On the challenge of writing accurate source spans on Unicode source code:
https://reedmullanix.com/posts/unicode-source-spans.html
Also (see footnotes) a fair number of LSP clients assume UTF-8 despite early versions of LSP mandating UTF-16!
Whilst LLMs don't always give an accurate answer, the UI is really compelling. I keep finding users whose favourite way of doing research is an LLM.
Difftastic does syntax highlighting based on tree-sitter's parse of the *whole file*. It's more accurate than most diffs are able to do.
In this hunk, the opening " of the string literal isn't included, but difftastic still knows that the first lines are from a string.