In LSP, a position is represented as a line number and a column offset (in Unicode code units): https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position
This is pretty elegant. You'll get the correct line regardless of encoding bugs, and the editor already knows the line number so it's cheap to compute.
Related Posts
I'm experimenting with diagnostics formatting.
* I've added a left margin, showing both the file name and line numbers
* I'm showing one line of context above/below the offending line.
* I'm using grey for comments.
What do you think? Is there anything you'd change?
On the challenge of writing accurate source spans on Unicode source code: https://reedmullanix.com/posts/unicode-source-spans.html
Also (see footnotes) a fair number of LSP clients assume UTF-8 despite early versions of LSP mandating UTF-16!
Chekhov's repro: If a line of code is included in a bug report, it should contribute to the debugging somewhere.