I rather like the fs.readFileSync API in node. It gives you a buffer, unless you specify a text encoding.
This gently encourages users to think about their encoding, without having a heavyweight unicode datatype.
Related Posts
In LSP, a position is represented as a line number and a column offset (in Unicode code units): https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position
This is pretty elegant. You'll get the correct line regardless of encoding bugs, and the editor already knows the line number so it's cheap to compute.
One advantage I've come to appreciate about Dash/Zeal docsets: it's really nice having focused search.
The text search is constrained to the languages I care about enough to download the docset, substantially increasing the relevance. In Google I'd need to specify the language.
There are *so many* ways that reading a text file can fail.
Maybe it doesn't exist, it's a broken symlink, it's actually a directory, it's not the encoding you expected, or perhaps you just don't have the correct permissions.
Reporting good errors is surprisingly labour intensive.