Unison is exploring the idea of storing source code by the hash of its AST, which allows some interesting refactoring and testing designs: https://www.theregister.co.uk/AMP/2019/09/26/unison_programming_language/
Related Posts
https://tigerbeetle.com/blog/2025-02-27-why-we-designed-tigerbeetles-docs-from-scratch/ has an interesting distinction between "physical" and "logical" hash of a tarball.
By storing the hash of the decompressed tarball contents (i.e. the logical hash), they can verify the validity of files without needing to keep the tarball around.
It feels like rename is by far the most important refactoring operation. If I had an IDE with only one refactoring, I think I'd want rename.
On the challenge of writing accurate source spans on Unicode source code: https://reedmullanix.com/posts/unicode-source-spans.html
Also (see footnotes) a fair number of LSP clients assume UTF-8 despite early versions of LSP mandating UTF-16!