Unison is exploring the idea of storing source code by the hash of its AST, which allows some interesting refactoring and testing designs: https://www.theregister.co.uk/AMP/2019/09/26/unison_programming_language/
miniblog.
Related Posts
https://tigerbeetle.com/blog/2025-02-27-why-we-designed-tigerbeetles-docs-from-scratch/ has an interesting distinction between "physical" and "logical" hash of a tarball.
By storing the hash of the decompressed tarball contents (i.e. the logical hash), they can verify the validity of files without needing to keep the tarball around.
On storing ASTs in flat arrays for performance, and the relationship with bytecode interpreters:
I really like the one-module-per-file model of JavaScript or Python.
If you're storing code in files, you might as well leverage file boundaries. If modules are a separate abstraction (e.g Rust, OCaml), it's harder to learn and choose how to organise code.
