On storing ASTs in flat arrays for performance, and the relationship with bytecode interpreters:
miniblog.
Related Posts
ASTs typically discard comments, and that's usually what you want.
The only time (AFAICS) that preserving comments is useful is for writing a code formatter.
Could you write a formatter in terms of a list of lexemes? A CST is a non-trivial bit of code for one use case.
Counter-intuitively, if you're writing a parser for a programming language, you need it to be a total function. As soon as you build IDE tooling, you need ASTs from invalid or incomplete input.
The parser should return (Ast, List<Error>) rather than Result<Ast, Error>.
Difftastic is effectively computing the "tree edit distance" between two ASTs, and there's a bunch of papers on this topic. Literature review is hard though: sometimes a paper takes a while to digest, only to realise that they're solving a slightly different problem.