Writing decent indentation for a major mode is very hard. You need a non-trivial performant parser. I dare not look at perl-mode or C++.
Related Posts
Counter-intuitively, if you're writing a parser for a programming language, you need it to be a total function. As soon as you build IDE tooling, you need ASTs from invalid or incomplete input.
The parser should return (Ast, List<Error>) rather than Result<Ast, Error>.
TIL Tcl has a notion of 'safe interpreters', a mode where you can run untrusted code in a sandbox: https://www.tcl.tk/man/tcl8.4/TclCmd/safe.htm
Not many programming languages have this, but it's way safer to include in the implementation than try to build as a userland library.
ASTs typically discard comments, and that's usually what you want.
The only time (AFAICS) that preserving comments is useful is for writing a code formatter.
Could you write a formatter in terms of a list of lexemes? A CST is a non-trivial bit of code for one use case.