I would have thought that invoking a C compiler would be a solved problem. Looking at Rust's cc crate there's a remarkable long tail of corner cases to fix.
Exotic CPUs, microarchitectures, compiler differences, operating system differences, etc.
Make defaults to a single worker, and newer build tools (e.g. ninja) default to the number of physical CPUs.
I wish there was an option for 'leave me a little bit of my machine to do stuff'.
Suppose you want to make a small, hackable interpreter, so you write an AST walker.
Could you recover performance by supporting lightweight threads that use all the CPUs?
The Python GIL preferred single threaded performance over multithreading, this is the opposite.