LLMs seem to handle dependency upgrades really well.
The task is well-specified, there's usually a build/test suite to check correctness of the modifications, and there's often a changelog they can consume too.
Related Posts
I love how the CommonMark Spec has a test suite that's just a JSON array. It's really easy to test a library for compliance, and I've seen developers nerd-sniped into full compliance.
https://spec.commonmark.org/0.31.2/spec.json
Is there a programming equivalent of the strawberry problem, i.e. a specific coding task that LLMs are consistently bad at?
One interesting consequence of the rise of LLMs: there's more demand for tools that handle untrusted input.
Arbitrary HTML+JS can be safely run in a browser. Lean can check an arbitrary proof.
These work really well with an LLM that can be wrong, but sometimes gives exactly what you want. Are there other tools in this family?