Tests may not guarantee correctness, but they are often suggestive.
When I have a bunch of regression tests for *real* issues, I feel a lot more confident. I'm covering ways I've failed in the past.
miniblog.
Related Posts
LLMs seem to handle dependency upgrades really well.
The task is well-specified, there's usually a build/test suite to check correctness of the modifications, and there's often a changelog they can consume too.
It's always seemed odd to me that the Rust stdlib is so lean (no random numbers, regex, HTTP) yet clippy is so big (correctness, performance, style preferences, even 'too many arguments').
Maybe it's because cargo is mature but clippy doesn't have an extension ecosystem?
I'm still figuring out where Copilot fits in my workflow, and I find it works really well *when I know exactly what code I want*.
When I have e.g. two lines in mind, I can see if it will write them (saving me the typing), and it's trivial to validate correctness.