The Turing Test focuses on distinguishing between humans and computers in a text chat.
There are lots of other domains where it's interesting to compare styles. Do we make different mistakes in speech recognition? How easy is it to spot a chess AI masquerading as human?
miniblog.
Related Posts
Test code is total: we require it to always terminate or it's a failure! It also typically has 100% line and branch coverage.
I feel way less nervous about refactoring tests, you can always just run them.
One subtle behaviour of Claude that wasn't obvious to me: whilst each conversation is transient, permissions persist across conversations.
So if you've given permission to run e.g. 'cargo test' or even 'cargo run', you need to be sure that all future invocations are safe too.
You can see the current permissions with /permissions.
LLMs seem to handle dependency upgrades really well.
The task is well-specified, there's usually a build/test suite to check correctness of the modifications, and there's often a changelog they can consume too.