A hierarchy of data cleanliness/readiness for training ML systems on: https://towardsdatascience.com/ready-set-ai-preparing-nhs-medical-imaging-data-for-the-future-8e85ed5a2824
Includes an interesting argument for centralising data sharing/cleaning at the NHS level.
Related Posts
Fascinating talk on applying deep learning to detecting cheaters in CS:GO https://www.youtube.com/watch?v=kTiP0zKF9bc
The presenter discusses how they get machine-readable data out of matches, and how they still keep a human in the loop (ML just feeds the human analysis component).
Many of the GNU manuals are incredibly good. For example, the diff manual includes worked examples: https://www.gnu.org/software/diffutils/manual/html_node/Example-Context.html
What's the secret? My docs are getting better, but they're not on this level.
On the limits and perils of being data driven: https://twitchard.github.io/posts/2022-08-26-metrics-schmetrics.html
(Worthwhile improvements are often not amenable to A/B testing, and metrics can harm intrinsic motivation.)