Playing with perf today! It's really interesting to see low-level details of where compute time is going. Branch prediction works well most of the time! (At least for this workload.)
Based on https://jvns.ca/blog/2014/05/13/profiling-with-perf/ and https://users.rust-lang.org/t/profiling-in-rust-application/18195/2