Optimising fizzbuzz with hand-written x86-64 assembly and AVX2: https://codegolf.stackexchange.com/a/236630
I guess it was only a matter of time, but it's an impressive achievement: 31GB/second!
miniblog.
Related Posts
Optimising GHC, implementing assembly pretty-printers, and the tradeoffs of implementing against an interface: https://www.tweag.io/blog/2022-12-22-making-ghc-faster-at-emitting-code/
Finally got dynamic string allocation working in my toy compiler! https://github.com/Wilfred/proper-compiler-hat/commit/cd726f45eb0540eb54c2c3c7e0ab75a651c46a43
Implementing intrinsics made this way easier: writing a single large assembly function is a pain.
The more I learn about register allocators, the less I want to write programs in assembly. It's really nice having a compiler minimise spills.