A.1: https://www.akkadia.org/drepper/cpumemory.pdf
- https://developer.arm.com/documentation/102467/0201/Example---matrix-multiplication
- https://developer.arm.com/documentation/den0013/d/Optimizing-Code-to-Run-on-ARM-Processors/ARM-memory-system-optimization/Loop-tiling
- https://salykova.github.io/matmul-cpu
- https://en.wikipedia.org/wiki/Loop_nest_optimization#Overview
- https://siboehm.com/articles/22/Fast-MMM-on-CPU
- https://marek.ai/matrix-multiplication-on-cpu.html