#gpu

inspiration
Flash Attention Is an IO Problem

Standard attention is slow not because of arithmetic — it is slow because of memory traffic. Flash Attention solves the IO problem, not the compute problem. That distinction matters for how you think about every inference optimization that follows.
June 9, 2026

Flash Attention Is an IO Problem