#pagedattention
1 paper
-
inspiration
PagedAttention Is an OS Idea
Before PagedAttention, LLM serving systems wasted 60–80% of GPU memory on KV cache fragmentation. The fix was not a new neural architecture — it was a 1960s operating systems concept applied to the wrong layer.