#memory
1 paper
-
inspiration
The KV Cache Is Your Real Memory Budget
The KV cache — not the model weights — is what limits how many tokens you can generate and how many requests you can serve. Understanding it changes how you provision hardware and tune inference.