#long-context
1 paper
-
inspiration
Long Context Is Not Long Attention
Expanding a model's context window does not guarantee it attends to all of that context. The window is a capacity claim. Attention quality across that capacity is a separate, structural problem — and it degrades in ways that are not visible in perplexity scores.