Position Is Not Neutral

A model that fits 128K tokens can still fail to use information you placed at token 60K. The context window is a capacity claim. Where you put information inside that window is a separate engineering decision — one with a measurable performance cost if you get it wrong.

The U-Shaped Curve

In 2023, Liu et al. ran a systematic experiment: they placed a relevant document at different positions inside a long input context, then measured multi-document question-answering accuracy. The result was a U-shaped performance curve.

Models performed best when the relevant information appeared at the very beginning or the very end of the context. Performance degraded significantly — in some cases by more than 20 percentage points — when the same information was placed in the middle. In the worst tested configuration (20–30 documents), GPT-3.5-Turbo's performance fell below its closed-book baseline: the model did better with no documents than with the relevant document buried in the middle.

The phenomenon has a name: Lost in the Middle.

Two Biases, One Mechanism

The U-shape reflects two overlapping biases:

Primacy bias — models attend disproportionately to early tokens. This is structural. Causal masking in a deep autoregressive transformer guarantees that early tokens accumulate more contextualized representations across layers. Chowdhury (2026) showed analytically that primacy bias is not a training artifact — it is a structural inevitability of deep autoregressive transformers. Causal masking alone guarantees it.

Recency bias — models attend disproportionately to recent tokens. Wu et al. (2025) showed that LayerNorm, not just positional encoding, induces this effect. The residual connection architecture preserves identity paths through depth, biasing finite-depth rollout toward recent tokens.

The middle of the context has no such structural advantage. It is simply farther from both anchors.

What This Means in Practice

For RAG systems: If retrieval returns five chunks and you place the highest-relevance chunk third, you have engineered a degraded response. Chunk ordering is not cosmetic — it is part of the retrieval quality problem. The finding from Liu et al. applies directly: reranking by relevance and placing the top result first (or last) is not optional.

For long-document tasks: Summarization over a 50-page document does not uniformly attend to all pages. Instructions or constraints buried in the middle of a system prompt are less likely to be followed than those at the top.

For agentic pipelines: Tool call results appended to the middle of an accumulating context are in the worst position. If a result is critical, it belongs at a structural boundary — beginning of a new message, or last in the sequence.

The Engineering Rule

Place high-salience content at the structural edges of your context. Beginning and end are not arbitrary — they are architecturally privileged positions. This is not a workaround. It is an acknowledgment of what the model is.

The context window tells you the maximum. Position tells you what the model will actually use.

References

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12. https://doi.org/10.1162/tacl_a_00638
Chowdhury, B. D. (2026). Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias. arXiv preprint. https://doi.org/10.48550/arxiv.2603.10123
Wu, X., Wang, Y., Jegelka, S., & Jadbabaie, A. (2025). On the Emergence of Position Bias in Transformers. OpenReview. https://openreview.net/pdf?id=YufVk7I6Ii
Hsieh, C.-Y., Chuang, Y.-S., Li, C., Wang, Z., Le, L. T., Kumar, A., et al. (2024). Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization. Findings of ACL 2024. https://doi.org/10.18653/v1/2024.findings-acl.890