#gqa

inspiration
Attention Heads Are Not Equal

Multi-head attention gives every query its own key and value heads. That is thorough — and expensive. Grouped-Query Attention proves the redundancy: Llama 3 70B serves 64 query heads from 8 KV heads, cuts its KV cache by 8x, and loses almost nothing in quality.
June 26, 2026

Attention Heads Are Not Equal