#performance — devinfo.dev

inspiration

Prompt Caching Is the Cheapest Speedup

The fastest token is the one you never recompute. Stop paying twice for a stable prefix.

July 20, 2026

inspiration

Latency and Throughput Are Not the Same Goal

Two systems serving the same model can feel completely different because they optimize opposite things.

July 19, 2026

inspiration

Prefix Caching Is Free Throughput

Automatic Prefix Caching in vLLM reuses already-computed KV cache blocks across requests that share identical prefixes — delivering 30–50% throughput gains and up to 10x latency reduction at zero engineering cost beyond a single configuration flag.

June 8, 2026