#chunked-prefill
1 paper
-
inspiration
The Scheduler Is Not the Model
You can swap in a faster model and get slower responses. The model is not the bottleneck — the scheduler is. How the serving system decides what to run, in what order, and in what chunks determines your latency and throughput.