#ttft
1 paper
-
inspiration
Prefill Is the Stall
The gap between submitting a prompt and receiving the first token is not network lag. It is compute. Prefill is a matrix multiplication over every token in your input — and it blocks decode entirely until it finishes.