#latency
1 paper
-
inspiration
Speculative Decoding: The Free Tokens
Speculative decoding cuts inference latency 2–3x without changing a single output token. The gain is real. So is the catch.
1 paper
Speculative decoding cuts inference latency 2–3x without changing a single output token. The gain is real. So is the catch.