#consumer-hardware
1 paper
-
whitepaper
The Memory Wall: A Field Guide to LLM Inference on Consumer Hardware
LLM inference is not compute-bound. It is memory-bandwidth-bound. Understanding that single fact — and the arithmetic that follows from it — determines every sensible hardware and quantization decision you will make when running models on consumer devices.