#architecture

inspiration
Sparse Is Not Small

A model with 671 billion parameters can cost less to run than a 70 billion dense model. That is not a marketing claim — it is arithmetic. Mixture of Experts replaces a full forward pass with a routing decision, and the routing decision is the cost model.
June 19, 2026
booklet
The LocalLLM Engine Stack: One API, Multiple Backends, Zero Lock-in

A single OpenAI-compatible endpoint that routes across Ollama, llama.cpp, and FreeLLMAPI with automatic failover. This booklet documents the architecture, routing logic, and deployment of the localllm-engine.
May 27, 2026

Sparse Is Not Small