#self-hosted
5 papers
-
booklet
From Free Tier to Sovereignty: Running Inference on Cloud ARM Instances
Free tier cloud compute promises self-hosted AI. The reality is capacity lotteries, region lock-in, and silent deprecation. This booklet documents what actually works, what does not, and how to build an inference setup that survives policy changes.
-
booklet
Ollama Beyond Defaults: Custom Model Paths on Windows and WSL
Ollama assumes default paths. When your models live elsewhere, the documentation stops helping. This booklet covers every configuration path for Windows native, WSL2, and cross-boundary access.
-
booklet
OpenCode with Local Models: Pointing Your Coding Agent at Your Own Inference
OpenCode is a terminal-first AI coding agent. It expects cloud APIs by default. This booklet shows how to wire it to Ollama, vLLM, or any OpenAI-compatible local endpoint — and what breaks when you do.
-
booklet
The LocalLLM Engine Stack: One API, Multiple Backends, Zero Lock-in
A single OpenAI-compatible endpoint that routes across Ollama, llama.cpp, and FreeLLMAPI with automatic failover. This booklet documents the architecture, routing logic, and deployment of the localllm-engine.
-
whitepaper
Choosing Your Inference Engine: llama.cpp, Ollama, and vLLM
llama.cpp, Ollama, and vLLM are not interchangeable. They solve different problems at different scales. This paper maps the architectural differences, performance characteristics, and deployment tradeoffs to help you pick the right engine for your workload — and understand why the wrong choice costs you in ways that are hard to undo.