#local-inference
2 papers
-
inspiration
Embeddings Are Not Optional
Every RAG pipeline, semantic search index, and similarity feature runs on embeddings. The generation model gets the credit. The embedding model does the work.
-
booklet
OpenCode with Local Models: Pointing Your Coding Agent at Your Own Inference
OpenCode is a terminal-first AI coding agent. It expects cloud APIs by default. This booklet shows how to wire it to Ollama, vLLM, or any OpenAI-compatible local endpoint — and what breaks when you do.