#rag — devinfo.dev

inspiration

Fine-Tuning Is Usually Not the First Move

Reaching for fine-tuning to fix a model is often the expensive wrong turn. Most problems that look like they need fine-tuning are really retrieval or prompting problems. Fine-tuning changes behavior and style; it is a poor and costly way to inject knowledge.

July 17, 2026

inspiration

Retrieval Is a Ranking Problem

Most RAG systems that disappoint are not failing at generation. They are failing at retrieval — and specifically at ranking. Swapping vector databases rarely fixes it. Two-stage retrieval and honest evaluation usually do.

July 14, 2026

inspiration

The Embedding Is Not the Default

Every RAG system encodes text into vectors. The model that produces those vectors — and the dimensionality you accept from it — is an engineering decision. Most engineers make it once, at setup, and never revisit it. That is the wrong posture.

July 10, 2026

inspiration

Position Is Not Neutral

A model that fits 128K tokens can still fail to use information you placed at token 60K. The context window is a capacity claim. Where you put information inside that window is a separate engineering decision — one with a measurable performance cost if you get it wrong.

July 1, 2026

whitepaper

RAG Is a Retrieval Problem: Chunking, Indexing, and Why Engineers Get It Backwards

Most RAG failures happen before the LLM sees a single token. Chunking and indexing are not preprocessing steps — they are architectural decisions that determine what the model can possibly know. This paper maps the engineering decisions that actually matter: chunk strategy, index choice, hybrid retrieval, and the failure modes that remain invisible until production.

June 22, 2026

whitepaper

Fine-Tuning, RAG, or Prompting: An Engineering Decision

Three techniques can improve LLM output quality: prompt engineering, retrieval-augmented generation, and fine-tuning. Each solves a different problem. Choosing the wrong one wastes months and produces worse results than the right one done simply.

June 1, 2026

inspiration

Embeddings Are Not Optional

Every RAG pipeline, semantic search index, and similarity feature runs on embeddings. The generation model gets the credit. The embedding model does the work.

May 31, 2026

inspiration

Retrieval Is the Weakest Link

RAG systems fail at retrieval, not generation. Engineers blame the LLM. The problem is upstream.

May 29, 2026