inspiration

Structured Outputs Are a Contract

devinfo.dev — May 27, 2026

devinfo.dev:2026.0012

Structured Outputs Are a Contract

When you call an LLM and parse its response with json.loads(), you are gambling.

The model might forget a closing brace. It might invent a field. It might wrap valid JSON in a markdown code block. These are not edge cases — they are the default behavior of a system optimized for plausibility, not correctness.

Structured output generation eliminates the gamble.

The Mechanism

The insight is simple: at each generation step, the model produces a probability distribution over its entire vocabulary. Constrained decoding intercepts that distribution and masks any token that would violate the declared schema — setting its probability to zero before sampling occurs.

The model never sees the constraint as a restriction. It samples freely, but only from the legally reachable tokens given the current parse state.

The Outlines library formalized this in 2023 using finite-state machines. A JSON schema (or regex, or context-free grammar) is compiled into an FSM at initialization time. Each FSM state maps to a set of valid next tokens. During generation, the runtime performs an O(1) lookup: what state are we in, what tokens are legal, mask everything else.

The compilation cost is paid once. The per-token overhead is microseconds.

Why This Is a Systems Decision

A structured output guarantee changes the contract between your model and everything downstream.

Without it: every consumer of model output must be defensive. Parse errors require retry logic. Schema validation is duplicated across services. A single malformed response can cascade.

With it: the output is typed. Downstream services can treat model responses the same way they treat any other data source — with a schema they can rely on at the boundary.

This is not about making prompts cleaner. It is about moving an implicit assumption (the model will probably return valid JSON) into an explicit invariant (the output is guaranteed to conform to this schema, by construction).

The Tradeoff

Constrained decoding is not free of cost. Highly complex schemas can have large FSM state spaces. Some implementations front-load this as compilation time; others compute token masks incrementally. Libraries like llguidance report ~50μs CPU overhead per token for a 128k tokenizer — negligible for most workloads, but measurable at scale.

There is also an expressiveness tradeoff. A schema tightly constrains not just structure but content. Over-constrain and you push the model into degenerate token sequences. The schema should express what the output shape must be, not attempt to dictate which words the model chooses within valid fields.

What to Use

  • Outlines (dottxt-ai/outlines): JSON schema, regex, CFG, Pydantic models. FSM-based, O(1) per-token cost after compilation. Works with Transformers and vLLM backends.
  • llguidance (guidance-ai/llguidance): Rust-based, ~50μs per token. Supports JSON schemas, regexes, and Lark context-free grammars. Used internally by the Guidance framework.
  • vLLM guided decoding: guided_json, guided_regex, guided_grammar, guided_choice parameters built into the generation API. Supports swappable backends (xgrammar recommended for production).
  • llama.cpp grammar sampling: GGML BNF grammar support via --grammar-file flag. Native, no extra dependencies. Available in all llama.cpp-based servers including Ollama.

The Principle

Every interface between components in a system should have an explicit contract. When the component is an LLM, structured output generation is how you write that contract down — and enforce it at the token level.

Stop parsing. Start contracting.

References

1. Willard, B. T., & Louf, R. (2023). Efficient Guided Generation for Large Language Models. arXiv:2307.09702. https://arxiv.org/abs/2307.09702

2. dottxt-ai. (2024). Outlines — Official Documentation. https://dottxt-ai.github.io/outlines/latest/

3. guidance-ai. (2024). llguidance: LLM Guidance Library. GitHub. https://github.com/guidance-ai/llguidance

4. vLLM Project. (2024). Structured Outputs — vLLM Documentation. https://docs.vllm.ai/en/v0.7.3/features/structured_outputs.html

5. Geng, S., et al. (2023). Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning. EMNLP 2023. https://aclanthology.org/2023.emnlp-main.674.pdf

6. Ugare, S., et al. (2025). JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models. arXiv:2501.10868. https://arxiv.org/abs/2501.10868

Cite as

devinfo.dev. (2026). "Structured Outputs Are a Contract." devinfo.dev:2026.0012. https://devinfo.dev/d/2026.0012

devinfo.dev | https://devinfo.dev/d/2026.0012
Content licensed under CC BY-NC 4.0. Free to share with attribution for non-commercial use.
https://devinfo.dev