Temperature Is Not Creativity

Temperature does one thing: it scales logits before softmax.

At T=1.0, the distribution is unchanged. At T<1.0, it sharpens — the top token wins more decisively. At T>1.0, it flattens — lower-probability tokens get more of the mass.

That is the whole mechanism.

What It Does Not Do

Temperature does not add new ideas. It does not unlock tokens the model never assigned probability to. It cannot conjure knowledge that is not in the weights.

Higher temperature only lets sampling venture further down the model's existing probability ranking. If the model never considered a particular token likely, temperature cannot surface it. The distribution is reshaped, not expanded.

Calling temperature a "creativity slider" is a category error.

The Practical Consequence

Most production systems should run T=0.0 to T=1.0. Empirical research on reasoning and multiple-choice tasks finds no performance improvement beyond T=1.0 — only degradation. For reproducible, deterministic outputs, T=0.0 (greedy sampling) is the correct default.

Higher temperatures are appropriate for brainstorming or diversity-sampling use cases — but that is a deliberate trade of coherence for variance, not a creativity enhancement.

What Creative Output Actually Requires

Genuine novelty in LLM output comes from:

Training data diversity — what the model has seen
Prompt construction — what framing you provide
Sampling breadth — running multiple low-temperature passes and selecting among them

Temperature at T=1.5 producing surprising text is not the model being creative. It is the model being incoherent. The surprise is noise.

The Engineer's Rule

Set temperature for the task, not for the feeling.

Coding, extraction, classification: T=0.0–0.3. Open-ended generation: T=0.7–1.0. Above 1.0: justify it explicitly or do not use it.

Temperature is a precision instrument. Treat it like one.

References

1. Perez Becker, M., et al. (2024). Is Temperature the Creativity Parameter of Large Language Models? arXiv:2405.00492. https://arxiv.org/abs/2405.00492

2. ACL Anthology (2024). Empirical Study of Temperature in LLM Sampling. EMNLP 2024 Findings. https://aclanthology.org/2024.findings-emnlp.432.pdf

3. Brendoerfer, M. (2024). Decoding Temperature: Controlling Randomness in Language Model Generation. https://mbrenndoerfer.com/writing/decoding-temperature-language-model-generation

4. Engineers of AI. Sampling Strategies: Temperature, Top-K, Top-P. https://engineersofai.com/docs/llms/llm-inference/Sampling-Strategies-Temperature-TopK-TopP