Cross-cutting

Temperature (sampling)

Temperature is the LLM sampling parameter that controls randomness in token selection, 0 produces deterministic output (always the most-likely token), 1 samples roughly proportional to probability, higher values flatten the distribution and produce more diverse output. Typical production defaults are 0 (deterministic tasks) to 0.7 (creative tasks).

May 23, 2026

Temperature choice depends on the task: classification, extraction, structured output should use 0 or near-0 because there's a single correct answer; creative writing benefits from higher temperatures because diversity is valued; reasoning is mixed (the model's reasoning is often deterministic but the final answer benefits from sampling). The interaction with chain-of-thought matters: reasoning at temperature 0 can lock the model into a wrong path; sampling multiple times at higher temperature and majority-voting often outperforms greedy decoding on hard reasoning tasks (self-consistency).