Temperature (sampling)
Temperature is the LLM sampling parameter that controls randomness in token selection — 0 produces deterministic output (always the most-likely token), 1 samples roughly proportional to probability, higher values flatten the distribution and produce more diverse output. Typical production defaults are 0 (deterministic tasks) to 0.7 (creative tasks).
Temperature choice depends on the task: classification, extraction, structured output should use 0 or near-0 because there's a single correct answer; creative writing benefits from higher temperatures because diversity is valued; reasoning is mixed (the model's reasoning is often deterministic but the final answer benefits from sampling). The interaction with chain-of-thought matters: reasoning at temperature 0 can lock the model into a wrong path; sampling multiple times at higher temperature and majority-voting often outperforms greedy decoding on hard reasoning tasks (self-consistency).
Related terms
- Top-p sampling
Top-p (or nucleus) sampling restricts token selection to the smallest set whose cumulative probability exceeds p — typically 0.
- Chain-of-thought (CoT)
Chain-of-thought prompting asks the LLM to reason step by step before producing the final answer — 'let's think through this carefully' or 'show your work'.
- Large language model (LLM)
A large language model is a neural network trained on enormous text corpora to predict the next token given preceding tokens — typically with billions to trillions of parameters.