Cross-cutting

Top-p sampling

Top-p (or nucleus) sampling restricts token selection to the smallest set whose cumulative probability exceeds p, typically 0.9 or 0.95. The technique adapts to the model's confidence: when the model is confident, the set is small; when uncertain, the set is large. Top-p often outperforms pure temperature sampling on quality at comparable diversity.

May 23, 2026

Top-p was introduced by Holtzman et al. (2019) as a counter to two failure modes of pure temperature sampling: at low temperature, the output is too repetitive; at high temperature, low-quality tokens are sometimes sampled. Top-p truncates the long tail of unlikely tokens regardless of temperature, so even high-temperature sampling produces coherent output. Production usage typically combines temperature and top-p (temperature controls diversity, top-p controls tail truncation). Defaults of temperature=0.7, top-p=0.95 are reasonable for most general-purpose generation.