All glossary terms
Cross-cutting

Large language model (LLM)

A large language model is a neural network trained on enormous text corpora to predict the next token given preceding tokens — typically with billions to trillions of parameters. Modern LLMs (GPT-4, Claude, Gemini, Llama) extend this base capability with instruction tuning, RLHF, tool use, and long context windows, producing systems that can write, reason, and act on natural-language input.

The 'large' threshold has moved over time: GPT-2 (1.5B parameters) was large in 2019; today's frontier models are 100x+ larger and the term has lost precise meaning. The defining structural choice across modern LLMs is the transformer architecture (attention-based, parallelisable on GPUs) introduced in 2017. The capability boundary today is set less by raw scale and more by post-training: instruction tuning produces models that follow directions, RLHF produces models that avoid undesirable outputs, and constitutional methods produce models with explicit value alignment. The next axis of progress (2024-2026) has been agentic capability — tool use, long horizons, multi-step reasoning — built on top of the base LLM.

Related terms