Large language model (LLM)
A large language model is a neural network trained on enormous text corpora to predict the next token given preceding tokens — typically with billions to trillions of parameters. Modern LLMs (GPT-4, Claude, Gemini, Llama) extend this base capability with instruction tuning, RLHF, tool use, and long context windows, producing systems that can write, reason, and act on natural-language input.
The 'large' threshold has moved over time: GPT-2 (1.5B parameters) was large in 2019; today's frontier models are 100x+ larger and the term has lost precise meaning. The defining structural choice across modern LLMs is the transformer architecture (attention-based, parallelisable on GPUs) introduced in 2017. The capability boundary today is set less by raw scale and more by post-training: instruction tuning produces models that follow directions, RLHF produces models that avoid undesirable outputs, and constitutional methods produce models with explicit value alignment. The next axis of progress (2024-2026) has been agentic capability — tool use, long horizons, multi-step reasoning — built on top of the base LLM.
Related terms
- Retrieval-augmented generation (RAG)
Retrieval-augmented generation is the pattern where an LLM is given relevant context retrieved from an external source — typically via semantic search over a vector database — before generating its response.
- Context window
The context window is the maximum number of tokens an LLM can process in a single request — including the prompt, retrieved context, conversation history, and the generated response.
- Fine-tuning
Fine-tuning continues the training of a pre-trained LLM on a custom dataset — typically a few thousand to a few million examples — to adapt its behaviour to a specific domain, task, or output style.