Fine-tuning
Fine-tuning continues the training of a pre-trained LLM on a custom dataset — typically a few thousand to a few million examples — to adapt its behaviour to a specific domain, task, or output style. The result is a derived model that performs better on the target task than the base model with the same prompt.
Fine-tuning was once the default approach to LLM customisation; the rise of strong instruction-following base models and long context windows made it less necessary. The current pragmatic guidance: try prompting first (system prompt + few-shot examples + RAG), measure, and only fine-tune if you can articulate a specific gap that prompting can't close — typically structured output requirements with thousands of training examples, domain-specific terminology the base model gets wrong consistently, or latency requirements where a smaller fine-tuned model beats a larger generic one. The cost is significant: dataset preparation, training compute, ongoing maintenance as base models evolve.
Related terms
- LoRA adapter
LoRA (Low-Rank Adaptation) is a fine-tuning technique that updates only a small number of parameters in low-rank decomposition matrices, leaving the base model frozen.
- Quantization
Quantization reduces the numerical precision of LLM weights (typically from FP16 to INT8 or INT4) to shrink memory footprint and speed up inference, with modest accuracy loss.
- Large language model (LLM)
A large language model is a neural network trained on enormous text corpora to predict the next token given preceding tokens — typically with billions to trillions of parameters.