Cross-cutting

Fine-tuning

Fine-tuning continues the training of a pre-trained LLM on a custom dataset, typically a few thousand to a few million examples, to adapt its behaviour to a specific domain, task, or output style. The result is a derived model that performs better on the target task than the base model with the same prompt.

May 23, 2026

Fine-tuning was once the default approach to LLM customisation; the rise of strong instruction-following base models and long context windows made it less necessary. The current pragmatic guidance: try prompting first (system prompt + few-shot examples + RAG), measure, and only fine-tune if you can articulate a specific gap that prompting can't close, typically structured output requirements with thousands of training examples, domain-specific terminology the base model gets wrong consistently, or latency requirements where a smaller fine-tuned model beats a larger generic one. The cost is significant: dataset preparation, training compute, ongoing maintenance as base models evolve.