Cross-cutting

LoRA adapter

LoRA (Low-Rank Adaptation) is a fine-tuning technique that updates only a small number of parameters in low-rank decomposition matrices, leaving the base model frozen. LoRA dramatically reduces training cost (typically 10-100x cheaper than full fine-tuning) and enables multiple specialised adapters to share one base model.

May 23, 2026

LoRA was introduced by Hu et al. (2021) and quickly became the dominant parameter-efficient fine-tuning method. The architectural insight: full fine-tuning updates billions of parameters, but most of the useful learning concentrates in low-rank updates. By training only the low-rank adapter (typically <1% of base model size), the technique captures most of the benefit at a fraction of the cost. Adapters can be swapped at inference time (multi-tenancy on a shared base model) or merged into the base for deployment. QLoRA extends LoRA with quantization to further reduce memory requirements.