Cross-cutting

Retrieval-augmented generation (RAG)

Retrieval-augmented generation is the pattern where an LLM is given relevant context retrieved from an external source, typically via semantic search over a vector database, before generating its response. RAG grounds the model in current, authoritative content the model wasn't trained on, dramatically reducing hallucination on factual queries.

May 23, 2026

RAG emerged as the practical answer to two LLM limitations: knowledge cutoff (the model doesn't know facts from after training) and hallucination (the model invents plausible-but-wrong answers when asked about facts it doesn't have). The pipeline: chunk source documents into small passages, embed each chunk into a vector, store in a vector database, embed the user query into the same space at runtime, retrieve top-K nearest chunks, and supply them as context to the model. Production RAG includes substantial engineering around chunking strategy, retrieval re-ranking, citation extraction, and answer validation. Agentic RAG (2024+) lets the model iteratively refine queries and retrieve more context as it reasons.