Cross-cutting

Agentic RAG

Agentic RAG extends traditional retrieval-augmented generation with an agent loop: instead of one retrieval pass before generation, the model iteratively refines its queries, retrieves more context, evaluates whether the answer is complete, and continues until it has enough information. The pattern handles complex queries that no single retrieval can satisfy.

May 23, 2026

Traditional RAG retrieves once, generates once. Agentic RAG turns retrieval into a tool the model can call repeatedly: first query gets initial context, the model identifies gaps, second query fills them, the model checks coverage, and so on. The pattern excels on multi-hop questions ('what is the relationship between X and Y where Y is mentioned in the paper about X'), comparative questions, and questions requiring synthesis across many sources. The trade-off is cost (more retrieval calls, more inference) and latency (sequential dependencies). Production agentic RAG includes caps on iteration count and progress-detection to halt when the model isn't converging.