Agentic RAG
Agentic RAG extends traditional retrieval-augmented generation with an agent loop: instead of one retrieval pass before generation, the model iteratively refines its queries, retrieves more context, evaluates whether the answer is complete, and continues until it has enough information. The pattern handles complex queries that no single retrieval can satisfy.
Traditional RAG retrieves once, generates once. Agentic RAG turns retrieval into a tool the model can call repeatedly: first query gets initial context, the model identifies gaps, second query fills them, the model checks coverage, and so on. The pattern excels on multi-hop questions ('what is the relationship between X and Y where Y is mentioned in the paper about X'), comparative questions, and questions requiring synthesis across many sources. The trade-off is cost (more retrieval calls, more inference) and latency (sequential dependencies). Production agentic RAG includes caps on iteration count and progress-detection to halt when the model isn't converging.
Related terms
- Retrieval-augmented generation (RAG)
Retrieval-augmented generation is the pattern where an LLM is given relevant context retrieved from an external source — typically via semantic search over a vector database — before generating its response.
- Agent loop
An agent loop is the orchestration pattern where an LLM iteratively reasons, calls tools, observes results, and continues — until a terminal condition is met (task complete, max iterations reached, error).
- Tool use (LLM agent)
Tool use is the LLM-agent pattern in which the model has access to a defined set of tools — read file, search web, run code, query API — and decides which to invoke and with what arguments based on the user's request and the current state.