Retrieval-Augmented Generation

Type
Concept
Published
2026-04-29
Aliases
RAG, retrieval augmented generation
Brief definition

A pattern in which an LLM is given relevant external documents at query time, retrieved from a corpus, so that the model can ground its answer in knowledge it was not trained on.

What it is

Retrieval-Augmented Generation (RAG) couples a language model with an external knowledge store. When a user submits a question, the system first retrieves the most relevant chunks of source material from that store, then includes those chunks in the prompt sent to the LLM. The model generates an answer over the augmented context rather than from its training weights alone.

In its simplest form — sometimes called “naive RAG” — retrieval is a single semantic similarity search over a vector index of pre-embedded document chunks. Production systems extend this with reranking, hybrid keyword-plus-semantic search, query rewriting, hierarchical retrieval, and agentic loops that re-query when the first attempt is insufficient. RAG is best understood as a design space rather than a single algorithm: see the main article for the seven canonical architectures and the 34-technique catalogue.

Why it matters

RAG is how most institutional deployments of LLMs ground model outputs in their own data — case law for legal teams, research papers for academic groups, internal documents for enterprises. Without it, the model can only answer from its frozen training data, which is incomplete, out of date, and unverifiable.

But RAG is not free. Retrieval quality drives accuracy more than write strategy or prompt phrasing, and single-vector retrieval has a dimensionality-bounded ceiling on the corpus size it can reliably discriminate (see Semantic Collapse). It also does not eliminate hallucinations: the first preregistered audit of commercial legal RAG tools, Magesh et al. (2024), found Lexis+ AI, Westlaw AI-Assisted Research, and Ask Practical Law AI hallucinated 17–33% of the time despite vendor claims to the contrary. Teams that treat RAG as a feature (“plug in a vector DB”) rather than an architecture run into failures that look like model failures but are actually retrieval failures.