Hybrid Retrieval

A retrieval strategy that combines dense semantic vector search with sparse keyword search (and often a reranking pass), then merges the rankings — designed to recover queries that either method alone would miss.

Brief definition

What it is

Dense retrieval (semantic vector search) and sparse retrieval (keyword indices like BM25) fail in different ways. Semantic search misses exact-match queries: a search for “Section 230” or “Donoghue v Stevenson” can return conceptually similar results without ever surfacing the literal match. Keyword search misses paraphrases: a query for “renting a flat” misses documents that only ever say “tenancy agreement.”

Hybrid retrieval runs both methods on the same query, then merges their results — typically by reciprocal rank fusion or a learned reranker that scores candidates from both pools. A reranking model (often a cross-encoder that scores query-document pairs more accurately than the initial bi-encoder) is then applied to the merged shortlist to produce the final ranking. Research summarised by God of Prompt found that hybrid retrieval cut retrieval failures roughly in half compared to semantic search alone, with retrieval quality showing near-perfect correlation (r=0.98) with overall answer accuracy.

Why it matters

For RAG systems operating on real-world corpora, hybrid retrieval should be the baseline rather than an optimisation. The reason is that production queries mix two types: exploratory questions that benefit from semantic understanding, and precise lookups (case names, statute numbers, defined terms, error codes) that depend on exact matches. A pipeline built only on one method silently fails the other class of query, and users do not always know when they have asked the kind of question their system cannot handle.

Hybrid retrieval also partly mitigates Semantic Collapse: when semantic distances compress at scale, keyword matches still distinguish documents reliably. It does not eliminate the embedding-dimension ceiling — for that, hierarchical or graph-based retrieval is needed — but it raises the floor on what naive RAG can achieve.

What it is

Why it matters

Related concepts