AI coding glossary
RAG (Retrieval-Augmented Generation)
Also known as: retrieval-augmented generation, retrieval augmented generation
In one sentence
An LLM pattern where the model retrieves relevant documents from an index at query time and uses them to ground its answer, reducing hallucination and adding fresh data.
Full definition
RAG (Retrieval-Augmented Generation) is the dominant pattern for grounding LLM answers in fresh or private data. A query is first run against a vector index (or hybrid BM25 + vector index) to retrieve relevant chunks; those chunks are stuffed into the prompt as context; the LLM generates an answer citing them. RAG is how Perplexity, Claude.ai's Projects, ChatGPT with Connectors, and most enterprise AI search products work. In AI coding tools, RAG appears as 'context indexing', Cursor's codebase indexing, Cline's @workspace, Claude Code's directory awareness. The quality of RAG is bounded by the retrieval step; if the retriever misses the right chunk, the LLM can't reason about it.