Retrieval-augmented generation
Also: RAG
A pattern where a generative model is first fed the results of a search over your documents, then asked to write an answer grounded in what it retrieved.
Last updated
RAG is the standard “chat with my PDFs” architecture. A search over the document collection retrieves candidate passages; the passages are inserted into a prompt; an LLM writes an answer grounded in those passages. It is often conflated with what extractive systems do — but the final answer is still generative.
Why it matters for your research. A RAG system that cites a chunk is only as trustworthy as the paraphrase it wrote around the citation. The generative step can omit qualifying context, combine retrieved passages in misleading ways, or introduce claims not present in any source. For claims a researcher will cite, reading the retrieved passage directly beats reading a paraphrase of it.
In Archēglyph. We do not do RAG. We return retrieved spans themselves; reading is the workflow. If you want RAG-style output you can copy the retrieved spans into a tool that does it — with eyes open about the trade-off.
Not to be confused with. Semantic search and lexical (BM25) search are steps in a RAG stack but can be used alone — and that’s what we do.