RAG Architectures¶
What Is RAG?¶
Retrieval-Augmented Generation: instead of relying only on the model's training data, retrieve relevant documents at query time and include them in the prompt.
User query
↓
Retrieve relevant chunks from a knowledge base
↓
Stuff retrieved chunks into prompt context
↓
Model generates answer grounded in retrieved text
Why RAG?¶
- Freshness: Knowledge base can be updated without retraining
- Grounding: Reduces hallucination by providing source text
- Domain specificity: Add your own docs, papers, codebase
- Cost: Cheaper than fine-tuning for most use cases
Architecture Variants¶
Naive RAG¶
Query → Embed → Vector search → Top-K chunks → LLM → Answer
Simple, works surprisingly well for many cases. Fails when: - Query and answer use different vocabulary - Answer requires synthesizing across multiple documents - Chunks lose context from surrounding text
Graph RAG (LightRAG, Microsoft GraphRAG)¶
Documents → Extract entities + relationships → Build knowledge graph
Query → Graph traversal + vector search → Subgraph context → LLM → Answer
LightRAG (HKUDS): Lightweight graph-based RAG. Extracts entities and relations, stores in PostgreSQL/AGE graph + Qdrant vectors. Dual retrieval: graph traversal for structural queries, vector search for semantic queries.
Key insight: Graphs capture relationships between concepts that vector similarity misses.
Agentic RAG¶
Query → Agent decides retrieval strategy → Multiple retrieval calls →
Agent synthesizes → May retrieve more → Answer
The agent decides how to retrieve, not just what. Can: - Reformulate queries - Retrieve from multiple sources - Verify answers against sources - Iterate until satisfied
Hybrid Search¶
Combine vector (semantic) search with keyword (BM25) search:
| Method | Finds | Misses |
|---|---|---|
| Vector search | Semantically similar | Exact terms, rare words |
| Keyword search | Exact matches | Paraphrased content |
| Hybrid | Both | Less than either alone |
Chunking Strategies¶
How you split documents into retrievable units matters enormously:
| Strategy | How | Best For |
|---|---|---|
| Fixed-size | Split every N tokens | Simple, predictable |
| Sentence | Split on sentence boundaries | Preserving meaning |
| Semantic | Split when topic changes | Long documents |
| Recursive | Split by headers, then paragraphs, then sentences | Structured documents |
| Parent-child | Retrieve child chunks, include parent for context | Maintaining context |
My Experience¶
- Used LightRAG (investigated for OSS contribution) — graph extraction is powerful but adds complexity
- Ollama + manual prompting for simple retrieval tasks
- For code: embedding-based search (Cursor, Claude Code's Grep) works better than document RAG