GraphRAG v. flat RAG — key tradeoffs
As of Jun 2025, based on arXiv 2502.11371 (Llama 3.1-70B) and Microsoft cost data (Aug 2024)

Flat-chunk RAG answers temporal multi-hop questions at 25.7% accuracy — a floor the architecture cannot escape. This issue dissects what GraphRAG adds to fix that, what it costs in indexing ($50–$200 vs. under $5 for a 500-page corpus), where it still loses to plain RAG, and the exact conditions under which neither approach makes sense.

Dorothy, Site A, Vendor B) and typed relationships (works-at, certified-by, engaged-with) as structured triplets. These are accumulated into a knowledge graph: nodes are entities, edges are relationships with a numeric strength score. A community-detection algorithm (Leiden) then clusters related entities into hierarchical communities, and a second LLM pass pre-generates a text summary for each community.text-embedding-ada-002 provides the cleanest apples-to-apples numbers available: 1| Task | Flat RAG F1 | GraphRAG Local F1 | Delta |
|---|---|---|---|
| Single-hop QA (Natural Questions) | 68.2% | 65.4% | −2.8 pp |
| Multi-hop QA (HotpotQA) | 63.9% | 64.6% | +0.7 pp |
| MultiHop-RAG overall accuracy | 65.8% | 71.2% | +5.4 pp |
| Temporal sub-task | 25.7% | 49.1% | +23.4 pp |
Añade más opiniones o contexto en torno a este contenido.