From Local to Global: A Graph RAG Approach to Query-Focused Summarization
https://arxiv.org/abs/2404.16130
GPT5.iconJust to the point first:
GraphRAG is a "global question" that looks at the entire corpus (e.g., What are the main themes? What are the conflicting viewpoints?) ). Entity-relationship graphs are created in advance, and summaries are generated for each community (cluster). When a question is asked, these are combined in a Map→Reduce synthesis to produce the final answer. (arXiv)
What the study did.
The conventional "vector RAG" is strong in extracting local facts, but weak in summarizing and synthesizing the entire context (sensemaking). Therefore, GraphRAG is proposed. (arXiv)
Pipeline (Figure 1)
1. document → chunking
2. extract entities, relations, and (if necessary) claims from each chunk with LLM
3. build a knowledge graph with them
4. community partitioning of graphs using Leiden method etc.
5. generate a summary for each community (hierarchical bottom-up)
nishio.iconbottom-up summary generation
6. partial answers to questions from each community summary (Map) → combined into a whole answer (Reduce). (arXiv)
Entity extraction is performed with multipart prompts. The self-reflection prompt compensates for the problem of increased misses in larger chunks. (arXiv)
How we evaluated
Dataset (assumed to be approximately 1 million tokens in size)
Podcast: public transcript of Behind the Tech with Kevin Scott (600 token chunks x 1669, overlap 100).
News: news for 2013/9-2023/12 (600 token chunks x 3197, overlap 100). (arXiv)
Condition comparison: 4 levels of GraphRAG (C0-C3), Text Direct Summary (TS), Vector RAG (SS). Generation prompts and context windows are unified. Community detection is Leiden with GRASPologic.(arXiv)
The evaluation indicators are relative ratings in LLM-as-a-judge (inclusiveness, diversity, empowerment, and directness as a control). In addition, objective indicators are also used together with the number of factual claims and clusters extracted by Claimify. (arXiv)
Main Results
Global system method (GraphRAG/TS) > Vector RAG
Inclusivity win rates: 72-83% for Podcasts, 72-80% for News.
Diversity win rates: 75-82% for Podcasts, 62-71% for News.
On the other hand, the vector RAG has the highest directness (i.e., short and direct). (arXiv)
Efficiency aspect: Root layer C0 consumes 9-43 times fewer tokens per question (advantageous for iterative search for global understanding). The deeper the hierarchy, the more information, but also the more tokens. (arXiv)
Even in objective measures, the global system is superior to the vector RAG in **number of claims (comprehensiveness) and number of claim clusters (diversity)** (e.g. p<.05). (arXiv)
Implementation Tips (from the paper)
Chunk size: If too large, extraction leakage will increase, so use self-reflection combination to balance. (arXiv)
Use of hierarchy:
C0 (top level community summary) = iterative search at very low cost.
C2/C3 (lower) = When you want to extend inclusiveness and diversity a bit. (arXiv)
Realistic cost of indexing: 600 token window for GPT-4-turbo as an example, about 281 minutes for podcast vocabulary (VM configuration and API conditions specified). (arXiv)
OSS: Microsoft implementation available; also extensions for LangChain/LlamaIndex/NebulaGraph/Neo4j. (arXiv)
Limitations and Future
Limited to validation with approximately 1 million tokens scale and 2 corpora. Cross-disciplinary generalization and comparison of hallucination rates (e.g. SelfCheckGPT) are future work. (arXiv)
When to use GraphRAG (personal opinion)
Questions that require "overall picture, coverage of issues, and multiple perspectives" (organizing trends, overlooking conflicting views, and extracting major themes).
Workloads that throw global questions at the same data over and over (low-cost iterations at C0 -> "dig" to lower layers as needed).
If the only purpose is to accurately identify individual facts, a conventional vector RAG is more likely to provide a short and direct answer. (arXiv)
---
This page is auto-translated from /nishio/From Local to Global: A Graph RAG Approach to Query-Focused Summarization using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.