版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
GraphRetrieval-AugmentedGeneration:ASurvey
arXiv:2408.08921v2[cs.AI]10Sep2024
BOCIPENG?,SchoolofIntelligenceScienceandTechnology,PekingUniversity,ChinaYUNZHU?,CollegeofComputerScienceandTechnology,ZhejiangUniversity,ChinaYONGCHAOLIU,AntGroup,China
XIAOHEBO,GaolingSchoolofArtificialIntelligence,RenminUniversityofChina,China
HAIZHOUSHI,RutgersUniversity,USCHUNTAOHONG,AntGroup,China
YANZHANGt,SchoolofIntelligenceScienceandTechnology,PekingUniversity,China
SILIANGTANG,CollegeofComputerScienceandTechnology,ZhejiangUniversity,China
Recently,Retrieval-AugmentedGeneration(RAG)hasachievedremarkablesuccessinaddressingthechallengesofLargeLanguageModels(LLMs)withoutnecessitatingretraining.Byreferencinganexternalknowledgebase,RAGrefinesLLMoutputs,effectivelymitigatingissuessuchas“hallucination”,lackofdomain-specificknowledge,andoutdatedinformation.However,thecomplexstructureofrelationshipsamongdifferententitiesindatabasespresentschallengesforRAGsystems.Inresponse,GraphRAGleveragesstructuralinformationacrossentitiestoenablemorepreciseandcomprehensiveretrieval,capturingrelationalknowledgeandfacilitatingmoreaccurate,context-awareresponses.GiventhenoveltyandpotentialofGraphRAG,asystematicreviewofcurrenttechnologiesisimperative.ThispaperprovidesthefirstcomprehensiveoverviewofGraphRAGmethodologies.WeformalizetheGraphRAGworkflow,encompassingGraph-BasedIndexing,Graph-GuidedRetrieval,andGraph-EnhancedGeneration.Wethenoutlinethecoretechnologiesandtrainingmethodsateachstage.Additionally,weexaminedownstreamtasks,applicationdomains,evaluationmethodologies,andindustrialusecasesofGraphRAG.Finally,weexplorefutureresearchdirectionstoinspirefurtherinquiriesandadvanceprogressinthefield.Inordertotrackrecentprogressinthisfield,wesetuparepositoryat
/pengboci/GraphRAG-Survey.
CCSConcepts:?Computingmethodologies→Knowledgerepresentationandreasoning;?Informa-tionsystems→Informationretrieval;Datamining.
AdditionalKeyWordsandPhrases:LargeLanguageModels,GraphRetrieval-AugmentedGeneration,Knowl-edgeGraphs,GraphNeuralNetworks
*Bothauthorscontributedequallytothisresearch.tCorrespondingAuthor.
Authors’ContactInformation:BociPeng,SchoolofIntelligenceScienceandTechnology,PekingUniversity,Beijing,China,bcpeng@;YunZhu,CollegeofComputerScienceandTechnology,ZhejiangUniversity,Hangzhou,China,zhuyun_dcd@;YongchaoLiu,AntGroup,Hangzhou,China,yongchao.ly@;XiaoheBo,GaolingSchoolofArtificialIntelligence,RenminUniversityofChina,Beijing,China,bellebxh@;HaizhouShi,RutgersUniversity,NewBrunswick,NewJersey,US,haizhou.shi@;ChuntaoHong,AntGroup,Hangzhou,China,chuntao.hct@;YanZhang,SchoolofIntelligenceScienceandTechnology,PekingUniversity,Beijing,China,zhyzhy001@;SiliangTang,CollegeofComputerScienceandTechnology,ZhejiangUniversity,Hangzhou,China,siliang@.
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Copyrightsforcomponentsofthisworkownedbyothersthantheauthor(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orrepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.Requestpermissionsfrompermissions@.
?2024Copyrightheldbytheowner/author(s).PublicationrightslicensedtoACM.ACM1557-735X/2024/9-ART111
/XXXXXXX.XXXXXXX
J.ACM,Vol.37,No.4,Article111.Publicationdate:September2024.
111:2Pengetal.
ACMReferenceFormat:
BociPeng,YunZhu,YongchaoLiu,XiaoheBo,HaizhouShi,ChuntaoHong,YanZhang,andSiliangTang.
2024.GraphRetrieval-AugmentedGeneration:ASurvey.J.ACM37,4,Article111(September2024),
41
pages.
/XXXXXXX.XXXXXXX
1Introduction
ThedevelopmentofLargeLanguageModelslikeGPT-4
[127],Qwen2
[184],andLLaMA
[31]has
sparkedarevolutioninthefieldofartificialintelligence,fundamentallyalteringthelandscapeofnaturallanguageprocessing.Thesemodels,builtonTransformer
[161]architecturesandtrained
ondiverseandextensivedatasets,havedemonstratedunprecedentedcapabilitiesinunderstanding,interpreting,andgeneratinghumanlanguage.Theimpactoftheseadvancementsisprofound,stretchingacrossvarioussectorsincludinghealthcare
[103,
166,
203],finance
[93,
125],andeduca
-tion
[46,
169],wheretheyfacilitatemorenuancedandefficientinteractionsbetweenhumansand
machines.
Despitetheirremarkablelanguagecomprehensionandtextgenerationcapabilities,LLMsmayexhibitlimitationsduetoalackofdomain-specificknowledge,real-timeupdatedinformation,andproprietaryknowledge,whichareoutsideLLMs’pre-trainingcorpus.Thesegapscanleadtoaphenomenonknownas“hallucination”
[61]wherethemodelgeneratesinaccurateoreven
fabricatedinformation.Consequently,itisimperativetosupplementLLMswithexternalknowledgetomitigatethisproblem.Retrieval-AugmentedGeneration(RAG)
[34,
45,
59,
62,
178,
195,
202]
emergedasasignificantevolution,whichaimstoenhancethequalityandrelevanceofgeneratedcontentbyintegratingaretrievalcomponentwithinthegenerationprocess.TheessenceofRAGliesinitsabilitytodynamicallyqueryalargetextcorpustoincorporaterelevantfactualknowledgeintotheresponsesgeneratedbytheunderlyinglanguagemodels.Thisintegrationnotonlyenrichesthecontextualdepthoftheresponsesbutalsoensuresahigherdegreeoffactualaccuracyandspecificity.RAGhasgainedwidespreadattentionduetoitsexceptionalperformanceandbroadapplications,becomingakeyfocuswithinthefield.
AlthoughRAGhasachievedimpressiveresultsandhasbeenwidelyappliedacrossvariousdomains,itfaceslimitationsinreal-worldscenarios:(1)NeglectingRelationships:Inpractice,textualcontentisnotisolatedbutinterconnected.TraditionalRAGfailstocapturesignificantstructuredrelationalknowledgethatcannotberepresentedthroughsemanticsimilarityalone.Forinstance,inacitationnetworkwherepapersarelinkedbycitationrelationships,traditionalRAGmethodsfocusonfindingtherelevantpapersbasedonthequerybutoverlookimportantcitationrelationshipsbetweenpapers.(2)RedundantInformation:RAGoftenrecountscontentintheformoftextualsnippetswhenconcatenatedasprompts.Thismakescontextbecomeexcessivelylengthy,leadingtothe“l(fā)ostinthemiddle”dilemma
[104].(3)
LackingGlobalInformation:RAGcanonlyretrieveasubsetofdocumentsandfailstograspglobalinformationcomprehensively,andhencestruggleswithtaskssuchasQuery-FocusedSummarization(QFS).
GraphRetrieval-AugmentedGeneration(GraphRAG)
[32,
58,
119]emergesasaninnovative
solutiontoaddressthesechallenges.UnliketraditionalRAG,GraphRAGretrievesgraphelementscontainingrelationalknowledgepertinenttoagivenqueryfromapre-constructedgraphdatabase,asdepictedinFigure
1.
Theseelementsmayincludenodes,triples,paths,orsubgraphs,whichareutilizedtogenerateresponses.GraphRAGconsiderstheinterconnectionsbetweentexts,enablingamoreaccurateandcomprehensiveretrievalofrelationalinformation.Additionally,graphdata,suchasknowledgegraphs,offerabstractionandsummarizationoftextualdata,therebysignificantlyshorteningthelengthoftheinputtextandmitigatingconcernsofverbosity.Byretrievingsubgraphsorgraphcommunities,wecanaccesscomprehensiveinformationtoeffectivelyaddresstheQFS
challengebycapturingthebroadercontextandinterconnectionswithinthegraphstructure.J.ACM,Vol.37,No.4,Article111.Publicationdate:September2024.
GraphRetrieval-AugmentedGeneration:ASurvey111:3
Query
Howdidtheartisticmovementsofthe19thcenturyimpactthedevelopmentofmodernartin
the20thcentury?
LLMs
Response
Theartisticmovementsofthe19thcenturyinfluencedmodernartinthe20th
centurybyencouraging
experimentationwithcolor,form,andsubjectmatter.Thesemovementspavedthewayforabstraction,
expressionism,andotherinnovative.
Query
Retriever
Howdidtheartisticmovementsofthe19thcenturyimpactthedevelopmentofmodernartin
the20thcentury?
LLMs
1.Impressionistartistslike
ClaudeMonetintroducednewtechniquesthatrevolutionizedthedepictionoflightandcolor.
2.TheImpressionisttechniquesinfluencedlaterartmovements.
3.PabloPicassopioneeredCubism,whichradically
transformedtheapproachtovisualrepresentation.
4.Cubismemergedintheearly20thcenturyandchallenged
traditionalperspectivesonart.
…
自RetrievedTextResponse
ImpressionistartistslikeClaudeMonetinthe19thcenturyintroducednewtechniquesthatinfluence
laterartmovements.PabloPicassopioneeredCubismrelativityintheearly20thcentury.
Query
Retriever
Howdidtheartisticmovementsofthe19thcenturyimpactthedevelopmentofmodernartin
the20thcentury?
LLMs
-(PabloPicasso)-[pioneered]→(Cubism)
-(Cubism)-[emergedin]→(early20thcentury)
-(ClaudeMonet)-[introduced]→(newtechniques)
-(newtechniques)–
[revolutionized]→(depictionoflightandcolor)
-(Impressionisttechniques)-[influenced]→(laterart
movements)
…
RetrievedTriplets
Response
Monetintroducednewtechniquesthatrevolutionizedthedepictionoflightandcolor.HisImpressionisttechniquesinfluencedlaterartmovements,includingPicasso'sCubism,whichemergedintheearly20th
century.ThisinfluencehelpedshapePicasso’s
innovativeapproachtofragmentedperspectives.
Fig.1.ComparisionbetweenDirectLLM,RAG,andGraphRAG.Givenauserquery,directansweringbyLLMsmaysufferfromshallowresponsesorlackofspecificity.RAGaddressesthisbyretrievingrelevanttextualinformation,somewhatalleviatingtheissue.However,duetothetext’slengthandflexiblenaturallanguageexpressionsofentityrelationships,RAGstrugglestoemphasize“influence”relations,whichisthecoreofthequestion.While,GraphRAGmethodsleverageexplicitentityandrelationshiprepresentationsingraphdata,enablingpreciseanswersbyretrievingrelevantstructuredinformation.
Inthispaper,wearethefirsttoprovideasystematicsurveyofGraphRAG.Specifically,webeginbyintroducingtheGraphRAGworkflow,alongwiththefoundationalbackgroundknowledgethatunderpinsthefield.Then,wecategorizetheliteratureaccordingtotheprimarystagesoftheGraphRAGprocess:Graph-BasedIndexing(G-Indexing),Graph-GuidedRetrieval(G-Retrieval),andGraph-EnhancedGeneration(G-Generation)inSection
5,Section
6
andSection
7
respectively,detailingthecoretechnologiesandtrainingmethodswithineachphase.Furthermore,weinvestigatedownstreamtasks,applicationdomains,evaluationmethodologies,andindustrialusecasesofGraphRAG.ThisexplorationelucidateshowGraphRAGisbeingutilizedinpracticalsettingsandreflectsitsversatilityandadaptabilityacrossvarioussectors.Finally,acknowledgingthatresearchinGraphRAGisstillinitsearlystages,wedelveintopotentialfutureresearchdirections.Thisprognosticdiscussionaimstopavethewayforforthcomingstudies,inspirenewlinesofinquiry,andcatalyzeprogresswithinthefield,ultimatelypropellingGraphRAGtowardmorematureandinnovativehorizons.
Ourcontributionscanbesummarizedasfollows:
?Weprovideacomprehensiveandsystematicreviewofexistingstate-of-the-artGraphRAGmethodologies.WeofferaformaldefinitionofGraphRAG,outliningitsuniversalworkflowwhichincludesG-Indexing,G-Retrieval,andG-Generation.
?WediscussthecoretechnologiesunderpinningexistingGraphRAGsystems,includingG-Indexing,G-Retrieval,andG-Generation.Foreachcomponent,weanalyzethespectrumofmodelselection,methodologicaldesign,andenhancementstrategiescurrentlybeingexplored.Additionally,wecontrastthediversetrainingmethodologiesemployedacrossthesemodules.
?Wedelineatethedownstreamtasks,benchmarks,applicationdomains,evaluationmetrics,currentchallenges,andfutureresearchdirectionspertinenttoGraphRAG,discussingboth
J.ACM,Vol.37,No.4,Article111.Publicationdate:September2024.
111:4Pengetal.
G-Retrieval
QueryExpansion
Query
Decomposition
Query
Enhancements
GraphDatabase&G-Indexing
OpenKnowledgeGraphs
Self-ConstructedGraphData
Knowledge
Enhancements
Merging
Pruning
InputQuery
Howdidtheartisticmovementsofthe19thcenturyimpactthedevelopmentofmodernartinthe20thcentury?
Retriever
Monetintroducednewtechniquesthat
revolutionizedthedepictionoflightand
color.HisImpressionisttechniques…
RetrievalResults
Nodes
…
Triplets
Paths
Subgraphs
Hybrid
GraphFormat
Pre-GenerationEnhancements
NaturalLanguage
Mid-GenerationEnhancements
SyntaxTree
Post-GenerationEnhancements
GraphEmbedding
OutputResponse
G-Generation
Adjacency/EdgeTable
Generator
Generator
Generator
NodeSequence
Code-LikeForms
Fig.2.TheoverviewoftheGraphRAGframeworkforquestionansweringtask.Inthissurvey,wedivideGraphRAGintothreestages:G-Indexing,G-Retrieval,andG-Generation.Wecategorizetheretrievalsourcesintoopen-sourceknowledgegraphsandself-constructedgraphdata.Variousenhancingtechniqueslikequeryenhancementandknowledgeenhancementmaybeadoptedtoboosttherelevanceoftheresults.UnlikeRAG,whichusesretrievedtextdirectlyforgeneration,GraphRAGrequiresconvertingtheretrievedgraphinformationintopatternsacceptabletogeneratorstoenhancethetaskperformance.
theprogressandprospectsofthisfield.Furthermore,wecompileaninventoryofexistingindustryGraphRAGsystems,providinginsightsintothetranslationofacademicresearchintoreal-worldindustrysolutions.
Organization.Therestofthesurveyisorganizedasfollows:Section
2
comparesrelatedtech-niques,whileSection
3
outlinesthegeneralprocessofGraphRAG.Sections
5
to
7
categorizethetechniquesassociatedwithGraphRAG’sthreestages:G-Indexing,G-Retrieval,andG-Generation.Section
8
introducesthetrainingstrategiesofretrieversandgenerators.Section
9
summarizesGraphRAG’sdownstreamtasks,correspondingbenchmarks,applicationdomains,evaluationmet-rics,andindustrialGraphRAGsystems.Section
10
providesanoutlookonfuturedirections.Finally,Section
11
concludesthecontentofthissurvey.
2ComparisonwithRelatedTechniquesandSurveys
Inthissection,wecompareGraphRetrieval-AugmentedGeneration(GraphRAG)withrelatedtechniquesandcorrespondingsurveys,includingRAG,LLMsongraphs,andKnowledgeBaseQuestionAnswering(KBQA).
2.1RAG
RAGcombinesexternalknowledgewithLLMsforimprovedtaskperformance,integratingdomain-specificinformationtoensurefactualityandcredibility.Inthepasttwoyears,researchershavewrittenmanycomprehensivesurveysaboutRAG
[34,
45,
59,
62,
178,
195,
202]
.Forexample,Fanetal.
[34]
andGaoetal.
[45]
categorizeRAGmethodsfromtheperspectivesofretrieval,gen-eration,andaugmentation.Zhaoetal.
[202]
reviewRAGmethodsfordatabaseswithdifferent
modalities.Yuetal.
[195]
systematicallysummarizetheevaluationofRAGmethods.TheseworksJ.ACM,Vol.37,No.4,Article111.Publicationdate:September2024.
GraphRetrieval-AugmentedGeneration:ASurvey111:5
provideastructuredsynthesisofcurrentRAGmethodologies,fosteringadeeperunderstandingandsuggestingfuturedirectionsofthearea.
Fromabroadperspective,GraphRAGcanbeseenasabranchofRAG,whichretrievesrelevantrelationalknowledgefromgraphdatabasesinsteadoftextcorpus.However,comparedtotext-basedRAG,GraphRAGtakesintoaccounttherelationshipsbetweentextsandincorporatesthestructuralinformationasadditionalknowledgebeyondtext.Furthermore,duringtheconstructionofgraphdata,rawtextdatamayundergofilteringandsummarizationprocesses,enhancingtherefinementofinformationwithinthegraphdata.AlthoughprevioussurveysonRAGhavetoucheduponGraphRAG,theypredominantlycenterontextualdataintegration.Thispaperdivergesbyplacingaprimaryemphasisontheindexing,retrieval,andutilizationofstructuredgraphdata,whichrepresentsasubstantialdeparturefromhandlingpurelytextualinformationandspurstheemergenceofmanynewtechniques.
2.2LLMsonGraphs
LLMsarerevolutionizingnaturallanguageprocessingduetotheirexcellenttextunderstanding,reasoning,andgenerationcapabilities,alongwiththeirgeneralizationandzero-shottransferabilities.AlthoughLLMsareprimarilydesignedtoprocesspuretextandstrugglewithnon-Euclideandatacontainingcomplexstructuralinformation,suchasgraphs
[49,
165],numerous
studies
[17,
35,
74,
92,
102,
116,
130,
131,
173,
204]havebeenconductedinthesefields
.ThesepapersprimarilyintegrateLLMswithGNNstoenhancemodelingcapabilitiesforgraphdata,therebyimprovingperformanceondownstreamtaskssuchasnodeclassification,edgeprediction,graphclassification,andothers.Forexample,Zhuetal.
[204]
proposeanefficientfine-tuningmethodnamedENGINE,whichcombinesLLMsandGNNsthroughasidestructureforenhancinggraphrepresentation.
Differentfromthesemethods,GraphRAGfocusesonretrievingrelevantgraphelementsusingqueriesfromanexternalgraph-structureddatabase.Inthispaper,weprovideadetailedintroductiontotherelevanttechnologiesandapplicationsofGraphRAG,whicharenotincludedinprevioussurveysofLLMsonGraphs.
2.3KBQA
KBQAisasignificanttaskinnaturallanguageprocessing,aimingtorespondtouserqueriesbasedonexternalknowledgebases
[41,
85,
86,
188],therebyachievinggoalssuchasfactverification,passage
retrievalenhancement,andtextunderstanding.PrevioussurveystypicallycategorizeexistingKBQAapproachesintotwomaintypes:InformationRetrieval(IR)-basedmethodsandSemanticParsing(SP)-basedmethods.Specifically,IR-basedmethods
[69,
70,
112,
113,
154,
167,
182,
196]
retrieveinformationrelatedtothequeryfromtheknowledgegraph(KG)anduseittoenhancethegenerationprocess.WhileSP-basedmethods
[16,
19,
36,
48,
153,
191]generatealogicalform(LF)
foreachqueryandexecuteitagainstknowledgebasestoobtaintheanswer.
GraphRAGandKBQAarecloselyrelated,withIR-basedKBQAmethodsrepresentingasubsetofGraphRAGapproachesfocusedondownstreamapplications.Inthiswork,weextendthediscussionbeyondKBQAtoincludeGraphRAG’sapplicationsacrossvariousdownstreamtasks.OursurveyprovidesathoroughanddetailedexplorationofGraphRAGtechnology,offeringacomprehensive
understandingofexistingmethodsandpotentialimprovements.
3Preliminaries
Inthissection,weintroducebackgroundknowledgeofGraphRAGforeasiercomprehensionof
oursurvey.First,weintroduceText-AttributedGraphswhichisauniversalandgeneralformatofJ.ACM,Vol.37,No.4,Article111.Publicationdate:September2024.
111:6Pengetal.graphdatausedinGraphRAG.Then,weprovideformaldefinitionsfortwotypesofmodelsthat
canbeusedintheretrievalandgenerationstages:GraphNeuralNetworksandLanguageModels.
3.1Text-AttributedGraphs
ThegraphdatausedinGraphRAGcanberepresenteduniformlyasText-AttributedGraphs(TAGs),wherenodesandedgespossesstextualattributes.Formally,atext-attributedgraphcanbedenoted
asG=(V,E,Au}u∈V,{ei,j}i,j∈E),whereVisthesetofnodes,E?V×Visthesetof
edges,A∈{0,1}V|×|V|istheadjacentmatrix.Additionally,{xu}u∈Vand{ei,j}i,j∈Earetextualattributesofnodesandedges,respectively.OnetypicalkindofTAGsisKnowledgeGraphs(KGs),wherenodesareentities,edgesarerelationsamongentities,andtextattributesarethenamesofentitiesandrelations.
3.2GraphNeuralNetworks
GraphNeuralNetworks(GNNs)areakindofdeeplearningframeworktomodelthegraphdata.ClassicalGNNs,e.g.,GCN
[83],GAT
[162],GraphSAGE
[52],adoptamessage-passingmannerto
obtainnoderepresentations.Formally,eachnoderepresentationhi(l?1)inthel-thlayerisupdatedbyaggregatingtheinformationfromneighboringnodesandedges:
hi(l)=UPD(hl?1),AGGj∈N(i)MSG(hi(l?1),hj(l?1),ei(,?1))),(1)
whereN(i)representstheneighborsofnodei.MSGdenotesthemessagefunction,whichcomputesthemessagebasedonthenode,itsneighbor,andtheedgebetweenthem.AGGreferstotheaggregationfunctionthatcombinesthereceivedmessagesusingapermutation-invariantmethod,suchasmean,sum,ormax.UPDrepresentstheupdatefunction,whichupdateseachnode’sattributeswiththeaggregatedmessages.
Subsequently,areadoutfunction,e.g.,mean,sum,ormaxpooling,canbeappliedtoobtaintheglobal-levelrepresentation:
hG=READOUTi∈VG(hL)).(2)
InGraphRAG,GNNscanbeutilizedtoobtainrepresentationsofgraphdatafortheretrievalphase,aswellastomodeltheretrievedgraphstructures.
3.3LanguageModels
Languagemodels(LMs)excelinlanguageunderstandingandaremainlyclassifiedintotwotypes:discriminativeandgenerative.Discriminativemodels,likeBERT
[28],RoBERTa
[107]andSentence
-
BERT[140],focusonestimatingtheconditionalprobabilityp(y|x)
andareeffectiveintaskssuchastextclassificationandsentimentanalysis.Incontrast,generativemodels,includingGPT-3
[14]and
GPT-4[127],aimtomodelthejointprobabilityp(x
,y)fortaskslikemachinetranslationandtextgeneration.Thesegenerativepre-trainedmodelshavesignificantlyadvancedthefieldofnaturallanguageprocessing(NLP)byleveragingmassivedatasetsandbillionsofparameters,contributingtotheriseofLargeLanguageModels(LLMs)withoutstandingperformanceacrossvarioustasks. Intheearlystages,RAGandGraphRAGfocusedonimprovingpre-trainingtechniquesfordiscriminativelanguagemodels
[28,
107,
140].Recently,LLMssuchasChatGPT
[128],LLaMA
[31],
andQwen2
[184]haveshowngreatpotentialinlanguageunderstanding,demonstratingpowerful
in-contextlearningcapabilities.Subsequently,researchonRAGandGraphRAGshiftedtowardsenhancinginformationretrievalforlanguagemodels,addressingincreasinglycomplextasksand
mitigatinghallucinations,therebydrivingrapidadvancementsinthefield.J.ACM,Vol.37,No.4,Article111.Publicationdate:September2024.
GraphRetrieval-AugmentedGeneration:ASurvey111:7
4OverviewofGraphRAG
GraphRAGisaframeworkthatleveragesexternalstructuredknowledgegraphstoimprovecontex-tualunderstandingofLMsandgeneratemoreinformedresponses,asdepictedinFigure
2.
ThegoalofGraphRAGistoretrievethemostrelevantknowledgefromdatabases,therebyenhancingtheanswersofdownstreamtasks.Theprocesscanbedefinedas
wherea?istheoptimalanswerofthequeryqgiventheTAGG,andAisthesetofpossibleresponses.Afterthat,wejointlymodelthetargetdistributionp(a|q,G)withagraphretrieverpθ(G|q,G)andananswergeneratorpφ(a|q,G)whereθ,φarelearnableparameters,andutilizethetotalprobabilityformulatodecomposep(a|q,G),whichcanbeformulatedas
(4)
≈pφ(a|q,G?)pθ(G?|q,G),
whereG?istheoptimalsubgraph.Becausethenumberofcandidatesubgraphscangrowexpo-nentiallywiththesizeofthegraph,efficientapproximationmethodsarenecessary.ThefirstlineofEquation
4
isthusapproximatedbythesecondline.Specifically,agraphretrieverisemployedtoextracttheoptimalsubgraphG?,afterwhichthegeneratorproducestheanswerbasedontheretrievedsubgraph.
Therefore,inthissurvey,wedecomposetheentireprocessofGraphRAGintothreemainstages:Graph-BasedIndexing,Graph-GuidedRetrieval,andGraph-EnhancedGeneration.TheoverallworkflowofGraphRAGisillustratedinFigure2anddetailedintroductionsofeachstageareasfollows.
Graph-BasedIndexing(G-Indexing).Graph-BasedIndexingconstitutestheinitialphaseofGraphRAG,aimedatidentifyingorconstructingagraphdatabaseGthatalignswithdownstreamtasksandestablishingindicesonit.Thegraphdatabasecanoriginatefrompublicknowledgegraphs
[4,
10,
100,
142,
150,
163],graphdata
[123],orbeconstructedbasedonproprietarydata
sourcessuchastextual
[32,
51,
89,
172]orotherformsofdata
[183]
.Theindexingprocesstypi-callyincludesmappingnodeandedgeproperties,establishingpointersbetweenconnectednodes,andorganizingdatatosupportfasttraversalandretrievaloperations.Indexingdeterminesthegranularityofthesubsequentretrievalstage,playingacrucialroleinenhancingqueryefficiency.
Graph-GuidedRetrieval(G-Retrieval).Followinggraph-basedindexing,thegraph-guidedretrievalphasefocusesonextractingpertinentinformationfromthegraphdatabaseinresponsetouserqueriesorinput.Specifically,givenauserqueryqwhichisexpressedinnaturallanguage,theretrievalstageaimstoextractthemostrelevantelements(e.g.,entities,triplets,paths,subgraphs)fromknowledgegraphs,whichcanbeformulatedas
G?=G-Retriever(q,G)
(5)
whereG?istheoptimalretrievedgraphelementsand
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 地蠟清除劑產(chǎn)業(yè)規(guī)劃專項研究報告
- USB供電的臺式風(fēng)扇產(chǎn)業(yè)運(yùn)行及前景預(yù)測報告
- 醫(yī)用皮膚水分測試儀產(chǎn)業(yè)規(guī)劃專項研究報告
- 經(jīng)濟(jì)學(xué)B23級學(xué)習(xí)通超星期末考試答案章節(jié)答案2024年
- 《大學(xué)英語》學(xué)習(xí)通超星期末考試答案章節(jié)答案2024年
- 儀表板防滑墊市場發(fā)展預(yù)測和趨勢分析
- 安全令牌加密裝置產(chǎn)業(yè)運(yùn)行及前景預(yù)測報告
- 發(fā)光的出口標(biāo)志市場發(fā)展預(yù)測和趨勢分析
- 假肢產(chǎn)業(yè)運(yùn)行及前景預(yù)測報告
- 倉儲物流大型機(jī)械使用方案
- 3C戰(zhàn)略三角模型
- 民間藝術(shù)團(tuán)管理規(guī)章制度
- 高標(biāo)準(zhǔn)農(nóng)田建設(shè)示范工程質(zhì)量管理體系與措施
- 學(xué)生頂崗實習(xí)安全教育課件
- 公司組織架構(gòu)圖模板課件
- 遼寧省葫蘆島市各縣區(qū)鄉(xiāng)鎮(zhèn)行政村村莊村名居民村民委員會明細(xì)
- 植物種子的傳播方式課件
- 百合干(食品安全企業(yè)標(biāo)準(zhǔn))
- 咨詢服務(wù)合同之補(bǔ)充協(xié)議
- 名字的來歷-完整版PPT
- 公路新建工程標(biāo)準(zhǔn)化質(zhì)量管理手冊
評論
0/150
提交評論