自動(dòng)文本摘要

上傳人：q*** IP屬地：湖北上傳時(shí)間：2024-01-11 格式：PPT 頁數(shù)：40 大?。?60.50KB 積分：28 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩35頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

自動(dòng)文本摘要AutomaticTextSummarization1Introduction摘要的定義：

atextthatisproducedfromoneormoretexts,thatconveysimportantinformationintheoriginaltext(s),andthatisnolongerthanhalfoftheoriginaltext(s)andusuallysignificantlylessthanthat.三個(gè)重點(diǎn)：1.摘要可能是由單文檔或多文檔得出的2.摘要必須保存有重要信息3.摘要要盡可能的簡潔2Introduction四個(gè)重要的單詞extraction提取找到重要部分逐字產(chǎn)生abstraction抽象化以另一種方式產(chǎn)生重要資源fusion融合把抽取的東西連貫在一起compression壓縮去掉不重要的部分3IntroductionSingle-DocumentSummarizationMulti-DocumentSummarizationOtherApproachestoSummarizationEvaluation4Single-DocumentSummarizationEarlyWork 詞語頻率倒排（盧恩）句子位置（巴克辛德）提示詞和文檔骨架（埃德蒙遜）5Single-DocumentSummarizationmachinelearningmethods特征獨(dú)立的：樸素貝葉斯方法非特征獨(dú)立的：隱式馬爾科夫模型對(duì)數(shù)線性模型近期出現(xiàn)的神經(jīng)網(wǎng)絡(luò)和第三方特征6Single-DocumentSummarization樸素貝葉斯方法正態(tài)分布7Single-DocumentSummarization隱式馬爾科夫模型三個(gè)特征：句子位置術(shù)語數(shù)量句子術(shù)語和文檔術(shù)語的相似度8Multi-DocumentSummarization背景20世紀(jì)90年代中期新聞?lì)I(lǐng)域比單文檔摘要更多的要求發(fā)展過程很糾結(jié)9Multi-DocumentSummarization1. 抽象和信息聚合

（AbstractionandInformationFusion）SUMMONS -內(nèi)容管理器 -語言生成器依賴樹（dependencytrees）Dependencytreerepresentingthesentence"McVeigh,27,waschargedwiththebombing"10Multi-DocumentSummarization2. 標(biāo)題驅(qū)動(dòng)摘要和MMR

（Topic-drivenSummarizationandMMR）MMR——maximalmarginalrelevance它可能是適用于多項(xiàng)任務(wù)，包括從文本檢索到主題驅(qū)動(dòng)的總結(jié)。Q——查詢/用戶配置文件 R——被搜索引擎檢索到的文件S——集合 Di——可選文檔adifferentuserwithdifferentinformationneedsmayrequireatotallydifferentsummary

ofthesamedocument.11Multi-DocumentSummarization3. 圖形擴(kuò)展激活（GraphSpreadingActivation）12Multi-DocumentSummarization4. 基于質(zhì)心的摘要(Centroid-basedSummarization) 不同于以往的系統(tǒng) 易于擴(kuò)展和域的獨(dú)立第一階段將描述同樣的事件的新聞組合到一起聚類算法

第二階段質(zhì)心值Ci 正值PiThefirst-sentenceoverlap(Fi),definedastheinnerproductbetweenthewordoccurrencevectorofsentenceiandthatofthefirstsentenceofthedocument.最終得分13Multi-DocumentSummarization5. 多語種多文檔摘要（MultilingualMulti-documentSummarization）現(xiàn)在還處于起步階段SimFinder20一個(gè)基于聚類的文本的工具通過各種句法和詞法的功能使用對(duì)數(shù)線性回歸的相似性模型14OtherApproachestoSummarization簡介Thissectiondescribesbrieflysomeunconventionalapproachesthat,ratherthanaimingtobuildfullsummarizationsystems,investigatesomedetailsthatunderliethesummarizationprocess,andthatweconjecturetohavearoletoplayinfutureresearchonthisfield.15OtherApproachestoSummarizationShortSummaries簡短的摘要SentenceCompression語句壓縮Sequentialdocumentrepresentation

順序的文件表示16OtherApproachestoSummarizationShortSummaries簡短的摘要17OtherApproachestoSummarizationWitbrockandMittal(1999)提取總結(jié)（extractivesummarization）18OtherApproachestoSummarizationheadlinestylesummaries標(biāo)題式的摘要19OtherApproachestoSummarizationReutersandtheAssociatedPress,publiclyavailableattheLDC2120OtherApproachestoSummarizationForcontentselection,themodellearnedatranslationmodelbetweenadocumentanditssummary(Brownetal.,1993).21OtherApproachestoSummarization“翻譯模型”22OtherApproachestoSummarizationtheauthorsassumedthattheprobabilityofawordappearinginasummaryisindependentofitsstructure23OtherApproachestoSummarization維特比算法維特比算法是一種動(dòng)態(tài)規(guī)劃算法用于尋找最有可能產(chǎn)生觀測事件序列的-維特比路徑-隱含狀態(tài)序列,特別是在馬爾可夫信息源上下文和隱馬爾可夫模型中。術(shù)語“維特比路徑”和“維特比算法”也被用于尋找觀察結(jié)果最有可能解釋相關(guān)的動(dòng)態(tài)規(guī)劃算法。例如在統(tǒng)計(jì)句法分析中動(dòng)態(tài)規(guī)劃算法可以被用于發(fā)現(xiàn)最可能的上下文無關(guān)的派生(解析)的字符串，有時(shí)被稱為“維比特分析”。維特比算法由安德魯·維特比(AndrewViterbi)于1967年提出，用于在數(shù)字通信鏈路中解卷積以消除噪音。此算法被廣泛應(yīng)用于CDMA和GSM數(shù)字蜂窩網(wǎng)絡(luò)、撥號(hào)調(diào)制解調(diào)器、衛(wèi)星、深空通信和802.11無線網(wǎng)絡(luò)中解卷積碼。現(xiàn)今也被常常用于語音識(shí)別、關(guān)鍵字識(shí)別、計(jì)算語言學(xué)和生物信息學(xué)中。例如在語音(語音識(shí)別)中，聲音信號(hào)做為觀察到的事件序列,而文本字符串,被看作是隱含的產(chǎn)生聲音信號(hào)的原因，因此可對(duì)聲音信號(hào)應(yīng)用維特比算法尋找最有可能的文本字符串。24OtherApproachestoSummarization馬爾可夫猜想每個(gè)數(shù)只在樹上出現(xiàn)一次（即沒有正整數(shù)z使得(a,b,z),(c,d,z)都是方程的解，其中a,b,c,d是兩兩相異的正整數(shù)，且a>b>z,c>d>z）。25OtherApproachestoSummarizationThesurfacerealizationmodelusedwasabigram

model.Viterbibeamsearchwasusedtoefficientlyfindanear-optimalsummary.TheMarkovassumptionwasviolatedbyusingbacktrackingateverystatetostronglydiscouragepathsthatrepeatedterms,sincebigramsthatstartrepeatingoftenseemtopathologicallyoverwhelmthesearchotherwise.26Evaluationdifficulttask:

(1)

moredifficultyinsummarycontent

(2)

theabsenceofastandardevaluationmetric

(3)

manualevaluationistooexpensive27Evaluation1HumanandAutomaticEvaluation

DUC-2001:DocumentUnderstandingConference2001

SEE:SummaryEvaluationEnvironment

MU:modelunit

SU:systemunit28Evaluationthehumanmarkingsforoverlappingunits,unstableinter-humanagreement,lowusingautomaticmetricsNAMS29EvaluationNAMnn-gram：achievebestcorrelationwithhuman

judgement30Evaluation2ROUGE

Recall-OrientedUnderstudyforGistingEvaluation

ROUGE-N,n-gramrecall31Evaluation

closelyrelatedtoBLEU

usefulinmultiplereferencesummaries:32Evaluation

otherROUGE

ROUGE-W

applyinLCS:longestcommonsubsequences

thelongertheLCSbetweentwosummarysentences,themoresimilartheyare.33Evaluation

ROUGE-S

gappyversionofROUGE-N，skipbigram34Evaluation

summary

performedverywellontheDUC-2001andDUC-2002datasets

anopenresearchtopic35Evaluation3Information-theoreticEvaluationofSummaries

information-theoreticmethod

Jensen-Shannondivergence

suitboththesingle-documentandthemulti-documentsummarization36Evaluatio

人人文庫> 全部分類> 教育資料 > 課設(shè)設(shè)計(jì)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

自動(dòng)文本摘要

文檔簡介

溫馨提示

最新文檔

評(píng)論

自動(dòng)文本摘要

文檔簡介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔