斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing

上傳人：策*** IP屬地：山西上傳時(shí)間：2023-07-18 格式：DOCX 頁(yè)數(shù)：93 大小：2.93MB 積分：19.9 舉報(bào) 版權(quán)申訴

斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第2頁(yè)

斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第3頁(yè)

斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第4頁(yè)

斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第5頁(yè)

已閱讀5頁(yè)，還剩88頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶(hù)提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

NaturalLanguageProcessing

withDeepLearning

CS224N/Ling284

ChristopherManning

Lecture4:DependencyParsing

LecturePlan

SyntacticStructureandDependencyparsing

1.SyntacticStructure:ConsistencyandDependency(25mins)

2.DependencyGrammarandTreebanks(15mins)

3.Transition-baseddependencyparsing(15mins)

4.Neuraldependencyparsing(20mins)

Reminders/comments:

InAssignment3,outonTuesday,youbuildaneuraldependencyparserusingPyTorch

StartinstallingandlearningPyTorch(Ass3hasscaffolding)

CometothePyTorchtutorial,Friday10am(undertheZoomtab,notaWebinar)

Finalprojectdiscussions–comemeetwithus;focusofThursdayclassinweek4

1.Twoviewsoflinguisticstructure:Constituency=phrase

structuregrammar=context-freegrammars(CFGs)

Phrasestructureorganizeswordsintonestedconstituents

Startingunit:words

the,cat,cuddly,by,door

Wordscombineintophrases

thecuddlycat,bythedoor

Phrasescancombineintobiggerphrases

thecuddlycatbythedoor

Twoviewsoflinguisticstructure:Constituency=phrasestructure

grammar=context-freegrammars(CFGs)

Phrasestructureorganizeswordsintonestedconstituents.

thecat

adog

largeinacrate

barkingonthetable

cuddlybythedoor

largebarking

talkto

walkedbehind

Twoviewsoflinguisticstructure:Dependencystructure

.Dependencystructureshowswhichwordsdependon(modify,attachto,orareargumentsof)whichotherwords.

Lookinthelargecrateinthekitchenbythedoor

Whydoweneedsentencestructure?I

Humanscommunicatecomplexideasbycomposingwordstogether

intobiggerunitstoconveycomplexmeanings

Listenersneedtoworkoutwhatmodifies[attachesto]what

Amodelneedstounderstandsentencestructureinordertobe

abletointerpretlanguagecorrectly

Prepositionalphraseattachmentambiguity

Scientistscountwhalesfromspace

etc.

PPattachmentambiguitiesmultiply

.Akeyparsingdecisionishowwe‘a(chǎn)ttach’variousconstituents

.PPs,adverbialorparticipialphrases,infinitives,coordinations,

.Catalannumbers:Cn=(2n)!/[(n+1)!n!]

.Anexponentiallygrowingseries,whicharisesinmanytree-likecontexts:

.E.g.,thenumberofpossibletriangulationsofapolygonwithn+2sides

.Turnsupintriangulationofprobabilisticgraphicalmodels(CS228)….

Coordinationscopeambiguity

ShuttleveteranandlongtimeNASAexecutiveFredGregoryappointedtoboard

Coordinationscopeambiguity

Adjectival/AdverbialModifierAmbiguity

VerbPhrase(VP)attachmentambiguity

Dependencypathshelpextractsemanticinterpretation–

simplepracticalexample:extractingprotein-proteininteraction

demonstrated

nsubjccomp

resultsmarkinteractsnmod:with

det!

thatadvmodSasA

The

nsubjcaseconj:and

KaiCrythmicallywithKaiAandKaiB

conj:andcc

KaiC?nsubjinteractsnmod:withèSasA

KaiC?nsubjinteractsnmod:withèSasAconj:andèKaiA

KaiC?nsubjinteractsnmod:withèSasAconj:andèKaiB

[Erkanetal.EMNLP07,Fundeletal.2007,etc.]

2.DependencyGrammarandDependencyStructure

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

submitted

BillswereBrownback

ports

Republican

bySenator

onandimmigration

Kansas

DependencyGrammarandDependencyStructure

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

Thearrowsare

commonlytyped

withthenameof

grammatical

relations(subject,

prepositionalobject,

apposition,etc.)

submitted

nsubj:passaux

Billswere

obl

Brownback

nmod

portscaseflatappos

caseccconjbySenatorRepublican

onandimmigrationnmod

Kansas

case

DependencyGrammarandDependencyStructure

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

Anarrowconnectsahead

(governor,superior,

regent)withadependent(modifier,inferior,

subordinate)

Usually,dependencies

formatree(aconnected,acyclic,single-rootgraph)

submitted

nsubj:passaux

Billswere

obl

Brownback

nmod

portscaseflatappos

caseccconjbySenatorRepublican

onandimmigrationnmod

Kansas

case

Pā?ini’sgrammar(c.5thcenturyBCE)

Gallery:

/indexplus/image/L0032691.html

CCBY4.0

File:BirchbarkMSfromKashmiroftheRupavatraWellcomeL0032691.jpg

DependencyGrammar/ParsingHistory

.Theideaofdependencystructuregoesbackalongway

.ToPā?ini’sgrammar(c.5thcenturyBCE)

.Basicapproachof1stmillenniumArabicgrammarians

.Constituency/context-freegrammarisanew-fangledinvention

.20thcenturyinvention(R.S.Wells,1947;thenChomsky1953,etc.)

.ModerndependencyworkisoftensourcedtoLucienTesnière(1959)

.Wasdominantapproachin“East”in20thCentury(Russia,China,…)

.Goodforfree-erwordorder,inflectedlanguageslikeRussian(orLatin!)

.UsedinsomeoftheearliestparsersinNLP,evenintheUS:

.DavidHays,oneofthefoundersofU.S.computationallinguistics,builtearly(first?)dependencyparser(Hays1962)andpublishedondependencygrammarinLanguage

DependencyGrammarandDependencyStructure

ROOTDiscussionoftheoutstandingissueswascompleted.

.Somepeopledrawthearrowsoneway;sometheotherway!

.Tesnièrehadthempointfromheadtodependent–wefollowthatconvention

.WeusuallyaddafakeROOTsoeverywordisadependentofprecisely1othernode

Theriseofannotateddata&UniversalDependenciestreebanks

Browncorpus(1967;PoStagged1979);Lancaster-IBMTreebank(startinglate1980s);

Marcusetal.1993,ThePennTreebank,ComputationalLinguistics;

UniversalDependencies:

Theriseofannotateddata

Startingoff,buildingatreebankseemsalotslowerandlessusefulthanwritingagrammar(byhand)

Butatreebankgivesusmanythings

.Reusabilityofthelabor

.Manyparsers,part-of-speechtaggers,etc.canbebuiltonit

.Valuableresourceforlinguistics

.Broadcoverage,notjustafewintuitions

.Frequenciesanddistributionalinformation

.AwaytoevaluateNLPsystems

DependencyConditioningPreferences

Whatarethesourcesofinformationfordependencyparsing?

1.Bilexicalaffinities

2.Dependencydistance3.Interveningmaterial

4.Valencyofheads

Thedependency[discussionàissues]isplausible

Mostdependenciesarebetweennearbywords

Dependenciesrarelyspaninterveningverbsorpunctuation

Howmanydependentsonwhichsideareusualforahead?

ROOTDiscussionoftheoutstandingissueswascompleted.

DependencyParsing

.Asentenceisparsedbychoosingforeachwordwhatotherword(includingROOT)itis

adependentof

.Usuallysomeconstraints:

.OnlyonewordisadependentofROOT

.Don’twantcyclesA→B,B→A

.Thismakesthedependenciesatree

.Finalissueiswhetherarrowscancross(benon-projective)ornot

ROOTI’llgiveatalktomorrowonneuralnetworks

Projectivity

.Definitionofaprojectiveparse:Therearenocrossingdependencyarcswhenthe

wordsarelaidoutintheirlinearorder,withallarcsabovethewords

.DependenciescorrespondingtoaCFGtreemustbeprojective

.I.e.,byformingdependenciesbytaking1childofeachcategoryashead

.Mostsyntacticstructureisprojectivelikethis,butdependencytheorynormallydoes

allownon-projectivestructurestoaccountfordisplacedconstituents

.Youcan’teasilygetthesemanticsofcertainconstructionsrightwithoutthese

nonprojectivedependencies

WhodidBillbuythecoffeefromyesterday?

3.MethodsofDependencyParsing

1.Dynamicprogramming

Eisner(1996)givesacleveralgorithmwithcomplexityO(n3),byproducingparseitemswithheadsattheendsratherthaninthemiddle

2.Graphalgorithms

YoucreateaMinimumSpanningTreeforasentence

McDonaldetal.’s(2005)MSTParserscoresdependenciesindependentlyusinganMLclassifier(heusesMIRA,foronlinelearning,butitcanbesomethingelse)

Neuralgraph-basedparser:DozatandManning(2017)etseq.–verysuccessful!

3.ConstraintSatisfaction

Edgesareeliminatedthatdon’tsatisfyhardconstraints.Karlsson(1990),etc.

4.“Transition-basedparsing”or“deterministicdependencyparsing”

Greedychoiceofattachmentsguidedbygoodmachinelearningclassifiers

E.g.,MaltParser(Nivreetal.2008).Hasprovenhighlyeffective.

Greedytransition-basedparsing[Nivre2003]

.Asimpleformofgreedydiscriminativedependencyparser

.Theparserdoesasequenceofbottom-upactions

.Roughlylike“shift”or“reduce”inashift-reduceparser,butthe“reduce”actionsare

specializedtocreatedependencieswithheadonleftorright

.Theparserhas:

.astackσ,writtenwithtoptotheright

.whichstartswiththeROOTsymbol

.abufferβ,writtenwithtoptotheleft

.whichstartswiththeinputsentence

.asetofdependencyarcsA

.whichstartsoffempty

.asetofactions

Basictransition-baseddependencyparser

Start:σ=[ROOT],β=w1,…,wn,A=?

1.Shift

2.Left-Arcr

3.Right-Arcr

σ,wi|β,Aèσ|wi,β,A

σ|wi|wj,β,Aèσ|wj,β,A∪{r(wj,wi)}

σ|wi|wj,β,Aèσ|wi,β,A∪{r(wi,wj)}

Finish:σ=[w],β=?

Start:σ=[ROOT],β=w1,…,wn,A=?

Shift

Left-Arc

Right-Arcr

σ,wi|β,Aèσ|wi,β,A

σ|wi|wj,β,Aè

σ|wj,β,A∪{r(wj,wi)}σ|wi|wj,β,Aè

σ|wi,β,A∪{r(wi,wj)}

Finish:σ=[w],β=?

atefish

ate

fish

ate

fish

Arc-standardtransition-basedparser

(thereareothertransitionschemes…)

Analysisof“Iatefish”

Start

[root]

Shift

[root]

Shift

[root]

ate

fish

[root]

ate

[root]

A+=

obj(ate→fish)

A+=

root([root]→ate)

Finish

Arc-standardtransition-basedparser

Analysisof“Iatefish”

LeftArc

Iate

[root]

Shift

[root]

RightArc

[root]atefish

RightArc

ate

[root]

A+=

nsubj(ate→I)

atefish

[root]

ate

[root]

MaltParser[NivreandHall2005]

.Wehavelefttoexplainhowwechoosethenextaction

.Answer:Standback,Iknowmachinelearning!

.Eachactionispredictedbyadiscriminativeclassifier(e.g.,softmaxclassifier)overeachlegalmove

.Maxof3untypedchoices;maxof|R|×2+1whentyped

.Features:topofstackword,POS;firstinbufferword,POS;etc.

.ThereisNOsearch(inthesimplestform)

.Butyoucanprofitablydoabeamsearchifyouwish(slowerbutbetter):Youkeepkgoodparseprefixesateachtimestep

.Themodel’saccuracyisfractionallybelowthestateoftheartindependencyparsing,

but

.Itprovidesveryfastlineartimeparsing,withhighaccuracy–greatforparsingtheweb

binary,sparsedim=106–107

ConventionalFeatureRepresentation

…

Featuretemplates:usuallyacombinationof1–3

elementsfromtheconfiguration

s1.w＝good∧s1.t＝JJ

Indicatorfeaturess2.w＝has∧s2.t＝VBZ∧sl.w＝good

lc(s2).t＝PRP∧s2.t＝VBZ∧s1.t＝JJ

ROOTShesawthevideolecture

012345

Gold

12Shensubj

20sawroot

35thedet

45videonn

52lectureobj

EvaluationofDependencyParsing:(labeled)dependencyaccuracy

Acc=#correctdeps

#ofdeps

UAS=4/5=80%LAS=2/5=40%

Parsed

She

saw

the

video

lecture

nsubj

root

det

nsubj

ccomp

Handlingnon-projectivity

.Thearc-standardalgorithmwepresentedonlybuildsprojectivedependencytrees

.Possibledirectionstohead:

1.Justdeclaredefeatonnonprojectivearcs

2.Usedependencyformalismwhichonlyhasprojectiverepresentations

.ACFGonlyallowsprojectivestructures;youpromoteheadofprojectivityviolations

3.Useapostprocessortoaprojectivedependencyparsingalgorithmtoidentifyandresolvenonprojectivelinks

4.Addextratransitionsthatcanmodelatleastmostnon-projectivestructures(e.g.,

addanextraSWAPtransition,cf.bubblesort)

5.Movetoaparsingmechanismthatdoesnotuseorrequireanyconstraintsonprojectivity(e.g.,thegraph-basedMSTParserorDozatandManning(2017))

4.Whydowegainfromaneuraldependencyparser?

IndicatorFeaturesRevisited

.Problem#1:sparse

.Problem#2:incomplete

.Problem#3:expensivecomputation

dense

0.10.9-0.20.3…-0.1-0.5

dim=~0o0r0ethan95%ofparsingtimeisconsumedby

featurecomputation

s1.w＝good∧s1.t＝JJNeuralApproach:

s2.w＝has∧s2.t＝VBZ∧sl.w＝goodlearnadenseandcompactfeaturerepresentation

lc(s2).t＝PRP∧s2.t＝VBZ∧s1.t＝JJ

Aneuraldependencyparser[ChenandManning2014]

.ResultsonEnglishparsingtoStanfordDependencies:

.Unlabeledattachmentscore(UAS)=head

.Labeledattachmentscore(LAS)=headandlabel

ParserUASLASsent./s

MaltParser

89.8

87.2

469

MSTParser

91.4

88.1

TurboParser

92.3

89.6

C&M2014

92.0

89.7

654

come

●

NNS(pluralnoun)shouldbeclosetoNN(singularnou.

nummod(numericalmodifier)shouldbeclosetoamod(adjectivemodifier).

waswere

Firstwin:DistributedRepresentations

.Werepresenteachwordasad-dimensionaldensevector(i.e.,wordembedding)

.Similarwordsareexpectedtohaveclosevectors.

.Meanwhile,part-of-speechtags(POS)anddependencylabelsarealsorepresentedas

d-dimensionalvectors.

.Thesmallerdiscretesetsalsoexhibitmanysemanticalsimilarities.

●

good

●is

lc(s1)rc(s1)lc(s2)rc(s2)

VBZNN

PRP

Aconcatenation

ofthevector

representationofalltheseisthe

neural

representationofaconfiguration

}

nsubj?

ExtractingTokens&vectorrepresentationsfromconfiguration

.Weextractasetoftokensbasedonthestack/bufferpositions:

wordPOSdep.

good

has

control

Secondwin:DeepLearningclassifiersarenon-linearclassifiers

.Asoftmaxclassifierassignsclassesy∈Cbasedoninputsx∈?dviatheprobability:

.Wetraintheweightmatrixw∈?#根dtominimizetheneg.logloss:∑i?logp(yi|xi)

.TraditionalMLclassifiers(includingNa?veBayes,SVMs,logisticregressionandsoftmax

classifier)arenotverypowerfulclassifiers:theyonlygivelineardecisionboundaries

Thiscanbequitelimiting

àUnhelpfulwhenaproblemiscomplex

Wouldn’titbecooltogetthesecorrect?

NeuralNetworksaremorepowerful

?Neuralnetworkscanlearnmuchmorecomplexfunctionswithnonlineardecisionboundaries!

?Non-linearintheoriginalspace,linearforthesoftmaxatthetopoftheneuralnetwork

VisualizationswithConvNetJSbyAndrejKarpathy!

/people/karpathy/convnetjs/demo/classify2d.html

Logloss(cross-entropyerror)willbeback-propagatedtotheembeddings

Simplefeed-forwardneuralnetworkmulti-classclassifier

Softmaxprobabilities

Outputlayery

y=softmax(Uh+b2)0Thehiddenlayerre-representstheinput—

itmovesinputsaroundinanintermediate

Hiddenlayerhlayervectorspace—soitcanbeeasily

h=ReLU(Wx+b1)classifiedwitha(linear)softmaxInputlayerx

xisresultoflookup

x(i,…,i+d)=Le

lookup+concat

logistic=“sigmoid”

ReLU=Rectified

LinearUnit

rect(z)=max(z,0)

NeuralDependencyParserModelArchitecture

Softmaxprobabilities

Outputlayery

y=softmax(Uh+b2)cross-entropyerrorwillbeback-propagated

totheembeddings

h=ReLU(Wx+b1)

Hiddenlayerh

Inputlayerx

lookup+concat

Dependencyparsingforsentencestructure

Neuralnetworkscanaccuratelydeterminethestructureofsentences,

supportinginterpretation

ChenandManning(2014)wasthefirstsimple,successfulneuraldependency

parser

Thedenserepresentations(andnon-linearclassifier)letitoutperformother

greedyparsersinbothaccuracyandspeed

Furtherdevelopmentsintransition-basedneuraldependencyparsing

Thisworkwasfurtherdevelopedandimprovedbyothers,includinginparticularatGoogle

?Bigger,deepernetworksw

人人文庫(kù)> 全部分類(lèi)> 行業(yè)資料 > 管理策劃

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔