斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第1頁(yè)
斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第2頁(yè)
斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第3頁(yè)
斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第4頁(yè)
斯坦福深度學(xué)習(xí)自然語(yǔ)言處理課件 -cs224n-2021-lecture04-dep-parsing_第5頁(yè)
已閱讀5頁(yè),還剩88頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

NaturalLanguageProcessing

withDeepLearning

CS224N/Ling284

ChristopherManning

Lecture4:DependencyParsing

LecturePlan

SyntacticStructureandDependencyparsing

1.SyntacticStructure:ConsistencyandDependency(25mins)

2.DependencyGrammarandTreebanks(15mins)

3.Transition-baseddependencyparsing(15mins)

4.Neuraldependencyparsing(20mins)

Reminders/comments:

InAssignment3,outonTuesday,youbuildaneuraldependencyparserusingPyTorch

StartinstallingandlearningPyTorch(Ass3hasscaffolding)

CometothePyTorchtutorial,Friday10am(undertheZoomtab,notaWebinar)

Finalprojectdiscussions–comemeetwithus;focusofThursdayclassinweek4

2

1.Twoviewsoflinguisticstructure:Constituency=phrase

structuregrammar=context-freegrammars(CFGs)

Phrasestructureorganizeswordsintonestedconstituents

Startingunit:words

the,cat,cuddly,by,door

Wordscombineintophrases

thecuddlycat,bythedoor

Phrasescancombineintobiggerphrases

thecuddlycatbythedoor

3

Twoviewsoflinguisticstructure:Constituency=phrasestructure

grammar=context-freegrammars(CFGs)

Phrasestructureorganizeswordsintonestedconstituents.

thecat

adog

largeinacrate

barkingonthetable

cuddlybythedoor

largebarking

talkto

walkedbehind

6

Twoviewsoflinguisticstructure:Dependencystructure

.Dependencystructureshowswhichwordsdependon(modify,attachto,orareargumentsof)whichotherwords.

Lookinthelargecrateinthekitchenbythedoor

7

Whydoweneedsentencestructure?I

Humanscommunicatecomplexideasbycomposingwordstogether

intobiggerunitstoconveycomplexmeanings

Listenersneedtoworkoutwhatmodifies[attachesto]what

Amodelneedstounderstandsentencestructureinordertobe

abletointerpretlanguagecorrectly

9

Prepositionalphraseattachmentambiguity

10

Prepositionalphraseattachmentambiguity

Scientistscountwhalesfromspace

Scientistscountwhalesfromspace

11

?

X

etc.

PPattachmentambiguitiesmultiply

.Akeyparsingdecisionishowwe‘a(chǎn)ttach’variousconstituents

.PPs,adverbialorparticipialphrases,infinitives,coordinations,

.Catalannumbers:Cn=(2n)!/[(n+1)!n!]

.Anexponentiallygrowingseries,whicharisesinmanytree-likecontexts:

.E.g.,thenumberofpossibletriangulationsofapolygonwithn+2sides

12

.Turnsupintriangulationofprobabilisticgraphicalmodels(CS228)….

Coordinationscopeambiguity

ShuttleveteranandlongtimeNASAexecutiveFredGregoryappointedtoboard

ShuttleveteranandlongtimeNASAexecutiveFredGregoryappointedtoboard

14

Coordinationscopeambiguity

15

Adjectival/AdverbialModifierAmbiguity

16

VerbPhrase(VP)attachmentambiguity

17

Dependencypathshelpextractsemanticinterpretation–

simplepracticalexample:extractingprotein-proteininteraction

demonstrated

nsubjccomp

resultsmarkinteractsnmod:with

det!

thatadvmodSasA

The

nsubjcaseconj:and

KaiCrythmicallywithKaiAandKaiB

conj:andcc

KaiC?nsubjinteractsnmod:withèSasA

KaiC?nsubjinteractsnmod:withèSasAconj:andèKaiA

KaiC?nsubjinteractsnmod:withèSasAconj:andèKaiB

[Erkanetal.EMNLP07,Fundeletal.2007,etc.]

18

2.DependencyGrammarandDependencyStructure

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

submitted

BillswereBrownback

ports

Republican

bySenator

onandimmigration

Kansas

of

19

DependencyGrammarandDependencyStructure

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

Thearrowsare

commonlytyped

withthenameof

grammatical

relations(subject,

prepositionalobject,

apposition,etc.)

20

submitted

nsubj:passaux

Billswere

obl

Brownback

nmod

portscaseflatappos

caseccconjbySenatorRepublican

onandimmigrationnmod

Kansas

case

of

DependencyGrammarandDependencyStructure

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

Anarrowconnectsahead

(governor,superior,

regent)withadependent(modifier,inferior,

subordinate)

Usually,dependencies

formatree(aconnected,acyclic,single-rootgraph)

21

submitted

nsubj:passaux

Billswere

obl

Brownback

nmod

portscaseflatappos

caseccconjbySenatorRepublican

onandimmigrationnmod

Kansas

case

of

Pā?ini’sgrammar(c.5thcenturyBCE)

Gallery:

/indexplus/image/L0032691.html

CCBY4.0

File:BirchbarkMSfromKashmiroftheRupavatraWellcomeL0032691.jpg

22

DependencyGrammar/ParsingHistory

.Theideaofdependencystructuregoesbackalongway

.ToPā?ini’sgrammar(c.5thcenturyBCE)

.Basicapproachof1stmillenniumArabicgrammarians

.Constituency/context-freegrammarisanew-fangledinvention

.20thcenturyinvention(R.S.Wells,1947;thenChomsky1953,etc.)

.ModerndependencyworkisoftensourcedtoLucienTesnière(1959)

.Wasdominantapproachin“East”in20thCentury(Russia,China,…)

.Goodforfree-erwordorder,inflectedlanguageslikeRussian(orLatin!)

.UsedinsomeoftheearliestparsersinNLP,evenintheUS:

.DavidHays,oneofthefoundersofU.S.computationallinguistics,builtearly(first?)dependencyparser(Hays1962)andpublishedondependencygrammarinLanguage

23

DependencyGrammarandDependencyStructure

ROOTDiscussionoftheoutstandingissueswascompleted.

.Somepeopledrawthearrowsoneway;sometheotherway!

.Tesnièrehadthempointfromheadtodependent–wefollowthatconvention

.WeusuallyaddafakeROOTsoeverywordisadependentofprecisely1othernode

24

Theriseofannotateddata&UniversalDependenciestreebanks

Browncorpus(1967;PoStagged1979);Lancaster-IBMTreebank(startinglate1980s);

Marcusetal.1993,ThePennTreebank,ComputationalLinguistics;

UniversalDependencies:

/

25

Theriseofannotateddata

Startingoff,buildingatreebankseemsalotslowerandlessusefulthanwritingagrammar(byhand)

Butatreebankgivesusmanythings

.Reusabilityofthelabor

.Manyparsers,part-of-speechtaggers,etc.canbebuiltonit

.Valuableresourceforlinguistics

.Broadcoverage,notjustafewintuitions

.Frequenciesanddistributionalinformation

.AwaytoevaluateNLPsystems

26

DependencyConditioningPreferences

Whatarethesourcesofinformationfordependencyparsing?

1.Bilexicalaffinities

2.Dependencydistance3.Interveningmaterial

4.Valencyofheads

Thedependency[discussionàissues]isplausible

Mostdependenciesarebetweennearbywords

Dependenciesrarelyspaninterveningverbsorpunctuation

Howmanydependentsonwhichsideareusualforahead?

ROOTDiscussionoftheoutstandingissueswascompleted.

27

DependencyParsing

.Asentenceisparsedbychoosingforeachwordwhatotherword(includingROOT)itis

adependentof

.Usuallysomeconstraints:

.OnlyonewordisadependentofROOT

.Don’twantcyclesA→B,B→A

.Thismakesthedependenciesatree

.Finalissueiswhetherarrowscancross(benon-projective)ornot

ROOTI’llgiveatalktomorrowonneuralnetworks

28

Projectivity

.Definitionofaprojectiveparse:Therearenocrossingdependencyarcswhenthe

wordsarelaidoutintheirlinearorder,withallarcsabovethewords

.DependenciescorrespondingtoaCFGtreemustbeprojective

.I.e.,byformingdependenciesbytaking1childofeachcategoryashead

.Mostsyntacticstructureisprojectivelikethis,butdependencytheorynormallydoes

allownon-projectivestructurestoaccountfordisplacedconstituents

.Youcan’teasilygetthesemanticsofcertainconstructionsrightwithoutthese

nonprojectivedependencies

WhodidBillbuythecoffeefromyesterday?

29

3.MethodsofDependencyParsing

1.Dynamicprogramming

Eisner(1996)givesacleveralgorithmwithcomplexityO(n3),byproducingparseitemswithheadsattheendsratherthaninthemiddle

2.Graphalgorithms

YoucreateaMinimumSpanningTreeforasentence

McDonaldetal.’s(2005)MSTParserscoresdependenciesindependentlyusinganMLclassifier(heusesMIRA,foronlinelearning,butitcanbesomethingelse)

Neuralgraph-basedparser:DozatandManning(2017)etseq.–verysuccessful!

3.ConstraintSatisfaction

Edgesareeliminatedthatdon’tsatisfyhardconstraints.Karlsson(1990),etc.

4.“Transition-basedparsing”or“deterministicdependencyparsing”

Greedychoiceofattachmentsguidedbygoodmachinelearningclassifiers

E.g.,MaltParser(Nivreetal.2008).Hasprovenhighlyeffective.

30

Greedytransition-basedparsing[Nivre2003]

.Asimpleformofgreedydiscriminativedependencyparser

.Theparserdoesasequenceofbottom-upactions

.Roughlylike“shift”or“reduce”inashift-reduceparser,butthe“reduce”actionsare

specializedtocreatedependencieswithheadonleftorright

.Theparserhas:

.astackσ,writtenwithtoptotheright

.whichstartswiththeROOTsymbol

.abufferβ,writtenwithtoptotheleft

.whichstartswiththeinputsentence

.asetofdependencyarcsA

.whichstartsoffempty

.asetofactions

31

Basictransition-baseddependencyparser

Start:σ=[ROOT],β=w1,…,wn,A=?

1.Shift

2.Left-Arcr

3.Right-Arcr

σ,wi|β,Aèσ|wi,β,A

σ|wi|wj,β,Aèσ|wj,β,A∪{r(wj,wi)}

σ|wi|wj,β,Aèσ|wi,β,A∪{r(wi,wj)}

Finish:σ=[w],β=?

32

Start:σ=[ROOT],β=w1,…,wn,A=?

1.

2.

3.

Shift

Left-Arc

Right-Arcr

σ,wi|β,Aèσ|wi,β,A

σ|wi|wj,β,Aè

σ|wj,β,A∪{r(wj,wi)}σ|wi|wj,β,Aè

σ|wi,β,A∪{r(wi,wj)}

Finish:σ=[w],β=?

r

I

atefish

ate

fish

I

I

ate

fish

Arc-standardtransition-basedparser

(thereareothertransitionschemes…)

Analysisof“Iatefish”

Start

[root]

Shift

[root]

Shift

[root]

33

ate

fish

[root]

ate

[root]

A+=

obj(ate→fish)

A+=

root([root]→ate)

Finish

Arc-standardtransition-basedparser

Analysisof“Iatefish”

LeftArc

Iate

[root]

Shift

[root]

RightArc

[root]atefish

RightArc

ate

34

[root]

A+=

nsubj(ate→I)

atefish

[root]

ate

[root]

MaltParser[NivreandHall2005]

.Wehavelefttoexplainhowwechoosethenextaction

.Answer:Standback,Iknowmachinelearning!

.Eachactionispredictedbyadiscriminativeclassifier(e.g.,softmaxclassifier)overeachlegalmove

.Maxof3untypedchoices;maxof|R|×2+1whentyped

.Features:topofstackword,POS;firstinbufferword,POS;etc.

.ThereisNOsearch(inthesimplestform)

.Butyoucanprofitablydoabeamsearchifyouwish(slowerbutbetter):Youkeepkgoodparseprefixesateachtimestep

.Themodel’saccuracyisfractionallybelowthestateoftheartindependencyparsing,

but

.Itprovidesveryfastlineartimeparsing,withhighaccuracy–greatforparsingtheweb

35

binary,sparsedim=106–107

ConventionalFeatureRepresentation

0

0

0

1

0

0

1

0

0

0

1

0

Featuretemplates:usuallyacombinationof1–3

elementsfromtheconfiguration

s1.w=good∧s1.t=JJ

Indicatorfeaturess2.w=has∧s2.t=VBZ∧sl.w=good

lc(s2).t=PRP∧s2.t=VBZ∧s1.t=JJ

36

ROOTShesawthevideolecture

012345

Gold

12Shensubj

20sawroot

35thedet

45videonn

52lectureobj

EvaluationofDependencyParsing:(labeled)dependencyaccuracy

Acc=#correctdeps

#ofdeps

UAS=4/5=80%LAS=2/5=40%

Parsed

1

2

3

4

5

2

0

4

5

2

She

saw

the

video

lecture

nsubj

root

det

nsubj

ccomp

37

Handlingnon-projectivity

.Thearc-standardalgorithmwepresentedonlybuildsprojectivedependencytrees

.Possibledirectionstohead:

1.Justdeclaredefeatonnonprojectivearcs

2.Usedependencyformalismwhichonlyhasprojectiverepresentations

.ACFGonlyallowsprojectivestructures;youpromoteheadofprojectivityviolations

3.Useapostprocessortoaprojectivedependencyparsingalgorithmtoidentifyandresolvenonprojectivelinks

4.Addextratransitionsthatcanmodelatleastmostnon-projectivestructures(e.g.,

addanextraSWAPtransition,cf.bubblesort)

5.Movetoaparsingmechanismthatdoesnotuseorrequireanyconstraintsonprojectivity(e.g.,thegraph-basedMSTParserorDozatandManning(2017))

38

4.Whydowegainfromaneuraldependencyparser?

IndicatorFeaturesRevisited

.Problem#1:sparse

.Problem#2:incomplete

.Problem#3:expensivecomputation

dense

0.10.9-0.20.3…-0.1-0.5

dim=~0o0r0ethan95%ofparsingtimeisconsumedby

featurecomputation

s1.w=good∧s1.t=JJNeuralApproach:

s2.w=has∧s2.t=VBZ∧sl.w=goodlearnadenseandcompactfeaturerepresentation

lc(s2).t=PRP∧s2.t=VBZ∧s1.t=JJ

39

Aneuraldependencyparser[ChenandManning2014]

.ResultsonEnglishparsingtoStanfordDependencies:

.Unlabeledattachmentscore(UAS)=head

.Labeledattachmentscore(LAS)=headandlabel

ParserUASLASsent./s

MaltParser

89.8

87.2

469

MSTParser

91.4

88.1

10

TurboParser

92.3

89.6

8

C&M2014

92.0

89.7

654

40

come

NNS(pluralnoun)shouldbeclosetoNN(singularnou.

nummod(numericalmodifier)shouldbeclosetoamod(adjectivemodifier).

waswere

Firstwin:DistributedRepresentations

.Werepresenteachwordasad-dimensionaldensevector(i.e.,wordembedding)

.Similarwordsareexpectedtohaveclosevectors.

.Meanwhile,part-of-speechtags(POS)anddependencylabelsarealsorepresentedas

d-dimensionalvectors.

.Thesmallerdiscretesetsalsoexhibitmanysemanticalsimilarities.

.

good

●is

41

s1

s2

b1

lc(s1)rc(s1)lc(s2)rc(s2)

JJ

VBZNN

?

?

PRP

?

Aconcatenation

ofthevector

representationofalltheseisthe

neural

representationofaconfiguration

??

++

}

?

nsubj?

ExtractingTokens&vectorrepresentationsfromconfiguration

.Weextractasetoftokensbasedonthestack/bufferpositions:

wordPOSdep.

good

has

control

?

?

He

?

42

Secondwin:DeepLearningclassifiersarenon-linearclassifiers

.Asoftmaxclassifierassignsclassesy∈Cbasedoninputsx∈?dviatheprobability:

.Wetraintheweightmatrixw∈?#根dtominimizetheneg.logloss:∑i?logp(yi|xi)

.TraditionalMLclassifiers(includingNa?veBayes,SVMs,logisticregressionandsoftmax

classifier)arenotverypowerfulclassifiers:theyonlygivelineardecisionboundaries

Thiscanbequitelimiting

àUnhelpfulwhenaproblemiscomplex

Wouldn’titbecooltogetthesecorrect?

43

NeuralNetworksaremorepowerful

?Neuralnetworkscanlearnmuchmorecomplexfunctionswithnonlineardecisionboundaries!

?Non-linearintheoriginalspace,linearforthesoftmaxatthetopoftheneuralnetwork

VisualizationswithConvNetJSbyAndrejKarpathy!

44

/people/karpathy/convnetjs/demo/classify2d.html

Logloss(cross-entropyerror)willbeback-propagatedtotheembeddings

Simplefeed-forwardneuralnetworkmulti-classclassifier

Softmaxprobabilities

Outputlayery

y=softmax(Uh+b2)0Thehiddenlayerre-representstheinput—

itmovesinputsaroundinanintermediate

Hiddenlayerhlayervectorspace—soitcanbeeasily

h=ReLU(Wx+b1)classifiedwitha(linear)softmaxInputlayerx

45

xisresultoflookup

x(i,…,i+d)=Le

lookup+concat

logistic=“sigmoid”

ReLU=Rectified

LinearUnit

rect(z)=max(z,0)

NeuralDependencyParserModelArchitecture

Softmaxprobabilities

Outputlayery

y=softmax(Uh+b2)cross-entropyerrorwillbeback-propagated

totheembeddings

h=ReLU(Wx+b1)

Hiddenlayerh

Inputlayerx

lookup+concat

46

Dependencyparsingforsentencestructure

Neuralnetworkscanaccuratelydeterminethestructureofsentences,

supportinginterpretation

ChenandManning(2014)wasthefirstsimple,successfulneuraldependency

parser

Thedenserepresentations(andnon-linearclassifier)letitoutperformother

greedyparsersinbothaccuracyandspeed

47

G

Furtherdevelopmentsintransition-basedneuraldependencyparsing

Thisworkwasfurtherdevelopedandimprovedbyothers,includinginparticularatGoogle

?Bigger,deepernetworksw

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論