




版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
NaturalLanguageProcessing
withDeepLearning
CS224N/Ling284
ChristopherManning
Lecture4:DependencyParsing
LecturePlan
SyntacticStructureandDependencyparsing
1.SyntacticStructure:ConsistencyandDependency(25mins)
2.DependencyGrammarandTreebanks(15mins)
3.Transition-baseddependencyparsing(15mins)
4.Neuraldependencyparsing(20mins)
Reminders/comments:
InAssignment3,outonTuesday,youbuildaneuraldependencyparserusingPyTorch
StartinstallingandlearningPyTorch(Ass3hasscaffolding)
CometothePyTorchtutorial,Friday10am(undertheZoomtab,notaWebinar)
Finalprojectdiscussions–comemeetwithus;focusofThursdayclassinweek4
2
1.Twoviewsoflinguisticstructure:Constituency=phrase
structuregrammar=context-freegrammars(CFGs)
Phrasestructureorganizeswordsintonestedconstituents
Startingunit:words
the,cat,cuddly,by,door
Wordscombineintophrases
thecuddlycat,bythedoor
Phrasescancombineintobiggerphrases
thecuddlycatbythedoor
3
Twoviewsoflinguisticstructure:Constituency=phrasestructure
grammar=context-freegrammars(CFGs)
Phrasestructureorganizeswordsintonestedconstituents.
thecat
adog
largeinacrate
barkingonthetable
cuddlybythedoor
largebarking
talkto
walkedbehind
6
Twoviewsoflinguisticstructure:Dependencystructure
.Dependencystructureshowswhichwordsdependon(modify,attachto,orareargumentsof)whichotherwords.
Lookinthelargecrateinthekitchenbythedoor
7
Whydoweneedsentencestructure?I
Humanscommunicatecomplexideasbycomposingwordstogether
intobiggerunitstoconveycomplexmeanings
Listenersneedtoworkoutwhatmodifies[attachesto]what
Amodelneedstounderstandsentencestructureinordertobe
abletointerpretlanguagecorrectly
9
Prepositionalphraseattachmentambiguity
10
Prepositionalphraseattachmentambiguity
Scientistscountwhalesfromspace
Scientistscountwhalesfromspace
11
?
X
etc.
PPattachmentambiguitiesmultiply
.Akeyparsingdecisionishowwe‘a(chǎn)ttach’variousconstituents
.PPs,adverbialorparticipialphrases,infinitives,coordinations,
.Catalannumbers:Cn=(2n)!/[(n+1)!n!]
.Anexponentiallygrowingseries,whicharisesinmanytree-likecontexts:
.E.g.,thenumberofpossibletriangulationsofapolygonwithn+2sides
12
.Turnsupintriangulationofprobabilisticgraphicalmodels(CS228)….
Coordinationscopeambiguity
ShuttleveteranandlongtimeNASAexecutiveFredGregoryappointedtoboard
ShuttleveteranandlongtimeNASAexecutiveFredGregoryappointedtoboard
14
Coordinationscopeambiguity
15
Adjectival/AdverbialModifierAmbiguity
16
VerbPhrase(VP)attachmentambiguity
17
Dependencypathshelpextractsemanticinterpretation–
simplepracticalexample:extractingprotein-proteininteraction
demonstrated
nsubjccomp
resultsmarkinteractsnmod:with
det!
thatadvmodSasA
The
nsubjcaseconj:and
KaiCrythmicallywithKaiAandKaiB
conj:andcc
KaiC?nsubjinteractsnmod:withèSasA
KaiC?nsubjinteractsnmod:withèSasAconj:andèKaiA
KaiC?nsubjinteractsnmod:withèSasAconj:andèKaiB
[Erkanetal.EMNLP07,Fundeletal.2007,etc.]
18
2.DependencyGrammarandDependencyStructure
Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies
submitted
BillswereBrownback
ports
Republican
bySenator
onandimmigration
Kansas
of
19
DependencyGrammarandDependencyStructure
Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies
Thearrowsare
commonlytyped
withthenameof
grammatical
relations(subject,
prepositionalobject,
apposition,etc.)
20
submitted
nsubj:passaux
Billswere
obl
Brownback
nmod
portscaseflatappos
caseccconjbySenatorRepublican
onandimmigrationnmod
Kansas
case
of
DependencyGrammarandDependencyStructure
Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies
Anarrowconnectsahead
(governor,superior,
regent)withadependent(modifier,inferior,
subordinate)
Usually,dependencies
formatree(aconnected,acyclic,single-rootgraph)
21
submitted
nsubj:passaux
Billswere
obl
Brownback
nmod
portscaseflatappos
caseccconjbySenatorRepublican
onandimmigrationnmod
Kansas
case
of
Pā?ini’sgrammar(c.5thcenturyBCE)
Gallery:
/indexplus/image/L0032691.html
CCBY4.0
File:BirchbarkMSfromKashmiroftheRupavatraWellcomeL0032691.jpg
22
DependencyGrammar/ParsingHistory
.Theideaofdependencystructuregoesbackalongway
.ToPā?ini’sgrammar(c.5thcenturyBCE)
.Basicapproachof1stmillenniumArabicgrammarians
.Constituency/context-freegrammarisanew-fangledinvention
.20thcenturyinvention(R.S.Wells,1947;thenChomsky1953,etc.)
.ModerndependencyworkisoftensourcedtoLucienTesnière(1959)
.Wasdominantapproachin“East”in20thCentury(Russia,China,…)
.Goodforfree-erwordorder,inflectedlanguageslikeRussian(orLatin!)
.UsedinsomeoftheearliestparsersinNLP,evenintheUS:
.DavidHays,oneofthefoundersofU.S.computationallinguistics,builtearly(first?)dependencyparser(Hays1962)andpublishedondependencygrammarinLanguage
23
DependencyGrammarandDependencyStructure
ROOTDiscussionoftheoutstandingissueswascompleted.
.Somepeopledrawthearrowsoneway;sometheotherway!
.Tesnièrehadthempointfromheadtodependent–wefollowthatconvention
.WeusuallyaddafakeROOTsoeverywordisadependentofprecisely1othernode
24
Theriseofannotateddata&UniversalDependenciestreebanks
Browncorpus(1967;PoStagged1979);Lancaster-IBMTreebank(startinglate1980s);
Marcusetal.1993,ThePennTreebank,ComputationalLinguistics;
UniversalDependencies:
/
25
Theriseofannotateddata
Startingoff,buildingatreebankseemsalotslowerandlessusefulthanwritingagrammar(byhand)
Butatreebankgivesusmanythings
.Reusabilityofthelabor
.Manyparsers,part-of-speechtaggers,etc.canbebuiltonit
.Valuableresourceforlinguistics
.Broadcoverage,notjustafewintuitions
.Frequenciesanddistributionalinformation
.AwaytoevaluateNLPsystems
26
DependencyConditioningPreferences
Whatarethesourcesofinformationfordependencyparsing?
1.Bilexicalaffinities
2.Dependencydistance3.Interveningmaterial
4.Valencyofheads
Thedependency[discussionàissues]isplausible
Mostdependenciesarebetweennearbywords
Dependenciesrarelyspaninterveningverbsorpunctuation
Howmanydependentsonwhichsideareusualforahead?
ROOTDiscussionoftheoutstandingissueswascompleted.
27
DependencyParsing
.Asentenceisparsedbychoosingforeachwordwhatotherword(includingROOT)itis
adependentof
.Usuallysomeconstraints:
.OnlyonewordisadependentofROOT
.Don’twantcyclesA→B,B→A
.Thismakesthedependenciesatree
.Finalissueiswhetherarrowscancross(benon-projective)ornot
ROOTI’llgiveatalktomorrowonneuralnetworks
28
Projectivity
.Definitionofaprojectiveparse:Therearenocrossingdependencyarcswhenthe
wordsarelaidoutintheirlinearorder,withallarcsabovethewords
.DependenciescorrespondingtoaCFGtreemustbeprojective
.I.e.,byformingdependenciesbytaking1childofeachcategoryashead
.Mostsyntacticstructureisprojectivelikethis,butdependencytheorynormallydoes
allownon-projectivestructurestoaccountfordisplacedconstituents
.Youcan’teasilygetthesemanticsofcertainconstructionsrightwithoutthese
nonprojectivedependencies
WhodidBillbuythecoffeefromyesterday?
29
3.MethodsofDependencyParsing
1.Dynamicprogramming
Eisner(1996)givesacleveralgorithmwithcomplexityO(n3),byproducingparseitemswithheadsattheendsratherthaninthemiddle
2.Graphalgorithms
YoucreateaMinimumSpanningTreeforasentence
McDonaldetal.’s(2005)MSTParserscoresdependenciesindependentlyusinganMLclassifier(heusesMIRA,foronlinelearning,butitcanbesomethingelse)
Neuralgraph-basedparser:DozatandManning(2017)etseq.–verysuccessful!
3.ConstraintSatisfaction
Edgesareeliminatedthatdon’tsatisfyhardconstraints.Karlsson(1990),etc.
4.“Transition-basedparsing”or“deterministicdependencyparsing”
Greedychoiceofattachmentsguidedbygoodmachinelearningclassifiers
E.g.,MaltParser(Nivreetal.2008).Hasprovenhighlyeffective.
30
Greedytransition-basedparsing[Nivre2003]
.Asimpleformofgreedydiscriminativedependencyparser
.Theparserdoesasequenceofbottom-upactions
.Roughlylike“shift”or“reduce”inashift-reduceparser,butthe“reduce”actionsare
specializedtocreatedependencieswithheadonleftorright
.Theparserhas:
.astackσ,writtenwithtoptotheright
.whichstartswiththeROOTsymbol
.abufferβ,writtenwithtoptotheleft
.whichstartswiththeinputsentence
.asetofdependencyarcsA
.whichstartsoffempty
.asetofactions
31
Basictransition-baseddependencyparser
Start:σ=[ROOT],β=w1,…,wn,A=?
1.Shift
2.Left-Arcr
3.Right-Arcr
σ,wi|β,Aèσ|wi,β,A
σ|wi|wj,β,Aèσ|wj,β,A∪{r(wj,wi)}
σ|wi|wj,β,Aèσ|wi,β,A∪{r(wi,wj)}
Finish:σ=[w],β=?
32
Start:σ=[ROOT],β=w1,…,wn,A=?
1.
2.
3.
Shift
Left-Arc
Right-Arcr
σ,wi|β,Aèσ|wi,β,A
σ|wi|wj,β,Aè
σ|wj,β,A∪{r(wj,wi)}σ|wi|wj,β,Aè
σ|wi,β,A∪{r(wi,wj)}
Finish:σ=[w],β=?
r
I
atefish
ate
fish
I
I
ate
fish
Arc-standardtransition-basedparser
(thereareothertransitionschemes…)
Analysisof“Iatefish”
Start
[root]
Shift
[root]
Shift
[root]
33
ate
fish
[root]
ate
[root]
A+=
obj(ate→fish)
A+=
root([root]→ate)
Finish
Arc-standardtransition-basedparser
Analysisof“Iatefish”
LeftArc
Iate
[root]
Shift
[root]
RightArc
[root]atefish
RightArc
ate
34
[root]
A+=
nsubj(ate→I)
atefish
[root]
ate
[root]
MaltParser[NivreandHall2005]
.Wehavelefttoexplainhowwechoosethenextaction
.Answer:Standback,Iknowmachinelearning!
.Eachactionispredictedbyadiscriminativeclassifier(e.g.,softmaxclassifier)overeachlegalmove
.Maxof3untypedchoices;maxof|R|×2+1whentyped
.Features:topofstackword,POS;firstinbufferword,POS;etc.
.ThereisNOsearch(inthesimplestform)
.Butyoucanprofitablydoabeamsearchifyouwish(slowerbutbetter):Youkeepkgoodparseprefixesateachtimestep
.Themodel’saccuracyisfractionallybelowthestateoftheartindependencyparsing,
but
.Itprovidesveryfastlineartimeparsing,withhighaccuracy–greatforparsingtheweb
35
binary,sparsedim=106–107
ConventionalFeatureRepresentation
0
0
0
1
0
0
1
0
…
0
0
1
0
Featuretemplates:usuallyacombinationof1–3
elementsfromtheconfiguration
s1.w=good∧s1.t=JJ
Indicatorfeaturess2.w=has∧s2.t=VBZ∧sl.w=good
lc(s2).t=PRP∧s2.t=VBZ∧s1.t=JJ
36
ROOTShesawthevideolecture
012345
Gold
12Shensubj
20sawroot
35thedet
45videonn
52lectureobj
EvaluationofDependencyParsing:(labeled)dependencyaccuracy
Acc=#correctdeps
#ofdeps
UAS=4/5=80%LAS=2/5=40%
Parsed
1
2
3
4
5
2
0
4
5
2
She
saw
the
video
lecture
nsubj
root
det
nsubj
ccomp
37
Handlingnon-projectivity
.Thearc-standardalgorithmwepresentedonlybuildsprojectivedependencytrees
.Possibledirectionstohead:
1.Justdeclaredefeatonnonprojectivearcs
2.Usedependencyformalismwhichonlyhasprojectiverepresentations
.ACFGonlyallowsprojectivestructures;youpromoteheadofprojectivityviolations
3.Useapostprocessortoaprojectivedependencyparsingalgorithmtoidentifyandresolvenonprojectivelinks
4.Addextratransitionsthatcanmodelatleastmostnon-projectivestructures(e.g.,
addanextraSWAPtransition,cf.bubblesort)
5.Movetoaparsingmechanismthatdoesnotuseorrequireanyconstraintsonprojectivity(e.g.,thegraph-basedMSTParserorDozatandManning(2017))
38
4.Whydowegainfromaneuraldependencyparser?
IndicatorFeaturesRevisited
.Problem#1:sparse
.Problem#2:incomplete
.Problem#3:expensivecomputation
dense
0.10.9-0.20.3…-0.1-0.5
dim=~0o0r0ethan95%ofparsingtimeisconsumedby
featurecomputation
s1.w=good∧s1.t=JJNeuralApproach:
s2.w=has∧s2.t=VBZ∧sl.w=goodlearnadenseandcompactfeaturerepresentation
lc(s2).t=PRP∧s2.t=VBZ∧s1.t=JJ
39
Aneuraldependencyparser[ChenandManning2014]
.ResultsonEnglishparsingtoStanfordDependencies:
.Unlabeledattachmentscore(UAS)=head
.Labeledattachmentscore(LAS)=headandlabel
ParserUASLASsent./s
MaltParser
89.8
87.2
469
MSTParser
91.4
88.1
10
TurboParser
92.3
89.6
8
C&M2014
92.0
89.7
654
40
come
●
NNS(pluralnoun)shouldbeclosetoNN(singularnou.
nummod(numericalmodifier)shouldbeclosetoamod(adjectivemodifier).
waswere
Firstwin:DistributedRepresentations
.Werepresenteachwordasad-dimensionaldensevector(i.e.,wordembedding)
.Similarwordsareexpectedtohaveclosevectors.
.Meanwhile,part-of-speechtags(POS)anddependencylabelsarealsorepresentedas
d-dimensionalvectors.
.Thesmallerdiscretesetsalsoexhibitmanysemanticalsimilarities.
●
.
good
●is
41
s1
s2
b1
lc(s1)rc(s1)lc(s2)rc(s2)
JJ
VBZNN
?
?
PRP
?
Aconcatenation
ofthevector
representationofalltheseisthe
neural
representationofaconfiguration
??
++
}
?
nsubj?
ExtractingTokens&vectorrepresentationsfromconfiguration
.Weextractasetoftokensbasedonthestack/bufferpositions:
wordPOSdep.
good
has
control
?
?
He
?
42
Secondwin:DeepLearningclassifiersarenon-linearclassifiers
.Asoftmaxclassifierassignsclassesy∈Cbasedoninputsx∈?dviatheprobability:
.Wetraintheweightmatrixw∈?#根dtominimizetheneg.logloss:∑i?logp(yi|xi)
.TraditionalMLclassifiers(includingNa?veBayes,SVMs,logisticregressionandsoftmax
classifier)arenotverypowerfulclassifiers:theyonlygivelineardecisionboundaries
Thiscanbequitelimiting
àUnhelpfulwhenaproblemiscomplex
Wouldn’titbecooltogetthesecorrect?
43
NeuralNetworksaremorepowerful
?Neuralnetworkscanlearnmuchmorecomplexfunctionswithnonlineardecisionboundaries!
?Non-linearintheoriginalspace,linearforthesoftmaxatthetopoftheneuralnetwork
VisualizationswithConvNetJSbyAndrejKarpathy!
44
/people/karpathy/convnetjs/demo/classify2d.html
Logloss(cross-entropyerror)willbeback-propagatedtotheembeddings
Simplefeed-forwardneuralnetworkmulti-classclassifier
Softmaxprobabilities
Outputlayery
y=softmax(Uh+b2)0Thehiddenlayerre-representstheinput—
itmovesinputsaroundinanintermediate
Hiddenlayerhlayervectorspace—soitcanbeeasily
h=ReLU(Wx+b1)classifiedwitha(linear)softmaxInputlayerx
45
xisresultoflookup
x(i,…,i+d)=Le
lookup+concat
logistic=“sigmoid”
ReLU=Rectified
LinearUnit
rect(z)=max(z,0)
NeuralDependencyParserModelArchitecture
Softmaxprobabilities
Outputlayery
y=softmax(Uh+b2)cross-entropyerrorwillbeback-propagated
totheembeddings
h=ReLU(Wx+b1)
Hiddenlayerh
Inputlayerx
lookup+concat
46
Dependencyparsingforsentencestructure
Neuralnetworkscanaccuratelydeterminethestructureofsentences,
supportinginterpretation
ChenandManning(2014)wasthefirstsimple,successfulneuraldependency
parser
Thedenserepresentations(andnon-linearclassifier)letitoutperformother
greedyparsersinbothaccuracyandspeed
47
G
Furtherdevelopmentsintransition-basedneuraldependencyparsing
Thisworkwasfurtherdevelopedandimprovedbyothers,includinginparticularatGoogle
?Bigger,deepernetworksw
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 代銷(xiāo)意向合同范本
- 二手車(chē)線上交易合同范本
- 眾籌股東合同范本6
- 買(mǎi)賣(mài)帶表格合同范例
- 加工中心保養(yǎng)合同范本
- 兄弟共同承包土地合同范本
- 辦公電腦合同范本
- 代理執(zhí)行合同范本
- 共同買(mǎi)地皮合同范本
- pc吊裝合同范本
- 2024中考英語(yǔ)1500詞匯默寫(xiě)匯總表練習(xí)(含答案)
- 麥琪的禮物全面英文詳細(xì)介紹
- 銀行前端工作總結(jié)
- 初中數(shù)學(xué)代數(shù)式
- 數(shù)字資產(chǎn)培訓(xùn)課件
- 2023年山東棗莊滕州市魯南高科技化工園區(qū)管理委員會(huì)招聘10人筆試參考題庫(kù)(共500題)答案詳解版
- 制程無(wú)有害物質(zhì)識(shí)別及風(fēng)險(xiǎn)評(píng)估表
- 建筑構(gòu)造(下冊(cè))
- 部編人教版歷史八年級(jí)下冊(cè)《三大改造》省優(yōu)質(zhì)課一等獎(jiǎng)教案
- 金工實(shí)訓(xùn)教學(xué)-數(shù)控銑床及加工中心加工
評(píng)論
0/150
提交評(píng)論