版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
PAGEPAGE1競選演講中的詞匯特征分析AbstractWiththecontinuousdevelopmentofcomputerscience,itisbecomingmoreandmorecommonthatothersubjectsemploycomputerscienceforfurtherresearchandmetrologicallinguisticsisoneofthem.Onaccountofthedevelopmentofcomputerscience,themethodofdataminingforthecorpusthroughcomputertechnologyisderived.Throughtheapplicationoftextclustering,correlationanalysis,visualization,andotherrelatedtechnologies,wecanmorescientificallyunderstandthecontentandmeaningcontainedinthetext.Nowadays,textminingtechnologyplaysanimportantroleinhelpingresearchersanalyzeusercomments,literaryworks,historicaldata,publicopinionanalysis,etc.,especiallyinprocessingalargescaleoftexts.Asanimportantgovernmentreportandpartofthepresident'spersonalimage,theinauguralspeechoftheAmericanpresidenthasreceivedwideattentiononceitisreleased.Therefore,itisverynecessarytoadoptascientificmethodoflanguagemeasurementtostudyit.Thepurposeofthisstudyistoconductwordfrequencystatistics,partofspeechtagging,thecalculationofaveragesentencelength,andsentence-by-sentencesentimentanalysisof12inauguralspeechesfromRoosevelttoBiden.Modelingiscarriedoutinaquantitativewaytoevaluatethecomprehensivenessandemotionaltendencyofeachspeech.Ahorizontalcomparisonismadebetweenthetworesearchdirections,inordertofindcertainregularityandtrytoexplainthedevelopmenttrendinconnectionwiththeactualsituation.Toachievethis,wecollecttherequiredspeechscriptsfromtheInternetasresearchtexts,andtheoriginaltextisnecessarilypreprocessed,includingwordsegmentation,sentencesegmentation,andremovalofpunctuationmarks,tocreateatextsetthatcanbeusedforstatistics.ThenweusetheNTLKtoolkitbasedonPythontodesignandtextindexanalysisprogram.Thetextsetisputintotheprogramforprocessing,andtheresultswerestatisticallysortedoutwithExcel.Finally,thequantitativedataobtainedfromdifferentspeechesarecomparedtoanalyzethedifferencesandtrendsinthelinguisticcharacteristicsofinauguralspeechesindifferentyears.Theresultsshowthattherearedifferencesinthetextualcharacteristicsandemotionalorientationofinauguraladdressesindifferentyears.Butthisdifferenceshowsupdifferentlyindifferentaspects.Duetothelimitationofspeechstyle,theproportionofpronoun,article,preposition,andotherwordswithweakermeaninginspeechesvarieslittlefromyeartoyear.Theaveragesentencelength,ontheotherhand,changesprecipitate-likeinacertainperiodoftime.Inaddition,wefindthatovertime,theemotionaltendenciesofinauguralspeechesgenerallyshowadownwardtrend,thatis,theybecomemoreandmorenegative,buttheyarestillgenerallypositive.Thesefindingshelpustounderstandtrendsingenerictextsintermsofdata,andhavethepotentialtohelpresearchersunderstandchangesintheglobalordomesticenvironmentfromsimilargovernmentdocumentsandinterpretthecharacteristicsoftextsincontext.Keywords:NTLK,python,Textmining,emotionalorientation,inauguralspeech,partofspeechtaggingContents摘要IAbstractII1introduction12.Literaturereview43.Method63.1Wordfrequencystatistics63.2POSTagging63.3Averagesentencelength73.4Sentimentanalysis73.4.2Trainingoftheclassifiers74.Resultanddiscussion94.1Wordfrequencystatistics94.1.1Resultandanalysis94.2POSTagging104.2.1Resultandanalysis104.3Averagesentencelength124.3.1Resultandanalysis124.4Resultsandanalysisofsentimentanalysis135.Conclusion165.1Majorfindings165.2Implicationsofthestudy165.3Limitationsandsuggestions17Acknowledgments19References20Appendix221introductionWiththeboomingdevelopmentofcomputerscience,applyingcomputerprogrammingtotextualanalysishasbeenbecomingmoreandmorecommon.Inthisprocess,moredetailscanbediscoveredbyalgorithmsthanwhatisrevealedbyvetting.Moreover,algorithmsshowusaviablemethodtoquantitativelyprocesstext,whichisinsurmountablewiththehelpofconventionthatcompletelyreliesonmanpower.Moresignificantly,theresultscanthoroughlyeliminatetheinfluencetriggeredbyemotionaltendencyandpresettingstandpoint.Onaccountoftheseadvances,textminingplaysamoreandmoreimportantpartincosmicallycontextprocessingforthepurposeofpolicymakingandriskevaluation.Infact,somestudieshavealreadybeenconductedinordertofindoutsomechangingfeaturesinvariablefields.Textminingisemployedinmassmediatofigureoutpeople5sfeelingaboutdepression(2021,Yu),topredictthefashiontrendinthenearfuture(An,2020),andeventoevaluatethevalidityofNPT(NuclearNon-ProliferationTreaty)accordingtothosestatementsmadebyspokesmen.Besides,textminingisalsobeingusedinotherspecializedfieldsincludinglaw,publicopinion,andhistorystudy.Amongthesestudies,manyfocusontheresponsesthatcustomershavemadetocertainproductstotestwhetherthemarketstrategyissuccessfulandsomeofthempayattentiontothereplyafterreceivingmedicaltreatment.Inshort,themajorityofthestudiestryhardtoevaluatetheserviceofferedbycompaniesorsomethingelseaccordingtofragmentedcomments.However,seldomresearchesaddresstheimportanceofusingalgorithmstodealwithlongtext,whichismoredifficultonaccountofthevariationofemotionaltendencyinandsubjectchanges.Infact,therearealsostudiesthatfocusontheanalysisoflongtext,buttheyfocusmoreontheindexanalysisratherthantheoverallsentimenttendency.Thesestudiesconcentrateonpolicymakingandspecialtextsuchasarticlesoflawandcontracts.However,thereisneveracombinationofemotionaltendencyandindexanalysis.Wethinkthereismoretorevealinthespeechesofthepresidentialelections,whichnotonlycanhelpusunderstandthecharacteristicsofthepresidentsbutalsoreflecttheurgentneedofthecurrentsocietyinAmerica.Inordertomakeourconclusionmorepersuasive,wehavechoseninauguraladdressesfromFranklinDelanoRoosevelttoJosephRobinetteBidenbetween1933and2021astextmaterial,sinceonlytheelectedpresidentshavetheabilitytorepresentthewillofthepublictoacertaindegreeandthetendencytowardinternationalaffairs.What’smore,webelievethattheimageofanidealleaderofthenationinpeople’smindsisalwaysshifting.Apartfromthetextualfeatures,wehopetofindthehiddenregularitybelowthegreatesteventsinAmerica.Thefirstsectionbrieflyintroducestheresearchbackground,purposeandsignificance,andstructureofthisstudy.Thesecondsectionourstudy.
presentssomepreviousstudiesandtriestoestablishatheoreticalbasisforInthesecondpart,weanalyzedthefrequencyofeachwordinthearticleandtaggedeachword.Amongthem,wenoticedthatmanywordsinthespeecharerepeatedandcontainlowinformation,forexample,restrictivearticles,meaninglessconjunctions,referentialpronounswithrepeatedmeanings,etc.Thesewordsoftenmeantherepetitionoftheinformationtobeconveyedwithinthelimitedlengthofthetext.Atthesametime,italsomakesiteasierfortheaudiencetounderstandspeeches.Therefore,weassumethattheproportionofthistypeofwordsinthewholetextreflectstheintelligibilityofthespeechtosomeextent,andwehavemadecorrespondingcalculationsforeachspeechandobtainedcertainresults.Consideringthatthelengthofasentenceusuallyhighlycontributestothecomprehensionofatext,wedecidedtoincludetheaveragesentencelengthintothecalculationindex.Attheendoftheperiod,thelengthofeachsentenceisanalyzedandtheaveragevalueiscalculated.Consideringthatwordlengthalsohasanimpactonthecomprehensionofthetext,wereferthesentencelengthheretothewordlengthratherthanthenumberofwordsinthesentence.Inthethirdpart,sincetheproportionandaveragesentencelengthmentionedinthefirsttwopartsareallusedtomeasurethecomprehensivenessofthetextfromacertainperspective,wehopetoexaminethecomplexityofthespeechbycombiningthetwoaspects.Intheprocess,wedidanormalizationprocess,andhopetofindsomepatternsfromit.Inthefourthpartofthearticle,wefocusontheemotionaltendencyofthearticle,hopingtobeabletounderstandthefickleemotionaltendencyininauguraladdresses,whichcanhardlybefairbyintuition,soweadoptsentimentanalysisinordertofigureoutpositivityornegativityindifferenttext.Inthisway,wewanttogetasenseofthegeneralmoodoftheinauguraladdressandthesituationfromyeartoyear.Inthefinalpart,majorfindingswillbeillustrated.What’smore,wewilltrytoconcentrateontheimplicationsandlimitationsofourworkandgivesomeadviceonstudiesinthefuture.2.LiteraturereviewTextmininganalysisandSemanticnetworkanalysisareusedinmanyfieldsinvolvingdecisionmaking,publicopinionsurvey,ideologicaldifferencesofpreferenceandfashiontrend,etc.Oneresearch(unknown,2021)focusesonthechangeondifferentsubjectsofmembersinFOMC,whichturnsoutthatthediscussionabouteconomicmodelingbecamemajorityduringGFC(GlobalFinancialCrisis),andafterthat,thepointshiftstothebankingsystem.Theother(Yu,2021)investigatesthepublic'sattitudetowardsdepressiononWeibo,withthehelpofTextmind,theyanalyzelanguagefeaturesandsemanticcontent.Semanticcontentincludes:seriouslyaffectingstigma,fightingstigma,callingforunderstanding,andprovidingsupport.Overtime,socialsupport(asopposedtoprofessionalsupport)increasedsignificantly,whilesupportfortheseriousconsequencesofdepressiondecreasedsignificantly.Inanotherstudy(Christopher,2020),thestatementsattheNPTReviewConferenceareusedtotesttheevidenceofdifferencesofpreference,whicharedeclarationsofstates'positionsontheNPT.TheyuseWordfishtomeasurepreferencesfortheTREATYontheNon-ProliferationofNuclearWeapons,amethodoftenusedtoassessideologicalpreferencesinelectoraldeclarations.Itisconcludedthattherehasbeenamarkeddivergenceovertimebetweenthepreferencesofnuclearandnon-nuclearstatesfortheNPTregime.Wefoundnoevidencetosupportthisaccusation.OurresultssuggestthatthecollapseoftheTREATYontheNon-ProliferationofNuclearWeapons(NPT)regime,drivenlargelybydifferentpreferences,isunlikelytooccurinthenearterm.Study(An&Park,2020)aimstoidentifyfashiontrendswithdesigncharacteristicsandprovideconsumer-drivenfashiondesignapplicationsthroughtextminingandsemanticnetworkanalysis.Theylookatthetextoffashionblogsthatreflectconsumerneedsandanalyzecommentsaboutfashioncollections.Theyusesemanticwebanalysistovisualizethedominanttrendsofthelatestseasonandtheircorrespondingdesignfeaturesandinterprettheresultsforclothingdesign.Thus,thestudyprovidesaconsumer-driventrendanalysisanditsapplicationtofashiontradersanddesigners.Newapproachestoresearchcontributetoadeeperunderstandingofconsumerfashionpreferencesandadddimensionsoffeasibilityandeffectivenesstoforecastingtechniquesthatrevealpasttrendsandpredictfutureones.Anotherstudy(Yu,2013)aboutlegaltexthasdiscoveredthecharacteristicsthatmainlyinvolveterminology,legalese,andsentenceconstruction.Ingeneral,thesearecausedbytheirspeciallogicandpeculiaruse.Therearealsoscholarstryingtofindoutinterestingthingsthroughhistoricaldocuments.Luisoneofthem.Thisstudy(Lu,2020)mainlyusestheestablishedparallelcorpustoconducttextsearchandstatisticsontheresearchcontentandkeywordsandcombineslandscapediscoursetoanalyzeissuessuchasidentityandpowerrelationshipreflectedinthetranslation.Inthecollationstage,AdobeAcrobat9.0softwareisusedforOCRrecognitionandtransliterationofthecollectedcorpus,andmanualproofreadingofthetransliteratedcorpusiscarriedouttobuildaChinese-Englishparallelcorpus,includingamulti-versionparallelcorpuswiththreemainlinesandasmallcorpuswiththedescriptionof"UkangCountry"asthecore.CUCParaConcisusedintheparallelprocessingofcorpus,andthesoftwarecanrealizetheparallelretrievalofonesourcetextandupto16translationsinmultiplelanguages,whichprovidesaconvenientretrievalmethodandavisualeffectforthecomparativeanalysisofthisstudy.Atthesametime,otherrelevanthistoricalmaterialsandtheannotationoflinenotesthatarenotincludedintheparallelcorpusarealsoconsulted,soastomaketheresearchresultsmorescientificandobjective.3.MethodThischapterconsistsofthreeparts:wordfrequencystatistics,thecalculationofaveragesentencelength,andsentimentanalysis.Beforeweintroducethedetailsofprocessing,whatissupposedtobeaddressedisthetextwechoose.Tomakesurethatsufficientcontextisputintotheanalysis,wecollectinauguraladdressesfromFranklinDelanoRoosevelttoJosephRobinetteBidenintheagebetween1933and2021astextmaterial.WordfrequencystatisticsInagivendocument,termfrequency(TF)referstothenumberoftimesagivenwordappearsinthedocument.Themoretimesawordappears,themoreitisthecorevocabularyofthedocument.Thiswordisofgreatsignificanceforaquickunderstandingofthearticle.What’smore,thestepthatwetrytofigureoutthedistributionofwordsinthetextistheprerequisiteoffollowingprocesses.POSTaggingPart-of-speechtagging(POStagging)isalsocalledgrammaticaltaggingorwordcategorydisambiguation.Itisatextdataprocessingtechniquethatmarksthepartofspeechofwordsinacorpusaccordingtotheirmeaningandcontext.InputasentenceorparagraphintothecorrespondingmoduleofNLTK,whichcanmarkeachwordinthesentenceorparagraphwithitscorrespondingpartofspeech,suchasverb,noun,adjective,adverb,etc.AveragesentencelengthInwrittenlanguage,textreadabilityisinverselyproportionaltoaveragesentencelength(LiuYing,2014).Thesmallertheaveragesentencelength,thefewerlongwordsinthetextandthestrongerthereadability,andviceversa.Toknowthereadabilityofthetext,evaluatingtheaveragesentencelengthisofvitalimportance.Sincethelongerthewordis,theharderthecomprehensionwillbe,whatweputintoconsiderationisthelengthofthesentencecalculatedbycharacterratherthanthenumberofthewords,sowecanIncludethewordlengthmetricwithinthismetric.SentimentanalysisThischapterconsists2parts:datacollection,trainingoftheclassifiers.Wewillfollowtheaboveordertoexplainhowwecollectthedataweneed,themethodthattheclassifieristrained.3.4.1Datacollection.Sincedividingthetextbysentenceistheprerequisiteofthecalculationoftheaveragelengthofsentence,theprocessedmaterialcanbedirectlyputintouseinthissectionasrawmaterial.3.4.2TrainingoftheclassifiersAftercollectingdata,westarttoanalyzetheemotioninspeeches,whichisadichotomyprocessindicatingwhetheragivensentenceispositiveornegative.WhatusedthereisNLTK(NaturalLanguageToolkit),apythonlibrary.Firstly,weinputamoviereviewdatasetprovidedbyNLTK,whichcontainstwotypesofreviews:positiveandnegative.TheywillbeputintoaNaiveBayesClassifier,thedataisdividedintotestsetsandtrainingsets,trainingsethas1600samples:800positiveand800negativeandtestsethas400samples:200positiveand200negative.Aftertraining,there’saNaiveBayesClassifierforproductswith0.735accuracieswherewecanusefortheemotionpredictionofeverysentenceinspeech.TheflowchartisshownbelowFigure3.4.1flowchartofprogramming4.ResultanddiscussionTakingeveryindexwementionedabove,weanalyzethewhole12speechesfromdifferentperspectives.Inthissection,theresultswillbepresentedandfurtherprocesswillbecarriedoutforthepurposeofclearquantitativemetricsWordfrequencystatisticsToacquiretheresultaccurately,itwillneverbeprocessablewithoutthepretreatmentofrawmaterial.Usingatokenizertodividethesentencesinthetextintowordbywordwillbenecessary.Afterthat,thestatisticsofdifferentwordfrequenciescanbecarriedoutsmoothly.WiththehelpofNTLK(NaturalLanguageToolkit)inpython,weareabletohandleit.ResultandanalysisAfterprocessing,partsoftheresultswillbeshownfollowingduetothemassivedata.WORDSTIMESFREQand623.416to462.5344I442.4242you351.9284is301.6529the291.5978a261.4325of241.3223that241.3223very241.3223will221.2121people211.157great170.9366have170.9366Chart4.1.1WordfrequencystatisticsforTrump(partly)WORDSTIMESFREQthe1265.1116of953.854to622.5152in542.1907we481.9473our441.785that421.7039and391.5822a311.2576as291.1765Chart4.1.2WordfrequencystatisticsforNixon(partly)Itiseasytofindthatthewordsthatappearmostoftencanhardlystandforthemaincontentofthetext.Besides,thewordswithhigherfrequencycanoftenbearticles,conjunctions,andpronouns,whichcontributelittletograspthetext.POSTaggingAfterapplyingtokenizertothetextmaterial,POSTaggingcanbeeasierforustounderstand.Itcanbecompletedbyusingthebuilt-incontrastcorpusandalgorithmmoduleofNPLK.4.2.1ResultandanalysisPartoftheresultswillbeexhibitedasfollows.NationNOUNthereADVTheyPRONThisDETtimeNOUNWePRONCongressNOUNdaysNOUNdisciplineVERBdutyNOUNeffortsNOUNNOUNNOUNemergencyPOSTaggingforRoosevelt(partly)aheadADVaimedVERBalikeADJalliesNOUNalmostADValoneADJAmericanADJArgonneNOUNArlingtonNOUNarsenalADJarsenalsNOUNPOSTaggingforReagan(partly)Takingrestrictivearticles,meaninglessconjunctions,referentialpronounswithlowinformationorratherlessmeaningfulcomparedtothestemofthesentencesinthecontentintoconsideration,weassumethatthegreatertheproportionofrepetitionofsuchwordsinthewholetext,thehigherthecomprehensionofthetext.Sinceknowingthefrequencyandtheproportionofwordsissignificanttounderstandhowspeechestookbytheaudience,weintroduce:Wtorepresenttheproportionofsuchwordswithweakmeaning.wtorepresentthenumberofwordswithweakmeaning.Ttorepresentthetotalnumberinatext.Herewehave:Aftercounting:NameRooseveltTrumanKennedyNixonFordCarterReaganClintonBushw8249575888983975201035646647total18942294117821438881237242416001598ObamaTrumpBiden10556291109239715212583Chart4.2.3countingofweakmeaningandtotalwordsAndwehaveproportion(Thefigurescutofftotwodecimals):NameRooseveltTrumanKennedyNixonFordCarterrate0.430.410.490.410.440.42ReaganClintonBushObamaTrumpBiden0.420.400.400.440.410.42Chart4.2.4proportionofweakmeaningwordsAccordingtothecalculation,wecanseethatthevariationoftheparameteristheminorone.Thatmaybebecauseofthestyleofthespeech.Long,complexsentencesarenotappropriateinaspeechscript.AveragesentencelengthWecounteverysentencethatendswithafullstopandthelengthofeachsentence.Believingthatthelengthofeachsentencerevealstheintelligibility,wecanhaveafurtherdeductionthattheaveragelengthofsentenceplaysanimportantroleforustounderstandtheintelligibilityofthewholeessay.ResultandanalysisAfterprocessingwithNTLK’ssentencetokenizer,wehavetheresultshowingbelow(Thefigurescutofftotwodecimals).NameRooseveltTrumanKennedyNixonFordCarterReaganlength123111.30129.65101.80101109.61102.37ClintonBushObamaTrumpBidenaverage92.7399.87111.3052.6169.64100.41Chart4.3.1averagelengthofspeechesInordertomakethedisplayofdatamoreintuitive,wedecidedtonormalizethedata.WeapplyMin-MaxNormalization,whichisalineartransformationoftheoriginaldatasothattheresultingvaluerangesbetween[0-1].Wehavetheformulabelow.
Lx—minX=max—minTheresultsare(Thefigurescutofftotwodecimals):NameRooseveltTrumanKennedyNixonFordCarterX*0.910.7610.630.620.73ReaganClintonBushObamaTrumpBiden0.640.520.610.7600.22Chart4.3.2resultofnormalizationTomakethesituationclearer,thebargraphisillustratedbelow.normalizationFigure4.3.1resultofnormalizationFromthegraph,itisclearthattheaveragelengthofsentencehasdroppedsharplyinTrump’sspeechandBiden’sspeechhardlyreachesthelevelofpreviousspeechesbeforeTrump,whichmayrevealthetendencyofeasierspeeches.Theresultmayalsoreflectthatthereisahugedifferencebetweenthemandtheirpredecessors.ResultsandanalysisofsentimentanalysisPartoftheresultsisasbelow.sentence:"IflldefendAmericaandIwillgiveall,allofyou"emotion:Negativeoossibility:0.55sentence:"KeepeverythingIdoinyourservice,thinkingnotofpowerbutofpossibilities,notofpersonalinterest,butthepublicgood'emotion:Positivepossibility:0.63sentence:"AndtogetherweshallwriteanAmericanstoryofhope,notfear"emotion:Positivepossibility:0.59sentence:"Ofunity,notdivision"emotion:Negativepossibility:0,5Figure4.4.1partoftheresultsofprogrammingInordertohaveasimplequantitativedescriptionoftheresults,weassumethatNegativeequals-1andPositiveequals1.Thisstephelpsusbetterunderstandtheemotionaltendencywhenitcomestothewholetext.Todescribeitmathematically,wehavetheformulabelowforfurtherprocessing.InthisformulaErepresentsthevaluesthatstandforemotionaltendency,thatis,positiveornegative.Prepresentsthepossibilityoftheassessment.irepresentsthenumberofthesentenceinatext.Erepresentsthevaluethatstandsfortheemotionaltendencyofthewholetext.Theresultsareshownbelow(Thefigurescutofftotwodecimals).NameRooseveltTrumanKennedyNixonFordCarterEav0.550.720.580.560.400.67ReaganClintonBushObamaTrumpBiden0.470.510.540.550.110.30Chart4.4.1theaverageemotiontendencyFigure4.4.2theaverageemotiontendencyItcanbeclearlyseenfromthechartthattheaverageemotiontendencyshowedadecreasingtrendyearbyyearonthewhole,whichcanberelatedtothedeterioratingdomesticandinternationalenvironment.Eventheleadersofthecountryarenotoptimisticanymore.TheresultalsorevealsthattheimageofthepresidentofAmericaisshiftingfrompositivetonegativetosomedegreeandthiscanalsohelptoexplainthereasonwhyAmericansarenotwillingtovoteonElectionDay.5.ConclusionInthissection,themajorfindings,implicationsofthestudy,limitations,andsuggestionsforfuturestudywillbesummarized.MajorfindingsThisstudyisbasedonNLTKtocarryoutquantitativeanalysisandfurtherdataprocessingonthetextofAmericanpresidentialspeechesfromfouraspects:wordfrequency,partofspeechtagging,averagesentencelength,andsentimentanalysis.Byapplyingthetextminingmethodtothetextset,wehaveresultspresentedasfollows.Forthefirstpart,intheprocessofevaluatingtheintelligibilityofthespeeches,wehavefoundthatthemostfrequentwordsthatappearinthetextarecommon,whichmaybeexplainedbythestyleofthespeeches.Besides,theproportionofthewordsinweakmeaningonlyvaryonasmallscale.However,theaveragelengthofeachsentenceindifferentspeecheshasexperiencedupheavalafterObama,whichmeansthatthepresidentsafterObama,Trump,andBiden,prefertoillustratetheirviewswithrathershortsentences.Theshortsentencesmaybeinformalonsuchoccasionsbutseemstobemoreacceptablebycitizens,whichalsoindicatesthetendencythatthegovernmentlanguageiscloserandclosertodailyexpression.Alltheseresultsshowusthetendencythattheintelligibilityofinauguraladdressesisincreasing.Forthesecondpart,wehaveassessedtheemotionaltendencyofeverysentenceinallthesespeecheswiththehelpoftheclassifierwehavetrainedpreviously.Assumingthattheaverageemotiontendencyindicatesthetendencyofthewholetext,wenoticethattheaverageemotiontendencyshowsadecreasingtrendyearbyyearonthewholebutitispositiveingeneral,whichtellsusthattheleadersofthecountryarenotasoptimisticasbefore.ImplicationsofthestudyByusingNLTKtoexplorethedifferenceofinauguraladdressesasawhole,thepresentstudycanbehelpfulforotherstudies.First,thecompositionofpartsofspeechofdifferenttypesoftextisdifferent,whichcanbeusedfortextclassification.What’smore,theextractionofwordfrequencycanhelpusbetterunderstandthemaincontentofthetextfromtheperspectiveofstatistics.Toacertainextent,theaveragesentencelengthofatextualstudycanalsodescribethepragmatichabitsofwriters,anditishelpfultounderstandthewritingstyleofthewriter.Second,Sentimentanalysishasbeenwidelyusedintheprocessingofsinglesentencesorshorttexts,buttheoverallsentimentanalysisoflongtextsishardlyseen.Thisstudyprovidesapreliminarymethodforsentimentanalysisofthelongtext.Inaddition,thesentimentanalysisofthelongtextisalsohelpfultounderstandthecharactercharacteristicsofauthorsandcanbeappliedtotheanalysisofthetextofsubjectivepointofvieworpersonaldescription.LimitationsandsuggestionsThisstudyfocusonthequantitativeanalysisofthetextandmodelingofevaluationoftextmathematically.However,therearesomelimitationsinthisstudy.Toseverfurtheraccuratestudy,wewilloffersomepossiblesuggestionsforoptimizingpurposes.First,afterthestatisticalresultsofwordfrequencyappear,ascreeningcanbecarriedouttoextractthewordswithstrongrelevancetothemaincontentandtopicofthearticle,soastofacilitateustograspthemaincontentofthetext.Second,intheprocessofpartofspeechtagging,consideringthesituationofmultiplepartsofspeechofthesameword,referencetothecontextshouldbeaddedtoimprovetheaccuracyofpartofspeechtagging.Third,themathematicalprocessingofstatisticaldatainthispaperisonlyforhorizontalcomparisonofthesametypeoftextandisnotastandardizedindicatortomeasurethestatisticalresults.Therefore,cautionshouldbetakenintheprocessofapplyingittoothertypesoftextprocessing.Fourthly,themathematicalprocessingofstatisticaldatarequiresmoretheoreticalbasisandexperimentaldata,soastoevaluatetheestablishedmodelinordertoachievefurtheroptimization.AcknowledgmentsIthasbeenagreatchallengeformetoreachthisstage.Theconflictbetweenmajorandminorhasalwaysharassedme.However,luckilyIhavethechancetoachievemyfinalgoal.LifeinCQUhastaughtmealot,whichwillberemarkableinmyentirelife.Firstandforemost,IwouldliketogivethankstoProfessorYang,withoutwhoseguidanceIwouldneverfinishthispapersuccessfully.Intheprocessofwritingthepaper,wheneverIhadanyquestions,ProfessorYangwouldpatientlyguideme,givingmealotofsuggestionsinthepaperwriting,sothatIcansuccessfullycompletethewritingofthepaper.Second,Iwouldliketothankallmyclassmateswhohavehelpedmeinmydailylifeorinmylearningprocess.Itisyouwhomakemefeelthewarmthofcampuslifeandmakemystudylifeoncampusmorecolorful.Finally,Iwouldliketoexpressmygratitudetomyparentsfortheirsupportandencouragementtoenrichmyselfandkeepmovingforwardontheroadofminorstudy.ReferencesAn,H.&Park,M.(2020).Approachingfashiondesigntrendapplicationsusingtextminingandsemanticnetworkanalysis.FashionandTextiles(1),.[1].(2021).TextdataanalysisusingLatentDir
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 海南職業(yè)技術(shù)學(xué)院《農(nóng)業(yè)資源與利用區(qū)劃》2023-2024學(xué)年第一學(xué)期期末試卷
- 海南體育職業(yè)技術(shù)學(xué)院《環(huán)境生態(tài)監(jiān)測實(shí)驗(yàn)》2023-2024學(xué)年第一學(xué)期期末試卷
- 海南師范大學(xué)《視聽廣告創(chuàng)意與制作》2023-2024學(xué)年第一學(xué)期期末試卷
- 超星電腦顯示課程設(shè)計
- 二零二五年夫妻財產(chǎn)凈身出戶分配執(zhí)行合同3篇
- 藝術(shù)形體采集課程設(shè)計
- 2025年度新型節(jié)能板房租賃及租賃期滿資產(chǎn)處理合同3篇
- 勞動合同法對醫(yī)療衛(wèi)生行業(yè)醫(yī)務(wù)人員的規(guī)制探討
- 2025年度網(wǎng)絡(luò)信息安全責(zé)任協(xié)議范本2篇
- 二零二五年度標(biāo)準(zhǔn)化辦公文檔制作與家政服務(wù)聯(lián)合協(xié)議
- 義務(wù)教育化學(xué)課程標(biāo)準(zhǔn)2022年
- 前端開發(fā)入門教程
- 護(hù)理給藥制度課件
- JCT2381-2016 修補(bǔ)砂漿標(biāo)準(zhǔn)
- 電腦安裝實(shí)施方案
- 人工智能與機(jī)器學(xué)習(xí)基礎(chǔ)課程
- 辦公大樓物業(yè)服務(wù)投標(biāo)方案(完整技術(shù)標(biāo))
- 中國國家標(biāo)準(zhǔn)英文翻譯指南
- 醫(yī)院人才培養(yǎng)和梯隊(duì)建設(shè)制度
- 幼兒園醫(yī)護(hù)助教知識學(xué)習(xí)培訓(xùn)PPT
- 【貓傳染性疾病的診斷與治愈8700字(論文)】
評論
0/150
提交評論