國(guó)際數(shù)學(xué)建模比賽一等獎(jiǎng)?wù)撐腳第1頁(yè)
國(guó)際數(shù)學(xué)建模比賽一等獎(jiǎng)?wù)撐腳第2頁(yè)
國(guó)際數(shù)學(xué)建模比賽一等獎(jiǎng)?wù)撐腳第3頁(yè)
國(guó)際數(shù)學(xué)建模比賽一等獎(jiǎng)?wù)撐腳第4頁(yè)
國(guó)際數(shù)學(xué)建模比賽一等獎(jiǎng)?wù)撐腳第5頁(yè)
已閱讀5頁(yè),還剩7頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

ModelingforCrimeBusting

Introduction

Aconspiracyisfoundtoembezzleacompany’spropertyandtostealmoneyfromthecreditcardsofthecompany’scustomersthroughthenetwork.Anorganization,theIntergalacticCrimeModelers(ICM),hasidentified7knownconspiratorsand8knownnon-conspiratorsandneedstofindouttheothermembersandtheleaders.Theconspiratorsandthepossiblesuspectedconspiratorsallworkforthesamecompanyinalargeofficecomplex.ICMhasrecentlyfoundasmallsetofmessagesfrom83workersinthecompany.Now,themessagesaredenotedwithanetworkthatshowsthecommunicationlinksandthetypesofmessages,wherethereare83nodes,400links(someinvolvingmorethanonetopic),over21,000wordsofmessagetraffic,15topics(3havebeendeemedtobesuspicious).Thegoalofthemodelingeffortistoidentifythemostpossibleconspiratorsintheofficecomplex.Itwouldbeidealtogiveaprioritylistofmostlikelyconspirators,adiscriminatelineseparatingconspiratorsfromnon-conspiratorsandthepossibleconspiracyleaders.

Assumptions

Node4,10,16aretheseniormanagersofthecompany.

Variablesymbol

Variable

Definition

P

Possibilityofbeingconspirator

Pc

Suspiciousrateofone’smessages.Pcequalstotherateofthenumberofone’smessagesinvolvingtopics7,11,13(theyareconsideredconspiratorialinthefollowingtext)tothenumberofhis/hertotalmessages.

wi

Weightdeterminedbasedonthesuspiciousdegreeofacertainmessagetopic

Wequ

Equivalentweight.Itdenotesthesumoftheweightsofallone’smessagetopicsdividedbythenumberofhis/hertotalmessages.

Cnc

Valueofnetworkcentrality

C’nc

RevisedCnc

K

Asetofkeywords

A

Asetofwords.

ai

Anywordi.It’sanelementofA.

si

Anysentencei

Sim(ai,aj)

Sentencesimilarityofwordsaiandaj

Sim(Si,Sj)

SentencesimilarityofsentencesSiandSj

Dis(a1,a2)

Worddistancebetweenwordsa1anda2

Len(Si)

NumberofwordsinsentenceSi

Analysisoftheproblem

Inordertogettheresultsofprioritylistaboutcrimesuspiciouspossibility,weshouldanalyzethenetworkandthetopics,andvalidatethecredibilityoftheresultswiththe7knownconspiratorsandthe8knownnon-conspirators.Wecanseefromtheproblemthatascertainingwhetherthethreeseniormanagersareconspiratorshasaguidingsignificanceforjudgingthesuspiciousdegreeofthe15topicsandtheprioritylist.However,wecannotdeterminetheweightvalueofthetopicbytheperson’sstatus(knownconspiratorsornon-conspirators)involved.Weshouldusemanualanalysesorsemanticandtextanalysesforthe15topicstodeterminetheweights.Thehighertheweightis,themoresuspiciousthetopicis.Basedonthenetworkandtheweightsoftopics,wehavecalculatedthesuspiciouspossibilityofeveryonewithcertainalgorithms.

★Task1.Theprioritylistofpossibleconspiratorsandleaders

Judgewhetherthethreeseniormanagersareconspirators

Node4isnotaconspirator

OnlyNode4ofthethreeseniormanagershasnotcommunicatedwiththe7knownconspirators.Inaddition,thesuspiciousrateofhismessagesisthelowestinthethreeseniormanagers,i.e.Pc4=0.286,Pc10=0.357,Pc16=0.625.

IfNode4isaconspirator,thenNodes10and16areconspiratorscalculatedwiththemodel,andalargeportionofthecompany’sworkersareconspirators.Thepossibilityoftheconditionissmallinreality,soNode4isconsideredanon-conspirator.

Node10isaconspirator

Findoutthepeoplewhodiscussthesuspicioustopicswiththe7knownconspiratorsinthenetworkmodel,andtheresultsareNodes7,10,1317,28,36,38,60,81,inwhich,Nodes7,10havediscussedfor6,4timesrespectively,andtheothersallhavediscussedonce.Thus,theseniormanagerNode10isconsideredaconspirator.

Node16isnotcertain

TheidentityofNode16isnotcertainforthemoment,sowedon’tmakeassumptionstemporarily.

Model1

Analysisthesuspiciousdegreeofthe15topicsandendowthemwithdifferentweightsrespectively,thus,wegotW=[w1,w2,...,w15].

Topics7,11,and13aredeterminedsuspicious.Hencew7,w11,w13areallsetto0.9;

BecauseNode10isaconspiratorasaseniormanager,thesuspiciousdegreeofhishighfrequencytopicisalsohigh.Accordingtothestatistics,Node10hasdiscussedfor14times,inwhichthefrequencyofoccurrenceofTopics4and6are0.429,andintheconditionthatonly3messagesinvolvetwoormoretopics,Topics4and6concurredtwice.Thereforew4,w6are0.7;

SomemessagesinTopic7usedSpanishwords,possiblyascodes.Topics2,12alsoinvolveusingSpanish,sow2,w12are0.6;

Topic5involvesseveralpersonsincludingtwoknownnon-conspirators,andtheconsensusinthediscussionisthatthecompanyhasgoneoverboardonsecuritytothedetrimentofoperations.Sothepossibilitythatpersonsinvolvedinthistopicarenon-conspiratorsishigh.Thus,w5is0.2;

ThecontentsoftheotherTopics1,3,8,9,10,14,15areneutral,andwecan’tdeterminewhetherpersonsinthetopicshavethetendencyofconspiracy.Sotheweightsofthemareall0.4.

Insum,W=[0.4,0.6,0.4,0.7,0.2,0.7,0.9,0.4,0.4,0.4,0.9,0.6,0.9,0.4,0.4].

FindoutconspiratorswithnewnetworkG

AccordingtoSection2.1,weendowthesetoflinksinthenetworkwithweights.Ifalinkinvolvesmorethanonetopic,itsweightissetwiththehighestweightoftopics.Foralinkl,W(l)iscalledtheweightofl,0<W(l)<1.

Inthemeantime,inordertoevaluatethepossibilityofeachpersontobeaconspirator,weendowthesetofnodesinthenetworkwithweights.Then,wegetafunctionP(v),0<P(v)<1.Foralltheknownconspirators,P(v)=1,andfortheothers,P(v)=0.Thus,weobtainanewnetworkGbasedontheoriginalnetwork.

Foranynodei,wedefineaniterativeequationofitsPfunctionasfollows:

Pi=maxPj×Wlijl(i,j)∈L}(1)

WiththeiterativeequationandtheoriginalPfunctionoftheconspirators,wecangetthePfunctionsoftheothersthroughiterativecalculation.Itsrationalityisevident:inadialogue,ifthepossibilityoftheinitiatortobeaconspiratorisdetermined,andthesuspiciousrateofthetopicisknown,thenthepossibilityofthereceivertobeaconspiratorisevaluated.Inaddition,basedontheprudenceoftheinvestigationofcrime,weareconservativeinaniterativecalculation,thatis,thePfunctionofanoderemainsitshighestvalueinhistory.

Duetothecomplexityofthenetwork,theiterativechangeofthePfunctionhasaftereffect.Sowecan’tsolvethePfunctionofeachnodewithnormaldynamicprogrammingalgorithminlinearstructure.ButwecansolvethePfunctionwithiterativemethodinnetworkandwecanprovethatthePfunctionisconvergentiniterativeprocedure.

WecallasuccessfuliterationtobeaRelaxoperation.ItisevidentthatifandonlyiftheRelaxoperationcan’tbeexecuted,thePfunctionisconvergent.

Sowegetthealgorithm1:

Algorithm1:

Begin

Initializeedgeweighsandnodeweighs;

Loop

Foreachl(i,j)∈Ldo

IfP(i)<P(j)×W(lij)andnodeiorjisnon-conspiratorinadvancedo

P(i)=P(j)×W(lij);//Relax

Untilthereisnorelaxoperate

End

WecangetthefinalconvergentPfunctionforallthepersonswiththerevisedalgorithm,whichisthepossibilityofconspiratorinthefollowingtable.Throughthisvaluewecanevaluatetheirpossibilitytobeconspiratorsinthiscase.Inthiscase,wesorttheresultsfromhightolowbydualkeywords.Thefirstkeywordisthepossibilityofconspirators(P),andthepersonswiththesameParesortedwiththesecondkeyword,theequivalentsuspiciousweights(Wequ),fromhightolow.Thuswegetthewholearrangementofthe83personswiththesuspiciousdegreefromhightolow.

Thenweusethegoldenratiotoseparatetheconspiratorsfromtheinnocent,thatis,theratioofthenumberofconspiratorstothetotalnumberis0.618:1.Thecalculatednumberofconspiratorsis31.7.Butastable1shows,thefirstkeywordofthepersonsfromthe32tothe43arethesame.SoitisnotappropriatetotakeNode32asaconspirator.Thusthereare31personsintotalareconspirators.Theinitialcalculationresultsareastable1shows.

Table1

theinitialresultsofprioritylistaboutcrimesuspiciouspossibility

Notes:inthe“rank〞column,theonesmarkedbyblackareconspirators,andgreennon-conspirators.Theonesmarkedbypinkareseniormanagers.Thedashedistheseparationlinebetweenconspiratorsandnon-conspirators.

Findout“Inez〞,“Bob〞and“Carol〞(CorrectionStrategyofModel1)

Inordertoavoidthesituationlike“Carol〞,weincreasetheproportionofinnocentsascomparedwiththegivencase,thatis,theproportionofinnocentsis0.618(greaterthanthegivencaseof0.5),anditcanreducethepossibilityofinnocentstobewronged.

Inordertofindoutsomeonelike“Inez〞and“Bob〞whohidedeeply

ThemessagesuspiciousratePcofeveryoneiscalculated.ItisfoundthatPcofnode7,16,21,28,33,51,54,56,57,67,72,75,79,80or81isequalorgreaterthan0.5.Comparingthemwiththe31conspiratorsinTable1,wefindthatNodes56,57,72,75,79,80areleftout.BytrackingwithwhomNodes56,57,72,75,79and80havetalkedandthetalkingtopics,wefoundthatalthoughNodes75and80areinvolvedinthesuspicioustopic,theyaretoldbytheknownnon-conspirators,sotheirpossibilitiesofconspiratorsarenothigh.

AccordingtotheresultsinTable1,seniormanagersNode10and16areconspirators.Trackingthepersonswhotalkaboutthesuspicioustopics7,11,13withthem,wefindNode3,6,8,11,17,30,35,49,51paringthemwiththe31conspiratorsinTable1,wefindthereisnoNodes8and35.

Insum,wethinkNodes8,35,56,57,72,79areconspiratorshiddendeeply.Finally,wedetermine37conspiratorsand46innocents.TheprioritylistisasTable2shows.

Table2

Theprioritylistaboutcrimesuspiciouspossibility

Notes:inthe“rank〞column,theonesmarkedbyblackareconspirators,andgreennon-conspirators.Theonesmarkedbypinkareseniormanagers.Thedashedistheseparationlinebetweenconspiratorsandnon-conspirators.

Validationoftheresults

Inordertovalidatethecredibilityofmodel1,the8knownnon-conspiratorsarenotsetto0frombeginningtoend.AccordingtoAlgorithm1,thesuspiciouspossibilitiesofthe8personsincreaseaslongastheysatisfysomeconditions.Thuswegetanewlistinwhichtheranksofthe8personsareasTable3shows.

Table3

therankof8knownnon-conspiratorsintheprioritylist

ItcanbeseenthatthePvaluesofmostofthemarestillsmallintheconditionthattheirPvaluesarenotrestrictedto0,whichmakesthemranklowinthelist,thereforetheyarestillconsiderednon-conspirators.TheonlyexceptionisNode2whosePvalueisbigandheisconsideredaconspiratorbymistake.Fromtheaboveweknowthatmostoftheknownnon-conspiratorsarestillnon-conspiratorswhentheconditionthatrestrictsthePvalueofthemto0iscancelled.Itisinagreementwiththereality.Thuswevalidatethefeasibilityofmodel1,andithasarelativelyhighcredibility.

Inthiscase,ICMhasalreadyknown7conspirators.Selectonefromthemrandomly,supposingthattheidentitiesofthemareunknown.Thatistosay,weonlyknowsixconspirators.Ceterisparibus,wecalculatethepossibilitiesofconspiratorbyAlgorithm1.Whenwechooseonefrom18,21,37,43,49,54,and67atwill,tracetheirRankwhentheprogramisrunningandtheresultscanbeseeninTable4.

Table4

therankofthe8knownnon-conspiratorsintheprioritylist

Aswecansee,whenchoosingonefrom18,21,37,43,and49,wethencalculatethepossibilityofconspiratorbyAlgorithm1.Finally,wecanseetheranksofpossibilitiesofconspiratorareallabove40.Althoughtherankof37and43donotconformtotheresultofTask1strictly,theerroriscontrolledwithinthereasonabletolerance.Thus,itcanbedecidedthattheyareaccessorytoacrimewhichconformtothefact.Whenwechoose54and67,wealsocalculatethepossibilitiesofconspiratorbyAlgorithm1.Finally,weobtainthelowranks.HoweverwhentheyarecorrectedbystrategiesinModel1,wefoundthatthepossibilitiesoftheconversationsuspicionof54and67,whichare0.8and0.867,arehigh.Therefore,itcanbedecidedthattheyareaccessorytoacrime.Asaconsequenceoftheabove,thevalidityoftheresultofmodeoneisdependable.

Thus,thismodelcouldprovideamorecomprehensiveguidetovariousfieldsindissimilarsituation,notjustcrime.

WeanalyzetheEZcasewiththeabovemodelandalgorithm,andtheprioritylistaboutcriminalsuspiciouspossibilityisasthefollowingshows.

Table5

theprioritylistaboutcrimesuspiciouspossibility

WeknowfromTable5thatthefirst6intheranklist,thatis,George,Dave,Ellen,Inez,BobandCarol,aremostpossiblyconspirators.ButtoInez,BobandCarol,theirfirstpriorityorderandsecondpriorityorderarethesame.

Thenweneedtoanalyzethe27messagesfurtherintextandmeaningtodecidetheirorder.

Nominatetheconspiracyleaders

Weselectthe16personswhosePvaluesareequalorgreaterthan0.9inTable2,theyconstituteaspecialnodesetintheoriginalnetwork.Selectingthenodesandremainingthelinksbetweenthem,wegotasub-networkG’.Itsnetworktopologicalstructureishelpfulforustoanalysistheorganizationandcompositionofthewholecriminalgang.

Model2:Findouttheleadercandidates

Asaleader,hisrepresentingnodeshouldhaveaveryhighnetworkcentralityinthenetwork.Anodewithahighnetworkcentralitynotonlyrequiresitisinthecentreofthenetworkstructure,butalsorequiresthatitisanimportantlinktoothernodes.Weintroducethreeindicatorsingraphtheorytoassessthenetworkcentrality.

Degree:measurementtonodeactivity.Itisusedtomeasurewhetherthisnodeisthecentralnodeofthecriminalnetwork.

ThecalculationofDegreeis:

Cdk=i=1nA(i,k)(2)

whereAisa0-1matrixdescribingthelinkstructureofG’.

Closeness:sumoftheshortestdistancesbetweenthisnodeandothernodes.Itrepresentstheclosenessofthisnodetoothernodes.

ThecalculationofClosenessis:

Cck=i=1nL(i,k)(3)

whereLisamatrixdescribingtheshortestpathofanytwonodesinG’.

Betweenness:measurementoftheextentthatanodecanmaketheothernodesinterconnected.Itmeasurestheabilityofthenodeasamediator,thatis,itoccupiesthepositionbetweentheothertwonodes,andifthisnodeloseseffectiveness,theothertwonodescan’tbeaccessed.Thisindicatorisimportant,anditdirectlymeasuresthestatusofthenodeinnetworkinformationcommunication.

ThecalculationofBetweennessis:

Cbk=i=1nj=1nFij(k)(4)

whereFij(k)denoteswhethertheshortestpathbetweennodesiandjpassesnodek,anditis1whenitpassesand0whennot.

Inordertocoordinatethethreeindicators,wedefinethevalueofnetworkcentralityas

Cnc=αCd+βCc+γCb(α+β+γ=1)(5)

Basedonthedifferentimportanceofthethreeindicators,wesetα=0.2,β=0.3,γ=0.5.

Weselectthe16nodeswhosepossibilitiesofconspiratorsarehighestintheoriginalnetwork,remaintheirlinks,andconstituteasub-networkG’.Withtheaboveevaluationmethod,wegettheCncvalueofthe16nodes,asshowninTable6(a).

Optimizationofmodel2:togivetheorderofleaderpossibilities

Theabovemethodisbasedonthenetworkstructureofthecriminalgangtosearchtheleaders,butneglectthespecifictalkingcontent,sotheresultisnotveryreasonable.Thedialoguesofsomeonehaveinvolvedonlyafewsuspicioustopics,sohecan’tbeanimportantperson.Totheworkerrepresentedbynode18,althoughhisCncisveryhigh,heonlytalksabouttwoofthethreesuspicioustopics,anditisimpossibleforaleader.

Therefore,weintroducethemessagesuspiciousratePcintheprevioustexttomodulateCnc,andgetridoftheobjectswhoarenotinvolvedinallthethreesuspicioustopics.

Thevalueofthenewnetworkcentralityis:

Cnc'=Cnc×Pc(6)

Resultsanalysis

Throughmodulation,wegettheorderofsuspiciousextentforthesuspiciouscandidatesoffiveleaders,asshowninTable6(b).

Table6

Theprioritylistofpossibleconspiracyleaders

Contrastingtheresultsbeforeandafteroptimization,wefindthatthenominationofcriminalleadersonlyinasmallscaleaffectstheorderofcertainindividuals,andtheoverallrankingtrendsandpersonnelarebasicallythesame.Theoptimizedmodelimprovestheaccuracyoftheresults.

★Task2

Accordingtotherequirement2,Topic1isalsoconnectedtotheconspiracyandthatChrisisoneoftheconspirators.

FindoutconspiratorswithnewnetworkG

BecauseTopic1isalsoconnectedtotheconspiracy,W1is0.9.Thevaluesofotherweightsdonotchange.Thus,hereisanewsetthatW=[0.9,0.6,0.4,0.7,0.2,0.7,0.9,0.4,0.4,0.4,0.9,0.6,0.9,0.4,0.4].

AccordingtoSection2.1inModel1,weendowthesetoflinksinthenetworkwithweights.Foralinkl,W(l)iscalledtheweightofl,0<W(l)<1.Inthemeantime,weendowthesetofnodesinthenetworkwithweights,andwegetafunctionP(v),0<P(v)<1.Foralltheknownconspirators,P(v)=1,andfortheothers,P(v)=0.Then,wegetanewnetworkG.

ByapplyingModel1andAlgorithm1inTask1,theresultsareasstatedinTable7.

Table7

Theinitialresultsofprioritylist

Notes:inthe“rank〞column,theonesmarkedbyblackareconspirators,andgreennon-conspirators.Theonesmarkedbypinkareseniormanagers.Thedashedistheseparationlinebetweenconspiratorsandnon-conspirators.

Findout“Inez〞,“Bob〞

WiththesamemethodasinTask1,weanalyzethosewithahighsuspiciousdialoguerateinordertofindouttheconspiratorswhohidedeeply,andwegettheconspiratorswhosenodesare56,57,72,and79.Thenwesearchthosewhohavetalkedaboutthesuspicioustopicswiththeseniormanagers10and16,andcomparewiththe31conspiratorsinTable8,wefindNode3,8,11,30,and35arepossiblyconspirators.

Insum,wethinkNode3,8,11,30,35,56,72,and79areconspiratorswhohidedeeply.Finallywedetermine40conspiratorsand43innocents,andtheirprioritylistisasTable8shows.

Table8

Finalresultofprioritylistaboutcriminalsuspiciouspossibility

Notes:inthe“rank〞column,theonesmarkedbyblackareconspirators,andgreennon-conspirators.Theonesmarkedbypinkareseniormanagers.Thedashedistheseparationlinebetweenconspiratorsandnon-conspirators.

Differencesofresults

IntheresultofTask1,theconspiratorsintheoriginalprioritylistincorporateallthosewhosePcareequalto0.81,butinTask2,onlypartofthemareincorporated,andfindsomethatmaybeleftoutasfishescapingfromtheseine.Therefore,theconspiratorsdeterminedinTask2aremoreandmostofthesuspiciouspossibilitiesdonothaveameaningfullyevidentchange.

★Task3

InthecalculationofTask1and2,the400linksinthenetworkofmessagesareequivalentwith15topics.Inotherwords,therewillbenomorethan15valuesofW(l)inthewholenetwork.Suchsimplificationmayweakentheexactnessoftheresult,becausethesuspiciousratesofdifferentdialogueinasingletopicmaybedifferentaswell.

Itisobservedthatthestaffswhohadbeenjudgedasnon-conspiratorwouldgetinvolvedinthosesuspicioustopics,whichcausemoreerrorsintheresult.Iftheoriginalmessagescouldbeobtained,the400messageswillbedealtwithindividually.Firstly,tonarrowthesuspiciousmessagesdownandthentheTextAnalysis(TA)andSemanticNetworkAnalysis(SNA)canbeappliedtoidentifythesuspiciousratesoftheothermessagesandtoendowthemwithdifferentweights.Thismethodcangiveexpressiontothediversityamongthemessages,enrichthevalueofthefunctionW(l),anddecreasethepossibilityofthemisjudgmentofthenon-suspicioustopicsatthesametime.

Semanticandtextanalysesofthecontentofthe400messages

Initialprocessofthe400messagesbyusingSemanticandTextAnalysisrespectively

Thestopwords(suchas“a〞and“the〞)canbedeletedbytext-analysissoftware.Thus,theremainedwordsofpracticalmeaningcanmake400setsofwords.(Asetofwordscanbemadefromonemessage.)

IntroducetheSNAandTAtechnologyandthemodelaboutSentenceSimilarity.NowletusdefinethattheSentenceSimilarityvalueoftwosentencesS1andS2isarealnumberSim(S1,S2)rangedfrom0to1,whichisrelatedtothemeaningsandstructuresofthetwosentences.

WordSimilarity

BeforetheexplanationofhowtocalculatetheSentenceSimilarity,letusdefinethemethodofcomputingtheWordSimilarity.WordSimilarityisanumericalvaluerangedfrom0to1.Thevalueis1comparedwithitsown.Ontheopposite,thevaluewouldbe0whentwowordscannotbereplacedbyeachother.

AnotheressentialindexofmeasuringtherelationshipbetweentwowordsisWordDistance.Ingeneral,worddistanceisininverseproportiontowordsimilarity.WecancalculatethedistancebetweentwowordsdirectlybyreferringtothetreeintheexistingHowNetdictionary.

WecaneasilygettheWordSimilarityvaluebyWordDistance.Fortwowordsa1anda2,wesetWordSimilarityasSim(a1,a2)andWordDistanceasDis(a1,a2).Thus,

SimW1,W2=αDisW1,W2+α(7)

whereαisanadjustableparameter,whichrepresentstheworddistancewhenthesimilarityvalueis0.5.BecausethedepthofHowNet'streewillnotbemorethan10levels,thefarthestdistancebetweentwopointswillnotover20.Sowedefinethatα=3.

AlgorithmaboutSentenceSimilarity

ByWordSimilarity,wecoulddesignanalgorithmtocalculateSentenceSimilarity.

Word-formSimilarity

Thesimilarityoftheword-formintwosentencesismeasuredbytheamountofsynonymsinthesentences.

Firstwedeletethestopwordsinthesentences,andthenwedefinetheword-formsimilaritytobe

Sim1S1,S2=2×SameWord(S1,S2)LenS1+Len(S2)(7)

whereSameWord(S1,S2)istheamountofwordpairsinS1andS2whoseWordSimilarityareequalorgreaterthan0.7,andLenSistheamountofwordsinasentence.

SemanticSimilarity

Toreflectthesemanticsimilarityoftwosentences,wedefinetheSemanticSimilarityasfollows.

Sim2S1,S2=121mi=1mmaxSima1i,a2j1≤j≤n+1ni=1nmaxSima2i,a1j1≤j≤m(8)

wheresentencesS1andS2includewordsa11,a12,……,a1manda21,a22,……,a2n,respectively.

SentenceSimilarity

Usingtheabovecharacteristics,wecancalculateSentenceSimilarityasfollows:

SimS1,S2=αSim1S1,S2+βSim2S1,S2α+β=1(9)

UsingSentenceSimilaritytoendowthecontentofthemessagewithweights

Supposeitisalreadyknownwhichmessageissuspicious.ThenwegotasetofsuspiciousmessagesU.IntroduceW(theweightofsuspicioustopics)inordertocalculatePc(thesuspiciousrateofmessage).Thesuspiciousrateofmessagescouldbecalculatedbycomparethesentencesimilaritybetweentwomessages.

Formessagei:

1)Ifitisalreadyconsideredconspiratorial,wi=0.9;

2)Otherwise,

wi=max0.9×SimSi,Sjj∈U}.(10)

Thenwecouldendowalllinkswithweightsreasonablyaccordingtotheoriginalmessages,andgettheprioritylistaboutcrimesuspiciouspossibilityofeveryonebyusingtheAlgorithm1inTask1.

TheapplicationofModel3inTask1.

Textanalysis

Thetextanalysissoftwarecouldoutput15setsofwordsautomaticallyifweinput15topicsinit.

Table9

15setsofwordsfromtheoutputofsoftware

Note:inthetopicnumbercolumn,theonesmarkedbyblueareknownsuspicioustopics,whichareusedasthebasistojudgeothertopics.

CalculateSentenceSimilarity

Comparedwiththewholemessages,thecontentoftopicswouldbefairlysmall.So,emphasisshouldbegiventothesemanticanalysisofeachsentencewhencalculatingSentenceSimilarity.Here,wedefineα=0.2,β=0.8.

Endowtopicswithweights

Asrequirement,Topics7,11,and13areconsideredsuspicious.Thus,theywillbethereferencestojudgeothertopics.ThenweuseSentenceSimilaritytoidentifythesuspiciousrateofmessages.

Fortopici:

1)Ifitisalreadyconsideredsuspicious,wi=0.9;

2)Otherwise,

wi=max0.9×SimSi,Sjj∈{7,11,13}(11)

ByusingthetechnologyofSNAwegotanew

W=[0.626,0.409,0.298,0.439,0.608,0.608,0.900,0.337,0.379,0.418,0.900,0.419,0.900,0.371,0.364].

Findouttheconspirators

Substitutingtheweightsof15topicsinSection2.3intothecalculationofModel1,wegotthegraphofeachnodeandrelevantsuspiciousratesasfollows:

Figure1thedifferenceinresultbetweensemanticnetworkanalysisandartificialanalysis

Infigure1,theredpolylineshowsthevaluesofweightsendowedbymanualanalysis.ThebluepolylineshowsthevaluesofweightsendowedbyTAandSNA.Itisseenthatthetrendsofeachtwolinesaresimilarwithonlyseveralminordifferencesofpoints.Soitisprovedthatbothmethodsarereasonable.ButthemethodappliedinModel3ismoreaccuratethanmanualanalysis.

★Task4

Ageneralmethod

Inreality,wemayneedtofacemorecomplexproblems,suchas:

morecomplexnetwork,

vastamountofdata.

noprovidedkeymessages.

TheWcannotbeidentifiedbyapplyingthemethodintheModel3becausenomessagecanbereferredtoendowothermessageswithweightsinthissituation.Sothereisaneedforimprovingthealgorithmbyusingothermethods.Togiveanewmethodtoendowmessagewithweightasfollows:

1)Allcontentsoftheconversationsarehonestandtrue;

2)Somedetailsabouttheconspiracyareknown(suchasmotivesofcrime,targetsetc.).

Allmessagescanbeendowedwithweightsbyfullyexploitingthedetailsofcrimeathand.WealsodrawtheconceptofWordSimilarityintothismodelatthesametime.

Extractingkeywords

Toabstractthedetailsofcrime,weneedtoextractsomecharacteristickeywords.

Principlesforextractingkeywords:

1)Keywordsshouldbesubstantialwordsasnoun,verbetc.;

2)Themeaningsofwordsshouldneedtobeconsistent;

3)Basedontheprinciplesabove,theamountofkeywordsshouldbeasmuchaspossible.

Thus,wegotasetofkeywordsK.

Toendowthemessagewithweight

After

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論