外文翻譯,英語翻譯,軟件外文翻譯,軟件計(jì)算機(jī)翻譯_第1頁
外文翻譯,英語翻譯,軟件外文翻譯,軟件計(jì)算機(jī)翻譯_第2頁
外文翻譯,英語翻譯,軟件外文翻譯,軟件計(jì)算機(jī)翻譯_第3頁
外文翻譯,英語翻譯,軟件外文翻譯,軟件計(jì)算機(jī)翻譯_第4頁
外文翻譯,英語翻譯,軟件外文翻譯,軟件計(jì)算機(jī)翻譯_第5頁
已閱讀5頁,還剩19頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

CompleteandAccurateCloneDetectioninGraph-basedModelsNamH.Pham,HoanAnhNguyen,TungThanhNguyen,JafarM.Al-Kofahi,TienN.NguyenElectricalandComputerEngineeringDepartmentIowaStateUniversity{nampham,hoan,tung,jafar,tien}@AbstractModel-DrivenEngineering(MDE)hasbecomeanim-portantdevelopmentframeworkformanylarge-scalesoft-ware.Previousresearchhasreportedthatasintraditionalcode-baseddevelopment,cloningalsooccursinMDE.However,therehasbeenlittleworkonclonedetectioninmodelswiththelimitationsondetectionprecisionandcom-pleteness.ThispaperpresentsModelCD,anovelclonedetectiontoolforMatlab/Simulinkmodels,thatisabletoef?cientlyandaccuratelydetectbothexactlymatchedandapproximatemodelclones.ThecoreofModelCDistwonovelgraph-basedclonedetectionalgorithmsthatareabletosystematicallyandincrementallydiscovercloneswithahighdegreeofcompleteness,accuracy,andscalability.Wehaveconductedanempiricalevaluationwithvariousexper-imentalstudiesonmanyreal-worldsystemstodemonstratetheusefulnessofourapproachandtocomparetheperfor-manceofModelCDwithexistingtools.Unfortunately,therehavebeenveryfewworkondetect-ingclonesinmodels.CloneDetectiverepresentsthestate-of-the-artofclonedetectioninMDE.However,ithassev-erallimitations.Theimportantlimitationsareitsinaccuracyandlowdegreeofcompletenessindetection.Theauthorsreportedthatseveralcloneswerenotdetected(e.g.smallclonesarecoveredinlargerclonepairs)[8].Itwasalsore-portedthatmanydetectedclonesbyCloneDetectivearenotinterestingtothedeveloperseventhoughtheyareclonesaccordingtoCloneDetective’sde?nition.Severaldetectedclonegroupsareinaccurateanddonotcarrymuchmean-ingfordevelopers.AnotherkeylimitationisthatCloneDe-tectivealgorithmtendsto?ndaslargeclonesaspossible.Theyaresometimestoolargeandnotuseful,anddonotcorrespondwelltocopy-pastedfragments.UsersareeasilyconfusedwhenCloneDetectivereportssuchlargeclonesinagraphicaleditor.Mostimportantly,CloneDetectivecouldnotdetectapproximateclonesinwhichtwopartsofamodelhaveslightdifferences.Thesecasesoccuroftenwhenusersmakeacopyofafragmentandthenmodifyit.1IntroductionModel-DrivenEngineering(MDE)hasbecomeanim-portantdevelopmentframework.Matlab/Simulinkisapop-ularMDEtoolfordesigningandmodelingsoftwareinmanyproductsfromsmallelectroniccontrolsoftwaretolarge-scale?ightcontrolsystems.Modelsarethecollec-tionoflogicalentitieswhichdescribeasystematmultiplelevelsofabstractionandfromavarietyofperspectives.PreviousstudybyDeissenboecketal.[8]showedthatwiththenatureofusinggraphicaleditorsformodels,clonedfragmentsinSimulinkmodelsoftenappear.Clonedfrag-mentsaretheexactlymatchedorsimilarfragmentsinSimulinkmodels.Similartotraditionalcodeclones,clonesinSimulinkmodelsrequireadditionaleffortsformainte-nanceandmanagement.Forexample,changestooneplacemustbecarriedoutmultipletimesforalloccurrencesofclones.Thus,detectingclonesinmodelsplaysthesameimportantroleasintraditionalsoftwaredevelopment[8].2ApproachOverviewInthispaper,weintroduceanovelclonedetectiontoolforMatlab/Simulinkmodels,namedModelCD,thatisabletodetectbothexactlymatchedandapproximatemodelclones.ThecoreofModelCDistworespectivemodelclonedetectionalgorithms:eScanandaScan.Wedevelopdif-ferentalgorithmsforexact-matchedandapproximateclonedetectionbecausebytakingintoconsiderationthenatureofeachkindofclones,wewereabletodesigndifferentop-timizationtechniquesforeachalgorithmtogainbothef?-ciencyandcompleteness.Thekeyideasofourmethodareasfollows.ASimulinkmodelisrepresentedasasparse,labeleddirectedgraph.Clonesinthatmodelareconsideredasitsweaklycon-nectedandnon-overlappingsubgraphs.Twoalgorithmsde-tectclonesthroughthreesteps:generating,grouping,and?ltering.They?rstgeneratecandidateclones,thengroupthemintoclonegroups,and?nally?lterthosegroupstore-ICSE’09,May16-24,2009,Vancouver,Canada978-1-4244-3452-7/09/$25.00?2009IEEE276movetheredundantones.Toef?cientlygeneratecandidateclones,ModelCDisbasedonanobservationthatiftwosub-graphsarecloned,theymustcontaintwoclonedsubgraphsofasmallersize(sizeismeasuredbythenumberofedges).Thus,thealgorithmsgeneratecandidateclonedsubgraphsfromthesmallesttothelargestsize.Bydoingthatway,thealgorithmscouldsystematicallydiscoverthecloneswithahighdegreeofcompletenessandprecision.Thisalsoal-lowseachalgorithmtoapplyappropriateoptimizationandheuristictechniquestoreducethecandidatesets.Forex-ample,toavoidcombinatorialexplosion,aScanappliesapruningtechniquethatprohibitsun-clonedsubgraphsfrombeingusedinfurthergeneratingofcandidates.Inthegroupingstep,differenttechniquesareusedintwoalgorithms.SinceeScanaimstodetectexactly-matchedclones,i.e.isomorphicnon-overlappingweaklyconnectedsubgraphs,itusescanonicallabeling[17],anadvancedgraphisomorphismtechniquetochecktheisomorphismofthesubgraphsinasparsegraph.Then,itputsthemintogroupsofisomorphicsubgraphs,andusesthosegroupedsubgraphstogeneratelargerisomorphiccandidateswiththeextensionofoneedge.Incontrast,aScanaimstodetectap-proximateclones,i.e.non-overlappingweaklyconnectedsubgraphsthataresimilarinstructure.Itusesournovelvector-basedtechnique,Exas[24],toapproximatethestruc-tureofasubgraphbyacountingvectorofthesequencesofnodes/edges’labels.Clonegroupingisdonebyusinghash-ingandmaximalcliquecovermethodsforthosevectors.ModelCDwasintegratedintoConQAT[8],anopen-sourcesoftwaremaintenanceforprogramsandSimulinkmodels.Thefront-endeditorandvisualizationformodelsareprovidedbyConQAT.Usersareabletospecifythemin-imumandmaximumdesiredclonesizesoradetectingtimelimit.AninterestingfeatureofModelCDisitsabilityofin-crementaloperation.Ifuserswantmoreresults(says,largerclones)afterthe?rstrun,ModelCDisabletocontinueitsexecutionwithoutre-runningthewholeprocess.Thestor-agecostforanincrementalmodeisreasonable.Wehaveperformedanextensiveempiricalevaluationonseveralopen-sourceSimulinksystemsandcomparedtheperformanceofModelCDwiththatofCloneDetective,themodelclonedetectiontoolwithinConQAT.ExperimentalresultsshowthatbotheScanandaScanareef?cientandscalabletotheverylargemodelswithreasonabletimecosts.ComparetoCloneDetective,eScanhaslargerrunningtimebutproducesmorecompleteandaccuratecloneresultswithhigherqualityandmuchmorequantity.Importantly,therunningtimeofeScanforlargemodelsofseveralthousandsofnodesandedgesisonlyintherangeofafewhundredsec-onds.aScanalsohasahighdegreeofcompleteness,preci-sion,andtimeef?ciencyinclonedetection.Nextsectionpresentsourformulationofthemodelclonedetectionproblem.Sections4and5explaineScanandaScanalgorithms.Additionalimprovementstobothalgo-rithmsareinSection6.EvaluationisinSection7.RelatedworkisdiscussedinSection8.Conclusionsappearlast.3GraphRepresentationandFormulation3.1RepresentationofSimulinkModelsTomodelasystemwithMatlab/Simulink,developersusebasicSimulinkblocksandthenthesystemwillbegen-eratedfromthemodel.Simulinkblockscanbeinstantiatedfrommanybasictypessuchasgains,adders,comparisons,switches,etc.Eachblockcanbeassociatedwithattributes,dependingontheblock’stype.Inputsandoutputsofblocksareconnectedtogetherviasignallines.Basicblockscanbecombinedtoformacompositeblockorasubsystem.MoredetailsonMatlab/Simulinkarein[20].This?rstphaseofourtooliscarriedoutinthesameman-nerasinConQAT[8].Basically,itconsistsofthreetasks:(1)parsingthemodelintoadirectedgraphwhereanoderepresentsablockandanedgerepresentsasignalconnec-tion,(2)?atteningallsubsystemsandconvertingthemintographs(thisstepisoptional),and(3)labelingnodesandedgeswiththelabelsdependingontheirattributes.Forex-ample,thelabelforanodeincludesitsblocktype,whileotherinformationarediscarded.AsinConQAT,thelabelofanedgeincludesthelabelsofthesourceandtargetports.Theoutputofthisphaseisalabeled,directedgraphGinwhichthesetofnodesVrepresentsSimulinkblocks,thesetofdirectededgesErepresentsthesignallines,andthelabelingfunctionTassignsthelabelstonodesandedges.Gisamulti-graphbecauseinamodel,theremightbemultiplesignalconnectionsbetweentwoblocks.3.2FormulationGivenG=(V,E,T)astherepresentationgraphofamodelM,letusformulatetheclonedetectionproblem.De?nition1(Fragment)AfragmentfisasetofedgesofGwhichformsaweaklyconnectedsubgraph.Afragmentfwithkedgesiscalledak-fragmentandisdenotedbyfk,i.e.withthesubscriptasitssize.De?nition2(ClonePair)Twofragmentsarecalledaclonepairiftheyaresuf?cientlysimilarwithrespecttoagivensimilaritymeasure.Wecallthemclonedfragmentsandsaythattheyareclonesofeachother.De?nition3(CloneGroup)Aclonegroupisasetofatleasttwofragmentsinwhichanytwofragmentsformaclonepair.277Thus,aclonegroupcontainsonlyclonedfragments.Byde?nition,aclonepairisalsoaclonegroup.Tomodelthenon-redundancyindetectedclonegroups,weusethefol-lowingconcept:De?nition4(CoveredGroup)AclonegroupPissaidtobecoveredbyanothergroupQifandonlyifeachmemberofPisasubgraphofatleastonememberofQ.Ifaclonegroupiscoveredbyanothergroup,itisredun-dantbecausetheinformationofitsmemberclonesisalsocontainedinthegroupcoveringit.De?nition5(CloneDetectioninGraph-basedModels)GivenagraphGandasimilaritymeasure.FindasetCGofclonegroupssatisfying:1.AnyclonepairexistinginGiscoveredbyatleastagroupinthesetCG.2.CGhasnocoveredgroup.Condition1meansthecompletenessofCG,becauseitcontainscloneinformationofallclonesinthegraph.Thatis,everyclonepairofGiseithercontainedinaclonegroupofCGoris“covered”byanotherpairinanotherclonegroup.Eachfragmentinacoveredpairisasubgraphofafragmentinthecoveringpair.Condition2meansthatCGhasnoredundantgroup.Rememberthatbyde?nition,allclonegroupsinCGcontainonlyclonedfragments.Itim-pliesthatCGisalsofullyprecise.andedges.Thislabelisinvariantwithrespecttoisomor-phism.Inotherwords,allisomorphiclabeledsubgraphshavethesamecanonicallabel.Hence,tocheckwhethertwofragmentsareclonedornot,weonlyneedtocomparetheircanonicallabels.Moreinformationaboutcanonicallabelingcanbefoundinanotherdocument[26].Therefore,ineScan,anexactclonegroupisasetofnon-overlappingfragmentshavingthesamesizeandcanon-icallabel.Takingtheunionofallclonegroupsofsizek,wehavethesetofallclonedk-fragments.Thissetiscalledaclonelayerofsizek,denotedbyLk.Observation1InGandclonelayers:1.Everyk-fragmentcanbegeneratedfroma(k?1)-fragmentbyaddingarelevantedge.2.Iftwok-fragmentsareaclonepair,thereexiststwocloned(k?1)-fragments(i.e.subgraphs)withinthem.3.EveryclonepairofLkmustbegeneratedfromaclonepairofLk?1.Fact1iseasytosee.Forfact2,weremindthattwoisomorphicgraphsmusthavetwoisomorphicsubgraphs.Iftheyarenon-overlapping,soaretheirsubgraphs.Fact3canbeeasilyderivedfromthe?rsttwofacts.ThosefactsimplythatLkcanbegeneratedfromLk?1byextendingallclonedfragmentsinLk?1byoneedge,col-lectingthoseresultingfragmentsintoacandidateset,keep-ingonlytheclonedk-fragments,andthengroupingthemintoclonegroups.BygraduallygeneratingL1,L2,...,Lk,wecould?ndallclonegroupspreciselyandcompletely.However,thisgeneratingstrategyisinthebreadth-?rstorder,whichrequiresmuchmemorycosttomaintainallthegroupsandcandidates.Toincreaseef?ciency,eScanfol-lowsadepth-?rstorderonagraphcalledclonelattice.Clonelatticeisalayeredgraphbuiltontheclonelay-ersL1,L2,...,Lk,...EachnodeoftheclonelatticeisaclonedfragmentandthekthlayerofclonelatticecontainsthemembersofLk.Weusethesubscripttoanodetode-noteitslayerindex.Iffkisasubgraphoffk+1,therewillbeanedgefromfktofk+1intheclonelattice.Thatedgerepresentsthegeneratingrelationshipbetweenfkandfk+1.To?ndallclonedfragmentsofallpossiblesizesinG,isin-deedtodiscoverthenodesoftheclonelatticebytraversingfromthenodesofthe?rstlevel.Thegroupingand?lteringprocessisappliedtoallclonedfragments(Section4.2).ThetechniqueofusingthiskindoflatticeisadaptedfromvSi-GraM[17].However,thedetailsofeachstepsaredifferent.TheremainingofthissectiondescribeseScanindetails.4ExactModelCloneDetectionLetusdescribeeScanalgorithm,whichaimsto?ndex-actlymatchedclonesinmodels.Inthiscase,thesimilaritymeasureforapairoffragmentsisde?nedasfollows:De?nition6(ExactClonePair)Twofragmentsf1andf2,withtwocorrespondingsubgraphs(V1,E1)and(V2,E2)inG=(V,E,T)areaclonepairifandonlyif1.Non-overlapping:V1∩V2=?,2.Label-isomorphic:thereexisttwobijectionsm:V1→V2andp:E1→E2suchthat?v∈V1:T(v)=T(m(v))and?e∈E(u,v)∩E1:p(e)∈E(m(u),m(v))∩E2andT(e)=T(p(e)).WeuseE(u,v)todenotethesetofedgesbetweentwonodesuandvingraphG.Inotherwords,aclonepairistwonon-overlappingsub-graphsofGthatareisomorphicregardingthelabelingfunc-tionT.Thus,tocheckiftwofragmentsareaclonepair,oneneedstosolvetheproblemoflabeledgraphisomorphism.Currently,itisnotknowntobeinPorNP-hard[17].How-ever,forthesparsegraphs,weuseanef?cienttechnique,calledcanonicallabeling[26],tosolvethatproblem.Inbrief,foreachsubgraph,acanonicallabeliscomputedbasedonitsstructure(topology)andthelabelsofitsnodes4.1ClonedFragmentsGenerationPseudo-codeofeScan(Figure1)usesthefollowings:?Clones(fk)isthesetcontainingfkandallofitsclones(i.e.allfragmentswhicharenon-overlapping27812345678910111213141516functioneScan(G=(V,E,T))L1←{allcloned1?fragments}foreachf1∈L1doDiscover(f1,Clones(f1))foreachLkdoCG←CG∪Group(Lk)Filter(CG)returnCGfunctionDiscover(fk,Clones(fk))foreachgk∈Clones(fk)doCk+1←Ck+1∪{gk⊕e|e∈E}foreachck+1∈Ck+1doifGeneratingParent(ck+1)=fkthenFind(Clones(ck+1))if|Clones(ck+1)|>1thenLk+1←Lk+1∪Clones(ck+1)Discover(ck+1,Clones(ck+1))traversal?nishes,allclonedfragmentsarecontainedintheclonelayersL1,L2,...,Lmax.Theyaregroupedlayer-by-layerintoclonegroups(line4).Then,resultinggroupsare?lteredtoremovecovered,i.e.redundant,groups(line5).4.2CloneGroupingandFilteringFigure1.ExactCloneDetectionwithandarelabel-isomorphictofk).Section4.4willexplainhowto?ndthisset(line13).?⊕istheextensionoperation:thatis,g⊕ereturnsthefragmentgeneratedbyaddinganedgeetofragmentg.?GeneratingParent(ck+1)returnsthegeneratingpar-entofck+1,i.e.theuniquefragmentthatisusedtogenerateck+1.Thiswillbediscussedlater.eScan?rstdiscoversL1,thesetofallcloned1-fragments,i.e.allrepeatededgesofG(line2).Then,eScanuseseachfragmentsinL1asthestartingpointofadiscoveryprocess(functionDiscover)thattraversestheclonelatticeinthedepth-?rstorder(line3).Whenvisit-ingaclonedk-fragmentfk,eScangeneratesacandidatesetCk+1of(k+1)-fragmentswhichcanbeobtainedfromafragmentinClones(fk)(i.e.fkanditsclones)withanex-tensionofonlyoneedge(lines9-10).Foreachcandidatefragmentck+1∈Ck+1,ifitisaclonedfragment(line14),itsclonesanditselfareaddedintoLk+1(line15)anditisusedinthenextiterationofdiscovery(line16).Acandidateck+1canbegeneratedbyseveralclonedk-fragments(atmostk+1).Thatis,ck+1mightbeexploredandprocessedmanytimes.Toavoidtheseredundantvisits,eScanusesthegeneratingparenttechniquetoensurethateachclonedfragmentisexploredonlyonce.Theideaistoassignforeachclonedfragmentck+1auniquefragmentfkwhichisusedtogenerateck+1.fkiscalledthegeneratingparentofck+1(Section4.3).Then,whilediscoveringfk,aclonedfragmentck+1willbeusedfornextdiscoveryifandonlyiffkisthegeneratingparentofck+1(line12).Sinceck+1hasonlyonegeneratingparent,itisdiscov-eredexactlyonce.Therecursionterminatesifthecandi-datesetisempty,andthetraversalwillbacktrack.AftertheRemindthatalltheisomorphicfragmentshavethesamecanonicallabel.Therefore,the?rststepofgroupingineS-canistopartitioneachclonelayerintosubsetsoffragmentshavingthesamecanonicallabel.Theresultofthisstepisacollectionofsmallersubsetsofisomorphicfragments.Despiteofbeingisomorphic,thefragmentsinasub-setmightnotbeclonedtoallothersbecauseofthenon-overlappingcondition.Thus,thenextphaseofgroupingtaskisto?ndthegroupsofnon-overlappingfragments.Letusgiveanexample.AssumethatasubsetShasfourfrag-mentsa,b,canddisomorphictooneanother.However,coverlapswithbothbandd.Byourde?nition,Sisnotaclonegroup.OnecandetectinSthefollowingclonegroups:(a,b),(a,b,d),(a,c),(a,d),(b,d).ThosegroupscoverallclonepairsinS.However,(a,b),(a,d),and(b,d)areredundantbecausetheyarecoveredby(a,b,d).Themostdesirableresultistwogroups(a,b,d)and(a,c).Therefore,ourgoalisto?ndasetofnon-redundantclonegroupsthatcoverallclonepairsofSandeachgrouphasasizeaslargeaspossible.Toachievethis,eScanrep-resentsSasagraphinwhichnodesarefragmentsandtwonodeshaveanedgeiftheyarenotoverlapped.Eachclonegroupisacliqueofthegraph,i.e.asetofnodessuchthatanytwonodesareconnectedbyanedge.Then,eScanap-pliesBron-Kerbosch,amaximalcliquecoveralgorithm[7],onthegraphto?ndthedesiredclonegroups.Aftergroupingthatwayforallsubsetsofalllayers,wehaveasetCGofallclonegroupsforthegraphG.The?l-teringstepisrequiredtoremoveallcoveredgroupsinCG.AgroupisremovedfromCGifitiscoveredbyanotherremaininggroup(seeDe?nition4forcoveredgroups).4.3GeneratingParentIdenti?cationWedidnotusetheoriginaltechniqueofgeneratingpar-entinvSiGraM[17]toavoiditsexpensivecomputationalcost.Ourproceduretoidentifythegeneratingparentofafragmentck+1isasfollows.Afterck+1isassignedacanon-icallabel,theorderofitsnodesandedgesareuniquelyiden-ti?ed.Then,thelastedgeinthatorderwhichdoesnotdis-connectck+1isidenti?ed.Ifthatedgeisexactlytheedgethatwasjustaddedtofk,thenfkisthegeneratingparentofck+1.Figure2displaysanexample.Supposethataftercanonicallabeling,theorderofedgesinck+1isfrom1to5.Assumethatthefragment(e)isjustusedtocreateck+1byaddingtheedge5.Thus,itisthegeneratingparentofck+1.279trast,eScanisabletodetectthatclonegroup(h1,h2,h3)?rst,whoseelementshavesmallersizes.Then,fromclonedfragmentsh1,h2,orh3,thefragmentsf1,f2andg1,g2areextended,thus,theothertwogroupsarediscovered.5ApproximateModelCloneDetectionAsincodeclones,clonesinmodelshouldbeconsiderednotonlyasexactlybutalsosimilarlymatched.Thatis,afragmentofthemodeliscopiedfromoneplaceandpastedinanotherplacewithsmallchangesofreplacing,addingorremovingblocks.Inthiscase,itstillmaintainsalmostthesamestructurebutisnolongerisomorphictotheoriginal.Therefore,thesimilaritymeasurethatusestheisomorphismrelationandeScanalgorithmarenotapplicable.Tode?neanewandmoreappropriatesimilaritymea-sureinsuchcases,wedevelopExas[24],avector-basedrepresentationandfeatureextractionmethodthatcanap-proximatethestructurewithina(sub)graph.A(sub)graphischaracterizedbyavectorwhoseelementsaretheoccur-rencecountsoftheselectedstructuralfeatureswithinthe(sub)graph.Bydoingthisway,thechangesofthevector,whichcanbemeasuredbyanappropriatedistancefunction,canapproximatelycapturethechangestoafragment.Ifthedistanceissuf?cientlysmall(i.e.smallerthanaspeci?cthresholdδ),therespectivefragmentscouldbeconsideredasclones.Next,wewilldiscussaboutExasvectors.Figure2.GeneratingParent21Figure3.DetectedHiddenClones4.4FindingClonesofaFragmentOuralgorithminvolvesastepthatrequiresthe?ndingofthesetClones(fk)forafragmentfkthatincludesitselfandallofitsclonesingraphG.Ingeneral,itisthesub-graphisomorphism,anNP-hardproblem[11].However,ineScan,thistaskrequiresarelativelyinexpensivecom-putationalcost.LetuscomebacktothecontextofeScan(Figure1)toshowthatClones(ck+1)isasubsetofCk+1.Proof.Assumethatck+1isacloneofck+1.Becauseck+1isgeneratedfromfk,theremustexistaclonedfrag-mentfkoffkthatcangenerateck+1(seeObservation1above).RemindthatCk+1isthesetofalloffragmentsextendedfromafragmentinClones(fk)(i.e.fkanditsclones)withoneedge.Therefore,ck+1∈Ck+1.Fromthisresult,to?ndClones(ck+1),wesearchinCk+1allfragmentsck+1havingthesamecanonicallabelandnon-overlappingwithck+1.Thissearchcanbedoneef?cientlybystoringCk+1asachaininghashtableusingcanonicallabelsaskeys.Sincecalculatingcanonicallabelsforfragmentstakestime,eScanusesacachemechanism.5.1ExasCharacteristicVectors4.5DetectingHiddenClonesOneinterestingfeatureofeScanoverCloneDetective[8]istheabilitytodetectsmallersizeclonesinthecasesthatthelargerclonepairshidesmallerones.Figure3showsanexamplethatCloneDetectivehasfailedtodetectbecauseithastwoseparatephases:clonepairdetectionandpair-to-groupconversion.InFigure3,itdetectstwopairsofmodelclones(f1,f2)and(g1,g2)(representedasshapes).How-ever,becauseit?ndsclonepairswiththesizesaslargeaspossible,theclonegroupof(h1,h2,h3)ismissed.Incon-Exasfocusesontwokindsofstructuralpatternsina(sub)graph,called(p,q)-nodeandn-path.A(p,q)-nodeisanodehavingpincomingandqoutgoingedges.Ann-pathisadirectedpathofnnodes,i.e.asequenceofnnodesinwhichanytwoconsecutivenodesareconnectedbyadi-rectededge.Aspecialcaseisan1-path,whichcontainsonlyonenode.Structuralfeatureofa(p,q)-nodeisthela-belofthenodeandtwonumberspandq.Forann-path,itisasequenceoflabelsofnodesandedgesalongthepath.Figure4showsanillustratedexampleofagraphanditstwoclonedfragmentsAandB[24].Table1listsallpat-ternsandfeaturesextractedfromfragmentA.ItcouldbeeasytocheckthatfragmentB,whichisisomorphictofrag-mentA,hasthesamesetoffeaturesasfragmentA.Toef?cientlydescribethefeaturesetofafragment,Exasusestheoccurrence-countvectorofthefeaturesextractedfromthatfragmentasitscharacteristicvector.Thatis,eachpositioninthevectorisindexedforafeatureandthevalueatthatpositionisthenumberofoccurrencesofthatfeatureinthefragment.Table2showstheindexesofthefeatures,whichareglobalacrossallvectors,andtheiroccurrencecountsinfragmentA.ThevectorsforbothAandBarethesame.Itis(2,1,1,1,1,2,1,1,1,2,1,1,1,1,1).280Table2.VectorIndexingandCountingaboveinsightsareformalizedinourde?nitionforapproxi-mate(similarlymatched)modelclonesasfollows:1De?nition7(SimilarClonePair)Twofragmentsfkand2fhrepresentedbytwosubgraphsG1=(V1,E1)andG2=(V2,E2)withtwocorrespondingExasvectorsv1andv2areclonedifandonlyif:(1)V1∩V2=?,(2)d(v1,v2)≤δ,oand(3)thereexistsapairofsubgraphsGo1ofG1andG2ofoG2suchthatGo1andG2areexactclonesofthesamesizesss≥σand≥σ,fortwogiventhresholdsδandσ.whereFigure4.ExampleofFragments(cont-)iningainmulin-gainin-mulin-mulmul-sumin-gain-sumin-mul-sumin-0-2in-0-1mul-2-1sum-2-0sumgain-sumin-mul-sumgain-1-1Table1.ExampleofPatternsandFeaturesIngeneral,itiseasytoverifythattwoisomorphicfrag-mentshavethesamefeatureset,thus,havethesamevector.Moreover,in[24],weprovedamoregenericproperty.Theorem1IfgrapheditdistanceofG1andG2isλ,thenNv1?v2≤v1?v21≤(2P+4)λwithP=l=1l.bl?1.G1andG2aretwosubgraphsofG.bisthemaximumdegreeofnodesinG(i.e.branchingfactor),andNisthemaximumsizeofn-pathsofinterest.(Sincetheremightexistanin?nitenumberofn-pathsofallsizes,Exasisin-terestedonlyinthen-pathsofcertainlimitedsizes.)Thisresultmeansthat,thevectordistanceoftwofrag-mentsisboundedbytheireditdistance,i.e.similarfrag-ments(havingsmalleditdistance)willhavesmallvectordistance.Therefore,vectordistancecouldbeusedasasim-ilaritymeasureoffragments.Tonormalizethevectordis-tancewithrespecttothevectorsofdifferentlengths,inaS-v1?v2.can,weusethismeasure:d(v1,v2)=12MoredetailsonhowExascanef?cientlycomputeandstorethevectorsforsubgraphscanbefoundin[24].Condition1meansthattwoclonedfragmentsarenon-overlapping.Condition2requiresthemtohavesimilarcharacteristicvectors.Condition3impliesthattheyhaveanisomorphiccorecommonpart.Theratiobetweenthesizeofthecorepartandthatofeachfragmentisatleastσ.Observation2Becausethesizesofthosetwofragmentsarelargerthanthatofthecorepart,k≥s≥hσandh≥s≥kσ.Thisimplieskσ≤h≤k/σ.Thatmeansaclonedh-fragmentofak-fragmentmusthaveitssizehintherangek.[l(k),r(k)]wherel(k)=kσandr(k)=Basedonthoseaforementionedobservations,weusethefollowingstrategiesforouraScandetectionalgorithm:Breadth-FirstTraversal.aScandiscoversthecandidatefragmentsforclonesbytraversingtheclonelatticeinthebreadth-?rsttraversalorder,ratherthandepth-?rstorderineScan.ThisallowsaScantoef?cientlyconsidercandidatefragmentswithdifferentsizestoformclonegroups.ByDe?nition7,twoclonedfragmentsmustcontaintwoisomorphicsubgraphs.CheckingsubgraphisomorphismisNP-hard.Therefore,inaScan,twofragmentsareconsid-eredaclonepairiftheysatisfythosethreeconditionsinwhich(1)and(2)meannon-overlappingandsmallvectordistance.Thus,wesacri?ceprecisionforef?ciency.CandidateWindow.BasedonObservation2,weusethefollowingheuristictoincreaseprecisionandperformance.Sinceanyclonedfragmentofak-fragmenthasitssizeofatleastl(k).Therefore,atthelayerkinthelattice,to?ndclonesofk-fragments,aScanconsidersnotonlythek-fragmentcandidatesbutalsotheclonedfragmentsinpre-viouslayers(fromlayerl(k)tok?1).Candidatewindowisthesetofallthoselayers.5.2DesignStrategiesDuetothenatureofsimilarclones,twoclonedfragmentshavetoshareatleastsomeisomorphiccorepart.Those2811234567891011functionaScan(G=(V,E,T))k←1,Lk←E,CG←Lkrepeatk←k+1Ck←Ck∪{fk?1⊕e|fk?1∈Lk?1,e∈L1}fori=l(k)tok?1doCk←Ck∪LiCG←CG∪Group(Ck)Filter(CG)Lk←{alldetectedclonedk?fragments}untilLk=?returnCGgroupinggivestheresultingclonegroupsatthelevelk.SincethenewgroupsmightcoversomedetectedgroupsinCG,a?lteringprocessisrequired(Section5.4)toremovetheredundantgroups(line8).Atlast,alldetectedclonedk-fragmentsareaddedtoLk.IfLkisnotempty,thepro-cesscontinuesandLkisusedtogeneratethecandidatesofsize(k+1)inthenextiteration.Otherwise,sincenofur-therlevelinthelatticecouldbeexplored,aScanstopsandreturnsthe?nalclonegroups.Notethat,asamek-fragmentmightbegeneratedmanytimes.Toavoidtheredundancy,aScanstorescandidatesetCkasahashsetandremovesallduplicatelygeneratedones.Figure5.ApproximateCloneDetection5.4Pruningtechniques.Thenumberofcandidatesforapprox-imateclonescouldbeverylarge.Weapplyapruningtech-niquethatprohibitsun-clonedfragmentsfrombeingusedinfurthergeneratingofcandidates.Ourheuristicisthat“theclonedfragmentsatasmallsizethatcanformclonegroupsarelikelytobesub-fragmentsofclonedfragmentsatsomelargersizes”.FromObservation2,ak-fragmentfkcannotbeacloneofthefragmentsofsizelargerthanr(k).Thus,atstepr(k)+1,iffkisnotacloneofanygeneratedfragments,itcouldberemovedwithoutreducingthecompleteness.However,aScanremovesitfromconsiderationrightatlevelktoprohi

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論