



全文預(yù)覽已結(jié)束
下載本文檔
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
CADViST:VisualizationToolforBLASTAlignmentofDengueVirusSequencesBoonyaratViriyasaksathian,YodchananWongsawatDepartmentofBiomedicalEngineering,MahidolUniversityNakornpathom,Thailandg5137363student.mahidol.ac.thandegywsmahidol.ac.thPrapatSuriyapholBioinformaticsandDataManagementforResearchUnit,OfficeforResearchandDevelopment,FacultyofMedicineSirirajHospital,MahidolUniversityBangkok,Thailandsipurmucc.mahidol.ac.thAbstractExplorationofthesearchenginethatcansimultaneouslyvisualizethegenomicsequencesisoneofthechallengingproblems.Inthispaper,weproposethesoftware,calledCADViST.TheUnitXgraphicalrepresentation(previouslyproposedbytheauthors)isemployedasthealternativetooltovisualizetheresultobtainedfromtheBasicLocalAlignmentSearchTool(BLAST).Theproposedsoftwarecanefficientlyhelptheusers/expertstoeasilyinterprettheresults,especiallyinDenguevirussequenceanalysiswheredifferentserotypesorsubtypesneedtobedistinguished.Keywords-BLAST,DengueVirus,Visualization,Bioinformatics.I.INTRODUCTIONInbioinformatics,theBasicLocalAlignmentSearchTool(BLAST)isoneofthemostwidelyusedtoolsforsequencesimilaritysearchduetoitsspeedandreasonableaccuracyofsearchingperformance.However,theBLASTprogramisstilllackedoftheuserfriendlygraphicalrepresentation.Hence,inthispaper,weaimtodevelopavisualizationtoolthatiscapabletodisplaythetextoutputresultingfromBLAST.Therearemanyexistingtoolsusedforvisualizingandanalyzingthegenomicsequences.Eachtoolisdevelopedbasedonsomespecifictaskswhichcanbecategorizedintofourapproaches,i.e.Basevector,Sequential,FourierTransform(FT)andZ-Curveapproaches.(1)Basevectorapproach:Hamori,E.andRuskin,J.(1983)representedDNAsequencesinathreedimensionalcurve(H-Curve)1.Gates,M.A.(1985)proposedthatgraphicalrepresentationofDNAsequenceintwodimensionalspacewasbetterthanH-Curve.Gatesgraphicalrepresentationshowsfournucleotidebases,i.e.adenine(A),thymine(T),cytosine(C),andguanine(G).TheunitvectorrepresentationsofthesebasesareontheCartesiancoordinatesystem,i.e.BaseAisonthenegativey-axis,baseTisonthepositivey-axis,baseGisonthepositivex-axis,andbaseCisonthenegativex-axis2.Aboutelevenyearslater,NandyA.(1996)proposedagraphicalrepresentationinordertodistinctthefeaturesofintronandexonsegmentsofeukaryoticsequences3.ThisgraphicalrepresentationwassimilartoGatesmethod.TheA,G,CandTnucleotidewasplottedonanACGT-axissystem.Theslopeofthisplotindicatedaclusterofintronandexonsequences.However,bothNandyandGatesmethodshavehighdegeneracysuchthatthesequencessuchasAGTC,AGTCA,andAGTCAGleadtothesamegraphicalrepresentation4.StephenS.T.Yauetal.,2003modifiedGatesmethod.Thefournucleicacidsareclassifiedintopyrimidine/purinegraphontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentspyrimidine(TandC),andtheforthquadrantrepresentspurine(AandG)4.Recently,theauthorsproposethegraphicalrepresentationespeciallyfortheDenguevirussequenceanalysisbasedonthecumulativeamountofaminoandketobases,calledUnitX5.(2)Sequentialapproach:Altschuletal.,1990developedtheBasicLocalAlignmentSearchTool(BLAST)program.Thisprogramisoneofthemostpopulartoolsforgenomicsequenceanalysis.Thistoolcanperformafastsimilaritysearch.Theprogramcomparesthesimilaritybetweenanytwosequencesanddisplaysthedifferencebetweenthesesequencesbycomparinginthebase-by-basebasis6.(3)FourierTransform(FT)approach:AnatassiouD.proposedthecolorspectrogramsofbiomolecularsequenceswhichisthetoolusedforvisualizationofthebiomolecularsequenceanalysis7,8.Spectrogramswhichcanrepresentthemagnitudeoftheshort-timeFouriertransform(STFT)isimplementedviathediscreteFouriertransform(DFT).AnalysisofthegenomicsequenceinfrequencydomainviatheFouriertransform(FT)usesthe3-periodicitypropertyforDNAcodingsequence.Thecolorspectrogramisdefinedbyusingthecolor:red,greenandblue.Eventhoughthismethodyieldsanimpressivegraphicalrepresentation,thecomputationalcomplexityisfairlyhigh.(4)Z-Curveapproach:ZhangC.T.etal.,1994suggestedapracticalvisualizationtoolcalledZ-Curve8-12.JamesJ.etaldevelopedthistoolinthepackagecalledMBEToolbox13.Accordingtotheassumptiononthecumulativecomponentsofthegenomicsequence,featuresobtainedfromZ-Curvecanbequicklyinterpreted,suchasthedistributionalongthesequenceofpurine/pyrimidinebases,amino/ketobases,strongH-bond/weakH-bond.SincethealgorithmofZ-Curveissimple,itcanbeappliedtoallgenomicsequencesregardlessofhowlongthosesequencesare.ThesimilarapproachwithZ-Curvecalled3DD-CurveispresentedbyZhangY.andTanM.(2008).ThisapproachcanbeviewedastheweightedversionofZ-Curve14.978-1-4244-4713-8/10/$25.002010IEEEThechoiceofselectingthegraphicalrepresentationcanvarybasedonthecharacteristicsofgenomicsequencesofinterest.Therefore,inthisfirstversionoftheproposedsoftware,Denguevirussequences(neucleotidesequences)areemployedtoverifythemeritoftheproposedsoftware.ThesoftwareiscalledCADViSTwhichstandsforClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool.ByemployingUnitXasthevisulizationtool,theproposedsoftwareissuitabletouseforintepretingtheDenguevirussequence.However,positioningofpartialDenguesequencesonDenguegenomewithUnitXrepresentationrequireshighcomputationalload.BLASTiswellknownastheefficientsearchingtool.However,visualizingtheresultsobtainedfromBLASTneedssomeimprovement.Therefore,inthispaper,weproposethesoftwarethatcombinesthemeritofbothBLASTandUnitX.TheproposedsoftwarecanefficientlysearchtheunknownportionofDenguevirussequencesandcansimultaneouslyillustrategraphicalrepresentationsoftheresultingsequences.Thispapercanbeorganizedasfollows.SectionIIintroducestheproposedvisualizationtool,calledCADViST.ThesoftwarearchitectureofCADViSTisdescribedinSectionIII.InSectionIV,thesimulationresultsoftheproposedsoftwareareshown.Finally,SectionVconcludesthepaper.II.CADVIST:THEPROPOSEDVISUALIZATIONTOOLClassificationandAnalysisofDengueVirusSerotypebyVisualizationTool,orCADViST,isavisualizationtoolproposedespeciallyforanalyzingtheDenguevirussequences.AllcomponentsanddetailsofCADViSTcanbedescribedindetailsasfollows:A.BasicLocalAlignmentSearchTool(BLAST)BLASTprogramisdevelopedbyStephenF.AltchulandhiscoworkersattheNationalCenterforBiotechnologyInformation(NCBI).Itiswidelyusedforcalculatingthesequencesimilarity.BLASTworksthroughtheheuristicalgorithmtofindthebestpossibleresults.Itfindsthehomologoussequencesbylocatingshortmatchesbetweentwosequencestomakethesearchfast.SimilaritymeasurementtechniqueofBLASTusesstatisticaltheorytoassignascoringmatrixforallpossiblepairsofresiduesandproducetheExpectvalue(E-value)foreachalignmentpair.Thestand-aloneBLASTprogramsareprovidedasacompressedpackage.Thepackage,availableasBLASTinitialedarchivesforavarietyofcomputerplatform,isavailableontheBLASTftpsite:/blast/executables/release/.Inthispaper,weemployedstand-aloneBLASTversion2.2.22togenerateBLASToutput,asinputoftheproposedsoftware(CADViST).B.UnitXGraphicalRepresentationUnitXgraphicalrepresentationcanefficientlyrevealthedistributionofamino/ketobasesalongthesequenceontwoquadrantsoftheCartesiancoordinatesystem.Thefirstquadrantrepresentstheamountofamino(CandA)whilethefourthquadrantrepresentsamountofketo(TandG).Theunitvectorsrepresentfournucleotides,i.e.adenines(A),guanine(G),thymine(T),andcytosine(C),aredemonstratedasfollows(Fig.1):Figure1.TheUnitXvectorsrepresentfournucleotidesA,G,CandT.ByassigningthenumbersofoccurringofbasesA,C,G,andTinthesequences,thecoordinate(x,y)oftheprojectionontoXandYaxeswithUnitXrepresentationcanbeillustratedasfollows:nullnullnullnullnullnullnullnullnull2nullnullnullnullnullnull2nullnullnullC.IdeaofCADViSTInthispaper,weemployBLASTina“stand-alone”modetofindthesimilarityscoreamongthequerysequenceandtheDenguevirusnucleotidedatabase.ThesearchresultsobtainedfromBLASTaregraphicallydisplayedviaUnitXrepresentation.D.CreatingnucleotideBLASTdatabaseThemainadvantageofstand-aloneBLASTprogramistobeabletocreateyourowndatabase.TocreateanucleotideBLASTdatabase,weneedasourcefileofsequenceinFASTAformat.Thisfilewillbeprocessedbytheformatdbprogramcontainedwithinthestand-aloneBLASTpackagetobuildindexfilesofthedatabase.Afterexecutingformatdbcommand,threefileswillbeproducedfromthesourceFASTAfile.Fornucleotidedatabases,theextensionsarenhr,nin,andnsq15.Theformatdbcommandcanbeshownasfollows:formatdb-pF-iDatabaseName.fastaThesourceFASTAfilewillhavetheform:FirstsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSecondsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXLastsequencedescriptionXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXwhereXsarenucleotidecodes(A,T,GorC).Inthispaper,thedatabaseoftheproposedsoftwareisobtainedfromNCBIwiththekeywordofDengueviruscompletegenome.All2,184nucleotidesequencescomposeoffourserotypesofDenguevirussequences(eachserotypecontains952,737,405and90nucleotidesequences,respectively).E.Stand-aloneexecutableBLASTThestand-aloneexecutableBLASTandNCBIweb-basedBLASTprogramprovideeasywaysforuserstoperformBLASTsearchviacommandlineorawebsite.TherearemanyadvantagestorunBLASTsearchprogramonyourownmachine,e.g.databasecanbeeasilyedited.Inthispaper,weemploystand-aloneBLASTprogramtogenerateBLASToutput.BLASTsearchcanbeexecutedviablastallcommandasfollow:blastall-pblastn-dDatabasename.fasta-iQuerySequence.fasta-m9-FFResult.txtF.GraphicalRepresentationviaUnitXInsteadyofdisplayingthesearchresultsinalphabets(Figs.4(b)and4(c)likeBLAST,CADViSTextractstheinformationfromBLASTandrepresentstheresultsgraphicallyviaUnitXrepresentationdescribedsectionIIB.Furthermore,inthecasethattheusersonlyneedtoexplorethenatureofDenguevirussequences,theycanalsoemployonlythegraphicalfeature(UnitX)ofCADViST.III.SOFTWAREARCHITECTUREOFCADVISTTodeveloptheuserfriendlyGUI,theproposedCADViSTsoftwareiswritteninC#programming.TheGUIofCADViSTcanbeshowninFig.3.Theinputfieldsforquerysequencecanbeeither(1)thetextfileinFASTAformator(2)textletterdirectlycopiedandputintotheblankspaceinFig.3.Oncetheinputisinserted,theprocessinsideCADViSTcanbesummarizedasfollows(Fig.2):Step1:Callstand-aloneBLASTprogramtogenerateBLASToutput,Step2:ExtractsequenceaccessionnumberandthecoordinatesofeachmatchedsequencefromBLASToutput,Step3:ProvidematchingregionsbetweenqueryandmatchedsequenceidentifiedbyBLASTprogramandsendtheresultstothedisplayunit,i.e.UnitXrepresentation.TheresultsareshowninFigs.4(d-e).Inaddition,otheroptionsofCADViSTarecopy,save,print,showpointvaluesinthegraphofUnitXvector.Theoptioncanbeselectedbymakingarightclickonthegraph.IV.SIMULATIONRESULTSAsanexample,weverifythemeritofCADViSTforfindingthesimilaritiesamongFN429899Denguevirusserotype3(20407143baseposition)andourDenguevirussequencedatabase.Traditionally,theresultsobtainedfromstand-aloneBLASTprogramconsistoftwomajorparts,i.e.(1)theone-linedescriptionsofeachdatabasesequencefoundtomatchthequerysequence(Fig.4(a),and(2)thealignmentbetweentheinputsequenceandthematchedquerysequences(Figs.4(b)-(c)16.Figs.4(b)and(c)illustratethefirstandsecondhighestscorematchedsequences,respectively.ByemployingtheinformationobtainedfromBLAST,Figs.4(d)-(e)representtheproposedgraphicalrepresentationviaCADViST.TheresultoftheproposedsoftwareconsistsoftwomainpartswhereeachpartdisplaysthegraphicalrepresentationviaUnitX.Thefirstpartshowsthewholegenomeofquerysequence(Fig.4(d).ThesecondpartdisplaysthematchedregionsbetweenthequeryandinputsequenceidentifiedbyBLAST(Fig.4(e).InFig.4(e),forconvenience,onlythefirst(FN429899)andsecond(AY858038)highestscoresmatchedsequencesareshown.Bothsequencesarealsofromthesameserotypeasourinputsequence.Asexpected,thefirsthighestscorematchedsequenceisthesequencethatwecopyitsportionasourquerysequence.Furthermore,accordingtoFigs.4(ac),wecanalsoobservethattheoutputofBLASTstilllacksofuserfriendlygraphicalrepresentation.Therefore,CADViSTcanefficientlybeoneofthealternativewaytovisualizetheresultingsequencesobtainedfromBLASTasshowninFigs.4(de).InFig.4(e),wecanobviouslyobservetheregionofthemismatchedbasepairs.TheresultoftheproposedsoftwarecanbedisplayedviathegraphoverlayingformattogetherwiththeUnitXrepresentationofthesequences(Fig.4(d).Figure2.FlowchartoftheproposedsoftwareFigure3.ScreenshotsoftheproposedsoftwareV.CONCLUSIONSInthispaper,wehavedevelopedthesoftwarecalledCADViST.TheproposedsoftwarecanbeusedtovisuallyanalyzethematchedregionsidentifiedbyBLASTbetweenthequerysequencesandtheDenguevirusdatabase.GraphicalrepresentationisimplementedviaUnitXwhichissuitableespeciallyforanalyzingdifferentserotypesofDenguevirusneocleotidesequences.ManyoptionsinCADViSTcanalsobenefitthebioinformaticsexperts,e.g.save,print,andshowtherawnumericvaluesonthegraph.WframworkofC#,CADViSTcanbeeasilymodifiedtoincludemoreopensourceorinhousedevelopedmathematicalmodeling,whilemaintainingtheuserfriendlyGUI.REFERENCES1E.HamoriandJ.Ruskin,“Hcurves,anovelmethodofrepresentationofnucleotideseriesespeciallysuitedforlongDNAsequences”.TheJournalofBiologicalChemistry,vol.258(2),1983,pp.1318-1327.2M.A.Gates,“SimpleDNAsequencerepresentations”,Nature,vol.316,1985.3A.Nandy,“Two-dimensionalgraphicalrepresentationofDNAsequencesandintron-exondiscriminationinintron-richsequences”.Bioinformatics,vol.12(1),1996,pp.55-62.4S.-T.Yau,J.Wang,A.Niknejad,C.Lu,N.JinandY-K.Ho,“DNAsequencerepresentationwithoutdegeneracy”,NucleicAcidsResearch,vol.31(12),2003,pp.3078-3080.5B.Viriyasaksathian,Y.WongsawatandP.Suriyaphol,“UnitX:Denguevirussequencegraphicalrepresentationforserotypesclassification”,ISBME2009,Bangkok,Thailand.6S.F.Altschul,W.Miller,E.MyersandD.J.Lipman,“Basiclocalalignmentsearchtool”,JournalofMolecularBiology,vol.215(3),1990,pp.403-410.7J.Ye,S.McGinnisandT.L.M
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 三明醫(yī)學(xué)科技職業(yè)學(xué)院《認(rèn)知心理學(xué)》2023-2024學(xué)年第二學(xué)期期末試卷
- 房改房上市交易收益分配與市場(chǎng)推廣合同
- 《核醫(yī)學(xué)放射性核素治療應(yīng)用課件》
- 《風(fēng)險(xiǎn)管理概要CEMA》課件
- 《質(zhì)量管理系統(tǒng)中的統(tǒng)計(jì)過(guò)程控制》課件
- 《難治性腹水的診斷與治療策略》課件
- 建筑施工安全管理與應(yīng)急預(yù)案制定課件分享
- 《連接元件與常見(jiàn)應(yīng)用》課件
- 《光纖通信技術(shù)及其應(yīng)用》課件
- 《遙感影像處理技術(shù)》課件
- 2024年注冊(cè)消防工程師題庫(kù)(全國(guó)通用)
- 靜脈留置針使用及維護(hù)培訓(xùn)課件
- 小學(xué)字母組合發(fā)音(課件)人教PEP版英語(yǔ)六年級(jí)下冊(cè)
- 低鈣血癥的病情觀察和護(hù)理
- 食堂食材配送服務(wù)方案及服務(wù)承諾
- 輔警培訓(xùn)工作方案
- 南京彭宇案完
- 哮喘患者的護(hù)理常規(guī) 課件
- YB-4001.1-2007鋼格柵板及配套件-第1部分:鋼格柵板(中文版)
- 2023年國(guó)家重點(diǎn)支持的八大高新技術(shù)領(lǐng)域
- 養(yǎng)殖場(chǎng)獸醫(yī)診斷與用藥制度范本
評(píng)論
0/150
提交評(píng)論