mega操作過程-多序列比對、進化樹、_第1頁
mega操作過程-多序列比對、進化樹、_第2頁
mega操作過程-多序列比對、進化樹、_第3頁
mega操作過程-多序列比對、進化樹、_第4頁
mega操作過程-多序列比對、進化樹、_第5頁
已閱讀5頁,還剩168頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、基礎(chǔ)生物信息學(xué)及應(yīng)用,王興平,多序列比對 分子進化分析系統(tǒng)發(fā)生樹構(gòu)建 核酸序列的預(yù)測與鑒定 酶切圖譜制作 引物設(shè)計,內(nèi) 容,多序列比對,內(nèi)容: 多序列比對 多序列比對程序及應(yīng)用,第一節(jié)、多序列比對 (Multiple sequence alignment),概念 多序列比對的意義 多序列比對的打分函數(shù) 多序列比對的方法,1、概念,多序列比對(Multiple sequence alignment) align multiple related sequences to achieve optimal matching of the sequences. 為了便于描述,對多序列比對過程可以給出下

2、面的定義:把多序列比對看作一張二維表,表中每一行代表一個序列,每一列代表一個殘基的位置。將序列依照下列規(guī)則填入表中: (a)一個序列所有殘基的相對位置保持不變; (b)將不同序列間相同或相似的殘基放入同一列,即盡可能將序列間相同或相似殘基上下對齊(下表)。,表1 多序列比對的定義,表示五個短序列(I-V)的比對結(jié)果。通過插入空位,使5個序列中大多數(shù)相同或相似殘基放入同一列,并保持每個序列殘基順序不變,2、多序列比對的意義,用于描述一組序列之間的相似性關(guān)系,以便了解一個分子家族的基本特征,尋找motif,保守區(qū)域等。 用于描述一組同源序列之間的親緣關(guān)系的遠近,應(yīng)用到分子進化分析中。 序列同源性分

3、析:是將待研究序列加入到一組與之同源,但來自不同物種的序列中進行多序列同時比較,以確定該序列與其它序列間的同源性大小。 其他應(yīng)用,如構(gòu)建profile,打分矩陣等,手工比對 在運行經(jīng)過測試并具有比較高的可信度的計算機程序(輔助編輯軟件如bioedit,seaview,Genedoc等)基礎(chǔ)上,結(jié)合實驗結(jié)果或文獻資料,對多序列比對結(jié)果進行手工修飾,應(yīng)該說是非常必要的。 為了便于進行交互式手工比對,通常使用不同顏色表示具有不同特性的殘基,以幫助判別序列之間的相似性。 計算機程序自動比對 通過特定的算法(如窮舉法,啟發(fā)式算法等),由計算機程序自動搜索最佳的多序列比對狀態(tài)。,3、多序列比對的方法,窮舉

4、法,窮舉法(exhaustive alignment method) 將序列兩兩比對時的二維動態(tài)規(guī)劃矩陣擴展到多維矩陣。即用矩陣的維數(shù)來反映比對的序列數(shù)目。這種方法的計算量很大,對于計算機系統(tǒng)的資源要求比較高,一般只有在進行少數(shù)的較短的序列的比對的時候才會用到這個方法 DCA (Divide-and-Conquer Alignment):a web-based program that is semiexhaustive http:/bibiserv.techfak.uni-bielefeld.de/dca/,啟發(fā)式算法,啟發(fā)式算法(heuristic algorithms): 大多數(shù)實用的多

5、序列比對程序采用啟發(fā)式算法(heuristic algorithms),以降低運算復(fù)雜度。 隨著序列數(shù)量的增加,算法復(fù)雜性也不斷增加。用O(m1m2m3mn)表示對n個序列進行比對時的算法復(fù)雜性,其中mn是最后一條序列的長度。若序列長度相差不大,則可簡化成O(mn),其中n表示序列的數(shù)目,m表示序列的長度。顯然,隨著序列數(shù)量的增加,序列比對的算法復(fù)雜性按指數(shù)規(guī)律增長。,第二節(jié) 多序列比對程序及應(yīng)用,Progressive Alignment Method Iterative Alignment Block-Based Alignment DNASTAR DNAMAN,1、Progressive

6、 Alignment Method,Clustal: Clustal,是由Feng和Doolittle于1987年提出的。 Clustal程序有許多版本 ClustalW(Thompson等,1994)是目前使用最廣泛的多序列比對程序 它的PC版本是ClustalX 作為程序的一部分,Clustal 可以輸出用于構(gòu)建進化樹的數(shù)據(jù)。,ClustalW 程序:ClustalW 程序可以自由使用 在NCBI/EBI的FTP服務(wù)器上可以找到下載的軟件包。ClustalW 程序用選項單逐步指導(dǎo)用戶進行操作,用戶可根據(jù)需要選擇打分矩陣、設(shè)置空位罰分等。 ftp:/ftp.ebi.ac.uk/pub/sof

7、tware/ EBI的主頁還提供了基于Web的ClustalW服務(wù),用戶可以把序列和各種要求通過表單提交到服務(wù)器上,服務(wù)器把計算的結(jié)果用Email返回用戶(或在線交互使用)。 http:/www.ebi.ac.uk/clustalw/,Progressive Alignment Method,ClustalW 程序 ClustalW對輸入序列的格式比較靈活,可以是FASTA格式,還可以是PIR、SWISS-PROT、GDE、Clustal、GCG/MSF、RSF等格式。 輸出格式也可以選擇,有ALN、GCG、PHYLIP和GDE等,用戶可以根據(jù)自己的需要選擇合適的輸出格式。 用ClustalW

8、得到的多序列比對結(jié)果中,所有序列排列在一起,并以特定的符號代表各個位點上殘基的保守性,“*”號表示保守性極高的殘基位點;“.”號代表保守性略低的殘基位點。,Progressive Alignment Method,Clustal W 使用 輸入地址:http:/www.ebi.ac.uk/clustalw/ 設(shè)置選項 (next),Progressive Alignment Method,Clustal W 使用 一些選項說明 PHYLOGENETIC TREE有三個選項 TREE TYPE:構(gòu)建系統(tǒng)發(fā)育樹的算法,有四個個選擇none、nj(neighbour joining)、phylip、

9、dist CORRECT DIST:決定是否做距離修正。對于小的序列歧異(10),選擇與否不會產(chǎn)生差異;對于大的序列歧異,需做出修正。因為觀察到的距離要比真實的進化距離低。 IGNORE GAPS:選擇on,序列中的任何空位將被忽視。 詳細說明參見 http:/www.ebi.ac.uk/clustalw/clustalw_frame.html,Progressive Alignment Method,Clustal W 使用 輸入5個16S RNA 基因序列 AF310602 AF308147 AF283499 AF012090 AF447394 點擊“RUN”,Progressive Al

10、ignment Method,Progressive Alignment Method,T-Coffee (Tree-based Consistency Objective Function for alignment Evaluation): Progressive alignment method /software/TCoffee.html In processing a query, T-Coffee performs both global and local pairwise alignment for all possible pairs inv

11、olved. A distance matrix is built to derive a guide tree, which is then used to direct a full multiple alignment using the progressive approach. Outperforms Clustal when aligning moderately divergent sequences Slower than Clustal,Progressive Alignment Method,PRALINE: web-based: http:/ibivu.cs.vu.nl/

12、programs/pralinewww/ First build profiles for each sequence using PSI-BLAST database searching. Each profile is then used for multiple alignment using the progressive approach. the closest neighbor to be joined to a larger alignment by comparing the profile scores does not use a guide tree Incorpora

13、te protein secondary structure information to modify the profile scores. Perhaps the most sophisticated and accurate alignment program available. Extremely slow computation.,Progressive Alignment Method,DbClustal: http:/igbmc.u-strasbg.fr:8080/DbClustal/dbclustal.html Poa (Partial order alignments):

14、 /poa/,2、Iterative Alignment,PRRN: web-based program http:/prrn.ims.u-tokyo.ac.jp/ Uses a double nested iterative strategy for multiple alignment. Based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions,Block-Base

15、d Alignment,DIALIGN2: a web based program http:/bioweb.pasteur.fr/seqanal/interfaces/dialign2.html It places emphasis on block-to-block comparison rather than residue-to-residue comparison. The sequence regions between the blocks are left unaligned. The program has been shown to be especially suitab

16、le for aligning divergent sequences with only local similarity.,Block-Based Alignment,Match-Box: web-based server http:/www.fundp.ac.be/sciences/biologie/bms/matchbox_submit.shtml Aims to identify conserved blocks (or boxes) among sequences. The server requires the user to submit a set of sequences

17、in the FASTA format and the results are returned by e-mail.,DNASTAR DNAMAN,軟件:,分子進化分析系統(tǒng)發(fā)生樹構(gòu)建,本章內(nèi)容: 分子進化分析介紹 系統(tǒng)發(fā)生樹構(gòu)建方法 系統(tǒng)發(fā)生樹構(gòu)建實例,第一節(jié) 分子進化分析介紹,基本概念: 系統(tǒng)發(fā)生(phylogeny)是指生物形成或進化的歷史 系統(tǒng)發(fā)生學(xué)(phylogenetics)研究物種之間的進化關(guān)系 系統(tǒng)發(fā)生樹(phylogenetic tree)表示形式,描述物種之間進化關(guān)系,分子進化研究的目的 從物種的一些分子特性出發(fā),從而了解物種之間的生物系統(tǒng)發(fā)生的關(guān)系。 蛋白和核酸序列 通過

18、序列同源性的比較進而了解基因的進化以及生物系統(tǒng)發(fā)生的內(nèi)在規(guī)律,分子進化分析介紹,分子進化分析介紹,分子進化研究的基礎(chǔ) 基本理論:在各種不同的發(fā)育譜系及足夠大的進化時間尺度中,許多序列的進化速率幾乎是恒定不變的。(分子鐘理論, Molecular clock 1965 ),實際情況:雖然很多時候仍然存在爭議,但是分子進化確實能闡述一些生物系統(tǒng)發(fā)生的內(nèi)在規(guī)律,分子進化分析介紹,直系同源與旁系同源 Orthologs(直系同源): Homologous sequences in different species that arose from a common ancestral gene dur

19、ing speciation; may or may not be responsible for a similar function. Paralogs(旁系同源): Homologous sequences within a single species that arose by gene duplication. 。 以上兩個概念代表了兩個不同的進化事件。用于分子進化分析中的序列必須是直系同源的,才能真實反映進化過程。,分子進化分析介紹,分子進化分析介紹,系統(tǒng)發(fā)生樹(phylogenetic tree): 又名進化樹(evolutionary tree)已發(fā)展成為多學(xué)科交叉形成的一個

20、邊緣領(lǐng)域。 包括生命科學(xué)中的進化論、遺傳學(xué)、分類學(xué)、分子生物學(xué)、生物化學(xué)、生物物理學(xué)和生態(tài)學(xué),又包括數(shù)學(xué)中的概率統(tǒng)計、圖論、計算機科學(xué)和群論。 聞名國際生物學(xué)界的美國冷泉港定量生物學(xué)會議于1987年特辟出進化樹專欄進行學(xué)術(shù)討論,標(biāo)志著該領(lǐng)域已成為現(xiàn)代生物學(xué)的前沿之一,迄今仍很活躍。,分子進化分析介紹,分子進化分析介紹,系統(tǒng)發(fā)生樹結(jié)構(gòu) The lines in the tree are called branches(分支). At the tips of the branches are present-day species or sequences known as taxa (分類,th

21、e singular form is taxon) or operational taxonomic units(運籌分類單位). The connecting point where two adjacent branches join is called a node(節(jié)點), which represents an inferred ancestor of extant taxa. The bifurcating point at the very bottom of the tree is the root node(根節(jié)), which represents the common a

22、ncestor of all members of the tree. A group of taxa descended from a single common ancestor is defined as a clade or monophyletic group (單源群). The branching pattern in a tree is called tree topology(拓撲結(jié)構(gòu)).,分子進化分析介紹,有根樹與無根樹 樹根代表一組分類的共同祖先,分子進化分析介紹,如何確定樹根 根據(jù)外圍群:One is to use an outgroup(外圍群), which is

23、a sequence that is homologous to the sequences under consideration, but separated from those sequences at an early evolutionary time. 根據(jù)中點:In the absence of a good outgroup, a tree can be rooted using the midpoint rooting approach, in which the midpoint of the two most divergent groups judged by ove

24、rall branch lengths is assigned as the root.,Rooted by outgroup,分子進化分析介紹,分子進化分析介紹,樹形 系統(tǒng)發(fā)生圖(Phylograms):有分支和支長信息 分支圖( Cladograms)只有分支信息,無支長信息,第二節(jié) 系統(tǒng)發(fā)生樹構(gòu)建方法,Molecular phylogenetic tree construction can be divided into five steps: (1) choosing molecular markers; (2) performing multiple sequence alignme

25、nt; (3) choosing a model of evolution; (4) determining a tree building method; (5) assessing tree reliability.,第三節(jié) 系統(tǒng)發(fā)生樹構(gòu)建實例,系統(tǒng)發(fā)生分析常用軟件 (1) PHYLIP (2) PAUP (3) TREE-PUZZLE (4) MEGA (5) PAML (6) TreeView,(7) VOSTORG (8) Fitch programs (9) Phylo_win (10) ARB (11) DAMBE (12) PAL (13) Bionumerics,其它程序見:

26、 /phylip/software.html,系統(tǒng)發(fā)生樹構(gòu)建實例,Mega 3 下載地址,離散特征數(shù)據(jù) (discrete character data): 即所獲得的是2個或更多的離散的值。如: DNA序列某一位置是或者不是剪切位點(二態(tài)特征); 序列中某一位置,可能的堿基有A、T、G、C共4種(多態(tài)特征); 相似性和距離數(shù)據(jù) (similarity and distance data): 是用彼此間的相似性或距離所表示出來的各分類單位間的相互關(guān)系。,核酸序列的預(yù)測和鑒定,內(nèi)容: 序列概率信息的統(tǒng)計模型 核酸序列的

27、預(yù)測與鑒定,第一節(jié)、序列概率信息的統(tǒng)計模型,One of the applications of multiple sequence alignments in identifying related sequences in databases is by construction of some statistical models. Position-specific scoring matrices (PSSMs) Profiles Hidden Markov models (HMMs).,收集已知的功能序列和非功能序列實例 (這些序列之間是非相關(guān)的 ),訓(xùn)練集 (training s

28、et),測試集或控制集 (control set),建立完成識別任務(wù)的模型,檢驗所建模型的正確性,對預(yù)測模型進行訓(xùn)練, 使之通過學(xué)習(xí)后具有 正確處理和辨別能力。,進行“功能”與“非功能”的 判斷,根據(jù)判斷結(jié)果計算 模識別的準確性。,識別“功能序列”和“非功能序列”的過程,多序列比對,相關(guān)序列選取,模型構(gòu)建,模型訓(xùn)練,參數(shù)調(diào)整,應(yīng)用,確立模型 Profile HMM,Hmmcalibrate,ClustalX,Hmmbuild,Hmmt,Hidden Markov Model,Hidden Markov Model,應(yīng)用 HMMs has more predictive power than P

29、rofiles. HMM is able to differentiate between insertion and deletion states In profile calculation, a single gap penalty score that is often subjectively determined represents either an insertion or deletion.,Hidden Markov Model,應(yīng)用 Once an HMM is established based on the training sequences, It can b

30、e used to determine how well an unknown sequence matches the model. It can be used for the construction of multiple alignment of related sequences. HMMs can be used for database searching to detect distant sequence homologs. HMMs are also used in Protein family classification through motif and patte

31、rn identification Advanced gene and promoter prediction, Transmembrane protein prediction, Protein fold recognition.,第二節(jié) 核酸序列的預(yù)測與鑒定,本節(jié)內(nèi)容 核酸序列預(yù)測概念 基因預(yù)測 啟動子和調(diào)控元件預(yù)測 酶切位點分析與引物設(shè)計,1、核酸序列預(yù)測概念,指利用一些計算方式(計算機程序)從基因組序列中發(fā)現(xiàn)基因及其表達調(diào)控元件的位置和結(jié)構(gòu)的過程。包括: 基因預(yù)測( Gene Prediction ) 基因表達調(diào)控元件預(yù)測(Promoter and Regulatory Element

32、 Prediction),Structure of Eukaryotic Genes,AGCATCGAAGTTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGCAAGTTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGCAAGTTGCATGACGATTGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCAT

33、GACCTAGCAAGAAGTTGCATGACGATGCATGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGCAAGTTGCATGACGATTGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGCGATGCATGACCTAGCAAGAAGTTGCATGACGATGCATGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGCAAGTTGCATGACGATTGACCTAGTGCATGACTGACCTAGCAGCATCGAAGT

34、TGCATGACGATGCATGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGCAAGTTGCATGACGATTGACCTAGTGCATGACGATGCATGACCTAGCAGCATCGAAGTTGCATGACGATGCATGACCTAGCAAGAAGTTGCATGACGATGCATGACCTAATGC,第二節(jié) 核酸序列的預(yù)測與鑒定,本節(jié)內(nèi)容 核酸序列預(yù)測概念 基因預(yù)測 啟動子和調(diào)控元件預(yù)測 酶切位點分析與引物設(shè)計,基因預(yù)測的概念及意義 原核基因識別 真核基因預(yù)測的困難性 真核基因預(yù)測的依據(jù) 真核基因預(yù)測的基本步驟及

35、策略 真核基因預(yù)測方法及其基本原理,2、基因預(yù)測,概念: Gene Prediction: Given an uncharacterized DNA sequence, find out: Where does the gene starts and ends? detection of the location of open reading frames (ORFs) Which regions code for a protein? delineation of the structures of introns as well as exons (eukaryotic),2.1 基因預(yù)

36、測的概念及意義,基因預(yù)測的概念及意義,意義: Computational Gene Finding (Gene Prediction) is one of the most challenging and interesting problems in bioinformatics at the moment. Computational Gene Finding is important because So many genomes have been being sequenced so rapidly. Pure biological means are time consuming

37、and costly. Finding genes in DNA sequences is the foundation for all further investigation (Knowledge of the protein-coding regions underpins functional genomics).,基因預(yù)測的概念及意義 原核基因識別 真核基因預(yù)測的困難性 真核基因預(yù)測的依據(jù) 真核基因預(yù)測的基本步驟及策略 真核基因預(yù)測方法及其基本原理,2、基因預(yù)測,2.2、原核基因識別,原核基因識別任務(wù)的重點是識別開放閱讀框,或者說識別長的編碼區(qū)域。 一個開放閱讀框(ORF, ope

38、n reading frame)是一個沒有終止編碼的密碼子序列。,原核基因預(yù)測工具介紹 ORF Finder HMM-based gene finding programs GeneMark Glimmer FGENESB RBSfinder,原核基因識別,ORF Finder (Open Reading Frame Finder) /gorf/gorf.html,原核基因識別,zinc-binding alcohol dehydrogenase, novicida(弗朗西絲菌 ),HMM-based gene finding program

39、s GeneMark: Trained on a number of complete microbial genomes /GeneMark/,原核基因識別,HMM-based gene finding programs Glimmer (Gene Locator and Interpolated Markov Modeler): A UNIX program /softlab/glimmer/glimmer.html,原核基因識別,HMM-based gene finding programs FGENESB

40、: Web-based program Trained for bacterial sequences ,原核基因識別,HMM-based gene finding programs RBSfinder: UNIX program Predicted start sites /pub/software/RBSfinder/,原核基因識別,基因預(yù)測的概念及意義 原核基因識別 真核基因預(yù)測的困難性 真核基因預(yù)測的依據(jù) 真核基因預(yù)測的基本步驟及策略 真核基因預(yù)測方法及其基本原理,2、基因預(yù)測,Why is Gene Prediction Challenging? C

41、oding density: as the coding/non-coding length ratio decreases, exon prediction becomes more complex. Some facts about human genome Coding regions comprise less than 3% of the genome There is a gene of 2400000 bps, only 14000 bps are CDS ( 0.5 are deemed reliable. This program is trained for sequenc

42、es from vertebrates, Arabidopsis, and maize. It has been used extensively in annotating the human genome.,真核基因預(yù)測方法及其基本原理,Ab InitioBased Programs GRAIL (Gene Recognition and Assembly Internet Link): a web-based program: /public/tools/ based on a neural network algorithm. The pro

43、gram is trained on several statistical features such as splice junctions, start and stop codons, poly-A sites, promoters, and CpG islands. The program scans the query sequence with windows of variable lengths and scores for coding potentials and finally produces an output that is the result of exon

44、candidates. The program is currently trained for human, mouse, Arabidopsis, Drosophila, and Escherichia coli sequences.,真核基因預(yù)測方法及其基本原理,Ab InitioBased Programs FGENES (FindGenes) Web-based program: Uses LDA to determine whether a signal is an exon. In addition to FGENES, there are many variants of th

45、e program: FGENESH: make use of HMMs. FGENESH C: similarity based. FGENESH+: combine both ab initio and similarity-based approaches.,真核基因預(yù)測方法及其基本原理,Ab InitioBased Programs MZEF (Michael Zhangs Exon Finder) Web based: /genefinder/ Uses QDA for exon prediction. Has not been obvious

46、 in actual gene prediction.,真核基因預(yù)測方法及其基本原理,Ab InitioBased Programs HMMgene: Web based: www.cbs.dtu.dk/services/HMMgene HMM-based program. The unique feature of the program is that it uses a criterion called the conditional maximum likelihood to discriminate coding from noncoding features. If a seque

47、nce already has a subregion identified as coding region, which may be based on similarity with cDNAs or proteins in a database, these regions are locked as coding regions. An HMM prediction is subsequently made with a bias toward the locked region and is extended from the locked region to predict th

48、e rest of the gene coding regions and even neighboring genes. The program is in a way a hybrid algorithm that uses both ab initio-based and homology-based criteria.,真核基因預(yù)測方法及其基本原理,真核基因預(yù)測方法及其基本原理,Homology-Based Programs Homology-based programs are based on the fact that exon structures and exon seque

49、nces of related species are highly conserved. When potential coding frames in a query sequence are translated and used to align with closest protein homologs found in databases, near perfectly matched regions can be used to reveal the exon boundaries in the query. This approach assumes that the data

50、base sequences are correct. It is a reasonable assumption in light of the fact that many homologous sequences to be compared with are derived from cDNA or expressed sequence tags (ESTs) of the same species.,Homology-Based Programs: 優(yōu)勢:With the support of experimental evidence, this method becomes ra

51、ther efficient in finding genes in an unknown genomic DNA. 不足:The drawback of this approach is its reliance on the presence of homologs in databases. If the homologs are not available in the database, the method cannot be used. Novel genes in a new species cannot be discovered without matches in the

52、 database.,真核基因預(yù)測方法及其基本原理,Homology-Based Programs GenomeScan web-based server: /genomescan.html Combines GENSCAN prediction results with BLASTX similarity searches. The user provides genomic DNA and protein sequences from related species. The genomic DNA is translated in all six f

53、rames to cover all possible exons. The translated exons are then used to compare with the user-supplied protein sequences. Translated genomic regions having high similarity at the protein level receive higher scores. The same sequence is also predicted with a GENSCAN algorithm, which gives exons pro

54、bability scores. Final exons are assigned based on combined score information from both analyses.,真核基因預(yù)測方法及其基本原理,Homology-Based Programs EST2Genome: web-based program:http:/bioweb.pasteur.fr/seqanal/interfaces/est2genome.html To define intronexon boundaries. Purely based on the sequence alignment ap

55、proach The program compares an EST (or cDNA) sequence with a genomic DNA sequence containing the corresponding gene. The alignment is done using a dynamic programmingbased algorithm.,真核基因預(yù)測方法及其基本原理,Homology-Based Programs TwinScan / A similarity-based gene-finding server. Pre

56、dict exons How to works: it uses GenScan to predict all possible exons from the genomic sequence. The putative exons are used for BLAST searching to find closest homologs. The putative exons and homologs from BLAST searching are aligned to identify the best match. Only the closest match from a genom

57、e database is used as a template for refining the previous exon selection and exon boundaries.,真核基因預(yù)測方法及其基本原理,真核基因預(yù)測方法及其基本原理,Consensus-Based Programs These programs work by retaining common predictions agreed by most programs and removing inconsistent predictions. Such an integrated approach may imp

58、rove the specificity by correcting the false positives and the problem of over prediction. However, since this procedure punishes novel predictions, it may lead to lowered sensitivity and missed predictions. Two examples of consensus-based programs are given next.,Consensus-Based Programs GeneComber

59、: a web server: www.bioinformatics.ubc.ca/genecomber/index.php Combines HMMgene and GenScan prediction results. The consistency of both prediction methods is calculated. If the two predictions match, the exon score is reinforced. If not, exons are proposed based on separate threshold scores.,真核基因預(yù)測方法及其基本原理,Consensus-Based Programs DIGIT: web server:http:/digit.gsc.riken.go.jp/cgi-bin/index.cgi First, existing gene-finders ( FGENESH, GENSC

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論