生物信息學(xué)課件英文原版課件 (28)_第1頁
生物信息學(xué)課件英文原版課件 (28)_第2頁
生物信息學(xué)課件英文原版課件 (28)_第3頁
生物信息學(xué)課件英文原版課件 (28)_第4頁
生物信息學(xué)課件英文原版課件 (28)_第5頁
已閱讀5頁,還剩42頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Caratterizzazione funzionale di trascritti attraverso lindividuazione di elementi di controllo post-trascrizionaleGraziano PESOLEUniversit di Milano7 Marzo 2002 graziano.pesoleunimi.it Microarray Experiment Result Functional classification Gene characterization problemGene classification problem Gen

2、e characterization problemESTsmRNA533CDSknown genegene identification through database searching (e.g. BLASTX) gene identification by aa pattern discovery (e.g. Pfam, etc.)gene identification by nt pattern discovery (e.g.UTRsite patterns) EST assembling Gene characterization problemmRNA53CDSCDSDNApr

3、oteinEventual alternatively spliced mRNAs Gene characterization problem3 EST53CDS?mRNA 3UTRpost-transcriptional regulation of gene expressionSTABILITYTRANSLATIONSUBCELLULARLOCALIZATIONCDS3UTR3UTR5UTRmRNABiological role of mRNA UTRsnuclear exportpolyadenylation statusmRNA cellular localizationControl

4、 of mRNA stabilityControl of mRNA translationFunctionmediated byoligonucleotide patternsstem-loop structuresPatterns known to play some regulatory activity (experimental assay, phylogenetic footprinting) Pattern discovery in biosequencesNovel patterns candidate of functional activity (Statistical an

5、alyses carried out on non-redundant databases)UTRdb - release 14.0 (January 2001)Functional Elements annotation(UTRsite database)Detection of known patternsConsensus OligonucleotidesConsensus Secondary StructuresConsensus OligonucleotidesRegular Expression StringsConsensus Matricese.g. AAUAAA polyad

6、enylation site C-x(2,4)-C-x(3)-LIVMFYWC-x(8)-H-x(3,5)-H C2H2 zinc-finger (deterministic description of the pattern : YES/NO)(probabilistic description of the pattern : threshold value)AU-rich elements (AREs)AU-rich elements (AREs) present in the 3untranslated region of mature lymphokine and cytokine

7、 mRNAs regulate mRNA stability and translational efficiency. Based on their sequence features and functional properties AREs can be divided into three classes. Class II AREs direct asynchronous cytoplasmic deadenylation (processive kinetics)generating poly-A(-) mRNAs.Among mRNAs with class II ARE in

8、 their 3 UTR are GM-CSF, IL-2. The minimum number of AUUUA tandem motifs to activate processive degradation is 4 (of which at least 3 in tandem). Furthermore, an AU-rich region 20-30 nt long immediately 5 to the cluster of AUUUA motifs can greatly enhance the degradation activity.CDSau-richAn5UTR3UT

9、RARE IIauuuacytoplasmic polyadenylylationCytoplasmic polyadenylation is an evolutionarily conserved mechanism regulating translational activation of a set of quiescent maternal messenger RNAs (mRNAs) during early development. Cytoplasmic poly(A) elongation occurs in a wide range of species, ranging

10、from clam to mouse. The relevance of this process during late oogenesis and early embryogenesis has been shown by studies in the mouse, Xenopus, and Drosophila. In mouse and Xenopus, the only vertebrates so far examined, the critical regulatory sequences, referred to as Cytoplasmic Polyadenylation E

11、lements (CPEs), are AU-rich and located in the 3 untranslated region (3-UTR) near the canonical nuclear polyadenylation element (AAUAAA), which also is required for proper poly(A) addition. The CPE has the general structure of UUUUUUAU. However, the CPE is not identical in all mRNAs and its position

12、 varies relative to AAUAAA (generally within 100 nucleotides). The minimal CPE capable to stimulate elongation of a poly(A) tail appears to be UUUUAU, and recent experiments show also the existence of substantial context and position effects on CPE function.CDSuuuuuuauaauaaaAn5UTR3UTRCPEConsensus St

13、ructuresmRNA patterns usually located in 5 or 3 UTRs that are able to fold into specific secondary structures able to be recognized by specific RNA binding proteinsHistone 3UTR mRNA elementMetazoan histone 3-UTR mRNAs, lacking a polyA tail, contain a highly conserved stem-loop structure with a six b

14、ase stem and a four base loop. This stem-loop structure plays a different role in the nucleus and in the cytoplasm. In the nucleus, it is involved in pre-mRNA processing and nucleocytoplasmic transport, whereas in the cytoplasm it enhances translation efficiency and regulates histone mRNA stability.

15、 The trans-acting factor which interacts with the 3-UTR hairpin structure of histone mRNAs is a 31 kDa stem-loop binding protein in mammals (SLBP) present both in nuclei and polyribosomes. In mammals in addition to SLBP histone mRNA processing requires at least one additional factor: the U7 snRNP, w

16、hich binds a purine-rich element 10-20 nt downstream of the stem-loop sequence (Histone Downstream Element, HDE).The histone 3-UTR hairpin structure is peculiar in that the bases of the stem are conserved unlike most functional hairpin motifs where conserved bases are found in single stranded loop r

17、egions only. The sequence of the stem an flanking sequences are critical for binding of the SLBP.SElenoCysteine Insertion Sequence (SECIS)Specific incorporation of selenocysteine in selenoproteins is directed by UGA codons residing within the coding sequence of the corresponding mRNAs. Translation o

18、f UGA, usually a termination codon, as selenocysteine requires a conserved stem-loop structure called SElenoCysteine Insertion Sequence (SECIS) lying in the 3UTR region of selenoprotein mRNAs. The consensus structure of SECIS element determined by comparative analysis of several selenoprotein mRNAs

19、as well as on both RNase and chemical probing. UGASECISnanos TCEThe 3UTR of drosophila nanos mRNA contains the essential signals for generating the nanos gradient emanating from the posterior pole of drosophila embryos.The polarized distribution of nanos is generated by the localization dependent tr

20、anslation of nanos mRNA in the pole plasm. The translation control element (TCE) consists of a a 90-nt region located in the 3UTR of nanos mRNA which is able to fold into a bipartite secondary structure that is recognized by Smaug repressor and at least one additional factor. A stem-loop bearing the

21、 CUGGC pentamer in the loop is required for Smaug:TCE interaction whereas both sequence and structure of another stem-loop is critical for TCE function. Translation activation is mediated by the interaction of localization factors with a 540-nt sequence regions overlapping the TCE structural motif.

22、Localization of nanos RNA to the posterior pole of the embryo is essential for translation of Nanos protein. In situ hybridization to nanos RNA (top), antibody staining with anti-Nanos antibody (bottom). How to find known patterns in unknown sequences?PatSearch PatSearch is a pattern matcher which s

23、earches protein or nucleotide (DNA, RNA, tRNA, and so on)sequences in order to find instances of a pattern which you can give as input. It is able to find, in a given sequence, kinds of loop structures that characterize tRNAs, rRNAs (hairpin loop, stem loop with bulges or internal loops) and/or any

24、kind of pattern in DNA and protein sequences. PatSearch also allows to use non-standard pairings for reverse matching and tolerate some numbers of mismatches and bulges. PatSearch SyntaxA pattern is a sequence of simple pattern units. A simple pattern unit is either a named pattern unit, a complemen

25、tation rule pattern unit or a basic pattern unit. In pattern definition, upper and lower case can be used interchangeably. Named Pattern Unit Name=BasicPatternp1=4.7 or p1=ACGTAGTComplementation rule pattern unit Name=Complementsr1=au,ua,gc,cgBasicPattern:1. String of characters to be matched (inclu

26、ding standard ambiguity codes for nucleotides and character X for proteins), optionally followed by a match qualifier of the form Mismatches,Deletions,Insertions Example: TATAA1,0,0 match TATAA, allowing 1 mismatch2. Range pattern unit of the form “Min.Max Example: 0.5 match 1 to 5 characters or non

27、e 3.Complement pattern unit that is used to match the reverse complement of a named pattern unit, previously defined. Example: r1p2 match the reverse complement of p2 using rules r14.Length-limit pattern unit puts a upper or lower bound on the sum of the lengths matched by previous named pattern uni

28、ts. Example: length(p1+p2+p3) 95.Weight pattern unit is used for matching against nucleotide sequences by using a weight matrix (Log-odds scoring method or matrix similarity method). Histone 3UTR mRNA elementPatSearch pattern:r1=au,ua,gc,cg,gu,ugn mmm p1=ggyyy u hhuh a r1p1 mm 03(m=a/c; y=c/u; h=not

29、 g)AUGInternal Ribosome Entry Site (IRES)PatSearch syntax: r1=au,ua,cg,gc,gu,ugp1=5.6 0.6 p2=5.6 p3=0.2 p4=5.8 p5=3.8 r1p41,0,0 p6=5.8 p7=3.5 r1p61,0,0 p8=0.5 r1p21,0,0 0.6 r1p11,0,0 2.5 p9=5.6 p10=3.8 r1p91,0,0 3.10 $ :/bigarea.area.ba r.it:8000/EmbIT/Patsearch.htmlPatSearch OutputEM_EST1:AA545838

30、:420,442 :C CAAC GGCCC T CTTT A GGGCC ACEM_EST1:AA545875 :369,391 :C CAAC GGCCC T CTTT A GGGCC ACEM_EST1:AA545884 :419,441 :C CAAC GGCCC T CTTT A GGGCC ACEM_EST1:AA563524 :422,444 :C CAAC GGCCC T CTTT A GGGCC ACEM_EST1:AA610040 :403,425 :C CAAA GGCTC T TTTC A GAGCC ACEM_EST2:AA701695 :155,177 :C CAA

31、C GGCCC T CTTT A GGGCC AC DatabaseDivisionEntry namePositionMatchedpatternSearch of the Histone 3UTR motif in EST sequencesGiven that only about ONE random hit of this pattern is expected in 100 Mb we can reliably assign the matched ESTs to mRNAs coding for histone-related proteinsPatterns known to

32、play some regulatory activity (experimental assay, phylogenetic footprinting) Pattern discovery in biosequencesNovel patterns candidate of functional activity (Statistical analyses carried out on non-redundant databases) Monitoring gene expression using DNA microarrays(a) SOM clustering(b) Hierarchi

33、cal clusteringIdentification of shared patterns in functionally equivalent sequencesIf you have a group of genes with a similar expression profile (e.g. those activated at the same time in cell-cycle) a natural assumption is that this profile is, at least partly, caused by and reflected in a similar

34、 structure of regions involved in transcription regulation. Identification of shared patterns in functionally equivalent sequencesResearch has thus focused on the detection of motifs (possibly representing TF-bindiing sites) common to the promoter sequences of putatively coregulated genes. Identific

35、ation of shared patterns in functionally equivalent sequencesSignificant PatternRandom PatternHow to decide if a pattern is significant? Occurrence Position Information ContentHow to decide if a pattern is significant? Occurrence The number of sequences containing a given pattern should be significa

36、ntly greater than expected (e.g. WordUP algorithm).PATTERNS OBSERVED EXPECTED CHI-SQUAREAATAAA 1345 414.00035 2093.62225AAATAA 834 414.00035 426.08588ATAAAG 578 258.12928 396.37997ATAAAA 744 414.00035 263.04270CCCCCC 273 654.61047 222.46291ATAAAT 584 321.22291 214.96537GAAATA 443 239.14498 173.77269

37、TAAATA 496 285.44362 155.31610TGTATTT 243 103.18333 189.45602TGTATAT 154 56.34083 169.27891ATATTTA 221 95.25445 165.99689TTTATAT 218 103.87432 125.38875TGTACAT 130 50.48650 125.22942ATATATA 136 59.08119 100.14193GCGGCCGC 38 5.41527 196.06842ATATATTT 100 31.42024 149.68643GGGTGGGG 92 31.82251 113.797

38、74TTTAAAAA 211 103.89444 110.41593TACATTTT 92 33.40711 102.76638TATTTATTT 94 16.60269 360.80574TTTTTAAAA 139 40.25077 242.26645How to decide if a pattern is significant? Position Functional patterns are generally in conserved positions (e.g. within a specific distance range from the transcription st

39、art site). How to decide if a pattern is significant? Information ContentThe functional constraints on each specific position of the pattern are variable from some sites absolutely conserved (Shannons information content Ci ranging between 0 and 1).Test caseTen random sequences 1 kb long, each conta

40、ining the same 15-mer motif (up to 2 mismatches allowed).Tools for the extraction of functional motifs Two different approaches can be used to extract functional motifs from regulatory regions of coregulated genes: Alignment methods Enumeration or exaustive methods Alignment MethodsTry to identify u

41、nknown signals by a significant local multiple alignment of all sequences Gibbs sampling MEME system (expectation maximization) These methods use a statistical approach that considers unknown the start position of the motifs in the sequences and perform a local optimization to determine which positi

42、ons deliver the “optimal motifs.Gibbs samplingTwo approachesSite sampling : each sequence is assumed to contain one motif element for each motif typeMotif sampling : each sequence is assumed to contain one motif element for each motif typeMEME systemEnumeration Methods WordUP ( :/bio- ba r.it:8000/B

43、ioWWW/wordupGCG.html) YEBIS ( :/www-scc.jst.go.jp:8080/sankichi/MotifExtraction/) Pratt ( :/ ii.uib.no/inge/Pratt.html/) Verbumculus ( :/ /homes/stelo/Verbumculus/) RSA tools ( :/copan.cifn.unam.mx/jvanheld/rsa-tools/)(test dataset: 20 sequences 100 nt long with a 8-mer shared by 75% of them)WordUP-

44、 OCCURRENCES- WORDS OBSERVED EXPECTED CHI-SQUARE- GGATCA 15 0.58 361.51 GATCAA 15 0.63 325.96 ATCAAA 15 0.65 316.96 AGGATC 7 0.57 72.70 TGGATC 6 0.48 63.10 TCAAAG 7 0.66 60.58 GTGGAT 4 0.45 28.05 GCAAGC 5 0.69 27.00 GTTTGG 4 0.49 25.09 CAAAGT 4 0.56 21.22 - OCCURRENCES- WORDS OBSERVED EXPECTED CHI-SQUARE- AGGATC 7 0.57 72.70 GCAAGC 5 0.69 27.00 GTTTGG 4 0.49 25.09 TCAAAGT 4 0.16 92.97 GTGGATC 3 0.11 74.04 GGATCA

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論