上海市臨檢中心 二代測(cè)序NGS培訓(xùn)班 8-徐書華-NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)_第1頁(yè)
上海市臨檢中心 二代測(cè)序NGS培訓(xùn)班 8-徐書華-NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)_第2頁(yè)
上海市臨檢中心 二代測(cè)序NGS培訓(xùn)班 8-徐書華-NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)_第3頁(yè)
上海市臨檢中心 二代測(cè)序NGS培訓(xùn)班 8-徐書華-NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)_第4頁(yè)
上海市臨檢中心 二代測(cè)序NGS培訓(xùn)班 8-徐書華-NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)_第5頁(yè)
已閱讀5頁(yè),還剩67頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)研究員馬普青年科學(xué)家小組組長(zhǎng)徐書華上??萍即髮W(xué)生命科學(xué)學(xué)院中國(guó)科學(xué)院上海生命科學(xué)研究院計(jì)算生物學(xué)研究所(中國(guó)科學(xué)院-馬普學(xué)會(huì)計(jì)算生物學(xué)伙伴研究所)Outline1.

Somebasicsofgenomicvariation2.

EvolutionofCNVdetectiontechnologies3.

CNVanalysisviaWGS4.

CNVmapsofChinesepopulations1SomebasicsofgenomicvariationContinuumofGenomicVariationlSingle?base-pair?changesNucleotidelPoint?mutations?(1?per?800?bp)lllSmall?insertions/deletionslFrameshift,?microsatellite,?minisatelliteMobile?elementslRetroelement

insertions?(300bp?-10?kb)Large-scale?genomic?copy?number?variation

(>10?kb)lLarge-scale?DeletionslSegmental?DuplicationsllLocal?RearangementsChromosomal?variationlTranslocation,?inversion,?fusionCytogeneticsClassesofstructuralvariantsDeletionsDuplicationsInsertionsQuantitative

(CNVs)Structural?VariantsPositional

(Translocations)Orientational

(Inversions)Genomic?alterations?involving?segment?of?DNA?>1kbCopy?Number?Polymorphism?

(CNP)?is?a?CNV?that?occurs?in?>1%?population?CNVssignificantlyoverlapwithknowngenesCooper?et?al?2007?Nat?Genet?39:?s22SizeDistributionofCNVinaHumanGenomeDifferent

classesofmutationoperatinginthehumangenomeFreeman?et?al.,?Genome?Res.?20062EvolutionofCNVdetectiontechnologiesEvolutionofCNVdetectiontechnologiesGenotypeknowncommonCNVsusingwhole-genomearraysarray-CGH,?CNV?only,?test?vs?reference?Nimblegencustom?or?whole-genome?(up?to?2,1M?probes)Affy

6.0>940,000?CNV?non-polymorphic?probesHigh-density?in?~5,600?CNV?regions?in?DGV?+?extended?to?whole-genome36,000?CNV?non-polymorphic?probesIllumina

1Mcovering?~4,000?CNV?regions?in?DGVCombineinformationacrossprobestoidentifynewCNVsBirdseyeAffy

5.0,?6.0Korn

et?al?2008?Nat?Genet?40:?1253PennCNVAffymetrix

and?IlluminaWang?et?al?2007?Genome?Res?17:?1665Intensity

of

probe

2IdentifyingCNVsthroughgenotypingerrorsMendelianInconsistenciesGGGGA-AAG-GGA/GFailure

Hardy-Weinbergequilibriumü

üü

?

?

?

??

ü

üConrad?et?al?2006?Nat?Genet?38:?75McCarroll

et?al?2006?Nat?Genet?38:?86SensitivityandthroughputofvariousCNVdetectiontechniques3CNVanalysisviaWGSSequencing?TypesSingle?ReadPaired-end?readMate-pair?readLibrary?Types?

Many?different?library?preps?:?DNA,?mate-pair,?mRNA,?miRNA,?ChIP?

Fragmentation?–

DNA?:?300?–

500?nt–

RNA?:?150?–

200?nt?

Attachment?of?appropriate?adapters–

Complex?:?flow?cell?binding,?F?&?R?sequencing,?BC?–

Custom?:?Avoid?if?possible?

Removal?of?dimers/small?inserts?

Amplification?(or?not)Paired-end?x?Mate-pair?

Paired-end?–

sequencing?from?both?fragment?ends?(<

1?kb)?

Mate-pair?–

longer?(3-20?kb)?molecules?circularized?via?internal?adapter?xIdentifyingCNVsviatargetedorwhole-genomesequencingKorbel

et?al?2007?Science?318:?420PlatformcomparisonHigh

ThroughputDNA

SequencingbasedMethodstodetectCNVs/SVs1.

PairedendsDeletionReferenceMappingGenomeReferenceSequencedpaired-ends3.

Splitread2.

ReaddepthDeletionDeletionReferenceGenomeReferenceGenomeReadsReadMappingMappingReadcountZerolevelReferenceNGStechnicalneedfor

CNV/SVdetectionPairedreadsandInsertsizeDetectingSVsbypaired-endmappingDetectingSVsbypaired-endmappingDetectingSVsbypaired-endmappingInsertion:falsenegativeSplitreadsmappingInsertionanddeletionsignaturesAn

example:CNVbreakpointsdeterminationCGH

arrayNGSa?case?of?duplicationThe

TibetanPlateau,

knownas"theroof

of

theworld"andwithanaverageelevationof

over4,500meters,

is

thehighest

plateauin

theworld.High?Altitude?Adaptation?of?TibetansWinXPCNVer:?Window-based?Xross-PopulationCopy

Number?Variation

searcher?Haiyi

Lou

Ruiqing

Fu/PGG/resource.phpa?case?ofTSD?Homo-deletiona?case?of?TSD?Hetero-deletionA?Tibetan-specific?CNVLou?et?al.,?AJHG?2015V1V2Scalechr1:2

kb56,130,000hg1956,129,00056,131,00056,132,000UCSCGenes(RefSeq,GenBank,CCDS,Rfam,tRNAs&

ComparativeGenomics)56,133,00056,134,000AK127270RefSeqGenesRefSeqGenesPublications:SequencesinScientificArticlesSequencesSNPsHuman

mRNAsfrom

GenBankHuman

mRNAs100_H3K27AcMark(OftenFoundNearActiveRegulatoryElements)on7

celllinesfromENCODELayeredH3K27Ac0

_DNaseIHypersensitivityClustersin125celltypesfromENCODE(V3)TranscriptionFactorChIP-seq(161factors)fromENCODEwithFactorbookMotifs100vertebratesBasewiseConservationbyPhyloPDNaseClustersTxnFactorChIP4.88_100Vert.Cons0

--4.5_MultizAlignmentsof100VertebratesRhesusMouseDogElephantChickenX_tropicalisZebrafishLampreySimpleNucleotidePolymorphisms(dbSNP146)Foundin>=1%ofSamplesRepeating

Elementsby

RepeatMaskerCommonSNPs(146)RepeatMaskerSegmentalDupsSelfChainDuplicationsof>1000BasesofNon-RepeatMaskedSequenceHumanChainedSelfAlignmentsSimpleTandem

Repeatsby

TRFDenovoassemblySomeavailablecomputersoftwareStructural?VariationApproachSoftwareDeletionDuplicationInversionInsertionCNVnatorreadDepthBreakdancerPindelOOOOOOOOORead?DepthRead?PairSplit?ReadOOOOOOOOCombined?approachesGenomeSTRiPLumpyOODe?novo?assemblyCortexO4CNVmapsofChinesepopulations藏族高原適應(yīng)研究中發(fā)現(xiàn)的最顯著的基因發(fā)展了一個(gè)新方法(WinXPCNVer),檢測(cè)到可能調(diào)控EPAS1功能的一個(gè)關(guān)鍵拷貝數(shù)變異Genome-wide

CNVs

in

Han

ChineseLuetal.J

Med

Genet

2017二代測(cè)序數(shù)據(jù)分析基本流程Basic?NGS?Workflow?SampleDNASonication?(using?energy?of?sound)?–

usually?results?in?fragments?~700?bpIf?suitable?fragment?size?not?achieved?after?shearing,?can?use?gel?size-selectionLibrary?preparationPCRSimilar?for?DNA?and?RNA?(=cDNA)?sequencingPCRLibrary?preparationSuboptimal?size?range?results?in?suboptimal?sequencing?results(Illumina?sequencing?–

ideal?size?range?of?fragments?is?300-700?bp)Paired-end?x?Mate-pair?

Paired-end?–

sequencing?from?both?fragment?ends?(<

1?kb)?

Mate-pair?–

longer?(3-20?kb)?molecules?circularized?via?internal?adapter?xRead?length?and?pairingTCGTACCGATATGCTGACTTAAGGCTGACTAGC?

Short?reads?are?problematic,?because?short?sequences?do?not?map?uniquely?to?the?genome.?

Solution?#1:?Get?longer?reads.?

Solution?#2:?Get?paired?reads.Next?generation?sequencing?vocabulary?

Base-pair

-

basic?building?block?of?double-stranded?DNA,?unit?of?DNA?segment?length?(bp)?

Read

-

continuous?sequence?produced?by?sequencer?

Coverage

-

the?number?of?short?reads?that?overlap?each?other?within?a?specific?genomic?region

(how?many?times?the?particular?base?or?region?is?read)?

Consensus?sequence?-

idealised

sequence?in?which?each?position?represents?the?base

most?often?found?when?many?sequences?are?compared?

Contig

-

set?of?overlapping?segments?(reads)?of?DNA?sequences

forming?continuous?consensus?sequence?

Assembly

-

aligning?and?merging?fragments?of?DNA?sequence?(reads,?contigs)?in?order?to?reconstruct?the?original?sequence?

Scaffold

-

set?of?linked?non-contiguous?series?of?genomic?sequences,?consisting?of?contigs

separated?by?gaps?of?roughly?known?length?

Single?vs?paired-end?sequencing?

Directional?vs?undirectional?libraries/readsFramework?for?variation?discovery?and?genotyping?from?NGSSteps?for?converting?raw?NGS?data?into?a?final?set?of?SNP?or?genotype?callsPipeline

for

today

’s

exerciseraw.reads.mappingremove.duplicateslocal.realignmentBQrecalibrationhaplotypecaller.gVCF.modeGenotypeGVCFs.joint.calling?variant.quality.score.recalFile

list

in

reference

data?

Shuhuas-MacBook-Pro:release.v1.02?xushua$?cd?ref?

Shuhuas-MacBook-Pro:ref

xushua$?ls

-l?

total?696?

-rw-r--r--@?1?xushua

staff

248066?Mar?16?23:07?dbsnp_147.GRCh37p13.chrMT.vcf?

-rw-r--r--@?1?xushua

staff

2462?Mar?16?23:07?dbsnp_147.GRCh37p13.chrMT.vcf.idx?

-rw-r--r--@?1?xushua

staff

17929?Mar?16?23:07?hapmap_3.3.b37.mt.vcf?

-rw-r--r--@?1?xushua

staff

330?Mar?16?23:07?hapmap_3.3.b37.mt.vcf.idx?

-rw-r--r--@?1?xushua

staff

164?Mar?16?23:07?human_g1k_v37.mt.dict?

-rw-r--r--@?1?xushua

staff

16924?Mar?16?23:07?human_g1k_v37.mt.fasta?

-rw-r--r--@?1?xushua

staff

19?Mar?16?23:07?human_g1k_v37.mt.fasta.amb?

-rw-r--r--@?1?xushua

staff

100?Mar?16?23:07?human_g1k_v37.mt.fasta.ann?

-rw-r--r--@?1?xushua

staff

16648?Mar?16?23:07?human_g1k_v37.mt.fasta.bwt?

-rw-r--r--@?1?xushua

staff

18?Mar?16?23:07?human_g1k_v37.mt.fasta.fai?

-rw-r--r--@?1?xushua

staff

4144?Mar?16?23:07?human_g1k_v37.mt.fasta.pac?

-rw-r--r--@?1?xushua

staff

8336?Mar?16?23:07?human_g1k_v37.mt.fasta.saSoftware

used

for

the

pipeline?

Shuhuas-MacBook-Pro:ref

xushua$?cd?../software/?

Shuhuas-MacBook-Pro:software

xushua$?ls?-l?

total?45552?

-rwxr-xr-x@

1?xushua

staff

13873912?Mar?16?23:07?GenomeAnalysisTK3.6.jar?

-rwxr-xr-x@

1?xushua

staff?

1009220?Mar?16?23:07?MarkDuplicates.jar?

-rwxr-xr-x@

1?xushua

staff?

2713272?Mar?16?23:07?bwa?

-rwxr-xr-x@

1?xushua

staff

402692?Mar?17?13:07?bwa_mac?

drwxr-xr-x@?16?xushua

staff?

512?Jun?22

2016?jdk1.8.0_101_mac?

drwxr-xr-x@?16?xushua

staff?

512?Mar?16?21:36?jdk1.8.0_112?

-rwxr-xr-x@

1?xushua

staff?

4191194?Mar?16?23:07?samtools?

-rwxr-xr-x@

1?xushua

staff?

1114448?Mar?17?13:46?samtools_macOther

files

and

scripts????000.create.pipeline.sh?---

用于生成NGS腳本,從fastq->g.vcf100.joint.calling.sh?---

生成joint?calling的腳本,從g.vcf->vcfdata/?------

20個(gè)個(gè)體的mtDNA的fastq文件g.vcf/?------

已run好的20個(gè)個(gè)體mtDNA的g.vcf文件sh000.create.pipeline.sh2.執(zhí)行000.create.pipeline.sh生成腳本sh000.create.pipeline.sh生成5個(gè)文件夾分別對(duì)應(yīng)前五個(gè)步驟?

fastq->?g.vcf???????依次執(zhí)行五個(gè)步驟,由于寫的是相對(duì)路徑,需要cd進(jìn)去執(zhí)行3.1?raw.reads.mappingcd?001.raw.reads.mapping/?sh

001.raw.reads.mapping.sh?執(zhí)行生成4個(gè)文件ind1.bam為mapping結(jié)果ind1.unmapped.bam為其中unmapped部分?

3.2

remove.duplicates????cd?../002.rem

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論