版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
NGS數(shù)據(jù)分析及基因組結(jié)構(gòu)變異檢測(cè)研究員馬普青年科學(xué)家小組組長(zhǎng)徐書華上??萍即髮W(xué)生命科學(xué)學(xué)院中國(guó)科學(xué)院上海生命科學(xué)研究院計(jì)算生物學(xué)研究所(中國(guó)科學(xué)院-馬普學(xué)會(huì)計(jì)算生物學(xué)伙伴研究所)Outline1.
Somebasicsofgenomicvariation2.
EvolutionofCNVdetectiontechnologies3.
CNVanalysisviaWGS4.
CNVmapsofChinesepopulations1SomebasicsofgenomicvariationContinuumofGenomicVariationlSingle?base-pair?changesNucleotidelPoint?mutations?(1?per?800?bp)lllSmall?insertions/deletionslFrameshift,?microsatellite,?minisatelliteMobile?elementslRetroelement
insertions?(300bp?-10?kb)Large-scale?genomic?copy?number?variation
(>10?kb)lLarge-scale?DeletionslSegmental?DuplicationsllLocal?RearangementsChromosomal?variationlTranslocation,?inversion,?fusionCytogeneticsClassesofstructuralvariantsDeletionsDuplicationsInsertionsQuantitative
(CNVs)Structural?VariantsPositional
(Translocations)Orientational
(Inversions)Genomic?alterations?involving?segment?of?DNA?>1kbCopy?Number?Polymorphism?
(CNP)?is?a?CNV?that?occurs?in?>1%?population?CNVssignificantlyoverlapwithknowngenesCooper?et?al?2007?Nat?Genet?39:?s22SizeDistributionofCNVinaHumanGenomeDifferent
classesofmutationoperatinginthehumangenomeFreeman?et?al.,?Genome?Res.?20062EvolutionofCNVdetectiontechnologiesEvolutionofCNVdetectiontechnologiesGenotypeknowncommonCNVsusingwhole-genomearraysarray-CGH,?CNV?only,?test?vs?reference?Nimblegencustom?or?whole-genome?(up?to?2,1M?probes)Affy
6.0>940,000?CNV?non-polymorphic?probesHigh-density?in?~5,600?CNV?regions?in?DGV?+?extended?to?whole-genome36,000?CNV?non-polymorphic?probesIllumina
1Mcovering?~4,000?CNV?regions?in?DGVCombineinformationacrossprobestoidentifynewCNVsBirdseyeAffy
5.0,?6.0Korn
et?al?2008?Nat?Genet?40:?1253PennCNVAffymetrix
and?IlluminaWang?et?al?2007?Genome?Res?17:?1665Intensity
of
probe
2IdentifyingCNVsthroughgenotypingerrorsMendelianInconsistenciesGGGGA-AAG-GGA/GFailure
Hardy-Weinbergequilibriumü
üü
?
?
?
??
ü
üConrad?et?al?2006?Nat?Genet?38:?75McCarroll
et?al?2006?Nat?Genet?38:?86SensitivityandthroughputofvariousCNVdetectiontechniques3CNVanalysisviaWGSSequencing?TypesSingle?ReadPaired-end?readMate-pair?readLibrary?Types?
Many?different?library?preps?:?DNA,?mate-pair,?mRNA,?miRNA,?ChIP?
Fragmentation?–
DNA?:?300?–
500?nt–
RNA?:?150?–
200?nt?
Attachment?of?appropriate?adapters–
Complex?:?flow?cell?binding,?F?&?R?sequencing,?BC?–
Custom?:?Avoid?if?possible?
Removal?of?dimers/small?inserts?
Amplification?(or?not)Paired-end?x?Mate-pair?
Paired-end?–
sequencing?from?both?fragment?ends?(<
1?kb)?
Mate-pair?–
longer?(3-20?kb)?molecules?circularized?via?internal?adapter?xIdentifyingCNVsviatargetedorwhole-genomesequencingKorbel
et?al?2007?Science?318:?420PlatformcomparisonHigh
ThroughputDNA
SequencingbasedMethodstodetectCNVs/SVs1.
PairedendsDeletionReferenceMappingGenomeReferenceSequencedpaired-ends3.
Splitread2.
ReaddepthDeletionDeletionReferenceGenomeReferenceGenomeReadsReadMappingMappingReadcountZerolevelReferenceNGStechnicalneedfor
CNV/SVdetectionPairedreadsandInsertsizeDetectingSVsbypaired-endmappingDetectingSVsbypaired-endmappingDetectingSVsbypaired-endmappingInsertion:falsenegativeSplitreadsmappingInsertionanddeletionsignaturesAn
example:CNVbreakpointsdeterminationCGH
arrayNGSa?case?of?duplicationThe
TibetanPlateau,
knownas"theroof
of
theworld"andwithanaverageelevationof
over4,500meters,
is
thehighest
plateauin
theworld.High?Altitude?Adaptation?of?TibetansWinXPCNVer:?Window-based?Xross-PopulationCopy
Number?Variation
searcher?Haiyi
Lou
Ruiqing
Fu/PGG/resource.phpa?case?ofTSD?Homo-deletiona?case?of?TSD?Hetero-deletionA?Tibetan-specific?CNVLou?et?al.,?AJHG?2015V1V2Scalechr1:2
kb56,130,000hg1956,129,00056,131,00056,132,000UCSCGenes(RefSeq,GenBank,CCDS,Rfam,tRNAs&
ComparativeGenomics)56,133,00056,134,000AK127270RefSeqGenesRefSeqGenesPublications:SequencesinScientificArticlesSequencesSNPsHuman
mRNAsfrom
GenBankHuman
mRNAs100_H3K27AcMark(OftenFoundNearActiveRegulatoryElements)on7
celllinesfromENCODELayeredH3K27Ac0
_DNaseIHypersensitivityClustersin125celltypesfromENCODE(V3)TranscriptionFactorChIP-seq(161factors)fromENCODEwithFactorbookMotifs100vertebratesBasewiseConservationbyPhyloPDNaseClustersTxnFactorChIP4.88_100Vert.Cons0
--4.5_MultizAlignmentsof100VertebratesRhesusMouseDogElephantChickenX_tropicalisZebrafishLampreySimpleNucleotidePolymorphisms(dbSNP146)Foundin>=1%ofSamplesRepeating
Elementsby
RepeatMaskerCommonSNPs(146)RepeatMaskerSegmentalDupsSelfChainDuplicationsof>1000BasesofNon-RepeatMaskedSequenceHumanChainedSelfAlignmentsSimpleTandem
Repeatsby
TRFDenovoassemblySomeavailablecomputersoftwareStructural?VariationApproachSoftwareDeletionDuplicationInversionInsertionCNVnatorreadDepthBreakdancerPindelOOOOOOOOORead?DepthRead?PairSplit?ReadOOOOOOOOCombined?approachesGenomeSTRiPLumpyOODe?novo?assemblyCortexO4CNVmapsofChinesepopulations藏族高原適應(yīng)研究中發(fā)現(xiàn)的最顯著的基因發(fā)展了一個(gè)新方法(WinXPCNVer),檢測(cè)到可能調(diào)控EPAS1功能的一個(gè)關(guān)鍵拷貝數(shù)變異Genome-wide
CNVs
in
Han
ChineseLuetal.J
Med
Genet
2017二代測(cè)序數(shù)據(jù)分析基本流程Basic?NGS?Workflow?SampleDNASonication?(using?energy?of?sound)?–
usually?results?in?fragments?~700?bpIf?suitable?fragment?size?not?achieved?after?shearing,?can?use?gel?size-selectionLibrary?preparationPCRSimilar?for?DNA?and?RNA?(=cDNA)?sequencingPCRLibrary?preparationSuboptimal?size?range?results?in?suboptimal?sequencing?results(Illumina?sequencing?–
ideal?size?range?of?fragments?is?300-700?bp)Paired-end?x?Mate-pair?
Paired-end?–
sequencing?from?both?fragment?ends?(<
1?kb)?
Mate-pair?–
longer?(3-20?kb)?molecules?circularized?via?internal?adapter?xRead?length?and?pairingTCGTACCGATATGCTGACTTAAGGCTGACTAGC?
Short?reads?are?problematic,?because?short?sequences?do?not?map?uniquely?to?the?genome.?
Solution?#1:?Get?longer?reads.?
Solution?#2:?Get?paired?reads.Next?generation?sequencing?vocabulary?
Base-pair
-
basic?building?block?of?double-stranded?DNA,?unit?of?DNA?segment?length?(bp)?
Read
-
continuous?sequence?produced?by?sequencer?
Coverage
-
the?number?of?short?reads?that?overlap?each?other?within?a?specific?genomic?region
(how?many?times?the?particular?base?or?region?is?read)?
Consensus?sequence?-
idealised
sequence?in?which?each?position?represents?the?base
most?often?found?when?many?sequences?are?compared?
Contig
-
set?of?overlapping?segments?(reads)?of?DNA?sequences
forming?continuous?consensus?sequence?
Assembly
-
aligning?and?merging?fragments?of?DNA?sequence?(reads,?contigs)?in?order?to?reconstruct?the?original?sequence?
Scaffold
-
set?of?linked?non-contiguous?series?of?genomic?sequences,?consisting?of?contigs
separated?by?gaps?of?roughly?known?length?
Single?vs?paired-end?sequencing?
Directional?vs?undirectional?libraries/readsFramework?for?variation?discovery?and?genotyping?from?NGSSteps?for?converting?raw?NGS?data?into?a?final?set?of?SNP?or?genotype?callsPipeline
for
today
’s
exerciseraw.reads.mappingremove.duplicateslocal.realignmentBQrecalibrationhaplotypecaller.gVCF.modeGenotypeGVCFs.joint.calling?variant.quality.score.recalFile
list
in
reference
data?
Shuhuas-MacBook-Pro:release.v1.02?xushua$?cd?ref?
Shuhuas-MacBook-Pro:ref
xushua$?ls
-l?
total?696?
-rw-r--r--@?1?xushua
staff
248066?Mar?16?23:07?dbsnp_147.GRCh37p13.chrMT.vcf?
-rw-r--r--@?1?xushua
staff
2462?Mar?16?23:07?dbsnp_147.GRCh37p13.chrMT.vcf.idx?
-rw-r--r--@?1?xushua
staff
17929?Mar?16?23:07?hapmap_3.3.b37.mt.vcf?
-rw-r--r--@?1?xushua
staff
330?Mar?16?23:07?hapmap_3.3.b37.mt.vcf.idx?
-rw-r--r--@?1?xushua
staff
164?Mar?16?23:07?human_g1k_v37.mt.dict?
-rw-r--r--@?1?xushua
staff
16924?Mar?16?23:07?human_g1k_v37.mt.fasta?
-rw-r--r--@?1?xushua
staff
19?Mar?16?23:07?human_g1k_v37.mt.fasta.amb?
-rw-r--r--@?1?xushua
staff
100?Mar?16?23:07?human_g1k_v37.mt.fasta.ann?
-rw-r--r--@?1?xushua
staff
16648?Mar?16?23:07?human_g1k_v37.mt.fasta.bwt?
-rw-r--r--@?1?xushua
staff
18?Mar?16?23:07?human_g1k_v37.mt.fasta.fai?
-rw-r--r--@?1?xushua
staff
4144?Mar?16?23:07?human_g1k_v37.mt.fasta.pac?
-rw-r--r--@?1?xushua
staff
8336?Mar?16?23:07?human_g1k_v37.mt.fasta.saSoftware
used
for
the
pipeline?
Shuhuas-MacBook-Pro:ref
xushua$?cd?../software/?
Shuhuas-MacBook-Pro:software
xushua$?ls?-l?
total?45552?
-rwxr-xr-x@
1?xushua
staff
13873912?Mar?16?23:07?GenomeAnalysisTK3.6.jar?
-rwxr-xr-x@
1?xushua
staff?
1009220?Mar?16?23:07?MarkDuplicates.jar?
-rwxr-xr-x@
1?xushua
staff?
2713272?Mar?16?23:07?bwa?
-rwxr-xr-x@
1?xushua
staff
402692?Mar?17?13:07?bwa_mac?
drwxr-xr-x@?16?xushua
staff?
512?Jun?22
2016?jdk1.8.0_101_mac?
drwxr-xr-x@?16?xushua
staff?
512?Mar?16?21:36?jdk1.8.0_112?
-rwxr-xr-x@
1?xushua
staff?
4191194?Mar?16?23:07?samtools?
-rwxr-xr-x@
1?xushua
staff?
1114448?Mar?17?13:46?samtools_macOther
files
and
scripts????000.create.pipeline.sh?---
用于生成NGS腳本,從fastq->g.vcf100.joint.calling.sh?---
生成joint?calling的腳本,從g.vcf->vcfdata/?------
20個(gè)個(gè)體的mtDNA的fastq文件g.vcf/?------
已run好的20個(gè)個(gè)體mtDNA的g.vcf文件sh000.create.pipeline.sh2.執(zhí)行000.create.pipeline.sh生成腳本sh000.create.pipeline.sh生成5個(gè)文件夾分別對(duì)應(yīng)前五個(gè)步驟?
fastq->?g.vcf???????依次執(zhí)行五個(gè)步驟,由于寫的是相對(duì)路徑,需要cd進(jìn)去執(zhí)行3.1?raw.reads.mappingcd?001.raw.reads.mapping/?sh
001.raw.reads.mapping.sh?執(zhí)行生成4個(gè)文件ind1.bam為mapping結(jié)果ind1.unmapped.bam為其中unmapped部分?
3.2
remove.duplicates????cd?../002.rem
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 寧夏回族自治區(qū)銀川市賀蘭縣2024-2025學(xué)年高三上學(xué)期1月期末學(xué)科測(cè)試數(shù)學(xué)試題(含答案)
- 14合同條款專用部分二標(biāo)段-
- 2025年度企業(yè)破產(chǎn)重整財(cái)產(chǎn)分割與債務(wù)重組專項(xiàng)合同3篇
- 2024高爾夫球場(chǎng)土方建設(shè)合同
- 2025年SET支付系統(tǒng)升級(jí)與加密技術(shù)保障服務(wù)協(xié)議3篇
- 2025年度醫(yī)療器械OEM委托生產(chǎn)與市場(chǎng)準(zhǔn)入合同2篇
- 福建省南平市九三英華學(xué)校2020-2021學(xué)年高一物理上學(xué)期期末試卷含解析
- 福建省南平市建陽(yáng)漳墩中學(xué)2022年高一數(shù)學(xué)理聯(lián)考試卷含解析
- 2025年度廠區(qū)綠化養(yǎng)護(hù)與可持續(xù)發(fā)展服務(wù)協(xié)議3篇
- 2024生產(chǎn)承包合同
- GB/T 44415-2024基于全球衛(wèi)星導(dǎo)航的機(jī)動(dòng)車制動(dòng)性能路試檢驗(yàn)要求和方法
- 2023-2024屆高考語(yǔ)文復(fù)習(xí)-閱讀與訓(xùn)練主題+工匠精神(含答案)
- 四川省城市園林綠化施工技術(shù)標(biāo)準(zhǔn)
- 部編版小學(xué)一年級(jí)上冊(cè)道德與法治教學(xué)設(shè)計(jì)(第三、第四單元)
- HG-T+21527-2014回轉(zhuǎn)拱蓋快開(kāi)人孔
- 胃腸減壓的護(hù)理措施要點(diǎn)課件
- DL5190.5-2019電力建設(shè)施工技術(shù)規(guī)范第5部分:管道及系統(tǒng)
- 科室患者投訴處理管理制度
- JTS-167-2-2009重力式碼頭設(shè)計(jì)與施工規(guī)范
- 室內(nèi)設(shè)計(jì)專業(yè)建設(shè)發(fā)展規(guī)劃報(bào)告
- DL-T 5148-2021水工建筑物水泥灌漿施工技術(shù)條件-PDF解密
評(píng)論
0/150
提交評(píng)論