數(shù)量遺傳學(xué)12-13群體結(jié)構(gòu)和關(guān)聯(lián)分析_第1頁
數(shù)量遺傳學(xué)12-13群體結(jié)構(gòu)和關(guān)聯(lián)分析_第2頁
數(shù)量遺傳學(xué)12-13群體結(jié)構(gòu)和關(guān)聯(lián)分析_第3頁
數(shù)量遺傳學(xué)12-13群體結(jié)構(gòu)和關(guān)聯(lián)分析_第4頁
數(shù)量遺傳學(xué)12-13群體結(jié)構(gòu)和關(guān)聯(lián)分析_第5頁
已閱讀5頁,還剩90頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

Populationstructure群體結(jié)構(gòu)Populationstructuremeansthe“makeup”orcompositionofapopulation.Bypopulationstructure,populationgeneticistsmeanthat,insteadofasingle,simplepopulation,populationsaresubdividedinsomeway.Theoverall"populationofpopulations"isoftencalledametapopulation,whiletheindividualcomponentpopulationsareoftencalled,well...subpopulations,butalsolocalpopulations,ordemes.Infact,inmanyrealpopulations,theremaynotbeanyobviousindividualpopulationsorsubstructureatall,andthepopulationsarecontinuous.However,evenineffectivelycontinuouspopulations,differentareascanhavedifferentgenefrequencies,becausethewholemetapopulationisnotpanmictic

(隨機(jī)交配的).Forinstance,amonghumans,Scotland,theNorthofEngland,andLondonhavesomequitemajorlanguagedifferences,suggestingsubstructure,butyouwouldbehardputtofindanexactboundarywherethereisachangeover.Suchpopulationsarestructured,butcontinuously,inspace.AverygooddefinitionofpopulationstructureiswhenpopulationshavedeviationsfromHardy-Weinbergproportions,ordeviationsfrompanmixia.Ifthereisinbreeding,orselection,orifmigrationisimportant,thenpopulationscanbesaidtobestructuredinsomeway.genefrequenciesandgenotyperatiosinarandomly-breedingpopulationremainconstantfromgenerationtogeneration.Hardy-Weinberg

law

Evolutioninvolveschangesinthegenepool.ApopulationinHardy-Weinbergequilibriumshowsnochange.Ifrecessivealleleswerecontinuallytendingtodisappear,thepopulationwouldsoonbecomehomozygous.UnderHardy-Weinbergconditions,genesthathavenopresentselectivevaluewillnonethelessberetained.WhentheHardy-WeinbergLawFailstoApplyMutationGeneFlowMembersofonepopulationmaybreedwithoccasionalimmigrantsfromanadjacentpopulationofthesamespecies.Thiscanintroducenewgenesoralterexistinggenefrequenciesintheresidents.Inmanyplantsandsomeanimals,geneflowcanoccurbetweendifferent(butstillrelated)species.Hybridization/

introgression.Ineithercase,geneflowincreasesthevariabilityofthegenepool.GeneticDriftAllelefrequencychangingsimplybychance.Noteverymemberofthepopulationwillbecomeaparentandnoteverysetofparentswillproducethesamenumberofoffspring.NonrandomMatingOneofthecornerstonesoftheHardy-Weinbergequilibriumisthatmatinginthepopulationmustberandom.Ifindividuals(usuallyfemales)arechoosyintheirselectionofmates,thegenefrequenciesmaybecomealtered.Darwincalledthissexualselection.Nonrandommatingseemstobequitecommon.Method

testingforpopulationstructureA

standardapproachinvolvessamplingDNAfrommembersofanumberofpotentialsourcepopulationsand

usingthesesamplestoestimateallelefrequenciesineachpopulationataseriesofunlinkedloci.Usingtheestimatedallelefrequencies,itisthenpossibletocomputethelikelihoodthatagivengenotypeoriginatedfromeachpopulation.Individualsofunknownorigincanbegeneticinformationandmightbeassignedtopopulationsaccordingtotheselikelihoods.Forexample,whenassociationmappingisusedtofinddiseasegenes,thepresenceofundetectedpopulationstructurecanleadtospuriousassociationsandthusinvalidatestandardtests.AncestryModelsFourmainmodelsfortheancestryofindividuals:Noadmixturemodel.EachindividualcomespurelyfromoneoftheKpopulations.Theoutputreportstheposteriorprobabilitythatindividualiisfrompopulationk.Thepriorprobabilityforeachpopulationis1=K.Thismodelisappropriateforstudyingfullydiscretepopulationsandisoftenmorepowerfulthantheadmixturemodelatdetectingsubtlestructure.先驗(yàn)概率是在缺乏某個(gè)事實(shí)的情況下描述一個(gè)變量;

而后驗(yàn)概率是在考慮了一個(gè)事實(shí)之后的條件概率。先驗(yàn)概率通常是經(jīng)驗(yàn)豐富的專家的純主觀的估計(jì),比如在美國大選中奧巴馬的支持率p,

在進(jìn)行民意調(diào)查之前,可以先驗(yàn)概率來表達(dá)這個(gè)不確定性。

theadmixturemodeleachindividualdrawssomefractionofhis/hergenomefromeachoftheKpopulations;Individualsmayhavemixedancestry.Thisismodeledbysayingthat

individualihasinheritedsomefractionofhis/hergenomefromancestorsinpopulationk.The

outputrecordstheposteriormeanestimatesoftheseproportions.Conditionalontheancestry

vector,q(i),theoriginofeachalleleisindependent.Werecommendthismodelasastartingpointformostanalyses.Itisareasonablyflexiblemodel

fordealingwithmanyofthecomplexitiesofrealpopulations.Admixtureisacommonfeatureofrealdata,andyouprobablywon'tfinditifyouusetheno-admixturemodel.LinkagemodelThisisessentiallyageneralizationoftheadmixturemodeltodealwith“admixturelinkagedisequilibrium”i.e.,thecorrelationsthatarisebetweenlinkedmarkersinrecentlyadmixedpopulations.Thebasicmodelisthat,tgenerationsinthepast,therewasanadmixtureeventthatmixedtheKpopulations.Ifyouconsideranindividualchromosome,itiscomposedofaseriesof“chunks"thatareinheritedasdiscreteunitsfromancestorsatthetimeoftheadmixture.AdmixtureLDarisesbecauselinkedallelesareoftenonthesamechunk,andthereforecomefromthesameancestralpopulation.Thesizesofthechunksareassumedtobeindependentexponentialrandomvariableswithmeanlength1/t(inMorgans).Inpracticeweestimatea“recombinationrate"rfromthedatathatcorrespondstotherateofswitchingfromthepresentchunktoanewchunk.Eachchunkinindividualiisderivedindependentlyfrompopulationkwithprobabilityq(i)k,whereq(i)kistheproportionofthatindividual'sancestryfrompopulationk.Usingpriorpopulationinformation.Geneticinformationtolearnaboutpopulationstructure.However,thereisoftenotherinformationthatmightberelevanttotheclustering(e.g.,physicalcharacteristicsofsampledindividualsorgeographicsamplinglocation).Atpresent,structurecanusethisinformationintwoways.First,theusermightfindthatthepre-definedgroups(egsamplinglocations)correspondalmostexactlytostructureclusters.Second,priorinformationmaybeintroducedthroughtheuseoflearningsamples:ie.,someindividualsareofknownorigin,andareusedtoclassifyindividualsofunknownorigin.ForexampleBeaumontetal.(2001)wantedtolearnabouttheancestryofScottishwildcats(manyofwhicharehybridizedwithferaldomesticcats).Theyhadgeneticdatafromabunchofpethousecatswhichweredenfinedasbeinginonepopulation,andtheyinferredQforthewildcats(withK=2).Useofthissortofpriorinformationwillnormallyimprovetheaccuracyoftheinference.AllelefrequencymodelsTwobasicmodelsOnemodelassumesthattheallelefrequenciesineachpopulationareindependentdrawsfromadistributionthatisspeciedbyaparametercalled.=1isthedefaultsetting.Anothermodelwithcorrelatedallelefrequencies.Thissaysthatfrequenciesinthedifferentpopulationsarelikelytobesimilar(probablyduetomigrationorsharedancestry).Theindependentmodelworkswellformanydatasets.Roughlyspeaking,thispriorsaysthatweexpectallelefrequenciesindifferentpopulationstobereasonablydifferentfromeachother.Thecorrelatedfrequenciesmodelsaysthattheymayactuallybequitesimilar.Thisoftenimprovesclusteringforcloselyrelatedpopulations,butmayincreasetheriskofover-estimatingK.Ifonepopulationisquitedivergentfromtheothers,thecorrelatedmodelcansometimesachievebetterinferenceifthatpopulationisremoved.Estimating:Fixing=1isagoodideaformostdata,butinsomesituationse.g.,SNPdatawheremostminorallelesarerare-smallervaluesmayworkbetter.Forthisreason,youcangettheprogramtoestimateforyourdata.Youmaywanttodothisonce,perhapsforK=1,andthenfixattheestimatedvaluethereafter,becausethereseemtobesomeproblemswithnon-identifiabilitywhentryingtoestimatetoomanyofthehyperparameters(,,F)atthesametime.EstimationofK(thenumberofpopulations)Takingcarefortworeasons:(1)itiscomputationallydifficulttoobtainaccurateestimatesofPr(X|K),andourmethodmerelyprovidesanadhocapproximation,and(2)thebiologicalinterpretationofKmaynotbestraightforward.TheprocedureforestimatingKgenerallyworkswellindatasetswithasmallnumberofdiscretepopulations.However,manyreal-worlddatasetsdonotconformpreciselytothestructuremodel(e.g.,duetoisolationbydistanceorinbreeding).Inthosecasestheremaynotbeanaturalanswertowhatisthe“correct"valueofK.Perhapsforthiskindofreason,itisnotinfrequentthatinrealdatathevalueofourmodelchoicecriterioncontinuestoincreasewithincreasingK.ThenitusuallymakessensetofocusonvaluesofKthatcapturemostofthestructureinthedataandthatseembiologicallysensible.StepsinestimatingK1)(Command-lineversion)SetCOMPUTEPROBSandINFERALPHAto1inthefileextraparams.(FrontEndversion)Makesurethatisallowedtovary.2)RuntheMCMCschemefordifferentvaluesofMAXPOPS(K).Attheenditwilloutputaline"EstimatedLnProbofData".ThisistheestimateoflnPr(X|K).YoushouldrunseveralindependentrunsforeachK,inordertoverifythattheestimatesareconsistentacrossruns.IfthevariabilityacrossrunsforagivenKissubstantialcomparedtothevariabilityofestimatesobtainedfordifferentK,youmayneedtouselongerrunsoralongerburn-inperiod.IflnPr(X|K)appearstobebimodalormultimodal,theMCMCschememaybefindingdifferentsolutions.YoucancheckforthisbycomparingtheQfordifferentrunsatasingleK.3)ComputeposteriorprobabilitiesofKForexample,whereKwas2,wegotK lnPr(X|K)1 -43562 -39833 -39824 -39835 -4006WecanstartbyassumingauniformprioronK=1-5.ThenfromBayes'Rule,Pr(K=2)isgivenbyMilddeparturesfromthemodelcanleadtooverestimatingKWhenthereisrealpopulationstructure,thisleadstoLDamongunlinkedlocianddeparturesfromHardy-Weinbergproportions.ButsomedeparturesfromthemodelcanalsoleadtoHardy-Weinbergorlinkagedisequilibrium.BeginninginVersion2,wehavesuggestedthatthecorrelatedallelefrequencymodelshouldbeusedasadefaultbecauseitoftenachievesbetterperformanceondifficultproblems,buttheusershouldbeawarethatthismaymakeiteasiertooverestimateKinsuchsettingsthanundertheindependentfrequenciesmodelFalushetal.(2003a).Howtodecidewhetherinferredstructureisreal.InformalpointersforchoosingK;isthestructurereal?ThereareacoupleofinformalpointerswhichmightbehelpfulinselectingK.Thefirstisthatit'softenthesituationthatPr(K)isverysmallforKlessthantheappropriatevalue(effectivelyzero),andthenmore-or-lessplateausforlargerK,asintheexampleofDataSet2Ashownabove.InthissortofsituationwhereseveralvaluesofKgivesimilarestimatesoflogPr(X|K),itseemsthatthesmallestoftheseisoften“correct".Itisabitdifficulttoprovideafirmruleforwhatwemeanbya“more-or-lessplateaus".Forsmalldatasets,thismightmeanthatthevaluesoflogPr(X|K)arewithin5-10,butinverybigdatasets,thedifferencebetweenK=3andK=4maybe50,butifthedifferencebetweenK=3andK=2is5,000,thenIwoulddefinitelychooseK=3.WemaynotalwaysbeabletoknowtheTRUEvalueofK,butweshouldaimforthesmallestvalueofKthatcapturesthemajorstructureinthedata.Thesecondpointeristhatiftherereallyareseparatepopulations,thereistypicallyalotofinformationaboutthevalueof,andoncetheMarkovchainconverges,willnormallysettledowntoberelativelyconstant(oftenwitharangeofperhaps0.2orless).However,ifthereisn'tanyrealstructure,willusuallyvarygreatlyduringthecourseoftherun.Supposethatyouhaveasituationwithtwoclearpopulations,butyouaretryingtodecidewhetheroneoftheseisfurthersubdivided(ie,thevalueofPr(X|K=3)issimilarto,orperhapsalittlelargerthanP(X|K=2)).Thenonethingyoucouldtryistorunstructureusingonlytheindividualsinthepopulationthatyoususpectmightbesubdivided,andseewhetherthereisastrongsignalasdescribedabove.Insummary,youshouldbeskepticalaboutpopulationstructureinferredonthebasisofsmalldifferencesinPr(K)ifthereisnoclearbiologicalinterpretationfortheassignments,andtheassignmentsareroughlysymmetrictoallpopulationsandnoindividualsarestronglyassigned.IsolationbydistancedataIsolationbydistancereferstotheideathatindividualsmaybespatiallydistributedacrosssomeregion,withlocaldispersal.Inthissituation,allelefrequenciesvarygraduallyacrosstheregion.Theunderlyingstructuremodelisnotwellsuitedtodatafromthiskindofscenario.Whenthisoccurs,theinferredvalueofK,andthecorrespondingallelefrequenciesineachgroupcanberatherarbitrary.Dependingonthesamplingscheme,mostindividualsmayhavemixedmembershipinmultiplegroups.Thatis,thealgorithmwillattempttomodeltheallelefrequenciesacrosstheregionusingweightedaveragesofKdistinctcomponents.Insuchsituations,interpretingtheresultsmaybechallenging.WhenFSTissignificant,butstructurefindsnostructureWeoccasionallygetthefollowingsortofquestion:“Ihavegenotypedataforindividualssampledfromnlocations.TestsofallelefrequencydifferencesindicatesmallbutsignificantFST

betweenatleastsomelocations.Howeverstructuredoesnotfindanydifferences.HowdoIinterprettheseresults?"Whenthepredefinedpopulationscorrespondcloselytogeneticpopulations,testingforfrequencydifferencesbetweenpredefinedgroupscanbemorepowerfulthanapplyingstructure.Thisisbecausethebasicstructuremodelsaimtosolveamuchharderstatisticalproblem,i.e.,identifyingpopulationclusterswithoutbeingtoldthelikelysubgroupsinadvance.Forthisreasonthereisapartofparameterspacewherethereisnotquiteenoughdataforstructuretogetthe“right"answer,eventhoughatestofFSTusingthepredefinedlabelsdetectspopulationdifferentiation.關(guān)聯(lián)分析associationmappingAssociationmappingreferstosignificantassociationofamolecularmarkerwithaphenotypictrait.LD(特例LA)referstonon-randomassociationbetweentwomarkersortwogenes/QTLsorbetweenagene/QTLandamarkerlocus.Thus,associationmappingisactuallyoneoftheseveralusesofLD.Instatisticalsense,associationreferstocovarianceofamarkerpolymorphismandatraitofinterest,whileLDrepresentscovarianceofpolymorphismsexhibitedbytwomolecularmarkers/genes.HowIsLDMeasured?AvarietyofstatisticshavebeenusedtomeasureLDthetwomostcommonstatistics

formeasuringLD:considerapairoflociwithallelesAandaat

locusone,andBandbatlocustwo,withallelefrequencies respectively.Theresultinghaplotypefrequenciesare

.ThebasiccomponentofallLDstatisticsisthedifferencebetweentheobservedandexpectedhaplotypefrequencies,haplotype:acombinationofallelesatmultiplelinkedlocithataretransmittedtogether.D’isscaledbasedontheobservedallelefrequencies,soitwillrangebetween0and1evenifallelefrequenciesdifferbetweentheloci.=Norecombination(mutationsattwolinkedlocinotseparatedintime);無重組Independent

assortment(mutationsattwolocinotseparatedintime);獨(dú)立搭配(C)Norecombination(onlymutationsseparatedintime);(D)Low

recombination(mutationsattwolocinotseparatedintime).LDdecayFactorsaffectingLDThefactors,whichleadtoanincreaseinLD,includeinbreeding,smallpopulationsize,geneticisolationbetweenlineages,populationsubdivision,lowrecombinationrate,populationadmixture,naturalandartificialselection,balancingselection,etc.Admixtureresultsintheintroductionofchromosomesofdifferentancestryandallelefrequencies.Often,theresultingLDextendstounlinkedsites,evenondifferentchromosomes,butbreaksdownrapidlywithrandommating.Areductioninpopulationsize(bottleneck)withaccompanyingextremegeneticdrift.Duringabottleneck,onlyfewalleliccombinationsarepassedontofuture.Someotherfactors,whichleadtoadecrease/disruptioninLD,includeoutcrossing,highrecombinationrate,highmutationrate,etc.Generally,LDdecaysmorerapidlyinoutcrossingspeciesascomparedtoselfingspecies.Thisisbecauserecombinationislesseffectiveinselfingspecies,whereindividualsaremorelikelytobehomozygous,thaninoutcrossingspecies.Thereareotherfactors,whichmayleadtoeitherincreaseordecreaseinLD,ormayincreaseLDbetweensomepairsofallelesanddecreaseLDbetweenotherpairs.Forinstance,mutationswilldisruptLDbetweenpairsinvolvingwildalleles,andwillpromoteLDbetweenpairsinvolvingmutantalleles.Similarly,genomicrearrangementsmaydisruptLDbetweengenesseparatedduetorearrangement,butLDmayincreasebetweennewgenecombinationsinthevicinityofbreakpointsduetosuppressionoflocalrecombination.GeneconversionisanonreciprocaltransferofgeneticinformationinDNAgeneticrecombination,whichoccursduringmeioticdivision.ItisaprocessbywhichDNAsequenceinformationistransferredfromoneDNAhelix(whichremainsunchanged)toanotherDNAhelix,whosesequenceisaltered.Itisoneofthewaysagenemaybemutated.Geneconversionmayleadtonon-Mendelianinheritanceandhasoftenbeenrecordedinfungalcrosses.基因轉(zhuǎn)換OtherfactorsaffectingLDincludepopulationstructure,epistasis,geneconversionandascertainmentbias.Ascertainmentbias(AB)isthebiasintroducedbythecriteriausedtoselectindividualsand/orlociinwhichgeneticvariationisassayed,sothatitleadstoinaccurateestimatesofLD.AscertainmentisthewayindividualswithatraitareselectedorfoundforgeneticstudiesandbiasisadifferencebetweentheestimatedandtruevalueofLDinastatisticalsample.MutationprovidestherawmaterialforproducingpolymorphismsthatwillbeinLD.RecombinationisthemainphenomenonthatweakensintrachromosomalLD,whereasinterchromosomalLDisbrokendownbyindependentassortment.Populationsizealsoplaysanimportantrole.Insmallpopulations,theeffectsofgeneticdriftresultintheconsistentlossofrarealleliccombinations,whichincreaseLDlevels.WhereNistheeffectivepopulationsizeandcistherecombinationfractionbetweensitesWhengeneticdriftandrecombinationareatequilibrium,Inanyorganism,LDcanbeusedforidentifying

genomicregions,whichhavebeenthetargetsof

naturalselection(directionalandbalancingselection)duringevolutionaryprocess.NaturalselectionTheprocessinnaturebywhich,accordingtoDarwin'stheoryofevolution,onlytheorganismsbestadaptedtotheirenvironmenttendtosurviveandtransmittheirgeneticcharacteristicsinincreasingnumberstosucceedinggenerationswhilethoselessadaptedtendtobeeliminatedPositivenaturalselectionistheforcethatdrivestheincreaseinprevalenceofadvantageoustraits,andithasplayedacentralroleinourdevelopmentasaspecies.將因含有有利突變而提高個(gè)體適合度的等位基因固定下來的選擇作用Positiveselection/Darwinianselection/adaptiveselectionistheprocessbywhichnewadvantageousgeneticvariantssweepapopulation.Genetichitchhiking

istheprocessbywhichanevolutionarilyneutralorinsomecasesdeleteriousalleleormutationmayspreadthroughthegenepoolbyvirtueofbeinglinkedtoagenethatispositivelyselected遺傳搭車效應(yīng)directionalselection

occurswhennaturalselectionfavorsasinglephenotypeandthereforeallelefrequencycontinuouslyshiftsinonedirection.Underdirectionalselection,theadvantageousallelewillincreaseinfrequencyindependentlyofitsdominancerelativetootheralleles(i.e.eveniftheadvantageousalleleisrecessive,itwilleventuallybecomefixed).Directionalselectionstandsincontrasttobalancingselectionwhereselectionmayfavormultiplealleles,andisthesameaspurifyingselectionwhichremovesdeleteriousmutationsfromapopulation.Purifyingselection凈化選擇PurifyingselectionreferstoselectionagainstnonsynonymoussubstitutionsattheDNAlevel.Inthiscase,theevolutionarydistancebasedonsynonymoussubstitutionsisexpectedtobegreaterthanthedistancebasedonnonsynonymoussubstitutions.Balancingselectionreferstoanumberofselectiveprocessesbywhichmultiplealleles(differentversionsofagene)areactivelymaintainedinthegenepoolofapopulationatfrequenciesabovethatofgenemutation.non-synonymousmutation在剛出現(xiàn)時(shí)是受到正選擇壓力的Structure2.0群體結(jié)構(gòu)/structure.htmlItsusesincludeinferringthepresenceofdistinctpopulations,assigningindividualstopopulations,studyinghybridzones,identifyingmigrantsandadmixedindividuals,andestimatingpopulationallelefrequenciesinsituationswheremanyindividualsaremigrantsoradmixed.Itcanbeappliedtomostofthecommonly-usedgeneticmarkers,includingSNPS,microsatellites,RFLPsandAFLPs.DepartmentofHumanGeneticsUniversityofChicagoTheprogramstructureimplementsamodel-basedclusteringmethodforinferringpopulationstructureusinggenotypedataconsistingofunlinkedmarkers.AJavaRuntimeEnvironment(JREVersion>1.5.0)bySunMicrosystemisrequiredbeforestructureinstallation.ThecompatibleJREforvariousoperatingsystemscanbedownloadedfreefrom/download.FrontEnd分析起始界面

Thefrontendorganizesdataanalysisinto“projects".Eachprojectisconnectedtoasingledatafile.Whencreatingaproject,theuseralsoprovidesinformationthatspecifyhowtoreadthedatafile(numberofloci,numberofindividuals,etc).Thesearecharacteristicsofthedatafile,andarealwaysthesamewithinthisproject.Parametersinfileextraparams.

torefinethemodelinvariousways.Booleanoptions(布爾型/是非型,測試某個(gè)對象是否是指定子類)type1for“Yes",or“Usethisoption";0for“No"or“Don'tusethisoption".Programoptions.

NOADMIX(Boolean)Assumethemodelwithoutadmixture(EachindividualisassumedtobecompletelyfromoneoftheKpopulations.)Intheoutput,insteadofprintingtheaveragevalueofQasintheadmixturecase,theprogramprintstheposteriorprobabilitythateachindividualisfromeachpopulation.1=noadmixture;0=modelwithadmixture.LINKAGE(Boolean)Usethelinkagemodel.RLOG10STARTsetstheinitialvalueofrecombinationraterperunitdistance.RLOG10MINandRLOG10MAXsettheminimumandmaximumallowedvaluesforlog10r.RLOG10PROPSDsetsthesizeoftheproposedchangestolog10rineachupdate.Eachprojectalsocontainsoneormore“parametersets".TheseallowtheusertospecifythedetailsoftheMCMCruns,includingthenumberofrepetitions,burn-inlength,etc,aswellasspecifyingthemodelofanalysis(e.g.,whethertoallowadmixture,modelsofallelefrequencies,etc).TheusercanthenruntheMarkovchainatchosenvaluesofK,foragivenparameterset.Thefrontendstoresvarioussummariesoftheresults,includinganumberofgraphicalplots,describedbelow.Buildingaproject.Firstyouneedtoconstructaninputfile.Now,clickonFileNewProject.Thisopensupawizardtoimportthedata(Figure2).Thedataarecopiedfromthespecifiedinputfileintotheworkdirectorychosenfortheproject.Thewizardconsistsoffourframes:1.Specifytheprojectdirectory,projectname,andinputdatafile.2.Specifythebasiccharacteristicsofthedatafile(numberofindividuals,ploidyofthedata(enter'2'fordiploidorganisms),numberofloci,andthevaluethatisusedtoindicatemissingdata.Clickon“Showdatafileformat"togetasummaryofthelengthsandnumberoflinesinthedatafile.FormatforthedatafileEssentially,theentiredatasetisarrangedasamatrixinasinglefile,inwhichthedataforindividualsareinrows,andthelociareincolumns.Foradiploidorganism,dataforeachindividualcanbestoredeitheras2consecutiverows,whereeachlocusisinonecolumn,orinonerow,whereeachlocusisintwoconsecutivecolumns.Unlessyouplantousethelinkagemodel(seebelow)theorderoftheallelesforasingleindividualdoesnotmatter.GenotypeData(Required;integer)Eachalleleatagivenlocusshouldbecodedbyaunique

integer(egmicrosatelliterepeatscore).Thefrontendrequiresreturnsattheendsofeachrow,and

doesnotallow

returnswithinrows;thecommand-lineversionofstructuretreatsreturnsinthesamewayasspaces

ortabs.MarkernamesRecessiveAlleles(inmaporderwithinlinkagegroups)Intermarkerdisance,-1unlinkedPhaseinformation個(gè)體編號群體數(shù)據(jù)指示(可用?)樣本來源的群體SampledatafilePOPDATA=1,NUMINDS=7,

NUMLOCI=5,andMISSING=-9.Also,POPFLAG=0,PHENOTYPE=0,EXTRACOLS=0.The

secondcolumnshowsthegeographicsamplinglocationofindividuals.不必寫出label3.(Rows)Specifywhich,ifany,oftheoptionalextrarowdataarepresent:rowofmarkernames;rowofinter-markerdistances;andarowofphasedataaftereachindividual.Alsotickthe“singleline"boxifdataforeachindividualarestoredinasinglerow,insteadofinthestandardformatoftworowsperindividual.4.(Columns)Specifywhichoftheoptionalcolumndataarethere:IndividualID(LABEL);Populationoforigin(POPDATA);USEPOPINFOflag-flagthatsaystousethePOPDATAinformationforcertainindividualswhenusingthepriorpopulationinformationmodel;phenotypedata(foruseinassociationmapping(Pritchardetal.,2000b));otherextracolumnsofdatapriortothegenotypedatathatshouldbeignoredbystructure.Whenyou'vefinishedthesesteps,you'llgetasummaryofthedataformat;ifthislookscorrect,clickon'proceed'.Theprogramwillnowattempttoloadthedatafileandcreatethenewproject.Configuringaparameterset.Onceyou'vesuccessfullyloadedadatafile,youarereadytostartrunningstructure.Youwillcreateo

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論