計算機體系結(jié)構(gòu)課件_第1頁
計算機體系結(jié)構(gòu)課件_第2頁
計算機體系結(jié)構(gòu)課件_第3頁
計算機體系結(jié)構(gòu)課件_第4頁
計算機體系結(jié)構(gòu)課件_第5頁
已閱讀5頁,還剩56頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

BranchInstructions

Sima,FountainandKacsuk

Chapter8CSE33041BranchInstructions

Sima,FouMajorChapterGoalsTounderstandhowtominimisetheperformancedegradationofbranchesBasicapproachtobranchhandlingDelayedbranchingBranchprocessingMulti-waybranchingGuardedExecution2MajorChapterGoalsTounderstaTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bneta3TypesofbranchinstructionsUnTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bnetaBRtata:BSRtata:RETta:SUBLR1,1,R1BLTR1,taBNER1,tata:4TypesofbranchinstructionsUnWhyworryaboutdifferenttypesofbranches?BranchescausestallsofpipelinesDifferenttechniquesforminimizingbrancheffectsTakeadvantageofdifferenttypeofbranche.g.UnconditionalbranchALWAYSoccurs.Therefore,hardwarecanplanforthebranchinadvance.e.g.differencebetweenloopclosingandnormalconditional5WhyworryaboutdifferenttypeWaysofcheckingconditionConditionalbranchesneedtoevaluateapredicateTwomainapproachesResultstateIBM360,370,PDP-11,VAX-11,x86,Pentium,MC68000,Sparc,PowerPC.DirectCheckPDP-10,Cyber/70,PDP-8,CRAY,MIPS,HPPA,DecAlpha6WaysofcheckingconditionCondResultStateResultstateisdeclaredtoholdstatusinformationrelatedtoresultofoperationTypicalimplementationisconditioncodesorflagregisterswhichareupdatedaftereveryarithmeticresultisproducedConditionalbranchinstructionsinterrogatetheflagsinsubsequentinstructions7ResultStateResultstateisdeResultstatedisadvantagesThegenerationofresultstateisnotstraightforward;irregularstructureoccupiesadditionalchipareaMakespipelinelongerALUTest>0<0=01ClockCycle8ResultstatedisadvantagesTheResultstatedisadvantages...Sequentialinconcept.HowdowepackinstructionsinSuperscalarandVLIWmachine?addr1,r2,r3subr5,r6,r7bltta1bgtta2addr1,r2,r3subr5,r6,r7bltta1bgtta29Resultstatedisadvantages...DirectCheckNoresultstateisdeclaredSpecifiedconditionsaredirectlycheckedbyexplicitinstructionsConditionalbranchingcanberequestedifthespecifiedconditionsaremet.MayinvolveoneortwoinstructionsFitsbetterintosuperscalararchitecturesShorterclockcyclebecauseonlycheckwhennecessary10DirectCheckNoresultstateisComparisonofconditionalbranchesTwoinstructionImplementationadd r1,r2,r3cmpeq r7,r1,0bt r7,labeldiv r5,r4,r1OneinstructionImplementationadd r1,r2,r3beq r1,labeldiv r5,r4,r111ComparisonofconditionalbranTheeffectofbranchesNeedtounderstandwhetherbranchesaretakenornotThentailorthearchitecturetoperformfastestonmostcommoncase.Itturnsoutthatmostbranchesaretaken…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12TheeffectofbranchesNeedtoBranchstatistics...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranch~1/3~1/3~1/3Takenforfirstn-1iterationsTakenNotTaken~1/6~1/6Taken~5/613Branchstatistics...UnconditiBranchHandlingUtilizingbranchdelayslotsHandlingofunresolvedconditionalbranchesAvoidingcond.branchesDelayedBranchingPerformanceBlockingbranchproc.Speculativebranchproc.Multiwaybranchproc.BranchprocessingGuardedExecution14BranchHandlingUtilizingbrancBranchHandlingDelayedbranchingUtiliseotherwisewastedcyclesfollowingbranches.AchievedbyinsertinganinstructionbehindthebranchandexecutingitbeforethebranchUtilizedonEarlyandSubsequentRISCarchitectures15BranchHandlingDelayedbranchiDelayedbranchingFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1,r2,r3b anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r616DelayedbranchingFetchDec/RdALDelayedbranching...FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1,r2,r3bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayedBranchFetchDec/RdALUWritesub17Delayedbranching...FetchDec/DelayedbranchingoptionsCanreducetosingledelayslotiftargetaddressavailableatendofdecodephaseCompilerplacesNOPsinslotsandmigratesinstructionsintoslotsusingcodemigrationCompilermustperformdataflowanalysis18DelayedbranchingoptionsCanrCodemigrationtofillslotsadd r8,r9,r10bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6NOPNOPadd r3,r2,r6bd anywherediv r5,r4,r1anywhere: sub ………..add r8,r9,r1019CodemigrationtofillslotsadDoesthisworkwithconditionalbranches/?Yes,butcodemigratedintodelayslotmustbeperformedunconditionallyadd r8,r9,r10beq r6,anywherediv r5,r4,r1add r3,r2,r6NOPNOPadd r3,r2,r6div r5,r4,r1add r8,r9,r10beq r6,anywhere20DoesthisworkwithconditionaOptimisations-AnnulmentAnnuldelayslotifbranchnottakenUsefulforbackwardconditionalbranchesMakesitpossibletomovealoopbodyinstructionintothedelayslotAnnuldelayslotifbranchistakenUsefulforforwardconditionalbranchesMakesitpossibletorelocateaninstructionfromsequentialpathintothedelayslot21Optimisations-AnnulmentAnnulAnnuldelayslotifbranchnottaken1234BrCDelayLoopbodyConditionalBranchwithnoAnnulmentLooprequires5instructions22AnnuldelayslotifbranchnotAnnuldelayslotifbranchnottaken...1234BrCDelayLoopbodyConditionalBranchwithAnnulment1Looprequires4instructions23AnnuldelayslotifbranchnotAnnuldelayslotifbranchistaken1ConditionalBranchwithnoAnnulment23BrCDelay4524AnnuldelayslotifbranchisAnnuldelayslotifbranchistaken1ConditionalBranchwithAnnulment23BrC4525AnnuldelayslotifbranchisFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranchdetectionIngeneral,theearlierabranchisdetected,thebetterthehandlingReducesnumberofdelayslotsrequiredEarlybranchdetectionInparallelLookaheadIntegratedwithfetchFetchDec/RdALUWrite26FetchDec/RdALUWriteFetchDec/RdEarlybranchdetectionInParallelBranchesaredetectedinparallelwithdecodeofotherinstructionsusingadedicatedbranchdecoderLookaheadBranchesaredetectedfromtheinstructionbufferbutaheadofgeneralinstructiondecodingIntegratedBranchesaredetectedduringinstructionfetch27EarlybranchdetectionInParalConditionalbranchesTheproblemwithconditionalbranchesisthatwemaynotknowuntillateintheinstructionwhetherwewishtotakeorrejectthecontroltransferTakebranch?FetchDec/RdALUWritebz r1,tagetbzreadr1cmp0WritePC28ConditionalbranchesTheprobleHandlingunresolvedconditionalbranchesBlockingbranchprocessingSpeculativebranchprocessingBranchpredictionExtentofspeculationRecoveryfrommis-predictionMulti-waybranching29HandlingunresolvedconditionaBlockingbranchprocessingTrivialapproachExecutionofconditionalissimplystalleduntilconditionisknownResultstateCanpossiblyorderinstructionstoreducedelayinwaitingforconditioncodestobesetDirectcheckingNeedtowaitforALUresultinthisinstructionSetCCBrCFetchDec/RdALUWrite30BlockingbranchprocessingTrivSpeculativebranchprocessingPipelinestallscanbeavoidedAfterdetectionofunresolvedconditionalbranchaguessismadeoftheoutcomeIfguessiscorrectSpeculationconfirmedandcontinuedexecutionIfguessisincorrectInstructionsdiscardedFetchrestarted31SpeculativebranchprocessingPBranchpredictionschemesFixedpredictionTruepredictionA“true”guessismadeStaticPredictionBasedonobjectcodeDynamicPredictionBasedonexecutionhistory32BranchpredictionschemesFixedFixedPrediction

(GuessNotTaken)DetectunresolvedconditionalbranchandguessasnottakenContinuewiththeexecutionofthesequentialpath,butstarttheexecutionoftakenpath(e.g.calculateBTA)Whenconditionalbecomesavailable,checktheguessIfcorrect,continuewithexecutiondeletetakenpathpre-processingIfincorrect,deletespeculativeexecutionandcontinuewithtakenpath33FixedPrediction

(GuessNotFixedPrediction

(GuessNotTaken)ConditionKnownEvaluateBTACORRECT34FixedPrediction

(GuessNotFixedPrediction

(GuessNotTaken)ConditionKnownEvaluateBTAINCORRECT35FixedPrediction

(GuessNotFixedPrediction

(GuessTaken)ConditionKnownEvaluateBTAINCORRECT36FixedPrediction

(GuessTakeStaticPredictionLikefixedprediction,butsomepropertiesoftheobjectcodeareusedtodecidewhethertousealwaystakenornottakenHaspossibilityofperformingbettersincesomeinformationisused.37StaticPredictionLikefixedprStaticPrediction...TypicaloptionsareOpcodebasedpredictionForcertainopcodesthebranchisassumedtobetaken,forothersnottaken.DisplacementbasedpredictionIfdisplacement<0taken,elsenottakenCompilerdirectedPredictbitininstructionsetbycompileranalysisortracedataOpBTASignbit38StaticPrediction...TypicaloDynamicpredictionPredictionismadeonbranchhistoryPhilosophyisthathistoryisgoodguidetofuturebehaviourGoodforloopsbecausetend

toiteratemultipletimesGoodforsomeconditionalbranchesdependingonbehaviourofdataGoodforexceptionconditioncheckingWhatHappenedLastTime?39DynamicpredictionPredictioniDynamicPredictionTechniquesExplicittechniqueBranchhistoryexplicitlystatedhistorybits1,2or3bitschemesNumberofbitsrepresentlikelihoodofbranchbeingtakeninfutureImplicittechniqueBranchbeing“recorded”isanindicationthatbranchwastakenRoughlyequivalentto1bitscheme40DynamicPredictionTechniquesEBranchHistoryBitsForeachbranch,recordastatusfield.OnebitstatusDidlastoccurrenceofbranchoccur?TwobitstatusStatetabledecides

whichstateThreebitschemeHistoryoflast3branchesMajoritydecisiononoutcome41BranchHistoryBitsForeachbrStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTPredictionTakenPredictionNotTaken42Statetransitionsin2bitschStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTStronglyNOTTakenANTWeaklyNOTTakenATWeaklyTakenATANTWeaklyNOTTakenATWeaklyTaken43Statetransitionsin2bitschImplicitDynamicSchemeTwomainschemesBranchTargetAccessCache(BTAC)BranchTargetInstructionCache(BTIC)BothschemesintroduceanextracacheEntriesareonlystoredfortakenbranchesNottakenbranchesarenotstoresIfanentryisincache,thennextbranchistakenBehaveslike1bit44ImplicitDynamicSchemeTwomaiBTACandBTICimplementationStoreBranchAddressPLUSThetargetaddressitself(BTAC)Theinstructionatthetarget(BTIC)10002000LoadR1,….100020001000LoadR1,….45BTACandBTICimplementationStBTACimplementation...LoadR1,….1000200010002000Don’ttakebranchSpeculativelyWhoops!AddthisbranchaddresstothecacheNexttimeTakebranchSpeculatively46BTACimplementation...LoadR1PerformancedifferencesinBTACandBTICBTACdeliversaddressofbranchintimefornextfetchBTICdeliversactualinstruction.CanbeslowerbecauseitdoesnotrequirefetchLookupBTACFetchDec/RdExeWriteFetchDec/RdExeWriteLookupBTIC47PerformancedifferencesinBTAImplementationofHistorybitsPlacementofhistorybitsI-cacheAlpha,UltraSparcBranchHistoryTable(BHT)PowerPC604,620,R10000BranchTargetAddressCache(BTAC)MC68060,Pentium,R8000I-cacheandBHTeffectivelythesame48ImplementationofHistorybitsI-CacheandBHT(PowerPC604)I-cache16KFourwaysetassociativeInstructionfetchaddressBHT128x4entriesPredictionLogic4instr./cycle2HistoryBitsTaken/NottakenBTAforatakenguessDecodeQueue4x1Instr.IssueQueue49I-CacheandBHT(PowerPC604)PredictionAccuracyP=fc*Pc+fm*Pmwherefc: Probabilityofcorrectlypredictingbranchesfm: Probabilityofmis-predictingbranchesPc: PenaltyofcorrectlypredictedbranchesPm: Penaltyofmis-predictedbranches50PredictionAccuracyP=fc*PcPredictionAccuracy….51PredictionAccuracy….51RecoveryfromMispredictionIfbranchpredictionhardwaremakeswrongguess,needtoreverttoalternativepathofexecution.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrite?52RecoveryfromMispredictionIfRecoveryfromMisprediction...Forpipelinedmachines,maybepossibletoabortregisterwritesStoreinstructionsarehardertorecoverConditioncodesmightalsoneedtoberestored.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteRegisterFile53RecoveryfromMisprediction..Schemestoshortenmis-predictionrecoveryBasicpriormeasuresforrecoveryIna“taken”guess,savesequentialaddressIna“nottaken”guess,pre-calculateandsavebranchtargetaddressRequirestwoaddressregistersperspeculatedconditionalbranchSavethisaddressSavethis

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論