




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領
文檔簡介
BranchInstructions
Sima,FountainandKacsuk
Chapter8CSE33041BranchInstructions
Sima,FouMajorChapterGoalsTounderstandhowtominimisetheperformancedegradationofbranchesBasicapproachtobranchhandlingDelayedbranchingBranchprocessingMulti-waybranchingGuardedExecution2MajorChapterGoalsTounderstaTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bneta3TypesofbranchinstructionsUnTypesofbranchinstructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranchAlpha: BRtaPowPC: bta BSRta blta RETta bclr BTLR1,ta bdnzta BNER1,ta bnetaBRtata:BSRtata:RETta:SUBLR1,1,R1BLTR1,taBNER1,tata:4TypesofbranchinstructionsUnWhyworryaboutdifferenttypesofbranches?BranchescausestallsofpipelinesDifferenttechniquesforminimizingbrancheffectsTakeadvantageofdifferenttypeofbranche.g.UnconditionalbranchALWAYSoccurs.Therefore,hardwarecanplanforthebranchinadvance.e.g.differencebetweenloopclosingandnormalconditional5WhyworryaboutdifferenttypeWaysofcheckingconditionConditionalbranchesneedtoevaluateapredicateTwomainapproachesResultstateIBM360,370,PDP-11,VAX-11,x86,Pentium,MC68000,Sparc,PowerPC.DirectCheckPDP-10,Cyber/70,PDP-8,CRAY,MIPS,HPPA,DecAlpha6WaysofcheckingconditionCondResultStateResultstateisdeclaredtoholdstatusinformationrelatedtoresultofoperationTypicalimplementationisconditioncodesorflagregisterswhichareupdatedaftereveryarithmeticresultisproducedConditionalbranchinstructionsinterrogatetheflagsinsubsequentinstructions7ResultStateResultstateisdeResultstatedisadvantagesThegenerationofresultstateisnotstraightforward;irregularstructureoccupiesadditionalchipareaMakespipelinelongerALUTest>0<0=01ClockCycle8ResultstatedisadvantagesTheResultstatedisadvantages...Sequentialinconcept.HowdowepackinstructionsinSuperscalarandVLIWmachine?addr1,r2,r3subr5,r6,r7bltta1bgtta2addr1,r2,r3subr5,r6,r7bltta1bgtta29Resultstatedisadvantages...DirectCheckNoresultstateisdeclaredSpecifiedconditionsaredirectlycheckedbyexplicitinstructionsConditionalbranchingcanberequestedifthespecifiedconditionsaremet.MayinvolveoneortwoinstructionsFitsbetterintosuperscalararchitecturesShorterclockcyclebecauseonlycheckwhennecessary10DirectCheckNoresultstateisComparisonofconditionalbranchesTwoinstructionImplementationadd r1,r2,r3cmpeq r7,r1,0bt r7,labeldiv r5,r4,r1OneinstructionImplementationadd r1,r2,r3beq r1,labeldiv r5,r4,r111ComparisonofconditionalbranTheeffectofbranchesNeedtounderstandwhetherbranchesaretakenornotThentailorthearchitecturetoperformfastestonmostcommoncase.Itturnsoutthatmostbranchesaretaken…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12TheeffectofbranchesNeedtoBranchstatistics...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranchtoSubroutineReturnfromSubroutineLoopclosingconditionalbranchOtherconditionalbranch~1/3~1/3~1/3Takenforfirstn-1iterationsTakenNotTaken~1/6~1/6Taken~5/613Branchstatistics...UnconditiBranchHandlingUtilizingbranchdelayslotsHandlingofunresolvedconditionalbranchesAvoidingcond.branchesDelayedBranchingPerformanceBlockingbranchproc.Speculativebranchproc.Multiwaybranchproc.BranchprocessingGuardedExecution14BranchHandlingUtilizingbrancBranchHandlingDelayedbranchingUtiliseotherwisewastedcyclesfollowingbranches.AchievedbyinsertinganinstructionbehindthebranchandexecutingitbeforethebranchUtilizedonEarlyandSubsequentRISCarchitectures15BranchHandlingDelayedbranchiDelayedbranchingFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1,r2,r3b anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r616DelayedbranchingFetchDec/RdALDelayedbranching...FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1,r2,r3bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayedBranchFetchDec/RdALUWritesub17Delayedbranching...FetchDec/DelayedbranchingoptionsCanreducetosingledelayslotiftargetaddressavailableatendofdecodephaseCompilerplacesNOPsinslotsandmigratesinstructionsintoslotsusingcodemigrationCompilermustperformdataflowanalysis18DelayedbranchingoptionsCanrCodemigrationtofillslotsadd r8,r9,r10bd anywherediv r5,r4,r1anywhere: sub ………..add r3,r2,r6NOPNOPadd r3,r2,r6bd anywherediv r5,r4,r1anywhere: sub ………..add r8,r9,r1019CodemigrationtofillslotsadDoesthisworkwithconditionalbranches/?Yes,butcodemigratedintodelayslotmustbeperformedunconditionallyadd r8,r9,r10beq r6,anywherediv r5,r4,r1add r3,r2,r6NOPNOPadd r3,r2,r6div r5,r4,r1add r8,r9,r10beq r6,anywhere20DoesthisworkwithconditionaOptimisations-AnnulmentAnnuldelayslotifbranchnottakenUsefulforbackwardconditionalbranchesMakesitpossibletomovealoopbodyinstructionintothedelayslotAnnuldelayslotifbranchistakenUsefulforforwardconditionalbranchesMakesitpossibletorelocateaninstructionfromsequentialpathintothedelayslot21Optimisations-AnnulmentAnnulAnnuldelayslotifbranchnottaken1234BrCDelayLoopbodyConditionalBranchwithnoAnnulmentLooprequires5instructions22AnnuldelayslotifbranchnotAnnuldelayslotifbranchnottaken...1234BrCDelayLoopbodyConditionalBranchwithAnnulment1Looprequires4instructions23AnnuldelayslotifbranchnotAnnuldelayslotifbranchistaken1ConditionalBranchwithnoAnnulment23BrCDelay4524AnnuldelayslotifbranchisAnnuldelayslotifbranchistaken1ConditionalBranchwithAnnulment23BrC4525AnnuldelayslotifbranchisFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranchdetectionIngeneral,theearlierabranchisdetected,thebetterthehandlingReducesnumberofdelayslotsrequiredEarlybranchdetectionInparallelLookaheadIntegratedwithfetchFetchDec/RdALUWrite26FetchDec/RdALUWriteFetchDec/RdEarlybranchdetectionInParallelBranchesaredetectedinparallelwithdecodeofotherinstructionsusingadedicatedbranchdecoderLookaheadBranchesaredetectedfromtheinstructionbufferbutaheadofgeneralinstructiondecodingIntegratedBranchesaredetectedduringinstructionfetch27EarlybranchdetectionInParalConditionalbranchesTheproblemwithconditionalbranchesisthatwemaynotknowuntillateintheinstructionwhetherwewishtotakeorrejectthecontroltransferTakebranch?FetchDec/RdALUWritebz r1,tagetbzreadr1cmp0WritePC28ConditionalbranchesTheprobleHandlingunresolvedconditionalbranchesBlockingbranchprocessingSpeculativebranchprocessingBranchpredictionExtentofspeculationRecoveryfrommis-predictionMulti-waybranching29HandlingunresolvedconditionaBlockingbranchprocessingTrivialapproachExecutionofconditionalissimplystalleduntilconditionisknownResultstateCanpossiblyorderinstructionstoreducedelayinwaitingforconditioncodestobesetDirectcheckingNeedtowaitforALUresultinthisinstructionSetCCBrCFetchDec/RdALUWrite30BlockingbranchprocessingTrivSpeculativebranchprocessingPipelinestallscanbeavoidedAfterdetectionofunresolvedconditionalbranchaguessismadeoftheoutcomeIfguessiscorrectSpeculationconfirmedandcontinuedexecutionIfguessisincorrectInstructionsdiscardedFetchrestarted31SpeculativebranchprocessingPBranchpredictionschemesFixedpredictionTruepredictionA“true”guessismadeStaticPredictionBasedonobjectcodeDynamicPredictionBasedonexecutionhistory32BranchpredictionschemesFixedFixedPrediction
(GuessNotTaken)DetectunresolvedconditionalbranchandguessasnottakenContinuewiththeexecutionofthesequentialpath,butstarttheexecutionoftakenpath(e.g.calculateBTA)Whenconditionalbecomesavailable,checktheguessIfcorrect,continuewithexecutiondeletetakenpathpre-processingIfincorrect,deletespeculativeexecutionandcontinuewithtakenpath33FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTACORRECT34FixedPrediction
(GuessNotFixedPrediction
(GuessNotTaken)ConditionKnownEvaluateBTAINCORRECT35FixedPrediction
(GuessNotFixedPrediction
(GuessTaken)ConditionKnownEvaluateBTAINCORRECT36FixedPrediction
(GuessTakeStaticPredictionLikefixedprediction,butsomepropertiesoftheobjectcodeareusedtodecidewhethertousealwaystakenornottakenHaspossibilityofperformingbettersincesomeinformationisused.37StaticPredictionLikefixedprStaticPrediction...TypicaloptionsareOpcodebasedpredictionForcertainopcodesthebranchisassumedtobetaken,forothersnottaken.DisplacementbasedpredictionIfdisplacement<0taken,elsenottakenCompilerdirectedPredictbitininstructionsetbycompileranalysisortracedataOpBTASignbit38StaticPrediction...TypicaloDynamicpredictionPredictionismadeonbranchhistoryPhilosophyisthathistoryisgoodguidetofuturebehaviourGoodforloopsbecausetend
toiteratemultipletimesGoodforsomeconditionalbranchesdependingonbehaviourofdataGoodforexceptionconditioncheckingWhatHappenedLastTime?39DynamicpredictionPredictioniDynamicPredictionTechniquesExplicittechniqueBranchhistoryexplicitlystatedhistorybits1,2or3bitschemesNumberofbitsrepresentlikelihoodofbranchbeingtakeninfutureImplicittechniqueBranchbeing“recorded”isanindicationthatbranchwastakenRoughlyequivalentto1bitscheme40DynamicPredictionTechniquesEBranchHistoryBitsForeachbranch,recordastatusfield.OnebitstatusDidlastoccurrenceofbranchoccur?TwobitstatusStatetabledecides
whichstateThreebitschemeHistoryoflast3branchesMajoritydecisiononoutcome41BranchHistoryBitsForeachbrStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTPredictionTakenPredictionNotTaken42Statetransitionsin2bitschStatetransitionsin2bitschemeForeachbranchAT=ActuallytakenANT=ActuallynottakenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTStronglyNOTTakenANTWeaklyNOTTakenATWeaklyTakenATANTWeaklyNOTTakenATWeaklyTaken43Statetransitionsin2bitschImplicitDynamicSchemeTwomainschemesBranchTargetAccessCache(BTAC)BranchTargetInstructionCache(BTIC)BothschemesintroduceanextracacheEntriesareonlystoredfortakenbranchesNottakenbranchesarenotstoresIfanentryisincache,thennextbranchistakenBehaveslike1bit44ImplicitDynamicSchemeTwomaiBTACandBTICimplementationStoreBranchAddressPLUSThetargetaddressitself(BTAC)Theinstructionatthetarget(BTIC)10002000LoadR1,….100020001000LoadR1,….45BTACandBTICimplementationStBTACimplementation...LoadR1,….1000200010002000Don’ttakebranchSpeculativelyWhoops!AddthisbranchaddresstothecacheNexttimeTakebranchSpeculatively46BTACimplementation...LoadR1PerformancedifferencesinBTACandBTICBTACdeliversaddressofbranchintimefornextfetchBTICdeliversactualinstruction.CanbeslowerbecauseitdoesnotrequirefetchLookupBTACFetchDec/RdExeWriteFetchDec/RdExeWriteLookupBTIC47PerformancedifferencesinBTAImplementationofHistorybitsPlacementofhistorybitsI-cacheAlpha,UltraSparcBranchHistoryTable(BHT)PowerPC604,620,R10000BranchTargetAddressCache(BTAC)MC68060,Pentium,R8000I-cacheandBHTeffectivelythesame48ImplementationofHistorybitsI-CacheandBHT(PowerPC604)I-cache16KFourwaysetassociativeInstructionfetchaddressBHT128x4entriesPredictionLogic4instr./cycle2HistoryBitsTaken/NottakenBTAforatakenguessDecodeQueue4x1Instr.IssueQueue49I-CacheandBHT(PowerPC604)PredictionAccuracyP=fc*Pc+fm*Pmwherefc: Probabilityofcorrectlypredictingbranchesfm: Probabilityofmis-predictingbranchesPc: PenaltyofcorrectlypredictedbranchesPm: Penaltyofmis-predictedbranches50PredictionAccuracyP=fc*PcPredictionAccuracy….51PredictionAccuracy….51RecoveryfromMispredictionIfbranchpredictionhardwaremakeswrongguess,needtoreverttoalternativepathofexecution.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrite?52RecoveryfromMispredictionIfRecoveryfromMisprediction...Forpipelinedmachines,maybepossibletoabortregisterwritesStoreinstructionsarehardertorecoverConditioncodesmightalsoneedtoberestored.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteRegisterFile53RecoveryfromMisprediction..Schemestoshortenmis-predictionrecoveryBasicpriormeasuresforrecoveryIna“taken”guess,savesequentialaddressIna“nottaken”guess,pre-calculateandsavebranchtargetaddressRequirestwoaddressregistersperspeculatedconditionalbranchSavethisaddressSavethis
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 用數(shù)據(jù)說話科普匯報中的信息可視化技巧
- 二零二五年度二手房買賣補充協(xié)議范本
- 木材銷售回收合同范本
- 2025年度飼料行業(yè)國際市場拓展與合作伙伴協(xié)議
- 二零二五年度服裝店員工勞動合同及品牌形象維護協(xié)議
- 二零二五年度籃球球員轉(zhuǎn)會手續(xù)費合同
- 2025年度股份有限公司股權(quán)激勵計劃實施細則合同范本
- 2025年度美容院顧客信息與合同權(quán)益轉(zhuǎn)讓書
- 二零二五年度房屋租賃合同租賃登記與備案流程
- 2025年度車輛質(zhì)押融資租賃保證合同
- 中國古典文獻學(全套)
- WOMAC骨性關節(jié)炎指數(shù)評分表
- 年處理量48萬噸重整裝置芳烴精餾的工藝設計-二甲苯塔
- CRPS電源設計向?qū)?CRPS Design Guide r-2017
- 16防沖工題庫題庫(238道)
- SH/T 1627.1-1996工業(yè)用乙腈
- GB/T 5534-2008動植物油脂皂化值的測定
- GB/T 3452.2-2007液壓氣動用O形橡膠密封圈第2部分:外觀質(zhì)量檢驗規(guī)范
- GB/T 30797-2014食品用洗滌劑試驗方法總砷的測定
- GB/T 20057-2012滾動軸承圓柱滾子軸承平擋圈和套圈無擋邊端倒角尺寸
- GB/T 19808-2005塑料管材和管件公稱外徑大于或等于90mm的聚乙烯電熔組件的拉伸剝離試驗
評論
0/150
提交評論