計(jì)算機(jī)體系結(jié)構(gòu)課后習(xí)題_第1頁(yè)
計(jì)算機(jī)體系結(jié)構(gòu)課后習(xí)題_第2頁(yè)
計(jì)算機(jī)體系結(jié)構(gòu)課后習(xí)題_第3頁(yè)
計(jì)算機(jī)體系結(jié)構(gòu)課后習(xí)題_第4頁(yè)
計(jì)算機(jī)體系結(jié)構(gòu)課后習(xí)題_第5頁(yè)
已閱讀5頁(yè),還剩6頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

精選優(yōu)質(zhì)文檔傾情為你奉上精選優(yōu)質(zhì)文檔傾情為你奉上專心專注專業(yè)專心專注專業(yè)精選優(yōu)質(zhì)文檔傾情為你奉上專心專注專業(yè)計(jì)算機(jī)體系結(jié)構(gòu)課后習(xí)題1.1Threeenhancementswiththefollowingspeedupsareproposedforanewarchitecture:Speedup1=30Speedup2=20Speedup3=15Onlyoneenhancementisusableatatime.(1)Ifenhancements1and2areeachusablefor25%ofthetime,whatfractionofthetimemustenhancement3beusedtoachieveanoverallspeedupof10?(2)Assumetheenhancementscanbeused25%,35%and10%ofthetimeforenhancements1,2,and3,respectively.Forwhatfractionofthereducedexecutiontimeisnoenhancementinuse?(3)Assume,forsomebenchmark,thepossiblefractionofuseis15%foreachofenhancements1and2and70%forenhancement3.Wewanttomaximizeperformance.Ifonlyoneenhancementcanbeimplemented,whichshoulditbe?Iftwoenhancementscanbeimplemented,whichshouldbechosen?答:(1)Assume:thefractionofthetimeenhancement3mustbeusedtoachieveanoverallspeedupof10isx.Speedup10=11-25%-25%-x+(2)Assume:ThetotalexecutiontimebeforethethreeenhancementscanbeusedisTimebefore,TheexecutiontimefornoenhancementisTimeno.TimeThetotalexecutiontimeafterthethreeenhancementscanbeusedisTimeafterTimeSo,Time(3)BySpeedupIfonlyoneenhancementcanbeimplemented:SpeedupSpeedupSpeedupSo,wemustselectenhancement1and3tomaximizeperformance.SpeedupSpeedupSpeedupSpeedupSo,wemustselectenhancement1and3tomaximizeperformance.1.2Supposethereisagraphicsoperationthataccountsfor10%ofexecutiontimeinanapplication,andbyaddingspecialhardwarewecanspeedthisupbyafactorof18.Infurther,wecouldusetwiceasmuchhardware,andmakethegraphicsoperationrun36timesfaster.Givethereasonofwhetheritisworthexploringsuchanfurtherarchitecturalchange?答:SpeedupSpeedupSpeedupSo,Itisnotworthexploringsuchanfurtherarchitecturalchange.1.3Inmanypracticalapplicationsthatdemandareal-timeresponse,thecomputationalworkloadWisoftenfixed.Asthenumberofprocessorsincreasesinaparallelcomputer,thefixedworkloadisdistributedtomoreprocessorsforparallelexecution.Assume20percentofWmustbeexecutedsequentially,and80percentcanbeexecutedby4nodessimultaneously.Whatisafixed-loadspeedup?答:SpeedupSpeedupSo,afixed-loadspeedupis2.5.

2.1Thereisamodelmachinewithnineinstructions,whichfrequenciesareADD(0.3),SUB(0.24),JOM(0.06),STO(0.07),JMP(0.07),SHR(0.02),CIL(0.03),CLA(0.2),STP(0.01),respectively.ThereareseveralGPRsinthemachine.Memoryisbyteaddressable,withaccessedaddressesaligned.Andthememorywordwidthis16bit.Supposethenineinstructionswiththecharacteristicsasfollowing:TwooperandsinstructionsTwokindsofinstructionlengthExtendedcodingShorterinstructionoperandsformat:R(register)-R(register)Longerinstructionoperandsformat:R(register)-M(memory)WithdisplacementmemoryaddressingmodeA.EncodethenineinstructionswithHuffman-coding,andgivetheaveragecodelength.B.Designedthepracticalinstructioncodes,andgivetheaveragecodelength.C.Writethetwoinstructionwordformatsindetail.D.Whatisthemaximumoffsetforaccessingmemoryaddress?答: HuffmancodingbyHuffmantreeADD 30% 01SUB 24% 11CLA 20% 10JOM 6% 0001STO 7% 0011JMP 7% 0010SHR 2% CIL 3% 00001STP 1% So,theaveragecodelengthisi=(B)TwokindsofinstructionlengthextendedcodingADD 30% 01SUB 24% 11CLA 20% 10JOM 6% 11000STO 7% 11001JMP 7% 11010SHR 2% 11011CIL 3% 11100STP 1% 11101So,theaveragecodelengthis(C)Shorterinstructionformat:Opcode2bitsRegister3bitsRegister3bitsLongerinstructionformat:opcode5bitsRegister3bitsRegister3bitsoffset5bits(D)Themaximumoffsetforaccessingmemoryaddressis32bytes.

3.1Identifyallofthedatadependencesinthefollowingcode.Whichdependencesaredatahazardsthatwillberesolvedviaforwarding?ADD R2,R5,R4ADD R4,R2,R5SW R5,100(R2)ADD R3,R2,R4答:3.2Howcouldwemodifythefollowingcodetomakeuseofadelayedbranchslot?Loop:LWR2,100(R3)ADDIR3,R3,#4BEQR3,R4,Loop答: LW R2,100(R3) Loop: ADDI R3,R3,#4 BEQ R3,R4,Loop Delayedbranchslot LW R2,100(R3)3.3Considerthefollowingreservationtableforafour-stagepipelinewithaclockcyclet=20ns.A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?B.Drawthestatetransitiondiagramforschedulingthepipeline.C.DeterminetheMALassociatedwiththeshortestgreedycycle.D.DeterminethepipelinemaximumthroughputcorrespondingtotheMALandgivent.s1s2s3s4 1 2 3s1s2s3s4×××××××答:A.theforbiddenlatenciesF={1,2,5}theinitialcollisionvectorC=(10011)B.thestatetransitiondiagramC.MAL(MinimalAverageLatency)=3clockcyclesD.ThepipelinemaximumthroughputHk=1/(3×20ns)3.4Usingthefollowingcodefragment:Loop: LW R1,0(R2); loadR1fromaddress0+R2 ADDI R1,R1,#1; R1=R1+1 SW 0(R2),R1; storeR1ataddress0+R2 ADDI R2,R2,#4; R2=R2+4 SUB R4,R3,R2; R4=R3-R2 BNEZ R4,Loop; BranchtoloopifR4!=0AssumethattheinitialvalueofR3isR2+396.ThroughoutthisexerciseusetheclassicRISCfive-stageintegerpipelineandassumeallmemoryaccesstake1clockcycle.A.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithoutanyforwardingorbypassinghardwarebutassumingaregisterreadandawriteinthesameclockcycle“forwards”throughtheregisterfile.Assumethatthebranchishandledbyflushingthepipeline.Ifallmemoryreferencestake1cycle,howmanycyclesdoesthislooptaketoexecute?B.ShowthetimingofthisinstructionsequencefortheRISCpipelinewithnormalforwardingandbypassinghardware.Assumethatthebranchishandledbypredictingitasnottaken.Ifallmemoryreferencetake1cycle,howmanycyclesdoesthislooptaketoexecute?C.AssumetheRISCpipelinewithasingle-cycledelayedbranchandnormalforwardingandbypassinghardware.Scheduletheinstructionsintheloopincludingthebranchdelayslot.Youmayreorderinstructionsandmodifytheindividualinstructionoperands,butdonotundertakeotherlooptransformationsthatchangethenumberoropcodeoftheinstructionsintheloop.Showapipelinetimingdiagramandcomputethenumberofcyclesneededtoexecutetheentireloop.答:A.·Theloopiterates396/4=99times.·Gothroughonecompleteiterationoftheloopandthefirstinstructioninthenextiteration.·Totallength=thelengthofiterations0through97(Thefirst98iterationsshouldbeofthesamelength)+thelengthofthelastiteration.·WehaveassumedtheversionofDLXdescribedinFigure3.21(Page97)inthebook,whichresolvesbranchesinMEM.·FromthisFigure,theseconditerationbegin17clocksafterthefirstiterationandthelastiterationtakes18cyclestocomplete.·Totallength=17×98+18=1684clockcyclesB.·FromthisFigure,theseconditerationbegin10clocksafterthefirstiterationandthelastiterationtakes11cyclestocomplete.·Totallength=10×98+11=991clockcyclesC.Loop: LW R1,0(R2); loadR1fromaddress0+R2 ADDI R1,R1,#1; R1=R1+1 SW 0(R2),R1; storeR1ataddress0+R2 ADDI R2,R2,#4; R2=R2+4 SUB R4,R3,R2; R4=R3-R2 BNEZ R4,Loop; BranchtoloopifR4!=0Reorderinstructionsto:Loop:LW R1,0(R2); loadR1fromaddress0+R2 ADDI R2,R2,#4; R2=R2+4 SUB R4,R3,R2; R4=R3-R2 ADDI R1,R1,#1; R1=R1+1 BNEZ R4,Loop; BranchtoloopifR4!=0 SW -4(R2),R1; storeR1ataddress0+R2·FromFiguretheseconditerationbegin6clocksafterthefirstiterationandthelastiterationtakes10cyclestocomplete.·Totallength=6×98+10=598clockcyclesLoop:LW R1,0(R2); loadR1fromaddress0+R2 stall ADDI R1,R1,#1; R1=R1+1 SW 0(R2),R1; storeR1ataddress0+R2 ADDI R2,R2,#4; R2=R2+4 SUB R4,R3,R2; R4=R3-R2 stall BNEZ R4,Loop; BranchtoloopifR4!=0 stallLoop:LW R1,0(R2); loadR1fromaddress0+R2(stall) ADDI R2,R2,#4; R2=R2+4 ADDI R1,R1,#1; R1=R1+1 SW -4(R2),R1; storeR1ataddress0+R2 SUB R4,R3,R2; R4=R3-R2 stall BNEZ R4,Loop; BranchtoloopifR4!=0 stallLoop:LW R1,0(R2); loadR1fromaddress0+R2(stall) ADDI R2,R2,#4; R2=R2+4 SUB R4,R3,R2; R4=R3-R2(stall) ADDI R1,R1,#1; R1=R1+1 BNEZ R4,Loop; BranchtoloopifR4!=0(stall) SW -4(R2),R1; storeR1ataddress0+R23.5Considerthefollowingreservationtableforafour-stagepipeline.A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?B.Drawthestatetransitiondiagramforschedulingthepipeline.C.DeterminetheMALassociatedwiththeshortestgreedycycle.D.DeterminethepipelinemaximumthroughputcorrespondingtotheMAL.E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline,determinethepipelineactualthroughput.1234567s1√√√s2√√s3√s4√√答:A.theforbiddenlatenciesare{2,4,6}theinitialcollisionvectorC=()B.thestatetransitiondiagram:C.theMALassociatedwiththeshortestgreedycycleis4cycles.schedulingAveragelatency(1,7)4(3,5)4(5,3)4(5)5(3,7)5(5,7)6(7)7D.thepipelinemaximumthroughputcorrespondingtotheMAL:Hk=1/(4clockcycles)E.Accordingtotheshortestgreedycycle,putsixtasksintothepipeline.Thebestschedulingisthegreedycycle(l,7).because:accordingto(1,7)scheduling:actualthroughputHk=6/(1+7+1+7+1+7)=6/(24cycles)accordingto(3,5)scheduling:actualthroughputHk=6/(3+5+3+5+3+7)=6/(26cycles)accordingto(5,3)scheduling:actualthroughputHk=6/(5+3+5+3+5+7)=6/(28cycles)

4.1ThefollowingCprogramisrun(withnooptimizations)onamachinewithacachethathasfour-word(16-byte)blocksandholds256bytesofdata: inti,j,c,stride,array[256]; … for(i=0;i<10000;i++) for(j=0;j<256;j=j+stride) c=array[j]+5;ifweconsideronlythecacheactivitygeneratedbyreferencestothearrayandweassumethatintegersarewords,whatistheexpectedmissratewhenthecacheisdirect-mappedandstride=132?Howaboutifstride=131?Wouldeitherofthesechangeifthecacheweretwo-waysetassociative?答:Ifstride=132andthecacheisdirect-mapped Page201、211·Theblocknumberofthecacheis256/16=16·Theblockaddressofarray[0]=0/16=0·Theblocknumberthatarray[0]mapstocache:0mod16=0·Theblockaddressofarray[132]=132×4/16=33·Theblocknumberthatarray[132]mapstocache:33mod16=1So,missrate=2/210000=1/10000Ifstride=131andthecacheisdirect-mapped Page201、211·Theblocknumberofthecacheis256/16=16·Theblockaddressofarray[0]=0/16=0·Theblocknumberthatarray[0]mapstocache:0mod16=0·Theblockaddressofarray[131]=131×4/16=32·Theblocknumberthatarray[131]mapstocache:32mod16=0So,missrate=210000/210000=1Ifstride=132andthecacheistwo-waysetassociative 227、211·Theblocknumberofthecacheis256/16=16·Thesetnumberofthecacheis16/2=8·Theblockaddressofarray[0]=0/

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論