第1章計算機(jī)體系結(jié)構(gòu)基本原理課件_第1頁
第1章計算機(jī)體系結(jié)構(gòu)基本原理課件_第2頁
第1章計算機(jī)體系結(jié)構(gòu)基本原理課件_第3頁
第1章計算機(jī)體系結(jié)構(gòu)基本原理課件_第4頁
第1章計算機(jī)體系結(jié)構(gòu)基本原理課件_第5頁
已閱讀5頁,還剩71頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

ComputerArchitecture

計算機(jī)系統(tǒng)結(jié)構(gòu)UndergraduateCourse

WeiminWu(吳為民)SchoolofComputerandInformationTechnology,BeijingJiaotongUniveristySpring2014內(nèi)容1.FundamentalsofComputerArchitecture計算機(jī)系統(tǒng)結(jié)構(gòu)的基本原理2.InstructionSet指令集3.Pipeling流水線4.MemoryHierarchy存儲層次5.Input-OutputSubsystem輸入輸出子系統(tǒng)6.InterconnectionNetworks7.ParallelComputers

本課的一般情況1.

共48學(xué)時(24次課)。其中課堂講授32學(xué)時(16次課),實(shí)驗(yàn)16學(xué)時(8次課)。2.平時包括考勤、課堂作業(yè)和上機(jī)作業(yè)。3.最終有期末考試,開卷。英文試卷。4.考核方式:平時40%,期末60%。5.要求盡量讀懂英文原文。讀不懂的地方可參見本書的翻譯版或者張晨曦的計算機(jī)系統(tǒng)結(jié)構(gòu)教材。也可發(fā)Email給我:wmwu@著重注意:作業(yè)實(shí)驗(yàn)報告中務(wù)必寫上你的課程班號(01或03),學(xué)號,姓名。1.FundamentalsofComputerArchitecture1.1LayersofComputerSystem

計算機(jī)系統(tǒng)的層次

1.2ComputerArchitectureandImplementation

計算機(jī)的系統(tǒng)結(jié)構(gòu)和實(shí)現(xiàn)1.3TheTaskofAComputerDesigner

計算機(jī)設(shè)計者的任務(wù)1.4MeasuringandReportingPerformance

測量和報告性能1.5QuantitativePrinciplesofComputerDesign

計算機(jī)設(shè)計的量化原理1.6ClassificationofComputerArchitecture

計算機(jī)系統(tǒng)結(jié)構(gòu)的分類計算機(jī)系統(tǒng)結(jié)構(gòu)的基本原理1.1LayersofComputersystemsApplicationLanguageMachineM5應(yīng)用語言機(jī)High-LevelLanguageMachineM4高級語言機(jī)AssemblyLanguageMachineM3匯編語言機(jī)OperatingSystemMachineM2操作系統(tǒng)機(jī)ConventionalMachineM1傳統(tǒng)機(jī)MicroprogramMachineM0微程序機(jī)每個層次執(zhí)行相關(guān)的功能子集。每個層次要依賴于下一個低層去執(zhí)行更原始的功能。這就將問題分解成更易處理的子問題。從M2到M5的層次是虛擬機(jī)。在傳統(tǒng)機(jī)上的指令(算數(shù)、邏輯等)由微程序級的程序?qū)崿F(xiàn)。該程序是作為一個解釋器,能理解一組簡單的操作集合,稱為微指令集。計算機(jī)系統(tǒng)的層次1.2ComputerArchitectureandImplementationComputerArchitecture

計算機(jī)系統(tǒng)結(jié)構(gòu)Referstothoseattributesofasystemvisibletoaprogrammer,

orthoseattributeshavedirectimpactonlogicalexecutionofprogram.

程序員可見,或者對程序執(zhí)行有直接影響的屬性Implementation實(shí)現(xiàn)Twocomponents:Organizationandhardware.*Organization(組織):includeshigh-levelaspectsofacomputer’sdesign,

suchas:memorysystem,busstructure,internalCPU.*Hardware(硬件):referstothespecificsofamachine,include:detailedlogicdesignandpackagingtechnology.計算機(jī)系統(tǒng)結(jié)構(gòu)和實(shí)現(xiàn)ArchitecturalAttributes系統(tǒng)結(jié)構(gòu)方面的屬性instructionset,指令集I/Omechanisms,I/O機(jī)制techniquesforaddressingmemory尋址技術(shù)

numberofbitsrepresentingvariousdatatype(numbers,characters)表示各種數(shù)據(jù)類型的位數(shù)(數(shù)值、字符)1.2ComputerArchitectureandImplementation,cont’dHardwareAttributes硬件方面的屬性packagingtechnology封裝技術(shù)power功耗cooling冷卻

OrganizationalAttributes組織方面的屬性Hardwaredetailstransparenttotheprogrammer.

對于程序員透明的硬件細(xì)節(jié)suchas:controlsignals控制信號computer/peripheralinterfaces計算機(jī)/外設(shè)接口

memorytechnology存儲技術(shù)1.2ComputerArchitectureandImplementation,cont’dArchitecturalDesignIssue系統(tǒng)結(jié)構(gòu)設(shè)計問題Whetheracomputerwillhaveamultiplyinstruction.是否要有一個乘法指令OrganizationalIssue組織設(shè)計問題Whethertheinstructionwillbeimplementedbyaspecialmultiplyunitorbyrepeateduseoftheaddunit.是采用乘法單元還是采用加法單元迭代使用Thedecisionmaybebasedontheanticipatedfrequencyofuseofthemultiplyinstruction,therelativespeedofthetwoapproaches,andthecostandphysicalsizeofaspecialmultiplyunit.決策取決于乘法指令使用頻率,兩種方法的相對速度,乘法單元的成本和大小1.2ComputerArchitectureandImplementation,cont’d1.3TheTaskofAComputerDesignerIsacomplexone:是一個復(fù)雜的問題

*Determinewhatattributesareimportantforanewmachine.確定哪些屬性是重要的*Designamachinetomaximizeperformance(性能)

whilestayingwithincost(成本)

andpower(功耗)

constraints,including:instructionsetdesign指令集設(shè)計

functionalorganization功能設(shè)計

logicdesign邏輯設(shè)計

implementation(實(shí)現(xiàn)):ICdesign,package,cooling計算機(jī)設(shè)計者的任務(wù)功能要求需要或支持的典型特征補(bǔ)充知識集成電路產(chǎn)業(yè)發(fā)展的里程碑:1947:Bell實(shí)驗(yàn)室的Bardeen、Brattain、Schockly發(fā)明了晶體管。共獲1956年諾貝爾物理學(xué)獎。

晶體管是IC產(chǎn)業(yè)的基石。1952:SONY開發(fā)出第一個基于晶體管的收音機(jī)。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1958:TI的Kilby發(fā)明了第一塊集成電路(IC)。獲2000年諾貝爾物理學(xué)獎。Noyce將其完善實(shí)用化。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1965:Moore對IC發(fā)展作出預(yù)言:Moore定律GordonMooreIntelCo-FounderandChairmainEmeritusImagesource:IntelCorporation

歷史證明一直正確。但是,會繼續(xù)持續(xù)下去嗎?物理限制經(jīng)濟(jì)限制晶體管密度每18-24個月翻一番。性能每18-24個月翻一番。舉個例子:光刻過程:因此:產(chǎn)生光刻畸變,需要矯正(OPC)集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1968:Noyce和Moore建立了Intel。1970:Intel開發(fā)出1KDRAM。1971:Intel研發(fā)出4位的4004微處理器(2250個晶體管)。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1976/81:APPLEII/IBMPC。1984:Xilinx發(fā)明了FPGA。1985:Intel開始集中研發(fā)微處理器產(chǎn)品。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1987:TSMC建立.全球最大的專業(yè)芯片制造服務(wù)公司。1991:ARM開發(fā)出其第一個可嵌入的RISCIP核(無芯片IC設(shè)計)。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1996:三星開發(fā)出1GDRAM。1998:IBM研發(fā)出1GHz實(shí)驗(yàn)型微處理器。集成電路產(chǎn)業(yè)發(fā)展的里程碑(續(xù)):1999/較早:系統(tǒng)芯片(System-on-Chip,SOC)應(yīng)用。2002/較早:系統(tǒng)封裝(System-in-Package,SiP)工藝。1.4MeasuringandReportingPerformance快的涵義?*Theusermaysayacomputerisfasterwhenaprogramrunsinlesstime.用戶:程序運(yùn)行時間短*thecomputercentermanagermaysayacomputerisfasterwhenitcompletesmorejobsinanhour.計算機(jī)中心經(jīng)理:在一小時內(nèi)做更多工作*Thecomputeruserisinterestedinreducingresponsetime(響應(yīng)時間)—thetimebetweenthestartandthecompletionofanevent—alsoreferredtoasexecutiontime(執(zhí)行時間).*Themanagerofadataprocessingcentermaybeinterestedinincreasingthroughput(吞吐量)—thetotalamountofworkdoneinagiventime.測量和報告計算機(jī)的性能Comparingdesignalternatives:*“XisfasterthanY”meanthattheresponsetimeisloweronXthanonY.X比Y快涵義*“XisntimesfasterthanY”mean:X比Y快n倍*Sinceexecutiontimeisthereciprocalofperformance:執(zhí)行時間是性能的倒數(shù)1.4MeasuringandReportingPerformance,cont’dEvenexecutiontimecanbedefinedindifferentways:執(zhí)行時間的不同定義*wall-clocktime,responsetime,orelapsedtime,whichisthelatencytocompleteatask,includingdiskaccesses,memoryaccesses,input/output

activities,operatingsystemoverhead.最直接的定義

*WithmultiprogrammingtheCPUworksonanotherprogramwhilewaitingforI/Oandmaynotnecessarilyminimizetheelapsedtimeofoneprogram.Henceweneedatermtotakethisactivityintoaccount.但多道程序的情況要考慮MeasuringPerformance測量性能1.4MeasuringandReportingPerformance,cont’dEvenexecutiontimecanbedefinedindifferentways:執(zhí)行時間的不同定義*CPUtime(CPU時間):meansthetimeCPUiscomputing,notincludingthetimewaitingforI/Oorrunningotherprograms.*CPUtimecanbefurtherdividedinto:進(jìn)一步分為

theCPUtimespentintheprogram,calleduserCPUtime(用戶CPU時間),theCPUtimespentintheoperatingsystemperformingtasksrequestedbytheprogram,calledsystemCPUtime(系統(tǒng)CPU時間).MeasuringPerformance測量性能1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

選擇程序來評估性能1.4MeasuringandReportingPerformance,cont’dfourlevelsofprogramslistedbelowindecreasingorderofaccuracyofprediction.四個層次的程序,按預(yù)測精確度從高到底的次序1.Realapplications

真實(shí)應(yīng)用*ExamplesarecompilersforC,text-processingsoftwarelikeWord,andotherapplicationslikePhotoshop.*Realapplicationshaveinput,output,andoptionsthatausercanselectwhenrunningtheprogram.有輸入、輸出、可選項1.4MeasuringandReportingPerformance,cont’d

2.Kernels

核心程序*extractsmall,keypiecesfromrealprogramsandusethemtoevaluateperformance.關(guān)鍵片段*Unlikerealprograms,nouserwouldrunkernelprograms,fortheyexistsolelytoevaluateperformance.無實(shí)際用途,只用于評估性能*Kernelsarebestusedtoisolateperformanceofindividualfeaturesofamachinetoexplainthereasonsfordifferencesinperformanceofrealprograms.最便于辨析出機(jī)器單個特性的性能ChoosingProgramstoEvaluatePerformance

選擇程序來評估性能3.Toybenchmarks

玩具測試基準(zhǔn)*typicallybetween10and100linesofcodeandproducearesulttheuseralreadyknows.

10-100行的代碼,運(yùn)行結(jié)果已知。*ProgramslikePuzzle,andQuicksortarepopularbecausetheyaresmall,easytotype,andrunonalmostanycomputer.

小,易于鍵入,可運(yùn)行于幾乎所有計算機(jī)。1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

選擇程序來評估性能4.Syntheticbenchmarks

合成測試基準(zhǔn)*Similarinphilosophytokernels,syntheticbenchmarkstrytomatchtheaveragefrequencyofoperationsandoperandsofalargesetofprograms.匹配程序中操作和操作數(shù)的平均頻率*Nouserrunssyntheticbenchmarks,becausetheydon’tcomputeanythingausercouldwant.1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

選擇程序來評估性能puttogethercollectionsofbenchmarkstomeasuretheperformanceofprocessorswithavarietyofapplications.是一個有各種應(yīng)用的組合Akeyadvantageofsuchsuitesisthattheweaknessofonebenchmarkislessenedbythepresenceofotherbenchmarks.優(yōu)劣互補(bǔ)Benchmarksuitsaremadeofcollectionsofprograms,someofwhichmaybekernels,butmanyofwhicharetypicallyrealprograms.有些是核心程序,但很多是真實(shí)程序BenchmarkSuites測試基準(zhǔn)程序1.4MeasuringandReportingPerformance,cont’dTheguidingprincipleofreportingperformancemeasurementsshouldbereproducibility

(可重現(xiàn)性).requiresafairlycompletedescriptionofthemachine,thecompilerflags,aswellasthepublicationofboththebaselineandoptimizedresults.要求完整的描述containstheactualperformancetimes,shownbothintabularformandasagraph.

包含實(shí)際性能,并用表或圖的形式表示ReportingPerformanceResults報告性能結(jié)果1.4MeasuringandReportingPerformance,cont’dComparingandSummarizingPerformance

比較和總結(jié)性能1.4MeasuringandReportingPerformance,cont’dbattlesarefoughtoverwhatisthefairwaytosummarizerelativeperformanceofacollectionofprograms.什么是公平的方法Forexample,twoarticlesonsummarizingperformanceinthesamejournaltookopposingpointsofview.觀點(diǎn)不同F(xiàn)igure1.5,takenfromonearticle,isanexampleoftheconfusionthatcanarise.thefollowingstatementshold:*Ais10timesfasterthanBforprogramP1.A比B快10倍*Bis10timesfasterthanAforprogramP2.B比A快10倍*Ais20timesfasterthanCforprogramP1.A比C快20倍*Cis50timesfasterthanAforprogramP2.C比A快50倍*Bis2timesfasterthanCforprogramP1.B比C快2倍*Cis5timesfasterthanBforprogramP2.C比B快5倍TherelativeperformanceofA,B,andCisunclear.結(jié)論不明1.4MeasuringandReportingPerformance,cont’dusetotalexecutiontimeofP1andP2.*Bis9.1timesfasterthanA.*Cis25timesfasterthanA.*Cis2.75timesfasterthanB.Thissummarytracksexecutiontime,ourfinalmeasureofperformance.執(zhí)行時間,最終性能度量IftheworkloadconsistedofrunningprogramsP1andP2anequalnumberoftimes,thestatementsabovewouldpredicttherelativeexecutiontimes.如果P1和P2的執(zhí)行次數(shù)相等,okTotalExecutionTime:AConsistentSummaryMeasure總體執(zhí)行時間1.4MeasuringandReportingPerformance,cont’dAnaverageoftheexecutiontimeisthearithmeticmean:平均執(zhí)行時間whereTimeiistheexecutiontimefortheithprogram.1.4MeasuringandReportingPerformance,cont’dAreprogramsP1andP2infactrunequallyintheworkload?P1和P2同等嗎?程序出現(xiàn)頻率不同時的執(zhí)行時間計算方法。Ifnot,thenoneapproachistoassignaweightingfactor

wi

toeachprogramtoindicatetherelativefrequencyoftheprograminworkload.

第一種方法:對每個程序賦予權(quán)值,指明其出現(xiàn)的相對頻率WeightedExecutionTime加權(quán)執(zhí)行時間1.4MeasuringandReportingPerformance,cont’dThisiscalledtheweightedarithmeticmean:加權(quán)算數(shù)平均值whereWeighti

isthefrequencyoftheithprogramintheworkloadandTimei

istheexecutiontimeofthatprogram.1.4MeasuringandReportingPerformance,cont’dFigure1.6showsthedatafromFigure1.5withthreedifferentweightings,eachproportionaltotheexecutiontimeofaworkloadwithagivenmix.權(quán)值設(shè)定:與執(zhí)行時間成比例1.4MeasuringandReportingPerformance,cont’dABCAsecondapproachtounequalmixtureofprogramsistonormalizeexecutiontimestoareferencemachine(參考機(jī))

andtaketheaverageofthenormalizedexecutiontimes.第二種方法:歸一化執(zhí)行時間,再取平均值performanceofnewprogramscanbepredictedbysimplymultiplyingthisnumbertimesitsperformanceonthereferencemachine.實(shí)際性能=歸一化數(shù)×參考機(jī)性能NormalizedExecutionTimeandtheProsandConsofGeometricMeans歸一化執(zhí)行時間,以及幾何平均值的優(yōu)劣1.4MeasuringandReportingPerformance,cont’dAveragenormalizedexecutiontimecanbeexpressedaseitheranarithmeticorgeometricmean.可采用算數(shù)或幾何平均值Theformulaforthegeometricmeanis

幾何平均值的表達(dá)式whereExecutiontimeratioi

istheexecutiontime,normalizedtothereferencemachine,fortheithprogramofatotalofnintheworkload.1.4MeasuringandReportingPerformance,cont’dGeometricmeanshaveanicepropertyfortwosamplesXi

andYi:幾何平均值的好性質(zhì)幾何平均值的比率與比率的幾何平均值相同1.4MeasuringandReportingPerformance,cont’dIncontrasttoarithmeticmeans,geometricmeansofnormalizedexecutiontimesareconsistentnomatterwhichmachineisthereference.Hence,thearithmeticmeanshouldnotbeusedto.無論采用哪個機(jī)器作為參考機(jī),歸一化執(zhí)行時間的幾何平均值都是一致的。故不應(yīng)采用算數(shù)平均值。Figure1.7showssomevariationsusingbotharithmeticandgeometricmeans.ExecutiontimesfromFigure1.5normalizedtoeachmachine1.4MeasuringandReportingPerformance,cont’dThearithmeticmeanperformancevariesdependingonwhichisthereferencemachine*incolumn2,B’sexecutiontimeisfivetimeslongerthanA’s,althoughthereverseistrueincolumn4.*Incolumn3,Cisslowest,butincolumn9,Cisfastest.1.4MeasuringandReportingPerformance,cont’dThegeometricmeansareindependentofnormalization*AandBhavethesameperformance,andtheexecutiontimeofCis0.63ofAorB(1/1.58is0.63).*Unfortunately,thetotalexecutiontimeofAis10timeslongerthanthatofB,andBinturnisabout3timeslongerthanC.*Asapointofinterest,therelationshipbetweenthemeansofthesamesetofnumbersisalways:geometricmean≤arithmeticmeanadvantage:geometricmeanisindependentoftherunningtimesofindividualprograms,anditdoesn’tmatterwhichmachineisusedtonormalize.與各個程序運(yùn)行時間無關(guān),與采用哪一個機(jī)器進(jìn)行歸一化無關(guān)drawback:geometricmeansviolateourfundamentalprincipleofperformancemeasurement---donotpredictexecutiontime.違反了性能測量的基本原理,不預(yù)測時間1.4MeasuringandReportingPerformance,cont’dNormalizedExecutionTimeandtheProsandConsofGeometricMeans歸一化執(zhí)行時間,以及幾何平均值的優(yōu)劣MakeCommonCaseFast使常見情況更快Perhapsthemostimportantandpervasiveprincipleofcomputerdesignistomakethecommoncasefast.Inmakingadesigntradeoff,favorfrequentcaseoverinfrequentcase.照顧經(jīng)常發(fā)生的情況Thisprinciplealsoapplieswhendetermininghowtospendresources.

對資源使用也是這個道理1.5QuantitativePrinciplesofComputerDesign計算機(jī)設(shè)計的量化原理1.5QuantitativePrinciplesofComputerDesignAmdahl’sLaw阿姆達(dá)爾定律TheperformancegainobtainedbyimprovingsomeportionofacomputercanbecalculatedusingAmdahl’sLaw.用途Amdahl’sLawstatesthattheperformanceimprovementtobegainedfromusingsomefastermodeofexecutionislimitedbythefractionofthetimethefastermodecanbeused.阿姆達(dá)爾定律的涵義:由某些部分加速所得到的性能提高受加速部分的百分率所限。1.5QuantitativePrinciplesofComputerDesign或者Amdahl’sLawdefinesthespeedup

thatcanbegainedbyusingaparticularfeature.Speedupistheratio加速比的定義Amdahl’sLawgivesusaquickwaytofindthespeedupfromsomeenhancement,Speedupoverall,whichdependsontwofactors:加速比取決于兩個因素1.Thefractionofthecomputationtimeintheoriginalmachinethatcanbeconvertedtotakeadvantageoftheenhancement.

能加速的部分Fractionenhanced12.Theimprovementgainedbytheenhancedexecutionmode.

能加速的程度Speedupenhanced11.5QuantitativePrinciplesofComputerDesign新的執(zhí)行時間Theoverallspeedupistheratiooftheexecutiontimes:總體加速比1.5QuantitativePrinciplesofComputerDesignEXAMPLE:Supposethatweareconsideringanenhancementthatruns10timesfasterthantheoriginalmachine,butisonlyusable40%ofthetime.Whatistheoverallspeedupgainedbyincorporatingtheenhancement?例子1.5QuantitativePrinciplesofComputerDesignAmdahl’sLawexpressesthelawofdiminishingreturns(回報遞減法則):Theincrementalimprovementinspeedupgainedbyanadditionalimprovementinjustaportionofthecomputationdiminishesasimprovementsareadded.對于一部分性能的提高,總體加速比的提高呈遞減AnimportantcorollaryofAmdahl’sLawisthatifanenhancementisonlyusableforafractionofatask,wecan’tspeedupthetaskbymorethanthereciprocalof1minusthatfraction.總體加速比有上界1.5QuantitativePrinciplesofComputerDesignEXAMPLE:Implementationsoffloating-pointsquareroot(FPSQR)

varysignificantlyinperformance.SupposeFPSQRisresponsiblefor20%oftheexecutiontimeofacriticalbenchmark.OneproposalistoaddFPSQRhardwarethatwillspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsrunfaster;FPinstructionsareresponsibleforatotalof50%oftheexecutiontime.ThedesignteambelievesthattheycanmakeallFPinstructionsruntwotimesfasterwiththesameeffortasrequiredforthefastsquareroot.Comparethesetwodesignalternatives.ANSWER:comparingthespeedups:2.00.751.33ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.1.5QuantitativePrinciplesofComputerDesignTheCPUPerformanceEquationCPU性能方程Essentiallyallcomputersareconstructedusingaclockrunningataconstantrate.Thesediscretetimeeventsarecalledticks,clockticks,clockperiods,clocks,cycles,orclockcycles.時鐘Computerdesignersrefertothetimeofaclockperiodbyitsduration(e.g.,1ns)orbyitsrate(e.g.,1GHz).CPUtimeforaprogramcanthenbeexpressedintwoways:程序的CPU時間1.5QuantitativePrinciplesofComputerDesignwecanalsocountthenumberofinstructionsexecuted---theinstructionpathlength

orinstructioncount

(IC).指令數(shù)

Ifweknowthenumberofclockcyclesandtheinstructioncountwecancalculatetheaveragenumberofclockcyclesperinstruction(CPI).

每條指令的平均時鐘數(shù)1.5QuantitativePrinciplesofComputerDesignThisallowsustouseCPIintheexecutiontimeformula:執(zhí)行時間的公式Expandingthefirstformulaas:1.5QuantitativePrinciplesofComputerDesignorSo,CPUperformanceisdependentupon:clockcycle(orrate),CPI,andIC.Butitisdifficulttochangeoneparameterinisolationfromothersbecausethebasictechnologiesinvolvedareinterdependent:很難改變一個參數(shù)而不影響其它參數(shù)*Clockcycletime

--Hardwaretechnologyandorganization*CPI--OrganizationandISA*Instructioncount--ISAandcompilertechnologyLuckily,manyimprovementtechniquesprimarilyimproveonecomponentwithsmallorpredictableimpactsontheothertwo.幸好,很多技術(shù)在改進(jìn)一個部分時,對于其他部分影響很小或影響可預(yù)測1.5QuantitativePrinciplesofComputerDesignSometimesitisusefulindesigningtheCPUtouse:另一種計算公式whereICi

representsnumberoftimesinstructioniisexecutedinaprogramandCPIi

representstheaveragenumberofclockcyclesforinstructioni.ThisformcanbeusedtoexpressCPUtimeas:1.5QuantitativePrinciplesofComputerDesignandCPIas:EXAMPLE:

例子Supposewehavethefollowingmeasurements:*FrequencyofFPoperations=25%*AverageCPIofFPoperations=4.0*AverageCPIofotherinstructions=1.33*FrequencyofFPSQR=2%*CPIofFPSQR=20

測量結(jié)果Assumethatthetwodesignalternativesareto

reducetheCPIofFPSQRto2ortoreducetheaverageCPIofallFPoperationsto2.ComparethesetwodesignalternativesusingtheCPUperformanceequation.設(shè)計選擇1.5QuantitativePrinciplesofComputerDesignANSWER:答案First,observethatonlytheCPIchanges;theclockrateandinstructioncountremainidentical.只有CPI變化了WecancomputetheCPIfortheenhancedFPSQRby:增強(qiáng)FPSQR的CPI1.5QuantitativePrinciplesofComputerDesignWecomputetheCPIfortheenhancementofallFPinstructions:增強(qiáng)FP指令的CPITheCPIofoverallFPenhancementislower,itsperformancewillbetter.改進(jìn)FP的CPI更好Specifically,thespeedupfortheoverallFPenhancementis:2.01.5

1.5

1.33

1.5QuantitativePrinciplesofComputerDesignMeasuringtheComponentsofCPUPerformance

測量CPU性能的各組成部分TousetheCPUperformanceequation,weneedmeasurementsoftheindividualcomponents.需要測量性能非常的各組成部分Todeterminetheclockcycle:時鐘周期*iseasyforanexistingCPU.現(xiàn)有CPU:容易*Low-leveltools,calledtimingestimatorsortimingverifiers,areusedforacompleteddesign.

已完成的設(shè)計:用時延估計器或時延驗(yàn)證器*Foradesignthatisnotcompleted,byexaminingcriticalpaths.未完成的設(shè)計:考察關(guān)鍵路徑1.5QuantitativePrinciplesofComputerDesignMeasuringtheinstructioncount:

指令數(shù)測量*compilertogetherwithtoolsthatmeasuretheinstructionsetbehavior.編譯器及測量指令集行為的工具*Foracompiledversionofaprogram,therearetwomajormethodstoobtainIC.如何獲得ICfirstway:byinstructionsetsimulatorthatinterpretstheinstructions—slowbutcanmeasurealmostanyaspectofinstructionsetbehavioraccurately.指令集模擬器:慢,但能精確地測量指令集行為的幾乎所有方面secondway:usesexecution-basedmonitoring.thebinaryprogramismodifiedtoincludeinstrumentationcode

—veryfast,sinceprogramisexecuted,ratherthaninterpreted用基于執(zhí)行的監(jiān)視:修改程序(插樁代碼),快。1.5QuantitativePrinciplesofComputerDesignMeasuringtheCPI:difficult測量CPI困難*Forsimpleprocessors,CPIfromatable.查表*Formodernprocessorsusetechniquessuchaspipeliningandmemoryhierarchies:對于帶流水線和存儲層次的現(xiàn)代處理器DesignersoftenuseaverageCPIvalues,buttheseaverageCPIsarecomputedbymeasuringtheeffectsofthepipelineandcachestructure.通常使用平均CPI,需考慮流水線和cache結(jié)構(gòu)itisoftenusefultoseparatethecomponentarisingfromthememorysystemandthecomponentdeterminedbythepipeline.流水線和存儲系統(tǒng)分別考慮Thus,wecancomputetheCPIforinstructioni,as:

CPIi=PipelineCPIi+MemorysystemCPIi1.5QuantitativePrinciplesofComputerDesignUsingtheCPUPerformanceEquations:MoreExamples運(yùn)用CPU性能方程:更多例子EXAMPLE:例子weareconsideringtwoalternativesforourconditionalbranchinstructions(條件轉(zhuǎn)移指令),as:條件轉(zhuǎn)移指令有兩種設(shè)計選擇

*CPUA:Aconditioncodeissetbyacompareinstructionandfollowedbyabranchthatteststheconditioncode.先用比較指令置條件碼,然后轉(zhuǎn)移指令檢測條件碼*CPUB:Acompareisincludedinthebranch.

轉(zhuǎn)移指令中包含了比較操作1.5QuantitativePrinciplesofComputerDesignOnbothCPUs,conditionalbranchinstructiontakes2cycles,andallotherinstructionstake1clockcycle.條件轉(zhuǎn)移指令2周期,其他指令1周期

OnCPUA,20%ofallinstructionsexecutedareconditionalbranches.Sinceeverybranchneedsacompare,another20%oftheinstructionsarecompares.CPUA:有20%條件轉(zhuǎn)移指令,相應(yīng)也就有20%的比較指令BecauseCPUAdoesnothavethecompareincludedinthebranch,assumethatitsclockcycletimeis1.25timesfasterthanthatofCPUB.

CPUA的時鐘比CPUB的快1.25倍WhichCPUisfaster?哪一個更快?WhatifCPUAwasonly1.1timesfaster?

1.5QuantitativePrinciplesofComputerDesignANSWER:答案wecanuseCPUperformanceformula:

CPIA=0.202+0.801=1.2CPUtimeA=ICA1.2ClockcycletimeAClockcycletimeB=1.25ClockcycletimeAComparesarenotexecutedinCPUB,so20%/80%=25%instructionsarebranches:

CPIB=0.252+0.751=1.25Because,ICB=0.8ICA.so:

CPUtimeB=ICB1.25ClockcycletimeB

=0.8ICA1.25(1.25ClockcycletimeA)=1.25ICAClockcycletimeA

>CPUtimeA

所以此時A快1.5QuantitativePrinciplesofComputerDesignIfCPUAwereonly1.1timesfaster,thenClockcycletimesis1.10ClockcycletimeAandtheperformanceofCPUBis:如果CPUA只比CPUB快1.1倍

CPUtimeB=ICBCPIBClockcycletimeB

=0.8ICA1.25(1.10ClockcycletimeA)=1.10ICAClockcycletimeA<CPUtimeA

所以此時B快本質(zhì)上是時鐘周期和指令數(shù)量之間的權(quán)衡。1

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論