執(zhí)行時(shí)間(latency等待時(shí)間)ppt課件_第1頁(yè)
執(zhí)行時(shí)間(latency等待時(shí)間)ppt課件_第2頁(yè)
執(zhí)行時(shí)間(latency等待時(shí)間)ppt課件_第3頁(yè)
執(zhí)行時(shí)間(latency等待時(shí)間)ppt課件_第4頁(yè)
執(zhí)行時(shí)間(latency等待時(shí)間)ppt課件_第5頁(yè)
已閱讀5頁(yè),還剩25頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Performance Measurement 1Performance Execution time 執(zhí)行時(shí)間latency等待時(shí)間: Time between the start and the completion of an event一個(gè)事件從開(kāi)場(chǎng)到終了所經(jīng)過(guò)的時(shí)間 Performance 1/(Execution time)性能與執(zhí)行時(shí)間成反比 Throughput吞吐量 (bandwidth帶寬):Total amount of work done in a given time給定時(shí)間內(nèi)完成的全部任務(wù)Performance Measurement 1Machine X is n

2、% faster than Machine Y:機(jī)器X比機(jī)器Y快 n% Performance Measurement 2Example: Machine A runs a program in 10 seconds,Machine B runs the same program in 15 seconds,A is _% faster than B.Make the Common Case FastPerhaps the most important and pervasive principle of computer design is to make the common case f

3、ast: In making a design trade-off, favor the frequent case over the infrequent case.計(jì)算機(jī)設(shè)計(jì)的最重要的原那么就是:加快經(jīng)常性發(fā)惹事件的執(zhí)行速度。Make the Common Case FastImproving the frequent event, rather than the rare event, will obviously help performance.Overflow case and no overflow case in addition 提高頻繁事件的執(zhí)行速度,而不是提高稀有事件的執(zhí)

4、行速度,將帶來(lái)明顯的性能上的提高例如加法運(yùn)算中的溢出和非溢出情況Amdahls Law 1Amdahls Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.阿姆達(dá)定律闡明:經(jīng)過(guò)改良某方式得到的整體性能提高,受限于該改良方式所占的運(yùn)轉(zhuǎn)時(shí)間比例。Amdahls Law 2Speedup 加速比= Performance f

5、or entire task using the enhancement when possible改良后完成整個(gè)義務(wù)的性能Performance for entire task w/o using the enhancement 改良前完成整個(gè)義務(wù)的性能=Execution time for entire task w/o using the enhancement改良前完成整個(gè)義務(wù)的時(shí)間Execution time for entire task using the enhancement when possible 改良前完成整個(gè)義務(wù)的時(shí)間Amdahls Law 3Execution t

6、imenew= Execution timeold xwhere fE: fraction of enhancement sE: improvement gained by the enhancement mode 即:新的執(zhí)行時(shí)間= 原來(lái)執(zhí)行時(shí)間xAmdahls Law 3 Speedup = 即:加速比原來(lái)的執(zhí)行時(shí)間/新的執(zhí)行時(shí)間 1 Amdahls Law 4Example: An enhancement run 10 times faster than the original machine, but it is usable 40% of the time, then the sp

7、eedup = _. Sol:fE = 0.4sE = 10 Speedup= 1/(1-0.4) + 0.4/10)= 1.56Amdahls Law can also be applied to compare two CPU design alternatives, for example :Implementations of floating-point(FP)square root vary significantly in performance, especially among processors designed for graphics. Suppose FP squa

8、re root(FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor

9、 of 1.6; FP instructions are responseible for a total of 50% of the execution time for the application. Compare these two design alternatives.Amdahls Law can also be applied to compare two CPU design alternatives, for example :Implementations of floating-point(FP) square root vary significantly in p

10、erformance, especially among processors designed for graphics. Amdahls Law也可以用于比較兩種設(shè)計(jì)不同的CPU,特別是對(duì)于處置圖形的處置器來(lái)說(shuō),求浮點(diǎn)數(shù)平方根的不同實(shí)現(xiàn)方法在性能上有很大差別。Amdahls Law can also be applied to compare two CPU design alternatives, for example :Implementations of floating-point(FP)square root vary significantly in performance,

11、 especially among processors designed for graphics. Suppose FP square root(FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make a

12、ll FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responseible for a total of 50% of the execution time for the application. Compare these two design alternatives.Suppose FP square root(FPSQR) is responsible for 20% of the execution time of a critical gr

13、aphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responseible for a total of 50% of the executio

14、n time for the application. Compare these two design alternatives.例如,求浮點(diǎn)數(shù)平方根的操作,在一個(gè)規(guī)范測(cè)試程序中占總執(zhí)行時(shí)間的20%。一種方法是改良FPSQR硬件,將它的操作速度提10倍。另一種方法是將一切圖形處置器中的FP指令的執(zhí)行速度都提高1.6倍,這些FP指令在總的執(zhí)行時(shí)間中占50% 比較這兩種設(shè)計(jì)方法。Answer: we can compare these two alternatives by comparing the speedups: Improving the performance of the FP o

15、perations overall is slightly better because of the higher frequency.Answer: we can compare these two alternatives by comparing the speedups: 可以經(jīng)過(guò)計(jì)算加速比來(lái)進(jìn)展比較 Improving the performance of the FP operations overall is slightly better because of the higher frequency. 可見(jiàn)提高一切FP操作的性能的方案要好, 這是由于它們的執(zhí)行頻率較高Amd

16、ahls Law 6Extreme Cases 極限情況fE = 0 Speedup = 1fE = 1 Speedup = sE fE 加強(qiáng)比例 sE 加強(qiáng)加速比CPU Performance 1Most computers are constructed using a clock running at a constant rate多數(shù)計(jì)算機(jī)的運(yùn)轉(zhuǎn)都基于一個(gè)固定頻率的時(shí)鐘信號(hào)Referred to by length/time, e.g., 10 ns, or rate, e.g., 100 MHzms = 103 sec, s = 106 sec, ns = 109 secHz = 1

17、/sec, KHz = 103 Hz, MHz = 106 Hz,GHz = 109 Hz Clock cycle time = 1/ clock rateCPU Performance 2CPI ( clock cycle per instruction 每條指令時(shí)鐘周期數(shù)) (程序CPU時(shí)鐘周期數(shù)) (程序指令數(shù))CPU time for a program = CPU clock cycles for a program x clock cycle time (執(zhí)行程序破費(fèi)的CPU時(shí)鐘周期數(shù)) (時(shí)鐘周期時(shí)間)CPU Performance 3CPI x Instruction Coun

18、t x 1/(clock rate) = CPU timeBUT, not every instruction takes the same number of clock cycles to execute. Take the average.執(zhí)行指令破費(fèi)的時(shí)鐘周期數(shù)并不一樣,這里取平均值CPU Performance 4CPIn: number of different instructions in a programCPIi: CPI of instruction ifi: frequency of instruction i in a program n即 (第i條指令的CPI 該指

19、令在全部指令中占的比例) i=1CPU Performance 5Example:Operations frequency clock cycleADD60%1LOAD40%2CPIoverall = _1.4CPU Performance 6Example:A given program consists of a 100-instruction loop that is executed 42 times. If it takes 16000 cycles to execute the program on a given system, what are that systems CPI

20、 for the program?一個(gè)程序由一個(gè)循環(huán)組成,循環(huán)內(nèi)100條指令,循環(huán)執(zhí)行42次,在某個(gè)特定的系統(tǒng)執(zhí)行這個(gè)程序破費(fèi)16000周期,那么這個(gè)系統(tǒng)執(zhí)行這個(gè)程序的CPI是多少?The total number of instructions executed is: 10042=4200. So the CPI is: 160004200=3.81.Improve CPU Performance 1How do we improve CPU performance 那么我們?cè)鯓犹岣逤PU性能呢? i.e., reduce CPU time?Again, CPU time = CPI x

21、Instruction Count x 1/(clock rate)So, we want to_ CPI_ Instruction Count_ clock rate_ clock cycle time我們可以減少CPI、IC、 clock cycle time或添加clock rateImprove CPU Performance 2Clock rate 添加時(shí)鐘頻率的方法HardWare technology 硬件技術(shù)Organization 組織構(gòu)造CPI 減少CPI的方法Organization Instruction set architecture 指令集Instruction

22、Count 減少I(mǎi)C的方法Instruction set architectureCompiler technology 編譯技術(shù)MIPS 1MIPS: Million Instruction Per Second每秒百萬(wàn)指令MIPS 指令數(shù) 執(zhí)行時(shí)間MIPS 2Given MIPS, MIPS Execution time Performance 知MIPS:那么 : 執(zhí)行時(shí)間指令數(shù)/ (MIPS106 )因此,假設(shè)MIPS添加,那么執(zhí)行時(shí)間減少,性能加強(qiáng)MIPS 3Advantage:Easy to understand (especially by customers)容易了解Disad

23、vantages Difficult to compare MIPS of computers with different instruction setsMIPS依賴(lài)于指令集,不同指令集的計(jì)算機(jī)不能比較MIPSMIPS varies between programs on the same computer 同一計(jì)算機(jī)上的MIPS能夠因程序而異MIPS can vary inversely to performance ( e.g. floating-point instruction executed by hardware or software ) MIPS能夠與性能相反MIPS 4When running a particular program, computer A achieves 100 MIPS and computer B achieves 75 MIPS. However, computer A takes 60s to execute the program, while computer B takes only 45s. How is this possible?執(zhí)行一個(gè)詳細(xì)的程序時(shí),計(jì)算機(jī)A的MIPS為100而計(jì)算機(jī)B的MIPS為75。然而執(zhí)行這個(gè)程序計(jì)算機(jī)A破費(fèi)60s,而計(jì)算機(jī)B破費(fèi)45

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論