計(jì)算機(jī)結(jié)構(gòu)英文版課件:Chapter 4 Cache Memory_第1頁(yè)
計(jì)算機(jī)結(jié)構(gòu)英文版課件:Chapter 4 Cache Memory_第2頁(yè)
計(jì)算機(jī)結(jié)構(gòu)英文版課件:Chapter 4 Cache Memory_第3頁(yè)
計(jì)算機(jī)結(jié)構(gòu)英文版課件:Chapter 4 Cache Memory_第4頁(yè)
計(jì)算機(jī)結(jié)構(gòu)英文版課件:Chapter 4 Cache Memory_第5頁(yè)
已閱讀5頁(yè),還剩60頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、William Stallings Computer Organization and Architecture7th EditionChapter 4Cache Memory 高速緩沖存儲(chǔ)器1CHAPTER 4 CACHE MEMORY Key Terms (1)Cache; CACHE MEMORY 高速緩沖存儲(chǔ)器Access time; Cache hit; cache line;cache miss;cache setData cache;direct access;Hit ratio;instruction cache;L1 cache;L2 cache;L3 cache;local

2、ity;memory hierarchyLine;slot;2CHAPTER 4 CACHE MEMORY Key Terms (2)Multilevel cache;random access;sequential access;mapping; tag 標(biāo)簽;set-associative associative mappingspatial locality;temporal locality;split cache;unified cache Write back;write once;write through3CharacteristicsLocationCapacityUnit

3、of transferAccess methodPerformancePhysical typePhysical characteristicsOrganisation4LocationCPUInternalExternal5CapacityWord sizeThe natural unit of organisationNumber of wordsor Bytes6Unit of TransferInternalUsually governed by data bus widthExternalUsually a block which is much larger than a word

4、Addressable unitSmallest location which can be uniquely addressedWord internallyCluster on M$ disks7Access Methods (1)SequentialStart at the beginning and read through in orderAccess time depends on location of data and previous locatione.g. tapeDirectIndividual blocks have unique addressAccess is b

5、y jumping to vicinity plus sequential searchAccess time depends on location and previous locatione.g. disk Vicinity visinitin. 附近,鄰近8Access Methods (2)RandomIndividual addresses identify locations exactlyAccess time is independent of location or previous accesse.g. RAMAssociativeData is located by a

6、 comparison with contents of a portion of the storeAccess time is independent of location or previous accesse.g. cacheAssociative: 關(guān)聯(lián)的9Memory HierarchyRegistersIn CPUInternal or Main memoryMay include one or more levels of cache“RAM”External memoryBacking store (存儲(chǔ)器回填)10Memory Hierarchy - Diagram11P

7、erformanceAccess timeTime between presenting the address and getting the valid dataMemory Cycle timeTime may be required for the memory to “recover” before next accessCycle time is access + recoveryTransfer RateRate at which data can be moved12Physical TypesSemiconductorRAMMagneticDisk & TapeOptical

8、CD & DVDOthersBubble (膜泡 壓鑄及機(jī)械詞匯 )Hologram (【物】全息圖 )13Physical CharacteristicsDecay (衰減)Volatility (易變性)Erasable (可抹除的)Power consumption (消耗;用盡)14OrganisationPhysical arrangement of bits into wordsNot always obviouse.g. interleaved (交叉,交錯(cuò))15The Bottom LineHow much?CapacityHow fast?Time is moneyHow e

9、xpensive?16Hierarchy ListRegistersL1 CacheL2 CacheMain memoryDisk cacheDiskOpticalTape cache vt.貯藏 cache n.【電腦】快速緩沖貯存區(qū) 17So you want fast?It is possible to build a computer which uses only static RAM (see later)This would be very fastThis would need no cacheHow can you cache cache?This would cost a

10、very large amount18Locality of ReferenceDuring the course of the execution of a program, memory references tend to clustere.g. loopse.g. 【拉】例如Cluster 【電腦】群集器,簇19CacheSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module20Cache and Main Memory21Cache/M

11、ain Memory Structure22Cache operation overviewCPU requests contents of memory locationCheck cache for this dataIf present, get from cache (fast)If not present, read required block from main memory to cacheThen deliver from cache to CPUCache includes tags to identify which block of main memory is in

12、each cache slot23Cache Read Operation - Flowchart24Cache DesignSizeMapping FunctionReplacement AlgorithmWrite PolicyBlock SizeNumber of Caches25Size does matterCostMore cache is expensiveSpeedMore cache is faster (up to a point)Checking cache for data takes time26Typical Cache Organization27Comparis

13、on of Cache SizesProcessorTypeYear of IntroductionL1 cacheL2 cacheL3 cacheIBM 360/85Mainframe196816 to 32 KBPDP-11/70Minicomputer19751 KBVAX 11/780Minicomputer197816 KBIBM 3033Mainframe197864 KBIBM 3090Mainframe1985128 to 256 KBIntel 80486PC19898 KBPentiumPC19938 KB/8 KB256 to 512 KBPowerPC 601PC199

14、332 KBPowerPC 620PC199632 KB/32 KBPowerPC G4PC/server199932 KB/32 KB256 KB to 1 MB2 MBIBM S/390 G4Mainframe199732 KB256 KB2 MBIBM S/390 G6Mainframe1999256 KB8 MBPentium 4PC/server20008 KB/8 KB256 KBIBM SPHigh-end server/ supercomputer200064 KB/32 KB8 MBCRAY MTAbSupercomputer20008 KB2 MBItaniumPC/ser

15、ver200116 KB/16 KB96 KB4 MBSGI Origin 2001High-end server200132 KB/32 KB4 MBItanium 2PC/server200232 KB256 KB6 MBIBM POWER5High-end server200364 KB1.9 MB36 MBCRAY XD-1Supercomputer200464 KB/64 KB1MB28Mapping FunctionCache of 64kByteCache block of 4 bytesi.e. cache is 16k (214) lines of 4 bytes16MByt

16、es main memory24 bit address (224=16M)29Direct MappingEach block of main memory maps to only one cache linei.e. if a block is in cache, it must be in one specific placeAddress is in two partsLeast Significant w bits identify unique wordMost Significant s bits specify one memory blockThe MSBs are spl

17、it into a cache line field r and a tag of s-r (most significant)30Direct MappingAddress StructureTag s-rLine or Slot rWord w814224 bit address2 bit word identifier (4 byte block)22 bit block identifier8 bit tag (=22-14)14 bit slot or lineNo two blocks in the same line have the same Tag fieldCheck co

18、ntents of cache by finding line and checking Tag31Direct Mapping Cache Line TableCache lineMain Memory blocks held00, m, 2m, 3m2s-m11,m+1, 2m+12s-m+1m-1m-1, 2m-1,3m-12s-1m=10 Cache line main memory blocks held 0 0,10,20,30. 1,11,21,319 9,19,29,3932Direct Mapping Cache Organization33Direct Mapping Ex

19、ample34Direct Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2 s+ w /2w = 2sNumber of lines in cache = m = 2rSize of tag = (s r) bitsExercise:A direct cache consists of 64 lines.

20、 Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.35Direct Mapping pros & consSimpleInexpensiveFixed location for given blockIf a program accesses 2 blocks that map to the same line repeatedly, cache misses are very highpros & cons (贊成和反對(duì)的理由,表示兩個(gè)方面)miss vt.

21、& vi. 未擊中; 未抓住; 未達(dá)到; 36Associative MappingA main memory block can load into any line of cacheMemory address is interpreted as tag and wordTag uniquely identifies block of memoryEvery lines tag is examined for a matchCache searching gets expensive37Fully Associative Cache Organization38Associative Ma

22、pping Example39Tag 22 bitWord2 bitAssociative MappingAddress Structure22 bit tag stored with each 32 bit block of dataCompare tag field with tag entry in cache to check for hitLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data blocke.g.AddressTagDataCache lin

23、eFFFFFCFFFFFC246824683FFF40Associative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2 s+w /2w = 2sNumber of lines in cache = undeterminedSize of tag = s bitsExerciseA associati

24、ve cache consists of 64 lines. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.41Set Associative MappingCache is divided into a number of setsEach set contains a number of linesA given block maps to any line in a given sete.g. Block B can be in any line of

25、set ie.g. 2 lines per set2 way associative mappingA given block can be in one of 2 lines in only one set42Set Associative MappingExample13 bit set numberBlock number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 map to same set43Two Way Set Associative Cache Organization44Set Associati

26、ve MappingAddress StructureUse set field to determine cache set to look inCompare tag field to see if we have a hite.gAddressTagDataSet number1FF 7FFC1FF123456781FFF001 7FFC001112233441FFFTag 9 bitSet 13 bitWord2 bit45Two Way Set Associative Mapping Example46Set Associative Mapping SummaryAddress le

27、ngth = (s + w) bitsNumber of addressable units = 2 s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2dNumber of lines in set = kNumber of sets = v = 2dNumber of lines in cache = kv = k * 2dSize of tag = (s d) bitsA set associative cache consists of 64 lin

28、es, or slots, divided into two-line sets. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.47Replacement Algorithms (1)Direct mappingNo choiceEach block only maps to one lineReplace that line48Replacement Algorithms (2)Associative & Set AssociativeHardware i

29、mplemented algorithm (speed)Least Recently used (LRU)e.g. in 2 way set associativeWhich of the 2 block is lru?First in first out (FIFO)replace block that has been in cache longestLeast frequently usedreplace block which has had fewest hitsRandom49Write PolicyMust not overwrite a cache block unless m

30、ain memory is up to date (最新式的)Multiple CPUs may have individual cachesI/O may address main memory directly50Write throughAll writes go to main memory as well as cacheMultiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to dateLots of trafficSlows down writesRemember bogus w

31、rite through caches!51Write backUpdates initially made in cache onlyUpdate bit for cache slot is set when update occursIf block is to be replaced, write to main memory only if update bit is setOther caches get out of syncI/O must access main memory through cacheN.B. 15% of memory references are writ

32、es52Pentium 4 Cache80386 no on chip cache80486 8k using 16 byte lines and four way set associative organizationPentium (all versions) two on chip L1 cachesData & instructionsPentium III L3 cache added off chipPentium 4L1 caches8k bytes64 byte linesfour way set associativeL2 cache Feeding both L1 cac

33、hes256k128 byte lines8 way set associativeL3 cache on chip53Intel Cache EvolutionProblemSolutionProcessor on which feature first appearsExternal memory slower than the system bus.Add external cache using faster memory technology.386Increased processor speed results in external bus becoming a bottlen

34、eck for cache access.Move external cache on-chip, operating at the same speed as the processor.486Internal cache is rather small, due to limited space on chipAdd external L2 cache using faster technology than main memory486Contention occurs when both the Instruction Prefetcher and the Execution Unit

35、 simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Units data access takes place.Create separate data and instruction caches.PentiumIncreased processor speed results in external bus becoming a bottleneck for L2 cache access.Create separate back-s

36、ide bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.Pentium ProMove L2 cache on to the processor chip.Pentium IISome applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too sm

37、all.Add external L3 cache.Pentium IIIMove L3 cache on-chip.Pentium 454Pentium 4 Block Diagram55Pentium 4 Core ProcessorFetch/Decode UnitFetches instructions from L2 cacheDecode into micro-opsStore micro-ops in L1 cacheOut of order execution logicSchedules micro-opsBased on data dependence and resour

38、cesMay speculatively executeExecution unitsExecute micro-opsData from L1 cacheResults in registersMemory subsystemL2 cache and systems bus56Pentium 4 Design ReasoningDecodes instructions into RISC like micro-ops before L1 cacheMicro-ops fixed lengthSuperscalar pipelining and schedulingPentium instru

39、ctions long & complexPerformance improved by separating decoding from scheduling & pipelining(More later ch14)Data cache is write backCan be configured to write throughL1 cache controlled by 2 bits in registerCD = cache disableNW = not write through2 instructions to invalidate (flush) cache and writ

40、e back then invalidateL2 and L3 8-way set-associative Line size 128 bytes57PowerPC Cache Organization601 single 32kb 8 way set associative603 16kb (2 x 8kb) two way set associative604 32kb620 64kbG3 & G464kb L1 cache8 way set associative256k, 512k or 1M L2 cachetwo way set associativeG532kB instruction cache64kB data cache58PowerPC G5 Block Diagram59Internet SourcesManufacturer

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論