C++內(nèi)存算法.doc

上傳人：工*** IP屬地：浙江上傳時間：2020-02-06 格式：DOC 頁數(shù)：338 大?。?.65MB 積分：30 舉報 版權(quán)申訴

已閱讀5頁，還剩333頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

Chapter 1 Memory Management Mechanisms11.1 Overview11.2 Mechanism Versus Policy11.3 Memory Hierarchy31.4 Address Lines and Buses61.5 Intel Pentium Architecture81.5.1 Real Mode Operation111.5.2 Protected Mode Operation151.5.2.1 Protected Mode Paging211.5.2.2 Paging as Protection251.5.2.3 Addresses: Logical, Linear, and Physical261.5.2.4 Page Frames and Pages271.6 Closing Thoughts331.7 References34Chapter 2: Memory Management Policies352.1 Overview352.2 Case Study: MS-DOS352.2.1 DOS Segmentation and Paging362.2.2 DOS Memory Map362.2.3 Memory Usage382.2.4 Example: A Simple Video Driver392.2.5 Example: Usurping DOS412.2.6 Jumping the 640KB Hurdle452.3 Case Study: MMURTL482.3.1 Background and Design Goals482.3.2 MMURTL and Segmentation492.3.3 Paging Variations512.3.4 MMURTL and Paging512.3.5 Memory Allocation532.4 Case Study: Linux552.4.1 History and MINIX552.4.2 Design Goals and Features552.4.3 Linux and Segmentation562.4.4 Linux and Paging582.4.5 Page Fault Handling622.4.6 Memory Allocation622.4.7 Memory Usage662.4.8 Example: Siege Warfare672.4.9 Example: Siege Warfare, More Treachery712.5 Case Study: Windows762.5.1 Historical Forces762.5.2 Memory Map Overview792.5.3 Windows and Segmentation812.5.3.1 Special Weapons and Tactics822.5.3.2 Crashing Windows with a Keystroke832.5.3.3 Reverse Engineering the GDT842.5.4 Windows and Paging862.5.4.1 Linear Address Space Taxonomy862.5.4.2 Musical Chairs for Pages882.5.4.3 Memory Protection892.5.4.4 Demand Paging902.5.5 Memory Allocation912.5.6 Memory Usage942.5.7 Turning Off Paging972.5.8 Example: Things That Go Thunk in the Night982.6 Closing Thoughts1012.7 References1022.7.1 Books and Articles1022.7.2 Web Sites103Chapter 3: High-Level Services1053.1 View from 10,000 Feet1053.2 Compiler-Based Allocation1063.2.1 Data Section1093.2.2 Code Section1123.2.3 Stack1133.2.3.1 Activation Records1153.2.3.2 Scope1213.2.3.3 Static or Dynamic?1273.3 Heap Allocation1273.3.1 System Call Interface1283.3.2 The Heap1313.3.2.1 Manual Memory Management1323.3.2.2 Example: C Standard Library Calls1333.3.2.3 Automatic Memory Management1353.3.2.4 Example: The BDW Conservative Garbage Collector1363.3.2.5 Manual Versus Automatic?1383.4 The Evolution of Languages1423.4.1 Language Features1633.4.2 Virtual Machine Architecture1653.4.3 Java Memory Management1663.5 Memory Management: The Three-layer Cake1723.6 References173Chapter 4: Manual Memory Management1754.0 Replacements for malloc() and free()1754.1 System Call Interface and Porting Issues1764.2 Keep It Simple.Stupid!1784.3 Measuring Performance1784.3.1 The Ultimate Measure: Time1794.3.2 ANSI and Native Time Routines1804.3.3 Testing Methodology1854.4 Indexing: The General Approach1894.5 malloc() Version 1: Bitmapped Allocation1904.5.1 Theory1904.5.2 Implementation1914.5.2.1 tree.cpp1924.5.2.2 bitmap.cpp1984.5.2.3 memmgr.cpp2024.5.2.4 mallocV1.cpp2064.5.2.5 perform.cpp2084.5.2.6 driver.cpp2084.5.2.7 Tests2094.5.3 Trade-Offs2154.6 malloc() Version 2: Sequential Fit2154.6.1 Theory2164.6.2 Implementation2184.6.2.1 memmgr.cpp2184.6.2.2 mallocV2.cpp2284.6.2.3 driver.cpp2294.6.3 Tests2304.6.4 Trade-Offs2324.7 malloc() Version 3: Segregated Lists2334.7.1 Theory2334.7.2 Implementation2344.7.2.2 memmgr.cpp2344.7.2.3 mallocV3.cpp2434.7.3 Tests2444.7.4 Trade-Offs2484.8 Performance Comparison248Chapter 5: Automatic Memory Management2505.1 Garbage Collection Taxonomy2505.2 malloc() Version 4: Reference Counting2515.2.1 Theory2515.2.2 Implementation2535.2.2.1 driver.cpp2535.2.2.2 mallocV4.cpp2555.2.2.3 perform.cpp2565.2.2.4 memmgr.cpp2585.2.3 Tests2695.2.4 Trade-Offs2725.3 malloc() Version 5: Mark-Sweep2745.3.1 Theory2745.3.2 Implementation2775.3.2.1 driver.cpp2775.3.2.2 mallocV5.cpp2795.3.2.3 perform.cpp2805.3.2.4 memmgr.cpp2825.3.3 Tests2975.4 Performance Comparison3035.5 Potential Additions3035.5.1 Object Format Assumptions3045.5.2 Variable Heap Size3055.5.3 Indirect Addressing3065.5.4 Real-Time Behavior3075.5.5 Life Span Characteristics3085.5.6 Multithreaded Support309Chapter 6: Miscellaneous Topics3116.1 Suballocators3116.2 Monolithic Versus Microkernel Architectures3156.3 Closing Thoughts318Index321Organization334Chapter 1 - Memory Management Mechanisms334Chapter 2 - Memory Management Policies334Chapter 3 - High-Level Services334Chapter 4 - Manual Memory Management335Chapter 5 - Automatic Memory Management335Chapter 6 - Miscellaneous Topics335334Chapter 1 Memory Management Mechanisms1.1 OverviewEveryone has a photographic memory. Some people just dont have film. -Mel BrooksNote: In the text of this book, italics are used to define or emphasize a term. The Courier font is used to denote code, memory addresses, input/output, and filenames. For more information, see the section titled Typographical Conventions in the Introduction.1.2 Mechanism Versus PolicyAccessing and manipulating memory involves a lot of accounting work. Measures have to be taken to ensure that memory being accessed is valid and that it corresponds to actual physical storage. If memory protection mechanisms are in place, checks will also need to be performed by the processor to ensure that an executing task does not access memory locations that it should not. Memory protection is the type of service that multiuser operating systems are built upon. If virtual memory is being used, a significant amount of bookkeeping will need to be maintained in order to track which disk sectors belong to which task. It is more effort than you think, and all the steps must be completed flawlessly.Note: On the Intel platform, if the memory subsystems data structures are set up incorrectly, the processor will perform what is known as a triple fault. A double fault occurs on Intel hardware when an exception occurs while the processor is already trying to handle an exception. A triple fault occurs when the double-fault handler fails and the machine is placed into the SHUTDOWN cycle. Typically, an Intel machine will reset when it encounters this type of problem.For the sake of execution speed, processor manufacturers give their chips the capacity to carry out advanced memory management chores. This allows operating system vendors to effectively push most of the tedious, repetitive work down to the processor where the various error checks can be performed relatively quickly. This also has the side effect of anchoring the operating system vendor to the hardware platform, to an extent.The performance gains, however, are well worth the lost portability. If an operating system were completely responsible for implementing features like paging and segmentation, it would be noticeably slower than one that took advantage of the processors built-in functionality. Imagine trying to play a graphics-intensive, real-time game like Quake 3 on an operating system that manually protected memory; the game would just not be playable.Note: You might be asking if I can offer a quantitative measure of how much slower an operating system would be. I will admit I have been doing a little arm waving. According to a 1993 paper by Wahbe, Lucco, et al. (see the References section), they were able to isolate modules of code in an application using a technique they labeled as sandboxing. This technique incurred a 4% increase in execution speed. You can imagine what would happen if virtual memory and access privilege schemes were added to such a mechanism._ASIDEAn arm-waving explanation is a proposition that has not been established using precise mathematical statements. Mathematical statements have the benefit of being completely unambiguous: They are either true or false. An arm-waving explanation tends to eschew logical rigor entirely in favor of arguments that appeal to intuition. Such reasoning is at best dubious, not only because intuition can often be incorrect, but also because intuitive arguments are ambiguous. For example, people who argue that the world is flat tend to rely on arm-waving explanations._The solution that favors speed always wins. I was told by a former Control Data engineer that when Seymour Cray was designing the 6600, he happened upon a new chip that was quicker than the one he was currently using. The problem was that it made occasional computational errors. Seymour implemented a few slick work-arounds and went with the new chip. The execs wanted to stay out of Seymours way and not disturb the maestro, as Seymour was probably the most valuable employee Control Data had. Unfortunately, they also had warehouses full of the original chips. They couldnt just throw out the old chips; they had to find a use for them. This problem gave birth to the CDC 3300, a slower and less expensive version of the 6600.My point: Seymour went for the faster chip, even though it was less reliable.Speed rules.The result of this tendency is that every commercial operating system in existence has its memory management services firmly rooted in data structures and protocols dictated by the hardware. Processors provide a collection of primitives for manipulating memory. They constitute the mechanism side of the equation. It is up to the operating system to decide if it will even use a processors memory management mechanisms and, if so, how it will use them. Operating systems constitute the policy side of the equation.In this chapter, I will examine computer hardware in terms of how it offers a mechanism to access and manipulate memory.1.3 Memory HierarchyWhen someone uses the term memory, they are typically referring to the data storage provided by dedicated chips located on the motherboard. The storage these chips provide is often referred to as Random Access Memory (RAM), main memory, and primary storage. Back in the iron age, when mainframes walked the earth, it was called the core. The storage provided by these chips is volatile, which is to say that data in the chips is lost when the power is switched off.There are various types of RAM: DRAM SDRAM SRAM VRAMDynamic RAM (DRAM) has to be recharged thousands of times each second. Synchronous DRAM (SDRAM) is refreshed at the clock speed at which the processor runs the most efficiently. Static RAM (SRAM) does not need to be refreshed like DRAM, and this makes it much faster. Unfortunately, SRAM is also much more expensive than DRAM and is used sparingly. SRAM tends to be used in processor caches and DRAM tends to be used for wholesale memory. Finally, theres Video RAM (VRAM), which is a region of memory used by video hardware. In the next chapter, there is an example that demonstrates how to produce screen messages by manipulating VRAM.Recent advances in technology and special optimizations implemented by certain manufacturers have led to a number of additional acronyms. Here are a couple of them: DDR SDRAM RDRAM ESDRAMDDR SDRAM stands for Double Data Rate Synchronous Dynamic Random Access Memory. With DDR SDRAM, data is read on both the rising and the falling of the system clock tick, basically doubling the bandwidth normally available. RDRAM is short for Rambus DRAM, a high-performance version of DRAM sold by Rambus that can transfer data at 800 MHz. Enhanced Synchronous DRAM (ESDRAM), manufactured by Enhanced Memory Systems, provides a way to replace SRAM with cheaper SDRAM.A bit is a single binary digit (i.e., a 1 or a 0). A bit in a RAM chip is basically a cell structure that is made up of, depending on the type of RAM, a certain configuration of transistors and capacitors. Each cell is a digital switch that can either be on or off (i.e., 1 or 0). These cells are grouped into 8-bit units call bytes. The byte is the fundamental unit for measuring the amount of memory provided by a storage device. In the early years, hardware vendors used to implement different byte sizes. One vendor would use a 6-bit byte and another would use a 16-bit byte. The de facto standard that everyone seems to abide by today, however, is the 8-bit byte.There is a whole set of byte-based metrics to specify the size of a memory region:1 byte = 8 bits1 word = 2 bytes1 double word = 4 bytes1 quad word = 8 bytes1 octal word = 8 bytes1 paragraph = 16 bytes1 kilobyte (KB) = 1,024 bytes1 megabyte (MB) = 1,024KB = 1,048,576 bytes1 gigabyte (GB) = 1,024MB = 1,073,741,824 bytes1 terabyte (TB) = 1,024GB = 1,099,511,627,776 bytes1 petabyte (PB) = 1,024TB = 1,125,899,906,842,624 bytesNote: In the 1980s, having a megabyte of DRAM was a big deal. Kids used to bug their parents for 16KB memory upgrades so their Atari 400s could play larger games. At the time, having only a megabyte wasnt a significant problem because engineers tended to program in assembly code and build very small programs. In fact, this 1981 quote is often attributed to Bill Gates: 640K ought to be enough for anybody.Today, most development machines have at least 128MB of DRAM. In 2002, having 256MB seems to be the norm. Ten years from now, a gigabyte might be the standard amount of DRAM (if we are still using DRAM). Hopefully, someone will not quote me.RAM is not the only place to store data, and this is what leads us to the memory hierarchy. The range of different places that can be used to store information can be ordered according to their proximity to the processor. This ordering produces the following hierarchy:1. Registers2. Cache3. RAM4. Disk storageThe primary distinction between these storage areas is their memory latency, or lag time. Storage closer to the processor takes less time to access than storage that is further away. The latency experienced in accessing data on a hard drive is much greater than the latency that occurs when the processor accesses memory in its cache. For example, DRAM latency tends to be measured in nanoseconds. Disk drive latency, however, tends to be measured in milliseconds! (See Figure 1.1 on the following page.)Registers are small storage spaces that are located within the processor itself. Registers are a processors favorite workspace. Most of the processors day-to-day work is performed on data in the registers. Moving data from one register to another is the single most expedient way to move data.Software engineers designing compilers will jump through all sorts of hoops just to keep variables and constants in the registers. Having a large number of registers allows more of a programs state to be stored within the processor itself and cut down on memory latency. The MIPS64 processor has 32, 64-bit, general-purpose registers for this very reason. The Itanium, Intels next generation 64-bit chip, goes a step further and has literally hundreds of registers.The Intel Pentium processor has a varied set of registers (see Figure 1.2). There are six, 16-bit, segment registers (CS, DS, ES, FS, GS, SS). There are eight, 32-bit, general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP). There is also a 32-bit error flag register (EFLAGS) to signal problems and a 32-bit instruction pointer (EIP).Figure 1.2Advanced memory management functions are facilitated by four system registers (GDTR, LDTR, IDTR, TR) and five mode control registers (CR0, CR1, CR2, CR3, CR4). The usage of these registers will be explained in the next few sections.Note: It is interesting to note how the Pentiums collection of registers has been constrained by historical forces. The design requirement demanding backward compatibility has resulted in the Pentium having only a few more registers than the 8086.A cache provides temporary storage that can be accessed quicker than DRAM. By placing computationally intensive portions of a program in the cache, the processor can avoid the overhead of having to continually access DRAM. The savings can be dramatic. There are different types of caches. An L1 cache is a storage space that is located on the processor itself. An L2 cache is typically an SRAM chip outside of the processor (for example, the Intel Pentium 4 ships with a 256 or 512KB L2 Advanced Transfer Cache).Note: If you are attempting to optimize code that executes in the cache, you should avoid unnecessary function calls. A call to a distant function requires the processor to execute code that lies outside the cache. This causes the cache to reload. This is one reason why certain C compilers offer you the option of generating inline functions. The other side of the coin is that a program that uses inline functions will be much larger than one that does not. The size-versus-speed trade-off is a balancing act that rears its head all over computer science.Disk storage is the option of last resort. Traditionally, disk space has been used to create virtual memory. Virtual memory is memory that is simulated by using disk space. In other words, portions of memory, normally stored in DRAM, are written to disk so that the amount of memory the processor can access is greater than the actual amount of physical memory. For example, if you have 10MB of DRAM and you use 2MB of disk space to simulate memory, the processor can then access 12MB of virtual memory.Note: A recurring point that I will make throughout this book is the high cost of disk input/output. As I mentioned previously, the latency for accessing disk storage is on the order of milliseconds. This is a long time from the pe

人人文庫> 全部分類> 教育資料 > 作文作品

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

C++內(nèi)存算法.doc

文檔簡介

溫馨提示

最新文檔

評論

C++內(nèi)存算法.doc

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔