版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
Chapter1:
FundamentalsofComputerDesignDavidPattersonElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley/~pattrsn/~cs252Originalslidescreatedby:Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls2WhatisComputerArchitecture?FunctionaloperationoftheindividualHWunitswithinacomputersystem,andtheflowofinformationandcontrolamongthem.TechnologyProgrammingLanguageInterfaceInterfaceDesign(ISA)Measurement&EvaluationParallelismComputerArchitecture:ApplicationsOSHardwareOrganization34AbstractionLayersinModernSystemsAlgorithmGates/Register-TransferLevel(RTL)ApplicationInstructionSetArchitecture(ISA)OperatingSystem/VirtualMachineMicroarchitectureDevicesProgrammingLanguageCircuitsPhysicsOriginaldomainofthecomputerarchitect(‘50s-’80s)Domainofrecentcomputerarchitecture(‘90s)Reliability,power,…Parallelcomputing,security,…Reinvigorationofcomputerarchitecture,mid-2000sonward.5ComputerSystems:TechnologyTrends1988SupercomputersMassivelyParallelProcessorsMini-supercomputersMinicomputersWorkstationsPC’s2002PowerfulPC’sandSMPWorkstationsNetworkofSMPWorkstationsMainframesSupercomputersEmbeddedComputersCrossroads:ConventionalWisdominComp.ArchOldConventionalWisdom:Powerisfree,TransistorsexpensiveNewConventionalWisdom:“Powerwall”Powerexpensive,Xtorsfree
(Canputmoreonchipthancanaffordtoturnon)OldCW:SufficientlyincreasingInstructionLevelParallelismviacompilers,innovation(Out-of-order,speculation,…)NewCW:“ILPwall”lawofdiminishingreturnsonmoreHWforILPOldCW:Multipliesareslow,MemoryaccessisfastNewCW:“Memorywall”Memoryslow,multipliesfast
(200clockcyclestoDRAMmemory,4clocksformultiply)OldCW:Uniprocessorperformance2X/1.5yrsNewCW:PowerWall+ILPWall+MemoryWall=BrickWallUniprocessorperformancenow2X/5(?)yrs Seachangeinchipdesign:multiple“cores”
(2Xprocessorsperchip/~2years)Moresimplerprocessorsaremorepowerefficient6Crossroads:UniprocessorPerformanceVAX :25%/year1978to1986RISC+x86:52%/year1986to2002RISC+x86:??%/year2002topresentFromHennessyandPatterson,ComputerArchitecture:AQuantitativeApproach,4thedition,October,2006Lessthan20%7ChangeinChipDesignIntel4004(1971):4-bitprocessor,
2312transistors,0.4MHz,
10micronPMOS,11mm2chip
Processoristhenewtransistor?
RISCII(1983):32-bit,5stage
pipeline,40,760transistors,3MHz,
3micronNMOS,60mm2chip125mm2chip,0.065micronCMOS
=2312RISCII+FPU+Icache+DcacheRISCIIshrinksto~0.02mm2at65nmCachesviaDRAMor1transistorSRAM()?ProximityCommunicationviacapacitivecouplingat>1TB/s?
(IvanSutherland@Sun/Berkeley)8TakingAdvantageofParallelismIncreasingthroughputofservercomputerviamultipleprocessorsormultipledisksDetailedHWdesignCarrylookaheadaddersusesparallelismtospeedupcomputingsumsfromlineartologarithmicinnumberofbitsperoperandMultiplememorybankssearchedinparallelinset-associativecachesPipelining:overlapinstructionexecutiontoreducethetotaltimetocompleteaninstructionsequence.Noteveryinstructiondependsonimmediatepredecessorexecutinginstructionscompletely/partiallyinparallelpossibleClassic5-stagepipeline:
1)InstructionFetch(Ifetch),
2)RegisterRead(Reg),
3)Execute(ALU),
4)DataMemoryAccess(Dmem),
5)RegisterWrite(Reg)9PipelinedInstructionExecutionInstr.OrderTime(clockcycles)RegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegCycle1Cycle2Cycle3Cycle4Cycle6Cycle7Cycle510Limitstopipelining
HazardspreventnextinstructionfromexecutingduringitsdesignatedclockcycleStructuralhazards:attempttousethesamehardwaretodotwodifferentthingsatonceDatahazards:InstructiondependsonresultofpriorinstructionstillinthepipelineControlhazards:Causedbydelaybetweenthefetchingofinstructionsanddecisionsaboutchangesincontrolflow(branchesandjumps).Instr.OrderTime(clockcycles)RegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchReg11ThePrincipleofLocalityThePrincipleofLocality:Programaccessarelativelysmallportionoftheaddressspaceatanyinstantoftime.TwoDifferentTypesofLocality:TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)SpatialLocality(LocalityinSpace):Ifanitemisreferenced,itemswhoseaddressesareclosebytendtobereferencedsoon
(e.g.,straight-linecode,arrayaccess)Last30years,HWreliedonlocalityformemoryperf.PMEM$12LevelsoftheMemoryHierarchyCPURegisters100sBytes300–500ps(0.3-0.5ns)L1andL2Cache10s-100sKBytes~1ns-~10ns$1000s/GByteMainMemoryGBytes80ns-200ns~$100/GByteDisk10sTBytes,10ms
(10,000,000ns)~$1/GByteCapacityAccessTimeCostTapeinfinitesec-min~$1/GByteRegistersL1CacheMemoryDiskTapeInstr.OperandsBlocksPagesFilesStagingXferUnitprog./compiler1-8bytescachecntl32-64bytesOS4K-8Kbytesuser/operatorMbytesUpperLevelLowerLevelfasterLargerL2Cachecachecntl64-128bytesBlocks13WhatComputerArchitecturebringstoTableOtherfieldsoftenborrowideasfromarchitectureQuantitativePrinciplesofDesignTakeAdvantageofParallelismPrincipleofLocalityFocusontheCommonCaseAmdahl’sLawTheProcessorPerformanceEquationCareful,quantitativecomparisonsDefine,quantity,andsummarizerelativeperformanceDefineandquantityrelativecostDefineandquantitydependabilityDefineandquantitypowerCultureofanticipatingandexploitingadvancesintechnologyCultureofwell-definedinterfacesthatarecarefullyimplementedandthoroughlychecked14Comp.Arch.isanIntegratedApproachWhatreallymattersisthefunctioningofthecompletesystemhardware,runtimesystem,compiler,operatingsystem,andapplicationInnetworking,thisiscalledthe“EndtoEndargument”Computerarchitectureisnotjustabouttransistors,individualinstructions,orparticularimplementationsE.g.,OriginalRISCprojectsreplacedcomplexinstructionswithacompiler+simpleinstructions15ComputerArchitectureis
DesignandAnalysisArchitectureisaniterativeprocess:SearchingthespaceofpossibledesignsAtalllevelsofcomputersystemsCreativityGoodIdeasMediocreIdeasBadIdeasCost/PerformanceAnalysis16Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls17FocusontheCommonCaseCommonsenseguidescomputerdesignSinceitsengineering,commonsenseisvaluableInmakingadesigntrade-off,favorthefrequentcaseovertheinfrequentcaseE.g.,Instructionfetchanddecodeunitusedmorefrequentlythanmultiplier,sooptimizeit1stE.g.,Ifdatabaseserverhas50disks/processor,storagedependabilitydominatessystemdependability,sooptimizeit1stFrequentcaseisoftensimplerandcanbedonefasterthantheinfrequentcaseE.g.,overflowisrarewhenadding2numbers,soimproveperformancebyoptimizingmorecommoncaseofnooverflowMayslowdownoverflow,butoverallperformanceimprovedbyoptimizingforthenormalcaseWhatisfrequentcaseandhowmuchperformanceimprovedbymakingcasefaster=>Amdahl’sLaw
18Amdahl’sLawBestyoucouldeverhopetodo:19Amdahl’sLawexampleNewCPU10XfasterI/Oboundserver,so60%timewaitingforI/OApparently,itshumannaturetobeattractedby10Xfaster,vs.keepinginperspectiveitsjust1.6Xfaster20Processorperformanceequation InstCount CPI ClockRateProgram X Compiler X (X)Inst.Set. X XOrganization X XTechnology XCPUtime =Seconds=InstructionsxCyclesxSeconds Program ProgramInstructionCycleinstcountCPICycletime21RelatingMetricsCPUexecutiontimeMeasuredtimeforarunningprogramEasytobemeasuredClockcyclesThenumberoftheclockpulseforarunningprogramHardtobemeasuredInstructioncountThenumberofinstructionsexecutedbytheprogramcanbemeasuredbyusingsoftwaretoolsthatprofiletheexecutionorbyusingasimulatorofthearchitectureCPIClockcyclesperinstructionsNeedtheclockcyclesandcountinstructionnumberforeachinstructiontypeforcomputingtheCPIClocksDigitalcircuithasaclockthatrunsataconstantrate(像人的脈膊),clockisusedforsignalsynchronizationCycletime=timeforonefullcycle(secondspercycle)Clockrate=cyclespersecond(HertzorHz)AlsoknownasclockfrequencyScientificPrefixesusingwithcycletimeandclockratePrefixSymbolMultipleteraT10E12gigaG10E9megaM10E6kilok10E3millim10E-3micro
u10E-6nanon10E-9picop10E-12What’saClockCycle?Olddays:10levelsofgatesToday:determinedbynumeroustime-of-flightissues+gatedelaysclockpropagation,wirelengths,driversLatchorregistercombinationallogic24TheaveragenumberofclockcycleseachinstructiontakestoexecuteAfloatingpointintensiveapplicationmighthaveahigherCPICPUclockcycles=InstructioncountxCPICPUtime=CPUclockcyclesxClockcycletimeCPUtime=InstructioncountxCPIxClockcycletimeCPUtime=(InstructioncountxCPI)/ClockrateCPI(Clockcyclesperinstruction)Supposewehavetwoimplementationsofthesameinstructionset
architecture(ISA).
Forsomeprogram,
MachineAhasaclockcycletimeof10ns.andaCPIof4.0
MachineBhasaclockcycletimeof20ns.andaCPIof1.2
Whatmachineisfasterforthisprogram,andbyhowmuch?
CPIExampleCPIExampleAnswer:MachineA:clockcycle=1ns,CPI=2MachineB:clockcycle=2ns,CPI=1.2CPUclockcyclesA=InstructionCountx4.0CPUclockcyclesB=InstructionCountx1.2CPUtimeA=CPUclockcyclesAxclockcycletime=InstructionCountx2x1=2xInstructionCountCPUtimeB=InstructionCountx1.2x2=4.4xInstructionCountPerformanceA/PerformanceB=ExecutiontimeB/ExecutiontimeA=(4.4xI)/(2xI)=1.2Thus,Ais1.2timesfasterthanBOutline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls28Desktop:personalcomputerServer:webservers,fileservers,databaseserversEmbedded:handhelddevices(phones,cameras),dedicatedparallelcomputersThreemainclassesofcomputers29FeatureDesktopServerEmbeddedPriceofsystemPriceofmultiprocessormoduleCriticalsystemdesignissues$500-$5000$5000-$5,000,000$10-$100,000$50-$500$200-$10,000$.01-$100Price-performance,GraphicsperformanceThroughput,Availability,ScalabilityPrice,Powerconsumption,Application-specificperformance30Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls31InstructionSetArchitecture:CriticalInterfacePropertiesofagoodabstractionLaststhroughmanygenerations(portability)Usedinmanydifferentways(generality)ProvidesconvenientfunctionalitytohigherlevelsPermitsanefficientimplementationatlowerlevelsinstructionsetsoftwarehardware32Example:MIPSarchitecture0r0r1°°°r31PClohiProgrammablestorage 2^32xbytes 31x32-bitGPRs(R0=0) 32x32-bitFPregs(pairedDP) HI,LO,PCDatatypes?Format?AddressingModes? Arithmeticlogical
Add,AddU,Sub,SubU,And,Or,Xor,Nor,SLT,SLTU, AddI,AddIU,SLTI,SLTIU,AndI,OrI,XorI,LUI SLL,SRL,SRA,SLLV,SRLV,SRAVMemoryAccess
LB,LBU,LH,LHU,LW,LWL,LWR SB,SH,SW,SWL,SWRControl
J,JAL,JR,JALR BEq,BNE,BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL32-bitinstructionsonwordboundary33RegistertoregisterTransfer,branchesJumpsMIPSarchitectureinstructionsetformat34ISAvs.ComputerArchitectureOlddefinitionofcomputerarchitecture
=instructionsetdesignOtheraspectsofcomputerdesigncalledimplementationInsinuatesimplementationisuninterestingorlesschallengingOurviewiscomputerarchitecture>>ISAArchitect’sjobmuchmorethaninstructionsetdesign;technicalhurdlestodaymorechallengingthanthoseininstructionsetdesignSinceinstructionsetdesignnotwhereactionis,someconcludecomputerarchitecture(usingolddefinition)isnotwhereactionisWedisagreeonconclusionAgreethatISAnotwhereactionis(ISAinCA:AQA4/eappendix)35Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls36Moore’sLaw:2Xtransistors/“year”“CrammingMoreComponentsontoIntegratedCircuits”GordonMoore,Electronics,1965#ontransistors/cost-effectiveintegratedcircuitdoubleeveryNmonths(12≤N≤24)37TrackingTechnologyPerformanceTrendsDrilldowninto4technologies:Disks,Memory,Network,ProcessorsCompare~1980Archaic(Nostalgic)vs.
~2000Modern(Newfangled)PerformanceMilestonesineachtechnologyCompareforBandwidthvs.LatencyimprovementsinperformanceovertimeBandwidth:numberofeventsperunittimeE.g.,Mbits/secondovernetwork,Mbytes/secondfromdiskLatency:elapsedtimeforasingleeventE.g.,one-waynetworkdelayinmicroseconds,
averagediskaccesstimeinmilliseconds38Disks:Archaic(Nostalgic)v.Modern(Newfangled)CDCWrenI,19833600RPM0.03GBytescapacityTracks/Inch:800
Bits/Inch:9550
Three5.25”platters
Bandwidth:
0.6MBytes/secLatency:48.3msCache:noneSeagate373453,200315000RPM (4X)73.4GBytes (2500X)Tracks/Inch:64000 (80X)Bits/Inch:533,000 (60X)Four2.5”platters
(in3.5”formfactor)Bandwidth:
86MBytes/sec (140X)Latency:5.7ms (8X)Cache:8MBytes39LatencyLagsBandwidth(forlast~20years)PerformanceMilestonesDisk:3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)40Memory:Archaic(Nostalgic)v.Modern(Newfangled)1980DRAM
(asynchronous)0.06Mbits/chip64,000xtors,35mm216-bitdatabuspermodule,16pins/chip13Mbytes/secLatency:225ns(noblocktransfer)2000
DoubleDataRateSynchr.
(clocked)DRAM256.00Mbits/chip (4000X)256,000,000xtors,204mm264-bitdatabusper
DIMM,66pins/chip (4X)1600Mbytes/sec (120X)Latency:52ns (4X)Blocktransfers(pagemode)41LatencyLagsBandwidth(last~20years)PerformanceMilestones
MemoryModule:16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:
3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)42LANs:Archaic(Nostalgic)v.Modern(Newfangled)Ethernet802.3
YearofStandard:197810Mbits/s
linkspeedLatency:3000msecSharedmediaCoaxialcableEthernet802.3ae
YearofStandard:200310,000Mbits/s (1000X)
linkspeedLatency:190msec (15X)SwitchedmediaCategory5copperwireCoaxialCable:CoppercoreInsulatorBraidedouterconductorPlasticCoveringCopper,1mmthick,
twistedtoavoidantennaeffectTwistedPair:"Cat5"is4twistedpairsinbundle43LatencyLagsBandwidth(last~20years)PerformanceMilestones
Ethernet:10Mb,100Mb,1000Mb,10000Mb/s(16x,1000x)MemoryModule:
16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:
3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)44CPUs:Archaic(Nostalgic)v.Modern(Newfangled)1982Intel8028612.5MHz2MIPS(peak)Latency320ns134,000xtors,47mm216-bitdatabus,68pinsMicrocodeinterpreter,
separateFPUchip(nocaches)
2001IntelPentium4
1500
MHz (120X)4500MIPS(peak) (2250X)Latency15ns (20X)42,000,000xtors,217mm264-bitdatabus,423pins3-waysuperscalar,
DynamictranslatetoRISC,Superpipelined(22stage),
Out-of-OrderexecutionOn-chip8KBDatacaches,
96KBInstr.Tracecache,
256KBL2cache45LatencyLagsBandwidth(last~20years)PerformanceMilestonesProcessor:‘286,‘386,‘486,Pentium,PentiumPro,Pentium4(21x,2250x)Ethernet:10Mb,100Mb,1000Mb,10000Mb/s(16x,1000x)MemoryModule:16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:3600,5400,7200,10000,15000RPM(8x,143x)CPUhigh,Memorylow
(“MemoryWall”)46RuleofThumbforLatencyLaggingBWInthetimethatbandwidthdoubles,latencyimprovesbynomorethanafactorof1.2to1.4
(andcapacityimprovesfasterthanbandwidth)Statedalternatively:
BandwidthimprovesbymorethanthesquareoftheimprovementinLatency
476ReasonsLatency
LagsBandwidth1. Moore’sLawhelpsBWmorethanlatencyFastertransistors,moretransistors,
morepinshelpBandwidthMPUTransistors: 0.130vs.42Mxtors (300X)DRAMTransistors: 0.064vs.256Mxtors (4000X)MPUPins: 68vs.423pins
(6X)DRAMPins: 16vs.66pins
(4X)Smaller,fastertransistorsbutcommunicate
over(relatively)longerlines:limitslatency
Featuresize: 1.5to3vs.0.18micron (8X,17X)MPUDieSize: 35vs.204mm2 (ratiosqrt2X)DRAMDieSize: 47vs.217mm2 (ratiosqrt2X)486ReasonsLatency
LagsBandwidth(cont’d)
2.Distancelimitslatency
SizeofDRAMblock
longbitandwordlines
mostofDRAMaccesstimeSpeedoflightandcomputersonnetwork1.&2.explainslinearlatencyvs.squareBW?3. Bandwidtheasiertosell(“bigger=better”)E.g.,10Gbits/sEthernet(“10Gig”)vs.
10mseclatencyEthernet4400MB/sDIMM(“PC4400”)vs.50nslatencyEvenifjustmarketing,customersnowtrainedSincebandwidthsells,moreresourcesthrownatbandwidth,whichfurthertipsthebalance496ReasonsLatency
LagsBandwidth(cont’d)
4. LatencyhelpsBW,butnotviceversa
Spinningdiskfasterimprovesbothbandwidthandrotationallatency
3600RPM15000RPM=4.2XAveragerotationallatency:8.3ms2.0msThingsbeingequal,alsohelpsBWby4.2XLowerDRAMlatency
Moreaccess/second(higherbandwidth)HigherlineardensityhelpsdiskBW
(andcapacity),butnotdiskLatency9,550BPI533,000BPI
60XinBW506ReasonsLatency
LagsBandwidth(cont’d)
5.BandwidthhurtslatencyQueueshelpBandwidth,hurtLatency(QueuingTheory)AddingchipstowidenamemorymoduleincreasesBandwidthbuthigherfan-outonaddresslinesmayincreaseLatency6.OperatingSystemoverheadhurts
LatencymorethanBandwidthLongmessagesamortizeoverhead;
overheadbiggerpartofshortmessages51Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls52Defineandquantitypower(1/2)ForCMOSchips,traditionaldominantenergyconsumptionhasbeeninswitchingtransistors,calleddynamicpower:Formobiledevices,energybettermetricForafixedtask,slowingclockrate(frequencyswitched)reducespower,butnotenergyCapacitiveloadafunctionofnumberoftransistorsconnectedtooutputandtechnology,whichdeterminescapacitanceofwiresandtransistorsDroppingvoltagehelpsboth,sowentfrom5Vto1VTosaveenergy&dynamicpower,mostCPUsnowturnoffclockofinactivemodules(e.g.Fl.Pt.Unit)53ExampleofquantifyingpowerSuppose15%reductioninvoltageresultsina15%reductioninfrequency.Whatisimpactondynamicpower?54Defineandquantitypower(2/2)Becauseleakagecurrentflowsevenwhenatransistorisoff,nowstaticpowerimportanttooLeakagecurrentincreasesinprocessorswithsmallertransistorsizesIncreasingthenumberoftransistorsincreasespowereveniftheyareturnedoffIn2006,goalforleakageis25%oftotalpowerconsumption;highperformancedesignsat40%Verylowpowersystemsevengatevoltagetoinactivemodulestocontrollossduetoleakage55Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls56CostofIntegratedCircuitsdependsofseveralfactors:Time:Thepricedropswithtime,learningcurveincreasesVolume:ThepricedropswithvolumeincreaseCommodities:ManymanufacturersproducethesameproductCompetitionbringspricesdown57ThepriceofIntelPentium4andPentiumM58AMDOpteronMicroprocessorDie59A300mmsiliconwafercontains117AMDOpteronmicroprocessorchipsina90nmprocess60Costofintegratedcircuit=Costofdie+Costoftestingdie+CostofPackagingandfinalTestFinalTestYieldCostofdie=CostofWaferDiesperwaferXDieyield61Diesperwafer=PiXWaferDiameterSqrt(2XDiearea)Example:WaferDiameter=300mmDiearea=1.5cmX1.5cm=2.25cm^2Diesperwafer=270PiX(WaferDiameter/2)^2Diearea-62Dieyield=DefectsperunitareaXDieareaaWaferyieldX(1+)-aWaferyield:measureshowmanywafersarecompletelybada=4Empiricalformulacorrespondstomaskinglevelsinmanufacturingprocess63Example:Diearea=1.5cmX1.5cm=2.25cm^2Dieyield=0.44Defectdensity=0.4percm^2Diearea=1.0cmX1.0cm=1cm^2Dieyield=0.68Smallerdieareagivesmoredieyield64Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependability
PerformanceFallaciesandPitfalls65Defineandquantitydependability(1/3)Howdecidewhenasystemisoperatingproperly?InfrastructureprovidersnowofferServiceLevelAgreements(SLA)toguaranteethattheirnetworkingorpowerservicewouldbedependableSystemsalternatebetween2statesofservicewithrespecttoanSLA:Serviceaccomplishment,wheretheserviceisdeliveredasspecifiedinSLAServiceinterruption,wherethedeliveredserviceisdifferentfromtheSLAFailure=transitionfromstate1tostate2Restoration=transitionfromstate2tostate166Defineandquantitydependability(2/3)Modulereliability=measureofcontinuousserviceaccomplishment(ortimetofailure).
2metricsMeanTimeToFailure(MTTF)measuresReliabilityFailuresInTime(FIT)=1/MTTF,therateoffailuresTraditionallyreportedasfailuresperbillionhoursofoperationMeanTimeToRepair(MTTR)measuresServiceInterruptionMeanTimeBetweenFailures(MTBF)=MTTF+MTTRModuleavailabilitymeasuresserviceasalternatebetweenthe2statesofaccomplishmentandinterruption(numberbetween0and1,e.g.0.9)Moduleavailability=MTTF/(MTTF+MTTR)67ExamplecalculatingreliabilityIfmoduleshaveexponentiallydistributedlifetimes(ageofmoduledoesnotaffectprobabilityoffailure),overallfailurerateisthesumoffailureratesofthemodulesCalculateFITandMTTFfor10disks(1MhourMTTFperdisk),1diskcontroller(0.5MhourMTTF),and1powersupply(0.2MhourMTTF):68Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls6970HowtoQuantifyPerformance?Timetorunthetask(ExTime)Executiontime,responsetime,latencyTasksperday,hour,week,sec,ns…(Performance)Throughput,bandwidthPlaneBoeing747BAD/SudConcodreSpeed610mph1350mphDCtoParis6.5hours3hoursPassengers470132Throughput(pmph)286,700178,200Definition:Performance Performance(X) Execution_time(Y) n= = Performance(Y) Execution_time(X)PerformanceisinunitsofthingspersecbiggerisbetterIfweareprimarilyconcernedwithresponsetime1 execution_time(x)"XisntimesfasterthanY"means:performance(x)=71Performance:WhattomeasureUsuallyrelyonbenchmarksvs.realworkloadsToincreasepredictability,collectionsofbenchmarkapplications,calledbenchmarksuites,arepopularSPECCPU:populardesktopbenchmarksuiteCPUonly,splitbetweenintegerandfloatingpointprogramsSPECint2000has12integer,SPECfp2000has14integerpgmsSPECCPU2006tobeannouncedSpring2006SPECSFS(NFSfileserver)andSPECWeb(WebServer)addedasserverbenchmarksTransactionProcessingCouncilmeasuresserverperformanceandcost-performancefordatabasesTPC-CComplexqueryforOnlineTransactionProcessingTPC-HmodelsadhocdecisionsupportTPC-WatransactionalwebbenchmarkTPC-Appapplicationserverandwebservicesbenchmark7273SPEC:SystemPerformanceEvaluationCooperativeFirstRound198910programsyieldingasinglenumber(“SPECmarks”)SecondRound1992SPECInt92(6integerprograms)andSPECfp92(14floatingpointprograms)CompilerFlagsunlimited.March93newsetofprograms:SPECint95(8integerprograms)andSPECfp95(10floatingpoint)“benchmarksusefulfor3years”Singleflagsettingforallprograms:SPECint_base95,SPECfp_base95
SPECCPU2000(11integerbenchmarks–CINT2000,and14floating-pointbenchmarks–CFP2000NormalizedExecutionTimeNormalizeexecutiontimetoareferencemachineTwocommonmethodArithmeticmeanGeometricmeanComparisonArithmeticmeanUsetopredictperformanceMaynotbeconsistentGeometricmeanIndependentoftherunningtimesoftheindividualprogramsCannotbeusedtopredictrelativeexecutiontimeforaworkload4.5NormalizedExecutionTime–ExampleTimeonATimeonBNormalizedtoANormalizedtoBABABProgram111011
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- GB/T 37752.5-2024工業(yè)爐及相關(guān)工藝設(shè)備安全第5部分:鋼帶連續(xù)退火爐
- 防雨雪冰凍應(yīng)急演練
- 頸椎病的預(yù)防與照護(hù)
- 花生酥課件教學(xué)課件
- 零售年中述職報告
- 精神科阿爾茨海默病
- 2.2 課時2 離子反應(yīng) 課件 上學(xué)期化學(xué)魯科版(2019)必修第一冊
- 超市防盜標(biāo)簽的種類和使用方法
- 初中體育教案課后反思
- 角的平分線的性質(zhì)說課稿
- 《兒童支氣管哮喘診斷與防治指南》解讀-PPT課件
- 亞朵酒店集團(tuán) 員工入職培訓(xùn)計劃
- 疏浚工程(絞吸船)施工方案
- 營運橋梁變形監(jiān)測報告
- 小班繪本故事《我的門》
- 公司企業(yè)保密知識培訓(xùn)(精品推薦)
- 220KV輸電線路工程施工組織設(shè)計
- 高爾斯華綏《品質(zhì)》
- 稻瘟病及其研究成果
- 生物質(zhì)炭化技術(shù)
- 物理化學(xué):第二章 熱力學(xué)第二定律
評論
0/150
提交評論