版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
TechnicalWhitePaper
H18597
DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning
Abstract
ThisdocumentdemonstrateshowtheDellEMCIsilonF800all-flashscale-outNASandNVIDIADGX?A100systemswithNVIDIA?A100TensorCoreGPUscanbeusedtoaccelerateandscaledeeplearningtrainingworkloads
November2020
Revisions
2DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Revisions
Date
Description
November2020
Initialrelease
Acknowledgments
Author:DamienMas
Other:NVIDIA
Theinformationinthispublicationisprovided“asis.”DellInc.makesnorepresentationsorwarrantiesofanykindwithrespecttotheinformationinthispublication,andspecificallydisclaimsimpliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.
Use,copying,anddistributionofanysoftwaredescribedinthispublicationrequiresanapplicablesoftwarelicense.
ThisdocumentmaycontaincertainwordsthatarenotconsistentwithDell'scurrentlanguageguidelines.Dellplanstoupdatethedocumentoversubsequentfuturereleasestorevisethesewordsaccordingly.
ThisdocumentmaycontainlanguagefromthirdpartycontentthatisnotunderDell'scontrolandisnotconsistentwithDell'scurrentguidelinesforDell'sowncontent.Whensuchthirdpartycontentisupdatedbytherelevantthirdparties,thisdocumentwillberevisedaccordingly.
Copyright?2020–2021DellInc.oritssubsidiaries.AllRightsReserved.DellTechnologies,Dell,EMC,DellEMCandothertrademarksaretrademarksofDellInc.oritssubsidiaries.Othertrademarksmaybetrademarksoftheirrespectiveowners.[3/31/2021][TechnicalWhitePaper][H18597]
Tableofcontents
3DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Tableofcontents
Revisions 2
Acknowledgements 2
Tableofcontents 3
Executivesummary 4
Audience 4
1Introduction 5
1.1Deeplearningdataflow 5
2Solutionarchitecture 7
2.1Overview 7
2.2Storage–DellEMCIsilonF800 8
2.3Networking 9
2.3.1DellEMCPowerSwitchdatacenterswitches 9
2.3.2NVIDIAMellanoxSN3700VEthernetswitchforStorage 10
2.3.3NVIDIAMellanoxQM8700InfiniBandswitchforGPUInterconnect 10
2.4Compute:NVIDIADGXA100system 10
2.5NVIDIANGC 11
2.6Billofmaterials 11
3Deeplearningtrainingperformanceandanalysis 13
3.1Benchmarkmethodology 13
3.2MLPerfBenchmarkresults 13
3.3NVIDIAcollectivecommunicationlibrary(NCCL) 14
3.4Storage-onlyperformanceusingFIO 15
4Solutionsizingguidance 17
5Conclusion 18
6References 19
Executivesummary
4DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Executivesummary
Deeplearning(DL)techniqueshaveenabledgreatsuccessesinmanyfieldssuchascomputervision,naturallanguageprocessing(NLP),gamingandautonomousdrivingbyenablingamodeltolearnfromexistingdataandthentomakecorrespondingpredictions.Thesuccessisduetoacombinationofimprovedalgorithms,accesstolargerdatasetsandincreasedcomputationalpower.Tobeeffectiveatenterprisescale,thecomputationalintensityofDLrequireshighlyefficientparallelarchitectures.Thechoiceanddesignofthesystemcomponents,carefullyselectedandtunedforDLuse-cases,canhaveabigimpactonthespeed,accuracyandbusinessvalueofimplementingartificialintelligence(AI)techniques.
Insuchademandingenvironment,itiscriticalthatorganizationsbeabletorelyonvendorsthattheytrust.Overthelastfewyears,DellTechnologiesandNVIDIAhaveestablishedastrongpartnershiptohelporganizationsfast-tracktheirAIinitiatives.OurpartnershipisbuiltonthephilosophyofofferingflexibilityandinformedchoiceacrossabroadportfoliowhichcombinesbestofbreedGPUacceleratedcompute,scale-outstorage,andnetworking.
ThispaperfocusesonhowDellEMCIsilonF800all-flashscale-outNASacceleratesAIinnovationbydeliveringtheperformance,scalabilityandI/OconcurrencytocomplementtherequirementsofNVIDIADGXA100systemsforhigh-performanceAIworkloads.
Audience
ThisdocumentisintendedfororganizationsinterestedinsimplifyingandacceleratingDLsolutionswithadvancedcomputingandscale-outdatamanagementsolutions.Solutionarchitects,systemadministratorsandotherinterestedreaderswithinthoseorganizationsconstitutethetargetaudience.
Introduction
5DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
1
1.1
Introduction
DLisanareaofAIwhichusesartificialneuralnetworkstoenableaccuratepatternrecognitionofcomplexreal-worldpatternsbycomputers.Thesenewlevelsofinnovationhaveapplicabilityacrossnearlyeveryindustryvertical.Someoftheearlyadoptersincludeadvancedresearch,precisionmedicine,hightechmanufacturing,advanceddriverassistancesystems(ADAS)andautonomousdriving.Buildingontheseinitialsuccesses,AIinitiativesarespringingupinvariousbusinessunits,suchasmanufacturing,customersupport,lifesciences,marketing,andsales.
Gartner
predictsthatAIaugmentationwillgenerate$2.9trillioninbusinessvalueby2021alone.Organizationsarefacedwithamultitudeofcomplexchoicesrelatedtodata,analyticskillsets,softwarestacks,analytictoolkits,andinfrastructurecomponents;eachwithsignificantimplicationsonthetimetomarketandthevalueassociatedwiththeseinitiatives.
Insuchacomplexenvironment,itiscriticalthatorganizationsbeabletorelyonvendorsthattheytrust.Overthelastfewyears,DellTechnologiesandNVIDIAhaveestablishedastrongpartnershiptohelporganizationsacceleratetheirAIinitiatives.Ourpartnershipisbuiltonthephilosophyofofferingflexibilityandinformedchoiceacrossanextensiveportfolio.TogetherourtechnologiesprovidethefoundationforsuccessfulAIsolutionswhichdrivethedevelopmentofadvancedDLsoftwareframeworks,delivermassivelyparallelcomputeintheformofNVIDIAGPUsforparallelmodeltrainingandscale-outfilesystemstosupporttheconcurrency,performance,andcapacityrequirementsofunstructuredimageandvideodatasets.
ThisdocumentfocusesonthelateststepintheDellTechnologiesandNVIDIAcollaboration,anewAIreferencearchitecturewithDellEMCIsilonF800storageandDGXA100systemsforDLworkloads.Thisnewoffergivescustomersmoreflexibilityinhowtheydeployscalable,highperformanceDLinfrastructure.TheresultsofstandardimageclassificationtrainingbenchmarkusingMLPerf0.7andmicro-benchmarkutilities,areincluded.
Deeplearningdataflow
Asvisualizedin
Figure1
,DLusuallyconsistsoftwodistinctworkflows,modeldevelopmentandinference.
CommonDLWorkflows-Modeldevelopmentandinference
Note:TheIsilonstorageandDGXA100systemarchitectureisoptimizedforthemodeldevelopmentworkflowwhichconsistsofthemodeltrainingandthebatchinferencevalidationsteps.Itisnotintendedforandnorwasitbenchmarkedforproductioninference.
Introduction
6DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Theworkflowstepsaredefinedanddetailedbelow.
1.IngestLabeledData—Thelabeleddata(e.g.imagesandtheirlabelswhichindicatewhethertheimagecontainsadog,cat,orhorse)areingestedintotheDLsystem.
2.Transform—TransformationincludesalloperationsthatareappliedtothelabeleddatabeforetheyarepassedtotheDLalgorithm.Itissometimesreferredtoaspreprocessing.Forimages,thisoftenincludesfileparsing,JPEGdecoding,cropping,resizing,rotation,andcoloradjustments.Transformationscanbeperformedontheentiredatasetaheadoftime,storingthetransformeddataondisk.Manytransformationscanalsobeappliedinatrainingpipeline,avoidingtheneedtostoretheintermediatedata.
3.TrainModel—Themodelparameters(edgeweights)arelearnedfromthelabeleddatausingthestochasticgradientdescentoptimizationmethod.Inthecaseofimageclassification,thereareseveralprebuiltstructuresofneuralnetworksthathavebeenshowntoworkwell.
4.ValidateModel—Oncethemodeltrainingphasecompleteswithasatisfactoryaccuracy,you’llwanttomeasuretheaccuracyofitonvalidationdata–datathatthemodeltrainingprocesshasnotseen.Thisisdonebyusingthetrainedmodeltomakeinferencesfromthevalidationdataandcomparingtheresultwiththecorrectlabel.Thisisoftenreferredtoasinferencebutkeepinmindthatthisisadistinctstepfromproductioninference.
5.ProductionInference—Thetrainedandvalidatedmodelisthenoftendeployedtoasystemthatcanperformreal-timeinference.Itwillacceptasinputasingleimageandoutputthepredictedclass(dog,cat,horse).Insomecases,inputsarebatchedforhigherthroughputbuthigherlatency.
Solutionarchitecture
7DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
2
2.1
Solutionarchitecture
Overview
Figure2
illustratesthereferencearchitectureshowingthekeycomponentsthatmadeupthesolutionasitwastestedandbenchmarked.Notethatinacustomerdeployment,thenumberofDGXA100systemsand
F800storagenodeswillvaryandcanbescaledindependentlytomeettherequirementsofthespecificDL
workloads.Referto
Solutionsizingguidance
fordetails.
Note:Backend40GbEswitches
forF800notshown
(4)DellEMCIsilonF800nodesin(1)F800chassis
40GbENFS
100GbENFS
200GbHDRIB
200GbEISL
(2)SN3700Vswitches
(4)DGXA100systems
Note:QM8700wereinterconnectedwitheightISLs
(2)QM8700InfiniBandswitches
ReferenceArchitecture
8DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
2.2
Storage–DellEMCIsilonF800
DellEMCIsilonF800representsthesixthgenerationofhardwarebuilttorunthewell-provenandmassivelyscalableDellEMCPowerScaleOneFSoperatingsystem.EachDellEMCIsilonF800chassis,shownin
Figure3
,containsfourstoragenodes,60high-performancesolidstatedrives(SSDs)andeight40GbEnetworkconnections.OneFScombinesupto252nodesin63chassisintoasinglehigh-performancefilesystemdesignedtohandlethemostI/O-intenseworkloadssuchasDL.Asperformanceandcapacitydemandsincrease,theplatformcanbescaled-outsimplyandnon-disruptively,allowingapplicationsanduserstocontinueworking.
DellEMCIsilonF800chassis,containingfourstoragenodes
Inthesolutiontestedinthisdocument,fourF800nodes,inonechassis,wereused.
DellEMCIsilonF800hasthefollowingfeatures.
?Lowlatency,highthroughput,andmassivelyparallelI/OforAI
-Upto250,000fileIOPSperchassis,upto15.75millionIOPSpercluster
-Upto15GB/sthroughputperchassis,upto945GB/spercluster
-96TBto924TBrawflashcapacityperchassis;upto58PBpercluster(all-flash)
ThisshortenstimefortrainingandtestinganalyticalmodelsfordatasetsfromtensofTBstotensofPBsonAIplatformssuchasRAPIDS,TensorFlow,SparkML,Caffe,orproprietaryAIplatforms.
?TheabilitytorunAIin-placeondatausingmulti-protocolaccess
-Multi-protocolsupportsuchasSMB,NFS,HTTP,S3,andnativeHDFStomaximizeoperationalflexibility
Thiseliminatestheneedtomigrate/copydataandresultsovertoaseparateAIstack.OrganizationscanperformDLandrunotherITappsonthesamedataalreadyonIsilonbyaddingadditionalIsilonnodestoanexistingcluster.
?Enterprisegradefeaturesout-of-box
-Enterprisedataprotectionandresiliency
-Robustsecurityoptions
ThisenablesorganizationstomanageAIdatalifecyclewithminimalcostandrisk,whileprotectingdataandmeetingregulatoryrequirements.
?Extremescale
9DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
-SeamlesslytierbetweenAllFlash,Hybrid,andArchivenodesviaDellEMCPowerScaleSmartPools
-Grow-as-you-goscalabilitywithupto58PBflashcapacitypercluster
-Newnodescanbeaddedtoaclustersimplybyconnectingpower,back-endEthernetandfront-endEthernet
-Asnewnodesareadded,storagecapacity,throughput,IOPS,cache,andCPUgrow
-Upto63chassis(252nodes)maybeconnectedtoformasingleclusterwithasinglenamespaceandasinglecoherentcache
-Upto85%storageefficiencytoreducecostswithDellEMCPowerScaleSmartDedupesoftware
-Optionaldatade-dupandcompressionenablinguptoa3:1datareduction
OrganizationscanachieveAIatscaleinacost-effectivemanner,enablingthemtohandlemulti-petabytedatasetswithhighresolutioncontentwithoutre-architectureand/orperformancedegradation.
ThereareseveralkeyfeaturesofOneFSthatmakeitanexcellentstoragesystemforDLworkloadsthatrequireperformance,concurrency,andscale.Thesefeaturesarelistedbelow.
?StorageTieringusingDellEMCPowerScaleSmartPoolssoftwareenablesmultiplelevelsofperformance,protection,andstoragedensitytoco-existwithinthesamefilesystemandunlockstheabilitytoaggregateandconsolidateawiderangeofapplicationswithinasingleextensible,ubiquitousstorageresourcepool.Thishelpsprovidegranularperformanceoptimization,workflowisolation,higherutilization,andindependentscalability–allwithasinglepointofmanagement.Formoredetails,see
StorageTieringwithDellEMCIsilonSmartPools
.
?OneFScachinginfrastructuredesignispredicatedonaggregatingthecachepresentoneachnodeinaclusterintoonegloballyaccessiblepoolofmemory.Thisallowsallthememorycacheinanodetobeavailabletoeverynodeinthecluster.OneFScantakeadvantageofprefetchingofdatabasedonheuristicsusedbytheIsilonSmartReadcomponent.Thisgreatlyimprovessequential-readperformanceacrossallprotocolsandmeansthatreadscomedirectlyfromRAMwithinmilliseconds.Forhigh-sequentialcases,SmartReadcanveryaggressivelyprefetchahead,allowingreadsofindividualfilesatveryhighdatarates.Formoredetails,see
OneFSSmartFlash
.
?OneFShasafullydistributedlockmanagerthatcoordinateslocksondataacrossallnodesinastoragecluster.EfficientlockingiscriticaltosupporttheefficientparallelI/OprofiledemandedbymanyiterativeDLworkloadsenablingconcurrentfilereadaccessupintothemillions.Formoredetails,seethe
OneFSTechnicalOverview
.
2.3Networking
2.3.1DellEMCPowerSwitchdatacenterswitches
ThebenchmarktestinginthisbriefwasperformedinNVIDIA’spartnerfacilityandthenetworkingmaterialsmentionedrepresenttheequipmenttheyusedduringthetesting.DellTechnologiesofferstop-of-rackswitchesbuiltforbuildinghigh-capacitynetworkfabrics,andcore/aggregationswitchesdesignedforbuildingoptimizeddatacenterleaf/spinefabricsofvirtuallyanysize.DellEMCPowerSwitchS-andZ-SeriesaretestedandproveninDellTechnologies’performancelabs,toprankedinindustrytests(
Tolly
and
IT
Brand
Pulse),andarecurrentlydeployedincustomerdatacentersaroundtheworld.
LearnmoreaboutDellEMCPowerSwitchS-andZ-Series
10DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
DellEMCPowerSwitchDataCenterQuickReferenceGuide
2.3.2NVIDIAMellanoxSN3700VEthernetswitchforStorage
TheNVIDIAMellanoxSN3700VEthernetswitchesprovidethehighspeed“front-end”EthernetconnectivitybetweentheIsilonF800clusternodesandNVIDIADGXA100systems.TheF800nodesconnectwith25GbEor40GbEconnections,theDGXA100systemsconnectwith100GbEor200GbEconnections,andtheSN3700switchesautomaticallyforwardtrafficacrossthedifferentspeedconnectionswithminimallatency.BasedontheNVIDIASpectrum-2switchASICandpurposebuiltforthemoderndatacenter,theSN3000switchescombinehighperformancepacketprocessing,richdatacenterfeatures,cloudnetworkscaleandvisibility.Aflexibleunifiedbuffertoensurefairandpredictableperformanceacrossanycombinationofportsandspeedsfrom10Gb/sto200Gb/s,andanOpenEthernetdesignsupportsmultiplenetworkOSchoicesincludingNVIDIACumulusLinux,NVIDIAOnyx,andSONiC.
Learnmoreaboutthe
NVIDIAMellanoxSpectrumSN3000seriesswitches
.
2.3.3NVIDIAMellanoxQM8700InfiniBandswitchforGPUInterconnect
TheNVIDIAMellanoxQM8700InfiniBandswitchesprovidehigh-throughput,low-latencynetworkingbetweentheDGXA100systems.DesignedforbothEDR100Gb/sandHDR200Gb/sInfiniBandlinks,theyminimizelatencyandmaximizethroughputforallGPU-to-GPUcommunicationbetweensystems.TheQM8700switchessupportRemoteDirectMemoryAccess(RDMA)andin-networkcomputingoffloadsforAIanddataanalyticstoenablefasterandmoreefficientdatatransfers.TheysupportNVIDIAGPUDirect,MellanoxSHARPfornetwork-basedAIandanalyticsoffloads(suchasMPIAllReduce),andMellanoxSHIELDformaximumresiliencyinaself-healingnetwork.
Learnmoreaboutthe
NVIDIAMellanoxQuantumQM8700InfiniBandswitches
,
2.4Compute:NVIDIADGXA100system
TheDGXA100system
(Figure4)
isafullyintegrated,turnkeyhardwareandsoftwaresystemthatispurpose-builtforDLworkflows.EachDGXA100systemispoweredbyeightNVIDIAA100TensorCoreGPUsthatareinterconnectedusingNVIDIANVSwitch?technology,whichprovidesanultra-highbandwidthlow-latencyfabricforinter-GPUcommunication.Thistopologyisessentialformulti-GPUtraining,eliminatingthebottleneckthatisassociatedwithPCIe-basedinterconnectsthatcannotdeliverlinearityofperformanceasGPUcountincreases.TheDGXA100systemisalsoequippedwitheightsingle-portNVIDIAMellanoxConnectX-6VPIHDRInfiniBandadaptersforclusteringandtwodual-portConnectX-6VPIEthernetadapterforstorageandnetworking,allcapableof200Gb/s.
11DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
NVIDIADGXA100systemwitheightNVIDIAA100TensorCoreGPUs
2.5
2.6
NVIDIANGC
TheNVIDIANGC?containerregistryprovidesresearchers,datascientistsanddeveloperswithsimpleaccesstoacomprehensivecatalogofGPU-acceleratedsoftwareforAI,DL,machinelearning(ML)andHPCthattakefulladvantageofNVIDIADGXA100systems.NGCprovidescontainersfortoday’smostpopularAIframeworkssuchasRAPIDS,Caffe2,TensorFlow,PyTorch,MXNetandTensorRT,whichareoptimizedforNVIDIAGPUs.Thecontainersintegratetheframeworkorapplication,necessarydrivers,librariesandcommunicationsprimitivesandtheyareoptimizedacrossthestackbyNVIDIAformaximumGPU-acceleratedperformance.NGCcontainersincorporatetheNVIDIACUDA?Toolkit,whichprovidestheNVIDIACUDABasicLinearAlgebraSubroutinesLibrary(cuBLAS),theNVIDIACUDADeepNeuralNetworkLibrary(cuDNN),andmuchmore.TheNGCcontainersalsoincludetheNVIDIACollectiveCommunicationsLibrary(NCCL)formulti-GPUandmulti-nodecollectivecommunicationprimitives,enablingtopologyawarenessforDLtraining.NCCLenablescommunicationbetweenGPUsinsideasingleDGXA100systemandacrossmultipleDGXA100systems.
Billofmaterials
Billofmaterials
Component
Purpose
Quantity
?DellEMCIsilonF800
?96TBSSD
?1TBRAM
?Four1GbE,eight40GbEinterfaces
Sharedstorage
1chassis
(4nodes)
?NVIDIAMellanoxSN3700V200GbEthernetSwitch
StorageFabricSwitch
2
?NVIDIAMellanoxQM8700InfiniBandHDRSwitch
ComputeFabricSwitch
2
?NVIDIADGXA100system
?8NVIDIAA100TensorCoreGPUswith40GB
ComputeServer
4
12DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
Component
Purpose
Quantity
?Two64-CoreAMDEPYC7742@3.3GHz
?1TBRAM
?2xDual-PortNVIDIAMellanoxConnectX-6VPI200Gb/sEthernet
?8xSingle-PortNVIDIAMellanoxConnectX-6VPI200Gb/sHDRInfiniBand
?SOFTWAREVERSIONS
Softwareversionsthatweretestedforthisdocument
Component
Version
?DellEMCIsilon–OneFS
?
?Patches:8.2.1_KGA-RUP_2020-04_268538,8.2.1_UGA-PATCH-INFRA_2019-11_263088,8.2.1_UGA-RUP_2020-04_268536
?NVIDIAMellanoxSN3700V–NCLUVersion
?1.0-cl4.2.1u1
?NVIDIAMellanoxSN3700V–DistributionRelease
?4.2.1
?NVIDIAMellanoxQM8700ProductRelease
?3.9.0606
?DGXA100–BaseOS
?4.99.11
?DGXA100–Linuxkernel
?5.3.0-59-generic
?DGXA100–NVIDIADriver
?450.51.06
?DGXA100–Ubuntu
?18.04.5LTS
?NVIDIANGCMXNetImage
?nvcr.io/nvidia/mxnet:20.06-py3
?MLPerfBenchmarks
?
/mlperf/training_results_v0.7/tree/master/N
VIDIA/benchmarks/resnet/implementations/mxnet
Deeplearningtrainingperformanceandanalysis
13DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
3Deeplearningtrainingperformanceandanalysis
3.1Benchmarkmethodology
Inordertomeasuretheperformanceofthesolution,theimageclassificationbenchmarkfromthe
MLPerf
BenchmarkSuite
repositorywasexecuted.Thisbenchmarkperformstrainingofanimageclassificationconvolutionalneuralnetwork(CNN)onlabeledimagesusingMXNet.Essentially,thesystemlearnswhetheranimagecontainsacat,dog,car,train,etc.Thewell-knownILSVRC2012imagedataset(oftenreferredtoasImageNet)wasused.Thisdatasetcontains1,281,167trainingimagesin144.8GB1.Allimagesaregroupedinto1000categoriesorclasses.ThisdatasetiscommonlyusedbyDLresearchersforbenchmarkingandcomparisonstudies.
TheindividualJPEGimagesintheImageNetdatasetwereconvertedtoRecordIOformat.Thedatasetwasnotresized,notnormalizedandnopreprocessingwasperformedontherawImageNetJPEGimages.ItmaintainstheimagecompressionofferedbytheJPEGformatandthetotalsizeofthedatasetremainedroughlythesame(148GB).Theaverageimagesizewas115KB.
ThebenchmarkresultsinthissectionwereobtainedwithfourF800nodesinthecluster.Eachresultistheaverageoffiveexecutions.
3.2
MLPerfBenchmarkresults
Thereareafewconclusionsthatwecanmakefromthebenchmarksrepresentedin
Figure5
.
?Imagethroughputandthereforestoragethroughputscalelinearlyfrom8to32GPUs.
?ThedifferencebetweenEpoch0(whenthedataispulledfromstorageandcached)andOverallisminor,sothestorageisnotabottleneck.
1AllunitprefixesinthisdocumentusetheSIstandard(base10)where1GBis1billionbytes.
Deeplearningtrainingperformanceandanalysis
14DellEMCPowerScaleandNVIDIADGXA100SystemsforDeepLearning|H18597
ResNet-50
Images/Sec
80,000
70,000
60,000
50,000
40,000
30,000
20,000
10,000
-
,705
54
20
14
78,4
76,7
747
39,
38,6
,947
21
20
8
32
16
GPUs
OverallEpoch0
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五版粉煤灰運(yùn)輸環(huán)保風(fēng)險(xiǎn)評估與治理服務(wù)合同3篇
- 二零二五年服務(wù)合同違約金支付與損害賠償3篇
- 二零二五版地下室房屋租賃合同附條件續(xù)約協(xié)議3篇
- 二零二五版旅游景點(diǎn)停車場車位租賃及旅游服務(wù)合同3篇
- 二零二五版硅酮膠產(chǎn)品市場調(diào)研與分析合同3篇
- 二零二五版白酒瓶裝生產(chǎn)線租賃與回購合同3篇
- 二零二五年度養(yǎng)老社區(qū)場地租賃與管理合同3篇
- 二零二五版消防安全評估與應(yīng)急預(yù)案合同3篇
- 2025年度綠色建筑節(jié)能改造合同范本2篇
- 二零二五版房產(chǎn)抵押合同變更及合同終止協(xié)議3篇
- 大學(xué)計(jì)算機(jī)基礎(chǔ)(第2版) 課件 第1章 計(jì)算機(jī)概述
- 數(shù)字化年終述職報(bào)告
- 《阻燃材料與技術(shù)》課件 第5講 阻燃塑料材料
- 2025年蛇年年度營銷日歷營銷建議【2025營銷日歷】
- 2024年職工普法教育宣講培訓(xùn)課件
- 安保服務(wù)評分標(biāo)準(zhǔn)
- T-SDLPA 0001-2024 研究型病房建設(shè)和配置標(biāo)準(zhǔn)
- (人教PEP2024版)英語一年級上冊Unit 1 教學(xué)課件(新教材)
- 全國職業(yè)院校技能大賽高職組(市政管線(道)數(shù)字化施工賽項(xiàng))考試題庫(含答案)
- 2024胃腸間質(zhì)瘤(GIST)診療指南更新解讀 2
- 光儲(chǔ)電站儲(chǔ)能系統(tǒng)調(diào)試方案
評論
0/150
提交評論