Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期

上傳人：媚*** IP屬地：境外上傳時(shí)間：2024-04-23 格式：PPTX 頁數(shù)：38 大?。?.59MB 積分：12 舉報(bào) 版權(quán)申訴

Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期_第2頁

Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期_第3頁

Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期_第4頁

Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期_第5頁

已閱讀5頁，還剩33頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

DeepLearning/AILifecycle

with

DellEMCand

bitfusionBhavesh

PatelDell

EMC

Server

Advanced

EngineeringAbstractThis

talk

gives

overview

the

end

application

life

cycle

ofdeep

learning

the

enterprise

along

with

numerous

use

cases

andsummarizes

studies

done

Bitfusion

and

Dell

high

performanceheterogeneous

elastic

rack

DellEMC

PowerEdge

C4130s

with

NvidiaGPUs.

Some

the

use

cases

that

will

talked

about

detail

will

beability

bring

on-demand

GPU

acceleration

beyond

the

rack

across

the

enterprise

with

easy

attachable

elastic

GPUs

for

deep

learningdevelopment,

well

the

creation

cost

effective

software

definedhigh

performance

elastic

multi-GPU

system

combiningmultipleDellEMC

C4130

servers

runtime

for

deep

learning

training.Deep

Learning

and

Are

being

adoptedacross

wide

range

market

segmentsIndustry/FunctionAI

RevolutionComputer

Vision

&Speech,Drones,DroidsInteractive

Virtual

Mixed

RealitySelf-Driving

Cars,

Co-PilotAdvisorPredictive

Price

Analysis,Dynamic

DecisionSupportDrug

Discovery,

Protein

SimulationPredictive

Diagnosis,Wearable

IntelligenceGeo-Seismic

Resource

DiscoveryAdaptive

Learning

CoursesAdaptive

Product

RecommendationsDynamic

Routing

OptimizationBots

And

Fully-Automated

ServiceDynamic

Risk

Mitigation

And

Yield

OptimizationROBOTICSENTERTAINMENTAUTOMOTIVEFINANCEPHARMAHEALTHCAREENERGYEDUCATIONSALESSUPPLY

CHAINCUSTOMER

SERVICEMAINTENANCE...but

few

people

have

the

time,knowledge,

resources

even

get

startedPROBLEM

HARDWARE

INFRASTRUCTURE

LIMITATIONSIncreased

cost

with

dense

serversTOR

bottleneck,

limited

scalabilityLimited

multi-tenancy

GPUservers

(limited

CPU

and

memoryper

user)Limited

8-GPU

applicationsDoes

not

support

GPU

apps

with:High

storage,

CPU,

MemoryrequirementsPROBLEM

SOFTWARE

COMPLEXITYOVERLOADSoftware

ManagementGPU

Driver

ManagementFramework

Library

InstallationDeep

Learning

Framework

ConfigurationPackageManagerJupyter

Server

IDE

SetupData

ManagementData

UploaderShared

Local

File

SystemData

Volume

ManagementData

Integrations

PipeliningModel

ManagementCode

Version

ManagementHyperparameter

OptimizationExperiment

TrackingDeployment

AutomationDeployment

Continuous

IntegrationWorkload

ManagementJob

SchedulerLog

ManagementUser

Group

ManagementInference

AutoscalingInfrastructure

ManagementCloud

Server

OrchestrationGPU

Hardware

SetupGPU

Resource

AllocationContainer

OrchestrationNetworking

Direct

BypassMPI

/RDMA

/RPI/gRPCMonitoringNeed

Simplify

andScaleSOLUTION

1/2:

CONVERGED

RACK

SOLUTIONComposable

computebundleUp

GPUs

per

applicationGPU

applications

with

varied

storage,memory,

CPU

requirements30-50%

less

cost

per

GPU>

{cores,

memory}

GPU>>

intra-rack

networking

bandwidthLess

inter-rack

loadComposable

Add-as-you-goSOLUTION

2/2:

COMPLETE,

STREAMLINED

DEVELOPMENTDevelop

pre-installed,

quickstart

deep

learning

containers.??Get

work

quickly

withworkspaces

with

optimized

pre-configured

drivers,

frameworks,libraries,andnotebooks.Start

with

CPUs,

and

attachElasticGPUs

on-demand.Allyour

code

and

data

issavedautomatically

and

sharable

withothers.Transition

from

developmentto

training

with

multipleGPUs.?Seamlessly

scale

out

moreGPUs

shared

training

clusterto

train

larger

models

quickly

andcost-effectively.Support

and

manage

multipleusers,teams,

and

projects.Train

multiple

models

parallelfor

massive

productivityimprovements.Pushtrained,

finalized

modelsinto

production.?Deploy

trained

neural

networkinto

production

and

perform

real-time

inference

across

differenthardware.Managemultiple

applicationsand

inference

endpointscorresponding

different

trainedmodels.?GPUGPUGPUGPUGPUGPGPUGPUGPUU

GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU12Dell

EMC

Deep

Learning

Optimized

serversVerticalSegmentApplicationsOpenSourceFrameworksOptimizedLibrariesOperatingSystemProcessor/AcceleratorComputePlatformC4130R730C6320P

inC6300GPUKNLPhiinC6320P

SledNvLink-GPUC4130

DEEP

LEARNING

ServerFront(optional)

RedundantPower

SuppliesDual

SSDbootdrivesBackIDRAC

NIC2x

1GbNICFrontPowerSuppliesGPUaccelerators(4)CPU

sockets(under

heatsinks)8fansGPU

DEEP

LEARNING

RACK

SOLUTIONFeaturesR730C4130CPUE5-2669

v3@2.1GHzE5-2630

v3@

2.4GhzMemory4GB1TB/node;

64G

DIMMStorageIntel

PCIe

NVMEIntel

PCIe

NVMENetworking

IOCX3

FDRInfiniBandCX3

FDRInfiniBandGPUNAM40-24GBTOR

SwitchMellanox

SX6036-

FDRSwitchCablesFDR

56G

DCA

CablesConfiguration

DetailsR730C4130Pre-Built

AppContainersGPU

and

WorkspaceManagementElastic

GPUs

across

theDatacenterSoftware

definedScaled

out

GPU

ServersGPU

DEEP

LEARNING

RACK

SOLUTIONPre-Built

App

ContainersGPUandWorkspaceManagementElastic

GPUs

across

theDatacenterSoftware

definedScaledoutGPU

Servers1

Develop2

Train3DeployEnd

End

Deep

Learning

Application

Life

CycleGPUGPU

GPU

GPUGPUGPU

GPU

GPUGPUGPU

GPU

GPUGPUGPU

GPU

GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUC4130

#1GPU

NodesInfinibandSwitchCPU

NodesC4130

#2C4130

#3C4130

#4R730

#1R730

#2…but

wait,

‘converged

compute’requires

network

attached

GPUs...R730C4130BITFUSION

CORE

VIRTUALIZATIONGPU

Device

VirtualizationAllows

dynamic

GPU

attach

per-application

basisFeaturesAPIs: CUDA,

OpenCLDistribution:

scale-out

remote

GPUsPooling:

Oversubscribe

GPUsResourceProvisioning:

Fractional

vGPUsHigh

Availability:

Automatic

DMRManageability:

Remote

nvidia-smiDistributed

CUDA

Unified

MemoryNative

support

for

IB,

GPUDirect

RDMAFeature

complete

with

CUDA

8.0PUTTING

ALL

TOGETHERCLIENT

SERVERGPUSERVERGPUSERVERGPUSERVERBitfusion

Flex,managed

containersBitfusion

Service

DaemonBitfusion

Client

LibraryNATIVE

VS.

REMOTE

GPUsCPUGPU

0GPU

1PCIeCPUGPU

0HCAPCIeCPUHCAGPU

1PCIeCompletely

transparent:

All

CUDA

Apps

see

local

and

remote

GPUs

directly

connectedResultsREMOTE

GPUs

LATENCY

AND

BANDWIDTHData

movement

overheads

the

primary

scalinglimiterMeasurements

done

application

level

–cudaMemcpyFast

Local

GPU

copiesPCIe

Intranode

copies16

GPU

virtual

system:

Naive

implementation

TCP/IPC4130Fast

local

GPUcopiesIntranode

copies

via

PCIeLow

BW,

High

Latency

remote

copiesOSBypass

needed

avoidprimary

TCP/IP

overheadsAIapps

are

very

latency

sensitivenode

0node

1node

2node

316

GPU

virtual

system:

Bitfusion

optimized

transport

and

runtimeSame

FDRx4

transport,

but

drop

IPoIBReplace

remotecallswith

native

verbsRuntime

selectionof

intranode

RDMA

vs.cudaMemcpyMulti-rail

communications

where

availaRbemleote=~

Native

Local

GPUsRuntime

optimizations:

pipelining,

speMciunilmaatlivNUeMA

effectsexecution,

distributed

caching

eventcoalescing,…SLICE

DICE

THAN

ONE

WAY

GET

GPUsCaffe

GoogleNetTensorFlowPixel-CNNR730C4130Native

GPU

performance

with

networkattached

GPUsRun

time

comparison

(lower

better)

→Multiple

ways

create

virtual

GPU

node,with

nativeefficiency(secsto

trainCaffeGoogleNet,

batch

size:

128)TRAINING

PERFORMANCEContinued

Strong

ScalingCaffe

GoogleNetWeak-scalingAccelerate

Hyper

parameter

OptimizationCaffe

GoogleNet

TensorFlow1.0

with

Pixel-CNN74%73%55%53%86%PCIe

host

bridge

limit124816nativeremoteR730C4130Other

PCIe

GPU

Configurations

AvailableCurrently

TestingConfig

‘G’Further

reading:/techcenter/high-performance-computing/b/gener

al_hpc/archive/2016/11/11/deep-learning-performance-with-p100-gpushttp:///techcenter/high-performance-computing/b/general_h

pc/archive/2017/03/22/deep-learning-inference-on-p40-gpuso3f0YNvLink

Configuration????4P100-16GBSXM2GPU2CPUPCIeswitch1

PCIe

slot

–

EDRIBSXM2#3Config

‘K’SXM2#2SXM2#4SXM2#1o3f1YNvLink

Configuration?????4P100-16GBSXM2GPU2CPUPCIeswitch1

PCIe

slot

–

EDRIBMemory

256GBw/16GB@

2133OS:

Ubuntu

16.04CUDA:

8.1??Config

‘L’SXM2#3SXM2#2SXM2#4SXM2#1PCIe

SwitchSoftware

Solutionso3f319Overview

–

Bright

Dell

EMC

has

partnered

withBrightComputing

offertheir

BrightML

package

the

software

stack

onDell

EMC

Deep

learninghardwaresolution.o3f419Bright

OverviewMachine

Learning

SeismicImaging

Using

KNL

FPGA–Project

#1Bhavesh

Patel

–

Server

Advanced

EngineeringRobert

Dildy

Product

Technologist

Sr.

Consultant,Engineering

Solutions36AbstractThis

paper

focused

how

apply

Machine

Learning

seismic

imaging

with

the

use

FPGA

aco-accelerator.It

will

cover

hardware

technologies:

Intel

KNL

Phi

FPGA

and

also

address

how

use

Machine

learningforseismic

imaging.There

are

different

types

accelerators

GPU,

Intel

Phi

but

are

choosing

study

how

can

use

i-ABRAplatform

KNL

FPGA

train

the

neural

network

using

Seismic

Imaging

data

and

then

doing

the

inference.Machine

learning

broader

sense

can

divided

into

parts

namely

Training

and

Inference.37BackgroundSeismic

Imaging

standard

data

processing

technique

used

creating

image

subsurface

structures

ofthe

Earth

from

measurements

recorded

the

surface

via

seismic

wave

propagations

captured

from

varioussound

energy

sources.There

are

certain

challenges

with

Seismic

data

interpretation

starting

replace

for

seismicinterpretation.There

has

been

rapid

growth

use

computer

vision

technology

several

companies

developing

imagerecognition

platforms.

This

technology

being

used

for

automatic

photo

tagging

and

classificatio

人人文庫> 全部分類> 行業(yè)資料 > 信息產(chǎn)業(yè)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期

文檔簡介

溫馨提示

最新文檔

評論

Deep Learning:AI 與 Dell EMC 和 Bitfusion 的生命周期

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔