版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
Dr.
Ettikan
Kandasamy
KaruppiahDirector,
Developers’
Ecosystem2
November2017REAL
WORLD
PROBLEMSIMPLIFICATIONUSING
DEEP
LEARNING
/
AI“Find
where
I
parkedmy
car”AI
ISEVERYWHERE“Find
the
bag
I
just
sawin
thismagazine”“What
movie
shouldI
watch
next?”Bringing
grandmother
closer
tofamily
by
bridging
language
barrierTOUCHING
OUR
LIVESPredicting
sick
baby’s
vitals
like
heartrate,
blood
pressure,
survival
rateEnabling
the
blind
to
“see”
theirsurrounding,
read
emotions
on
facesIncreasing
public
safety
with
smartvideo
surveillance
at
airports
&
mallsAI
FORPUBLICGOODProviding
intelligent
servicesin
hotels,
banks
andstoresSeparating
weeds
as
itharvests,reduces
chemical
usage
by
90%5HOW
A
DEEP
NEURAL
NETWORK
SEESRaw
dataLow-level
featuresMid-level
featuresHigh-level
featuresKNOWYOURPROBLEM
WELL3000AIMOMENTUMTODAYBY
2020AIstartups85%$47B20%of
allcustomerspend
on
AIof
companies
willservice
interactions
willtechnologiesdedicate
workers
tobe
powered
bymonitor
and
guideAI
botsneural
networksEVERY
INDUSTRY
HAS
AWOKEN
TO
AIOrganizations
engaged
withNVIDIA
on
DeepLearning201420161,54919,439Higher
EdInternetHealthcareFinanceAutomotiveGovernmentOthersDeveloper
Tools9GPU-Computing
perf1.5X
per
year1000Xby2025RISE
OF
GPUCOMPUTING105104103102107106Single-threaded
perf1980
1990
2000
2010
2020Original
data
up
to
theyear
2010
collectedand
plotted
by
M.
Horowitz,
F.
Labonte,O.
Shacham,
K.
Olukotun,
L.
Hammond,
and
C.
Batten
New
plot
and
data
collectedfor
2010-2015
by
K.
Rupp1.5X
per
year1.1X
per
yearAPPLICATIONSSYSTEMSALGORITHMSCUDAARCHITECTURE10CPU
GPUAdd
GPUs:
Accelerate
Data
Processing
&
Analytics?
NVIDIA
2013NVIDIA
IGNITESTHE
AI
BIG
BANGArtificial
intelligence
is
the
use
of
computers
to
simulate
human
intelligence.AI
amplifies
our
cognitive
abilities
—
letting
us
solve
problems
where
thecomplexity
is
too
great,
the
information
is
incomplete,
or
the
details
aretoo
subtle
and
require
expert
training.Learning
from
data
—
a
computer’s
version
of
life
experience
—
is
how
AI
evolves.GPU
computing
powers
the
computation
required
for
deep
neural
networks
to
learnto
recognize
patterns
from
massive
amounts
of
data.This
new
computing
model
sparked
the
AI
era.DEEP
LEARNING
FRAMEWORKSVISIONSPEECHBEHAVIORImage
Classification
Object
DetectionVoiceRecognition LanguageTranslationRecommendationEnginesSentiment
AnalysisDEEP
LEARNINGcuDNNMATH
LIBRARIEScuBLAS
cuSPARSE
cuFFTMULTI-GPUNCCLMocha.jlNVIDIA
DEEP
LEARNING
SDKHigh
PerformanceGPU-Acceleration
for
DeepLearningANNOUNCING
TESLA
V100GIANT
LEAP
FORAI
&
HPCVOLTA
WITH
NEW
TENSORCORE21Bxtors
| TSMC
12nm
FFN
|
815mm25,120
CUDAcores7.5
FP64
TFLOPS
| 15
FP32
TFLOPSNEW
120
Tensor
TFLOPS20MB
SM
RF
| 16MBCache16GB
HBM2@
900
GB/s300
GB/sNVLinkNEW
TENSOR
CORENew
CUDA
TensorOp
instructions&
data
formats4x4
matrix
processing
arrayD[FP32]
=
A[FP16]
*
B[FP16]
+
C[FP32]Optimized
for
deep
learningActivation
InputsWeights
InputsOutput
Results1516MODEL
COMPLEXITY
IS
EXPLODING2016
—
Baidu
Deep
Speech
22015—
Microsoft
ResNet2017
—
NMT105
ExaFLOPS8.7
Billion
Parameters20ExaFLOPS300
Million
Parameters7
ExaFLOPS60
Million
ParametersREVOLUTIONARY
AI
PERFORMANCE3X
Faster
DL
Training
PerformanceOver
80x
DL
TrainingPerformance
in
3
Years1x
K808x
P100cuDNN64x
M40cuDNN340x20x60xQ2170x
cuDNN2Q1
Q3
Q215
15
16Googlenet
TrainingPerformance(Speedup
Vs
K80)100x8xV100cuDNN780xSpeedup
vs
K8085%
Scale-Out
EfficiencyScales
to
64
GPUs
with
MicrosoftCognitive
Toolkit0510158XV1008XP100Multi-NodeTraining
with
NCCL2.0(ResNet-50)ResNet50Training
for90
Epochs
with
1.28M
images
dataset
|
Using
Caffe2
|
V100performance
measured
onpre-productionhardware.64X
V100
1
Hour7.4
Hours18
Hours3X
Reduction
in
Time
to
TrainOver
P100010201XV1001XP1002XCPULSTMTraining
(Neural
Machine
Translation)NeuralMachineTranslationTraining
for
13Epochs
|German
->English,
WMT15
subset
|
CPU
=
2x
Xeon
E5
2699
V4
|
V100
performance
measured
onpre-production
hardware.15
Days18
Hours6Hours18Deep
Learning-Inferencing:TESLAV100
DELIVERS
NEEDED
RESPONSIVENESS
WITH
UP
TO
99X
MORE
THROUGHPUT01,0002,0003,0004,0005,000CPU
ServerTeslaV100ResNet-504,647
i/s@7mslatency47
i/s@21msLatency06001,2001,800CPU
ServerTeslaV100VGG-161,658i/s@7msLatency23
i/s@43msLatency05001,0001,5002,0002,5003,0003,500CPU
ServerTeslaV100GoogleNet3,270
i/s@7msLatency136i/s@7msLatencyThroughput
(Images/Sec)Throughput
(Images/Sec)Throughput
(Images/Sec)GPUServers:
Dual
Xeon
E5-2690
v4@2.6GHz
with
16GB
PCIe
GPUs
configs
as
shownUbuntu
14.04.5,
CUDA
9.0.103,
cuDNN
3;
NCCL
2.0.4,
TensorRT
pre-release,
data
set:
ImageNet,GPU
Optimalbatchsize
used
toachieve
7mslatency;
CPU
batchsize
reduced
to
1if
latency
exceeds
7msCPU:
Xeon
E5-2690
V4NVIDIA
METROPOLIS
—EDGETO
CLOUDAI
CITYPLATFORMCameraCLOUDTraining
and
InferenceEDGE
ANDON-PREMISESInferenceDGX
ApplianceVideo
recorderServerJETSONTESLA/QUADROJETPACK,
TENSOR
RT,
DEEPSTREAM200102030405060HPCG
Performance
EquivalencySingle
GPU
Server
vs
Multiple
CPU-Only
ServersCPU
Server:
Dual
Xeon
E5-2690
v4@2.6GHz,
GPU
Servers:
same
CPU
server
w/V100s
PCIeCUDA
Version:
CUDA
9.0.103;
Dataset:
256x256x256
local
sizeTo
arriveatCPU
node
equivalence,
we
use
measured
benchmark
withup
to
8
CPU
nodes.
Then
we
use
linear
scaling
toscalebeyond
8nodes.HPCGBenchmarkExercises
computational
and
data
accesspatterns
that
closely
match
a
broad
set
ofimportant
HPC
applicationsVERSION3ACCELERATED
FEATURESAllSCALABILITYMulti-GPU
and
Multi-NodeMORE
INFORMATIONhttp://www.hpcg-/index.html19
CPUServers37
CPUServers67
CPUServers#
of
CPU
Only
Servers701server8x
V100GPU’s19x36x67xSpeed
up
vsCPU
server1server2x
V100GPU’s1server4x
V100GPU’sREINVENTING
OUR
COMMUNITYREINVENTING
OUR
COMMUNITY23AUTOMOTIVE
DEEP
LEARNINGDeepLearning
for
Damage
Estimation
based
on
photo
of
the
damageCustom
development
for
one
of
top
5
InsuranceCompaniesData
Processed
to
Date*
Total
Data
Available90,000
Claims~380,000
car
imagesScope
of
Work1000
Claims4500
images*Time
lag
in
data
migration,data
clean-up
and
tagging
imagesVisualization
of
the
activations
of
theconvolutions
over
a
car
damage
imageVisualization
of
the
layersof
a
neuralnetwork
and
the
back
propagation
processNeural
Network
Demohttps://www.galacticar.ai/
->
Galaxy.ai
:
Artificial
Intelligence
Driven
Sedan
Damage
EstimatorSoftware
as
a
ServiceAnnual
fee
+
fixed
feeevery
time
API
is
pingedIntegration
with
Insurance
Mobile
app.Use
cases
verified
by
the
industry
and
insurance
clients:Whether
clientshould
file
a
claim
or
not
-
Claims
triagingClaims
estimation
from
damaged
sedan
vehicle
imagesBusiness
Model(1)
DEEP
LEARNING
IN
FINANCE
-TRADING/gtc/2016/presentation/s6589-masahiko-todoriki-performance-improvement-algorithmic-trading.pdf/
20
Oct
2016Generic
Bigdata
OperationsData
AnalyticsData
VisualizationScrambling&ProtectingDataGovernancePublishedDataMarts/LakeHarmonization&
OntologyMappingDataHarmonizationDataQuality
&IntegrityDataCleansingDataStagingDataAcquisitionData
PreparationData
DisseminationData
Warehousing/Storage
(including
Network
Data)Virtualized
Platform
&
Security
ManagementHarmonizationTerminologiesCheck
fordifferentRepresentationand
usageOntologyCorrection&
AssuranceCheck
forCorrectionExceptionFields/DataDataAnonymity
&DataProtectionStructuredDataSourcesStructuredData
AccessDataModelingSentimentAnalyticsData
Reporting&
VisualizationDataInterpretationSocial
NetworkUnderstandingReal-timeDataSourcesReal-time
DataIngestionDataAnalyticsNetworkAnalyticsUnstructuredData
SourcesUnstructuredDataCollectionData……….DataStatisticsMobileSharingTabletSharingPCSharingPush
&
Pull
Data
PlatformData
Models&Schema29GPU
DATABASES
ARE
EVEN
FASTER1.1Billion
Taxi
Ride
Benchmarks2130156080991250150269225037269629705000450040003500300025002000150010005000Query
1Query
2Query
3Query
4Time
in
MillisecondsMapDDGX-1 MapD
4
xP100Source:
MapD
Benchmarks
on
DGX
from
internal
NVIDIA
testing
following
guidelinesofMark
Litwintschik’s
blogs:
Redshift,6-node
ds2.8xlarge
cluster
&
Spark
2.1,
11
x
m3.xlarge
cluster
w/
HDFSRedshift
6-node Spark
11-node@marklit8210190
8134
19624
85942NVIDIA
CONFIDENTIAL.
DO
NOTDISTRIBUTE.
30Silicon
Valley-based
Blue
RiverTechnology
has
developeda
deep
learning
solutioncalledLettuceBot
that
rolls
through
afield
photographing
5,000young
plants
a
minute,
usingalgorithms
and
machine
visionto
identify
each
sprout
aslettuce
or
a
weed.
Thecompany
trained
their
neuralnetwork
with
GPUs
and
theCaffe
deep
learningframework.Accurate
within
a
quarter
inch,the
LettuceBot
automaticallypinpoints
weeds,underdeveloped
sprouts,
andoverplanted
areas
and
thenapplies
tiny
doses
of
herbicideto
maximize
crop
production.Automated
Crop
ManagementSource
:
/artificial-intelligence-helping-to-ensure-humanitys-future-food-supply/NVIDIA
CONFIDENTIAL.
DO
NOTDISTRIBUTE.
31Researchers
from
the
CostaRica
Institute
of
Technologyand
French
AgriculturalResearch
Centre
forInternational
Developmentdeveloped
a
deep
learningalgorithm
toautomaticallyidentify
plant
specimens
thathave
been
pressed,
dried
andmounted
on
herbarium
sheets.According
to
the
researchers,this
is
the
first
attempt
to
usedeep
learning
to
tacklethedifficult
taxonomic
task
ofidentifying
species
in
natural-history
collections.Artificial
Intelligence
Helps
Identify
Plant
SpeciesSource
:
/artificial-intelligence-helps-identify-plant-species-for-science/32Trainwhole
slide
imagesamplesampletraining
datanormaltumorTestwhole
slide
imageoverlapping
imagepatchestumor
prob.
map1.00.00.5Convolutional
NeuralNetworkP(tumor)How
does
it
work?SAFE
AND
SMART
CITIES
IS
AN
AI
PROBLEM0M200M400M600M800M1,000M201620201
billion
installed
security
cameras
WW(2020)30
billion
frames
per
dayChallenging
real
world
conditionsTraditional
video
analytics
nottrustworthyAccuracyImage
Classification9…HumanDeep
Learning74%Hand-coded
CV2010
2011
2012
2013
2014
2015
2016AIachieves
super
human
resultsAI
driven
intelligent
video
analytics34353639AI
FORVISUAL
SEARCHIN
MARKET
PLACE40TESLA
PLATFORMLeading
Data
CenterPlatform
for
HPC
and
AITESLA
GPU
&
SYSTEMSNVIDIA
SDKINDUSTRY
TOOLSAPPLICATIONS&SERVICESECOSYSTEM
TOOLS
&
LIBRARIES+400
MoreApplicationsHPCcuBLAScuDNNTensorRTDeepStreamSDKFRAMEWORKSMODELSCognitive
ServicesAI
TRAINING
&
INFERENCEMachine
Learning
ServicesResNetGoogleNetAlexNetDeepSpeechInceptionBigLSTMDEEP
LEARNING
SDKNCCLC/C++COMPUTEWORKSTESLA
GPU
NVLINK
SYSTEM
OEM
CLO
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 小學(xué)生活趣事體驗(yàn)測試
- 物聯(lián)網(wǎng)技術(shù)在車聯(lián)網(wǎng)中的應(yīng)用策略
- 如何正確對待家里的小動物
- 小學(xué)漓江風(fēng)光小測驗(yàn)
- 如何正確處理一次失敗
- 初中物理綜合模擬試卷
- 高考復(fù)習(xí)模擬卷
- 木工機(jī)床 術(shù)語 征求意見稿
- 地理水循環(huán)與洋流課件
- 2024房屋轉(zhuǎn)租的簡單版合同范本(30篇)
- 傷殘鑒定復(fù)議申請書
- 食物氨基酸含量表
- 大連海事大學(xué)新題庫,甲類海員考試-帶翻譯
- 鋼結(jié)構(gòu)桁架吊裝安裝專項(xiàng)施工方案30
- 英文經(jīng)典電影賞析智慧樹知到答案章節(jié)測試2023年武漢科技大學(xué)
- 技術(shù)經(jīng)紀(jì)人練習(xí)題集附有答案
- (37)-腕踝針法(1)1481字刺法灸法學(xué)
- GB/T 41837-2022溫泉服務(wù)溫泉水質(zhì)要求
- GB/T 6495.3-1996光伏器件第3部分:地面用光伏器件的測量原理及標(biāo)準(zhǔn)光譜輻照度數(shù)據(jù)
- 小學(xué)道德與法治學(xué)科高級(一級)教師職稱考試試題(有答案)
- 鈍感力復(fù)習(xí)課程
評論
0/150
提交評論