版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
2024人工智能AI技術(shù)教程課程講義名稱備注1課程介紹Overviewand
system/AI
basics2人工智能系統(tǒng)概述System
perspective
of
System
for
AISystem
for
AI:
a
historic
view;
Fundamentals
of
neural
networks;Fundamentals
of
System
for
AI3深度神經(jīng)網(wǎng)絡(luò)計(jì)算框架基礎(chǔ)Computation
frameworks
for
DNNBackprop
and
AD,
Tensor,
DAG,
Execution
graphPapers
and
systems:
PyTorch,
TensorFlow4矩陣運(yùn)算與計(jì)算機(jī)體系結(jié)構(gòu)Computer
architecture
for
Matrix
computationMatrix
computation,
CPU/SIMD,
GPGPU,
ASIC/TPUPapers
and
systems:
Blas,
TPU5分布式訓(xùn)練算法Distributed
training
algorithmsData
parallelism,
model
parallelism,
distributed
SGDPapers
and
systems:6分布式訓(xùn)練系統(tǒng)Distributed
training
systemsMPI,
parameter
servers,all-reduce,
RDMAPapers
and
systems:
Horovod7異構(gòu)計(jì)算集群調(diào)度與資源管理系統(tǒng)Scheduling
and
resource
management
systemRunning
DNNjob
on
cluster:
container,
resource
allocation,
schedulingPapers
and
systems:
KubeFlow,
OpenPAI,
Gandiva,
HiveD8深度學(xué)習(xí)推導(dǎo)系統(tǒng)Inference
systemsEfficiency,
latency,
throughput,
and
deployment課程講義名稱備注9計(jì)算圖編譯優(yōu)化Computation
graph
compilation
and
optimizationIR,
sub-graph
pattern
match,
Matrix
multiplication
and
memoryoptimizationPapers
and
systems:
XLA,
MLIR,
TVM,NNFusion10模型壓縮和稀疏化處理Efficiency
via
compression
and
sparsityModel
compression,
SparsityPruning11自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)AutoML
systemsHyper
parameter
tuning,
NASPapers
and
systems:
Hyperband,
SMAC,
ENAS,AutoKeras,
NNI12強(qiáng)化學(xué)習(xí)系統(tǒng)Reinforcement
learning
systemsTheory
of
RL,
systems
for
RLPapers
and
systems:
AC3,
RLlib,
AlphaZero13模型安全與隱私保護(hù)Security
and
PrivacyFederated
learning,
security,
privacyPapers
and
systems:
DeepFake14用AI技術(shù)優(yōu)化計(jì)算機(jī)系統(tǒng)AIfor
systemsAI
for
traditional
systems
problems,
for
system
algorithmsPapers
and
systems:
Learned
Indexes,
Learned
query
path課程講義名稱備注Lab
1
(for
week1,2)框架及工具入門示例A
simple
throughout
end-to-end
AI
example,
from
asystem
perspectiveUnderstand
the
systems
from
debugger
info
andsystem
logsLab
2
(for
week
3)定制一個(gè)新的張量運(yùn)算Customize
operatorsDesign
and
implement
a
customized
operator
(bothforward
and
backward):
in
pythonLab
3
(for
week
4)CUDA實(shí)現(xiàn)和優(yōu)化CUDA
implementationAdd
a
CUDA
implementation
for
thecustomizedoperatorLab
4
(for
week
5,6)AllReduce實(shí)現(xiàn)和優(yōu)化AllReduceImprove
one
of
AllReduce
operators’implementation
onHorovodLab
5
(for
week
7,
8)配置Container來進(jìn)行云上訓(xùn)練或推理準(zhǔn)備Configure
containers
for
customized
training
and
inferenceConfigure
containersLab
6學(xué)習(xí)使用調(diào)度管理系統(tǒng)Scheduling
and
resource
management
systemGet
familiar
with
OpenPAI
or
KubeFlowLab
7分布式訓(xùn)練任務(wù)練習(xí)Distributed
trainingTry
different
kinds
of
all
reduce
implementationsLab
8自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)練習(xí)AutoMLSearch
for
a
new
neural
networkNN
structuree
forImage/NLP
tasksLab
9強(qiáng)化學(xué)習(xí)系統(tǒng)練習(xí)RLSystemsConfigure
and
get
familiar
with
one
of
the
followingRL
Systems:
RLlib,
…Deep
Learning深度學(xué)習(xí)正在改變世界Self-drivingPersonalassistantSurveillance
detectionTranslationMedicaldiagnosticsGameArtImage
recognitionSpeech
recognitionNatural
languageGenerative
modelReinforcement
learningCatDogRaccoonDogcatdoghoney
badger??1??2??3??4??5??error????5??error????4??error????3??error????2??error????1ErrorslossRDMA計(jì)算能力海量的(標(biāo)識(shí))數(shù)據(jù)14M
images深度學(xué)習(xí)算法的進(jìn)步語言、框架深度學(xué)習(xí)+系統(tǒng)的進(jìn)步:編程語言、優(yōu)化、計(jì)算機(jī)體系結(jié)構(gòu)、并行計(jì)算以及分布式系統(tǒng)MNISTImageNetWeb
Images60K
samples16M
samplesBillions
of
Images10
categories1000
categoriesOpened
categoriesE.g.,
image
classification
problem1257.73.31.44.71.70.23TEST
ERROR
RATE
(%)LeNet,convolution,max-pooling,softmax,
1998AlexNet,
16.4%ReLU,
Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residual
way,2015EfficientNet,3.1%NAS2019Image
recognitionSpeech
recognitionNatural
languageReinforcement
learning19602019CPUMoore’s
law108x1970
19801990
20002010ENIAC5
Kops~500
GopsXeon
E5DedicatedHardware105xGPUTPUTPUv3360
TopsV100TPUv1125
Tops90
Tops?Performance(Op/Sec)Deep
learning
frameworksMxNetTensorFlowCNTKPyTorchLanguage
FrontendSwift
for
TensorFlowCompiler
BackendTVMTensorFlow
XLACustom
purposemachine
learningalgorithmsTheanoDisBeliefCaffeAlgebra
&linear
libsCPUGPUDense
matmul
engineGPUFPGASpecial
AI
acceleratorsTPUGraphCoreOther
ASICsAI
frameworkDense
matmulengineDeep
learningframeworksprovide
easierways
to
leveragevarious
librariesCustom
purposemachine
learningalgorithmsTheanoDisBeliefCaffeAlgebra
&linear
libsCPUGPUA
Full-Featured
Programming
Language
forML:
Expressive
and
flexibleControl
flow,
recursion,
sparsityPowerful
Compiler
Infrastructure:Code
optimization,
sparsity
optimization,hardware
targetingMachine
Learning
Language
andCompilerSIMD
MIMDSparsity
SupportControl
Flowand
DynamicityAssociated
MemoryScalable
Network
Stack
(RDMA,
IB,
NVLink)Hardware
APIs
(GPU,
CPU,
FPGA,
ASIC)Resource
Management/SchedulerExperienceFrameworksArchitecture(single
node
and
Cloud)Deep
Learning
Runtime:Optimizer,
Planner,
ExecutorRuntimeEnd-to-End
AI
User
ExperiencesModel,
Algorithm,
Pipeline,
Experiment,
Tool,Life
CycleManagementProgramming
InterfacesComputation
graph,
(auto)
Gradient
calculationIR,
Compiler
infrastructureclass
3class
4class
5class
6class
7class
8更廣泛的AI系統(tǒng)生態(tài)機(jī)器學(xué)習(xí)新模式(RL)自動(dòng)機(jī)器學(xué)習(xí)(AutoML)安全與隱私模型推導(dǎo)、壓縮與優(yōu)化深度學(xué)習(xí)算法和框架廣泛用途的高效新型通用AI算法多種深度學(xué)習(xí)框架的支持與進(jìn)化深度神經(jīng)網(wǎng)絡(luò)編譯架構(gòu)及優(yōu)化核心系統(tǒng)軟硬件深度學(xué)習(xí)任務(wù)運(yùn)行和優(yōu)化環(huán)境通用資源管理和調(diào)度系統(tǒng)新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計(jì)算棧class
12class
11class
13class
10(2)開始訓(xùn)練(1)定義網(wǎng)絡(luò)結(jié)構(gòu)Fullyconnected 通常用作分類問題的最后幾層Convolutionalneural
network 通常用作圖像、語音等Locality強(qiáng)的數(shù)據(jù)Recurrentneural
network 通常用作序列及結(jié)構(gòu)化的數(shù)據(jù),比如文本信息、知識(shí)圖Transformerneural
network 通常用作序列數(shù)據(jù),比如文本信息#
A
recursive
TreeBank
model
in
a
dozen
lines
of
JPL
code#
Walk
the
tree,
accumulating
embedding
vecs#
Word
embedding
model
is
used
at
the
leaf
node
to
map
word#
index
into
high-dimensional
semantic
word
representation.#
Map
tree
embedding
to
sentiment#
Getsemantic
representations
forleft
and
right
children.#
A
composition
function
is
used
to
learn
semantic#
representation
for
phrase
at
the
internal
node.更多樣化的結(jié)構(gòu)更強(qiáng)大的建模能力更復(fù)雜的依賴關(guān)系更細(xì)粒度的計(jì)算模式Graph
definition
(IR)x
*w
b+
yFront-endLanguage
Binding:
Python,
Lua,
R,
C++OptimizationBatching,
Cache,
OverlapExecution
RuntimeCPU,
GPU,
RDMA
devicesTensorFlowx
yz*a+bΣcData-Flow
Graph
(DFG)as
Intermediate
Representation??b??a??x??y??z+??*??TensorFlowx
yz*a+bΣ
Σ??cAdd
gradient
backpropagation
to
Data-FlowGraph
(DFG)??b??a??z+??*??xy
z
??x
??y*a+bΣ
Σ??cCPU
codeGPU
code??b??a??z+??*??xy
z
??x
??y*a+bΣ
Σ??c......1OperatorsExperienceFrameworksArchitectureIDEProgramming
with:
VSCode,
Jupiter
NotebookLanguageIntegrated
with
mainstream
PL:
PyTorch
and
TensorFlow
inside
PythonCompilerIntermediate
representationCompilationOptimizationBasic
data
structure:
TensorLexical
analysis:
TokenUser
controlled:
mini-batchBasic
computation:
DAGParsing:
ASTData
parallelism
and
model
parallelismAdvance
features:
control
flowSemantic
analysis:Symbolic
ADLoop
nets
analysis:
pipeline
parallelism,control
flowGeneral
IRs:
MLIRCode
optimizationData
flow
analysis:
CSP,
Arithmetic,
FusionCode
generationHardware
dependent
optimizations:matrix
computation,
layoutResource
allocation
and
scheduler:memory,
recomputation,RuntimesSingle
node:
CuDNNMultimode:
Parameter
servers,
All
reducerComputation
cluster
resource
management
and
job
schedulerHardwareHardware
accelerators:CPU/GPU/ASIC/FPGANetworkaccelerators:
RDMA/IB/NVLinkDeep
learning
frameworksMxNetTensorFlowCNTKPyTorchLanguage
FrontendSwift
for
TensorFlowCompiler
BackendTVMTensorFlow
XLAAI
Framework
Densematmul
engineGPUFPGASpecial
AI
acceleratorsTPUGraphCoreOther
ASICsimport
"tensorflow/core/framework/to";import
"tensorflow/core/framework/op_to";import
"tensorflow/core/framework/tensor_toAFull-Featured
Programming
Languagefor
ML:
Expressive
and
flexibleControl
flow,
recursion,
sparsityPowerful
Compiler
Infrastructure:Code
optimization,
sparsity
optimization,hardwaretargetingMachine
Learning
Language
andCompilerSIMD
MIMDSparsity
SupportControl
Flowand
DynamicityAssociated
Memory//
Syntactically
similar
to
LLVM:func
@testFunction(%arg0:
i32){%x
=
call
@thingToCall(%arg0)
:
(i32)->
i32br
^bb1^bb1:%y
=
addi
%x,
%x:i32return
%y
:
i32}深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開發(fā)速度大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度Inference
performance
Serving
latency8
layers1.4
GFLOP16%
Error2012AlexNetImage152
layers22.6
GFLOP3.5%
Error2015ResNetSpeech80
GFLOP7,000
hrs
of
Data8%
Error2014Deep
Speech
1465
GFLOP12,000
hrs
of
Data5%
Error2015Deep
Speech
2Different
architectures:
CNN,RNN,
Transformer,
…High
computation
resourcerequirements:
model
size,
…Different
goals:
latency,throughput,
accuracy,
…Transparently
apply
over
heterogeneous
hardware
environmentScale-out Local
Efficiency Memory
EffectivenessBe
transparent
to
various
user
requirements系統(tǒng)、算法和硬件必須相互結(jié)合
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 肛周瘙癢癥的臨床護(hù)理
- 八年級(jí)英語EnviromentWriting課件
- JJF(陜) 051-2021 機(jī)動(dòng)車 GNSS 區(qū)間測(cè)速監(jiān)測(cè)系統(tǒng)標(biāo)準(zhǔn)裝置校準(zhǔn)規(guī)范
- JJF(陜) 004-2019 水泥膠砂流動(dòng)度測(cè)定儀校準(zhǔn)規(guī)范
- 人事風(fēng)險(xiǎn)管理的應(yīng)對(duì)策略計(jì)劃
- 校園文化與美術(shù)教育互動(dòng)探討計(jì)劃
- 適應(yīng)變化的職場(chǎng)策略計(jì)劃
- 零倉儲(chǔ)模式下的保安管理與風(fēng)險(xiǎn)防控計(jì)劃
- 生物學(xué)科英語融合教學(xué)方案計(jì)劃
- 藝術(shù)與科技融合課程的前景分析計(jì)劃
- Unit1Topic1考點(diǎn)梳理課件八年級(jí)英語上冊(cè)
- 陜西省西安市周至縣2025屆初三中考測(cè)試(一)數(shù)學(xué)試題理試題含解析
- 附件1:腫瘤防治中心評(píng)審實(shí)施細(xì)則2024年修訂版
- 【《電子商務(wù)企業(yè)審計(jì)風(fēng)險(xiǎn)探究-以京東為例》11000字(論文)】
- 國債項(xiàng)目資金管理辦法
- 職業(yè)技術(shù)學(xué)校云計(jì)算技術(shù)應(yīng)用專業(yè)人才需求調(diào)研分析報(bào)告
- 2023年7月遼寧省高中學(xué)業(yè)水平合格考語文試卷真題(含答案詳解)
- 跨學(xué)科主題-探索外來食料作物傳播史課件-2024-2025學(xué)年七年級(jí)地理上學(xué)期(2024)人教版
- 《紅樓夢(mèng)》十二講智慧樹知到期末考試答案章節(jié)答案2024年安徽師范大學(xué)
- 敦煌的藝術(shù)智慧樹知到期末考試答案章節(jié)答案2024年北京大學(xué)
- 項(xiàng)目介紹書范文
評(píng)論
0/150
提交評(píng)論