2024人工智能AI技術(shù)教程_第1頁
2024人工智能AI技術(shù)教程_第2頁
2024人工智能AI技術(shù)教程_第3頁
2024人工智能AI技術(shù)教程_第4頁
2024人工智能AI技術(shù)教程_第5頁
已閱讀5頁,還剩47頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

2024人工智能AI技術(shù)教程課程講義名稱備注1課程介紹Overviewand

system/AI

basics2人工智能系統(tǒng)概述System

perspective

of

System

for

AISystem

for

AI:

a

historic

view;

Fundamentals

of

neural

networks;Fundamentals

of

System

for

AI3深度神經(jīng)網(wǎng)絡(luò)計(jì)算框架基礎(chǔ)Computation

frameworks

for

DNNBackprop

and

AD,

Tensor,

DAG,

Execution

graphPapers

and

systems:

PyTorch,

TensorFlow4矩陣運(yùn)算與計(jì)算機(jī)體系結(jié)構(gòu)Computer

architecture

for

Matrix

computationMatrix

computation,

CPU/SIMD,

GPGPU,

ASIC/TPUPapers

and

systems:

Blas,

TPU5分布式訓(xùn)練算法Distributed

training

algorithmsData

parallelism,

model

parallelism,

distributed

SGDPapers

and

systems:6分布式訓(xùn)練系統(tǒng)Distributed

training

systemsMPI,

parameter

servers,all-reduce,

RDMAPapers

and

systems:

Horovod7異構(gòu)計(jì)算集群調(diào)度與資源管理系統(tǒng)Scheduling

and

resource

management

systemRunning

DNNjob

on

cluster:

container,

resource

allocation,

schedulingPapers

and

systems:

KubeFlow,

OpenPAI,

Gandiva,

HiveD8深度學(xué)習(xí)推導(dǎo)系統(tǒng)Inference

systemsEfficiency,

latency,

throughput,

and

deployment課程講義名稱備注9計(jì)算圖編譯優(yōu)化Computation

graph

compilation

and

optimizationIR,

sub-graph

pattern

match,

Matrix

multiplication

and

memoryoptimizationPapers

and

systems:

XLA,

MLIR,

TVM,NNFusion10模型壓縮和稀疏化處理Efficiency

via

compression

and

sparsityModel

compression,

SparsityPruning11自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)AutoML

systemsHyper

parameter

tuning,

NASPapers

and

systems:

Hyperband,

SMAC,

ENAS,AutoKeras,

NNI12強(qiáng)化學(xué)習(xí)系統(tǒng)Reinforcement

learning

systemsTheory

of

RL,

systems

for

RLPapers

and

systems:

AC3,

RLlib,

AlphaZero13模型安全與隱私保護(hù)Security

and

PrivacyFederated

learning,

security,

privacyPapers

and

systems:

DeepFake14用AI技術(shù)優(yōu)化計(jì)算機(jī)系統(tǒng)AIfor

systemsAI

for

traditional

systems

problems,

for

system

algorithmsPapers

and

systems:

Learned

Indexes,

Learned

query

path課程講義名稱備注Lab

1

(for

week1,2)框架及工具入門示例A

simple

throughout

end-to-end

AI

example,

from

asystem

perspectiveUnderstand

the

systems

from

debugger

info

andsystem

logsLab

2

(for

week

3)定制一個(gè)新的張量運(yùn)算Customize

operatorsDesign

and

implement

a

customized

operator

(bothforward

and

backward):

in

pythonLab

3

(for

week

4)CUDA實(shí)現(xiàn)和優(yōu)化CUDA

implementationAdd

a

CUDA

implementation

for

thecustomizedoperatorLab

4

(for

week

5,6)AllReduce實(shí)現(xiàn)和優(yōu)化AllReduceImprove

one

of

AllReduce

operators’implementation

onHorovodLab

5

(for

week

7,

8)配置Container來進(jìn)行云上訓(xùn)練或推理準(zhǔn)備Configure

containers

for

customized

training

and

inferenceConfigure

containersLab

6學(xué)習(xí)使用調(diào)度管理系統(tǒng)Scheduling

and

resource

management

systemGet

familiar

with

OpenPAI

or

KubeFlowLab

7分布式訓(xùn)練任務(wù)練習(xí)Distributed

trainingTry

different

kinds

of

all

reduce

implementationsLab

8自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)練習(xí)AutoMLSearch

for

a

new

neural

networkNN

structuree

forImage/NLP

tasksLab

9強(qiáng)化學(xué)習(xí)系統(tǒng)練習(xí)RLSystemsConfigure

and

get

familiar

with

one

of

the

followingRL

Systems:

RLlib,

…Deep

Learning深度學(xué)習(xí)正在改變世界Self-drivingPersonalassistantSurveillance

detectionTranslationMedicaldiagnosticsGameArtImage

recognitionSpeech

recognitionNatural

languageGenerative

modelReinforcement

learningCatDogRaccoonDogcatdoghoney

badger??1??2??3??4??5??error????5??error????4??error????3??error????2??error????1ErrorslossRDMA計(jì)算能力海量的(標(biāo)識(shí))數(shù)據(jù)14M

images深度學(xué)習(xí)算法的進(jìn)步語言、框架深度學(xué)習(xí)+系統(tǒng)的進(jìn)步:編程語言、優(yōu)化、計(jì)算機(jī)體系結(jié)構(gòu)、并行計(jì)算以及分布式系統(tǒng)MNISTImageNetWeb

Images60K

samples16M

samplesBillions

of

Images10

categories1000

categoriesOpened

categoriesE.g.,

image

classification

problem1257.73.31.44.71.70.23TEST

ERROR

RATE

(%)LeNet,convolution,max-pooling,softmax,

1998AlexNet,

16.4%ReLU,

Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residual

way,2015EfficientNet,3.1%NAS2019Image

recognitionSpeech

recognitionNatural

languageReinforcement

learning19602019CPUMoore’s

law108x1970

19801990

20002010ENIAC5

Kops~500

GopsXeon

E5DedicatedHardware105xGPUTPUTPUv3360

TopsV100TPUv1125

Tops90

Tops?Performance(Op/Sec)Deep

learning

frameworksMxNetTensorFlowCNTKPyTorchLanguage

FrontendSwift

for

TensorFlowCompiler

BackendTVMTensorFlow

XLACustom

purposemachine

learningalgorithmsTheanoDisBeliefCaffeAlgebra

&linear

libsCPUGPUDense

matmul

engineGPUFPGASpecial

AI

acceleratorsTPUGraphCoreOther

ASICsAI

frameworkDense

matmulengineDeep

learningframeworksprovide

easierways

to

leveragevarious

librariesCustom

purposemachine

learningalgorithmsTheanoDisBeliefCaffeAlgebra

&linear

libsCPUGPUA

Full-Featured

Programming

Language

forML:

Expressive

and

flexibleControl

flow,

recursion,

sparsityPowerful

Compiler

Infrastructure:Code

optimization,

sparsity

optimization,hardware

targetingMachine

Learning

Language

andCompilerSIMD

MIMDSparsity

SupportControl

Flowand

DynamicityAssociated

MemoryScalable

Network

Stack

(RDMA,

IB,

NVLink)Hardware

APIs

(GPU,

CPU,

FPGA,

ASIC)Resource

Management/SchedulerExperienceFrameworksArchitecture(single

node

and

Cloud)Deep

Learning

Runtime:Optimizer,

Planner,

ExecutorRuntimeEnd-to-End

AI

User

ExperiencesModel,

Algorithm,

Pipeline,

Experiment,

Tool,Life

CycleManagementProgramming

InterfacesComputation

graph,

(auto)

Gradient

calculationIR,

Compiler

infrastructureclass

3class

4class

5class

6class

7class

8更廣泛的AI系統(tǒng)生態(tài)機(jī)器學(xué)習(xí)新模式(RL)自動(dòng)機(jī)器學(xué)習(xí)(AutoML)安全與隱私模型推導(dǎo)、壓縮與優(yōu)化深度學(xué)習(xí)算法和框架廣泛用途的高效新型通用AI算法多種深度學(xué)習(xí)框架的支持與進(jìn)化深度神經(jīng)網(wǎng)絡(luò)編譯架構(gòu)及優(yōu)化核心系統(tǒng)軟硬件深度學(xué)習(xí)任務(wù)運(yùn)行和優(yōu)化環(huán)境通用資源管理和調(diào)度系統(tǒng)新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計(jì)算棧class

12class

11class

13class

10(2)開始訓(xùn)練(1)定義網(wǎng)絡(luò)結(jié)構(gòu)Fullyconnected 通常用作分類問題的最后幾層Convolutionalneural

network 通常用作圖像、語音等Locality強(qiáng)的數(shù)據(jù)Recurrentneural

network 通常用作序列及結(jié)構(gòu)化的數(shù)據(jù),比如文本信息、知識(shí)圖Transformerneural

network 通常用作序列數(shù)據(jù),比如文本信息#

A

recursive

TreeBank

model

in

a

dozen

lines

of

JPL

code#

Walk

the

tree,

accumulating

embedding

vecs#

Word

embedding

model

is

used

at

the

leaf

node

to

map

word#

index

into

high-dimensional

semantic

word

representation.#

Map

tree

embedding

to

sentiment#

Getsemantic

representations

forleft

and

right

children.#

A

composition

function

is

used

to

learn

semantic#

representation

for

phrase

at

the

internal

node.更多樣化的結(jié)構(gòu)更強(qiáng)大的建模能力更復(fù)雜的依賴關(guān)系更細(xì)粒度的計(jì)算模式Graph

definition

(IR)x

*w

b+

yFront-endLanguage

Binding:

Python,

Lua,

R,

C++OptimizationBatching,

Cache,

OverlapExecution

RuntimeCPU,

GPU,

RDMA

devicesTensorFlowx

yz*a+bΣcData-Flow

Graph

(DFG)as

Intermediate

Representation??b??a??x??y??z+??*??TensorFlowx

yz*a+bΣ

Σ??cAdd

gradient

backpropagation

to

Data-FlowGraph

(DFG)??b??a??z+??*??xy

z

??x

??y*a+bΣ

Σ??cCPU

codeGPU

code??b??a??z+??*??xy

z

??x

??y*a+bΣ

Σ??c......1OperatorsExperienceFrameworksArchitectureIDEProgramming

with:

VSCode,

Jupiter

NotebookLanguageIntegrated

with

mainstream

PL:

PyTorch

and

TensorFlow

inside

PythonCompilerIntermediate

representationCompilationOptimizationBasic

data

structure:

TensorLexical

analysis:

TokenUser

controlled:

mini-batchBasic

computation:

DAGParsing:

ASTData

parallelism

and

model

parallelismAdvance

features:

control

flowSemantic

analysis:Symbolic

ADLoop

nets

analysis:

pipeline

parallelism,control

flowGeneral

IRs:

MLIRCode

optimizationData

flow

analysis:

CSP,

Arithmetic,

FusionCode

generationHardware

dependent

optimizations:matrix

computation,

layoutResource

allocation

and

scheduler:memory,

recomputation,RuntimesSingle

node:

CuDNNMultimode:

Parameter

servers,

All

reducerComputation

cluster

resource

management

and

job

schedulerHardwareHardware

accelerators:CPU/GPU/ASIC/FPGANetworkaccelerators:

RDMA/IB/NVLinkDeep

learning

frameworksMxNetTensorFlowCNTKPyTorchLanguage

FrontendSwift

for

TensorFlowCompiler

BackendTVMTensorFlow

XLAAI

Framework

Densematmul

engineGPUFPGASpecial

AI

acceleratorsTPUGraphCoreOther

ASICsimport

"tensorflow/core/framework/to";import

"tensorflow/core/framework/op_to";import

"tensorflow/core/framework/tensor_toAFull-Featured

Programming

Languagefor

ML:

Expressive

and

flexibleControl

flow,

recursion,

sparsityPowerful

Compiler

Infrastructure:Code

optimization,

sparsity

optimization,hardwaretargetingMachine

Learning

Language

andCompilerSIMD

MIMDSparsity

SupportControl

Flowand

DynamicityAssociated

Memory//

Syntactically

similar

to

LLVM:func

@testFunction(%arg0:

i32){%x

=

call

@thingToCall(%arg0)

:

(i32)->

i32br

^bb1^bb1:%y

=

addi

%x,

%x:i32return

%y

:

i32}深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開發(fā)速度大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度Inference

performance

Serving

latency8

layers1.4

GFLOP16%

Error2012AlexNetImage152

layers22.6

GFLOP3.5%

Error2015ResNetSpeech80

GFLOP7,000

hrs

of

Data8%

Error2014Deep

Speech

1465

GFLOP12,000

hrs

of

Data5%

Error2015Deep

Speech

2Different

architectures:

CNN,RNN,

Transformer,

…High

computation

resourcerequirements:

model

size,

…Different

goals:

latency,throughput,

accuracy,

…Transparently

apply

over

heterogeneous

hardware

environmentScale-out Local

Efficiency Memory

EffectivenessBe

transparent

to

various

user

requirements系統(tǒng)、算法和硬件必須相互結(jié)合

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論