Python 中的機(jī)器學(xué)習(xí)_第1頁(yè)
Python 中的機(jī)器學(xué)習(xí)_第2頁(yè)
Python 中的機(jī)器學(xué)習(xí)_第3頁(yè)
Python 中的機(jī)器學(xué)習(xí)_第4頁(yè)
Python 中的機(jī)器學(xué)習(xí)_第5頁(yè)
已閱讀5頁(yè),還剩33頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

MACHINELEARNINGINPYTHON

(PART4):

DIFFUSIONMODELSINPYTORCH

LUKESHENEMAN

GENERATIVEARTIFICIALINTELLIGENCE

TexttoImage(StableDiffusion)

TexttoVideo

GenerativeAI:Learnalatentrepresentationofthedistributionofourcomplextrainingdataandthensamplefromit

TrainingData

DeepLearning

Diffusion,etc.

Transformers,etc.

DIFFUSIONMODELS

CONDITIONINGIMAGEGENERATION

Providenaturallanguagetextpromptsto

guidereversediffusionprocess

Text-to-ImageDiffusionModelsareboth:

ImageGenerationModels

LanguageModels

StableDiffusionArchitecture

OVERVIEW

RecapfromParts1-3

MachineLearningBasics

NeuralNetworks

Tensors

ConvolutionalNeuralNetworks(CNNs)

GPUsandCUDA

PyTorch

WhyusePyTorch?

ImplementingaDiffusionModelinPython

TrainandTestourDiffusionModel

REVIEWOFBASICS

Machinelearningisadata-drivenmethodforcreatingmodelsforprediction,optimization,classification,generation,andmore

Pythonandscikit-learn

MNIST

ArtificialNeuralNetworks(ANNs)

MNIST

NEURALNETWORKBASICS

WeightsandBiases

FULLY-CONNECTEDNEURALNETWORKS

Imagesaretensors!

FEATUREHIERARCHIES

Weneedimagefilterstohelpusextractfeatures

EXAMPLE: SOBELFILTER

Sobelkernels=

CONVOLUTIONALNEURALNETWORK

GPUvs.CPU

“Moore’sLawforCPUsisDead”

WHYGPUSEXACTLY?

CNNsareallaboutmatrixandvectoroperations(multiplication,addition)

GPUscanperformparallelmultiplicationandadditionstepspereachclockcycle.

FrameworksmakeGPUsEasy

DIFFUSIONMODELS

FORWARDDIFFUSION

Definehowmanytimestepswillbeused(commontousehundredsormore)

EstablishanoiseschedulewhichdescribestherateatwhichGaussiannoiseisadded

Linear

Cosine

T=0 T=1 T=2 T=3 …

T=n

Iused100timesteps. LargermodelslikeStableDiffusionusethousandsofsmallersteps.Iusedacosinenoiseschedule.

TIMESTEPENCODING

30

30

+ =

RGBImage Integertimestep 4-ChannelRGB+Timestep

Iencodetimestepasanotherbandintheimageinpixel-space

U-NetArchitecture

4

U-NetDenoiser

3

RGB+T RGB

TrainingourNeuralNetwork

PossiblelossfunctionsforourU-Net

loss1=MSE(predt,original)loss2=MSE(predt,noisyt-1)loss3=predt-noisyt

OtherHyperparameters:

Epochs=100

Timesteps=100BatchSize=1250Optimizer=AdamLearningRate=0.001

CoreTrainingLoop

schedule=cosine_schedule(TIMESTEPS)foreachEpoch:

foreachBatchb:

foreachTimestept:

img=add_gaussian_noise(img,schedule(t))predicted=UNet(img)

loss=loss_function(img,predicted)backward_propagationandoptimization

CELEBFACESATTRIBUTES(CELEBA)DATASET

202,599numberoffaceimagesofvariouscelebrities

10,177uniqueidentities,butnamesofidentitiesarenotgiven

40binaryattributeannotationsperimage

5landmarklocations

Images”inthewild”orCropped/Aligned

SOMEPRELIMINARYOUTPUT

Ohno!

UseseparateAImodelforupsampling

64

64

512

SRResNet

512

/twtygqyy/pytorch-SRResNet

MyModel

Mightnotbeterrific,but…

Itwastrainedononly5000imagesforafewhoursonasingleRTX4090GPU

StableDiffusionwastrainedon600millioncaptionedimages

Took256NVIDIAA100GPUsonAmazonWebServicesatotalof150,000GPU-hoursAtacostof$600,000

StableDiffusion

ConditioningreverseDiffusiononTextprompts

PRE-PROCESSINGCELEBADATASET

Readfirst5000annotationsintoPANDASdataframe(easy!)

Foreachimage,gettheheadingnamesforpositiveattributes

Convertheadingnamesintoatextprompt:

e.g.“Photoofperson<attribute_x>,<attribute_y>,<attribute_z>,…”

e.g.“Photoofpersonbushyeyebrows,beard,mouthslightlyopen,wearinghat.”

Cropthelargestsquarefromtheimage,thenresizeto64x64x3numpyarray

UseOpenAICLIPmodeltofindtheimageembeddingsandtextembeddingsforeveryimage/promptpair.

Createa5000elementPythonlistof4-tuples:

(filename,64x64xRGBimage_array,image_embedding,prompt_embedding)

Picklelisttoafilewecanquickyloadintomemorywhenwetrainourmodel!

OPENAICLIPMODEL (CONTRASTIVELANGUAGE–IMAGEPRE-TRAINING)

Opensource/weightsmulti-modalAImodeltrainedonimage,captionpairs

Sharedembeddingspace!

Usetransformermodel(GPT-2)tocreatetokenembeddingsfromtext

Usevisiontransformer(VIT)tocreatetokenembeddingsfromimages

CLIPExamples/research/clip

USINGCLIPISTRIVIAL

/openai/CLIP

Zero-shotclassifications!ConditioningGenerativeAI(DALL-E)

GeneratingcaptionsforimagesorvideoImagesimilaritysearch

ContentModerationObjectTracking

CLIPusesvectorswith512dimensionsGPT3(Davinci)uses12888dimensions

Vectorembeddingscapturethedeepersemanticcontextofawordortextchunk…orimage…oranything.

Thesemanticsofanobjectaredefinedbyitsmulti-dimensionalandmulti-scaleco-occurrenceandrelationshipswithotherobjectsinthetrainingdata

Semanticvectorembeddingsarelearnedfromvastamountsofdata.

400,000,000(image,text)pairs

CLIPwastrainedon256largeGPUsfor2weeks.

SizeofEmbeddingVector

Onewaytotrainamulti-modalembeddinglayer

“AcuteWelshCorgidog.”

Learningyoursemanticembeddingsf

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論