![G. Wang - Distributed Machine Learning with Python- Accelerating model training and serving with distributed systems (2022)(英文)行業(yè)資料_第1頁](http://file4.renrendoc.com/view/2dba95a09ebe231e6c00b0656827aeca/2dba95a09ebe231e6c00b0656827aeca1.gif)
![G. Wang - Distributed Machine Learning with Python- Accelerating model training and serving with distributed systems (2022)(英文)行業(yè)資料_第2頁](http://file4.renrendoc.com/view/2dba95a09ebe231e6c00b0656827aeca/2dba95a09ebe231e6c00b0656827aeca2.gif)
![G. Wang - Distributed Machine Learning with Python- Accelerating model training and serving with distributed systems (2022)(英文)行業(yè)資料_第3頁](http://file4.renrendoc.com/view/2dba95a09ebe231e6c00b0656827aeca/2dba95a09ebe231e6c00b0656827aeca3.gif)
![G. Wang - Distributed Machine Learning with Python- Accelerating model training and serving with distributed systems (2022)(英文)行業(yè)資料_第4頁](http://file4.renrendoc.com/view/2dba95a09ebe231e6c00b0656827aeca/2dba95a09ebe231e6c00b0656827aeca4.gif)
![G. Wang - Distributed Machine Learning with Python- Accelerating model training and serving with distributed systems (2022)(英文)行業(yè)資料_第5頁](http://file4.renrendoc.com/view/2dba95a09ebe231e6c00b0656827aeca/2dba95a09ebe231e6c00b0656827aeca5.gif)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
DistributedMachine
Learningwith
Python
Acceleratingmodeltrainingandservingwithdistributedsystems
GuanhuaWang
BIRMINGHAM—MUMBAI
DistributedMachineLearningwithPython
Copyright?2022PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishingoritsdealersanddistributors,willbeheldliableforanydamagescausedorallegedtohavebeencauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
PublishingProductManager:AliAbidi
SeniorEditors:RoshanKumar,NathanyaDiaz
ContentDevelopmentEditors:TazeenShaikh,ShreyaMoharir
TechnicalEditor:DevanshiAyare
CopyEditor:SafisEditing
ProjectCoordinator:AparnaRavikumarNair
Proofreader:SafisEditing
Indexer:PratikShirodkar
ProductionDesigner:AlishonMendonca
MarketingCoordinators:AbeerRiyazDawe,ShifaAnsari
Firstpublished:May2022
Productionreference:1040422
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
Birmingham
B32PB,UK.
ISBN978-1-80181-569-7
Tomyparents,YingHanandXinWang
Tomygirlfriend,JingYuan
–GuanhuaWang
Contributors
Abouttheauthor
GuanhuaWangisafinal-yearcomputersciencePh.D.studentintheRISELabatUCBerkeley,advisedbyProfessorIonStoica.Hisresearchliesprimarilyinthemachinelearningsystemsarea,includingfastcollectivecommunication,efficientin-parallelmodeltraining,andreal-timemodelserving.Hisresearchhasgainedlotsofattentionfrombothacademiaandindustry.Hewasinvitedtogivetalkstotop-tieruniversities(MIT,Stanford,CMU,Princeton)andbigtechcompanies(Facebook/Meta,Microsoft).Hereceivedhismaster'sdegreefromHKUSTandabachelor'sdegreefromSoutheastUniversityinChina.Hehasalsodonesomecoolresearchonwirelessnetworks.Helikesplayingsoccerandhasrunmultiplehalf-marathonsintheBayAreaofCalifornia.
Aboutthereviewers
JamshaidSohailispassionateaboutdatascience,machinelearning,computervision,andnaturallanguageprocessingandhasmorethan2yearsofexperienceintheindustry.HepreviouslyworkedataSiliconValley-basedstart-up,FunnelBeam,thefoundersofwhicharefromStanfordUniversity,asadatascientist.Currently,heisworkingasadatascientistatSystemsLimited.Hehascompletedover66onlinecoursesfromdifferentplatforms.HeauthoredthebookDataWranglingwithPython3.XforPacktPublishingandhasreviewedmultiplebooksandcourses.HeisalsodevelopingacomprehensivecourseondatascienceatEducativeandisintheprocessofwritingbooksformultiplepublishers.
HiteshHindujaisanardentAIenthusiastworkingasaseniormanagerinAIatOlaElectric,whereheleadsateamof20+peopleintheareasofML,statistics,CV,NLP,andreinforcementlearning.Hehasfiled14+patentsinIndiaandtheUSandhasnumerousresearchpublicationstohisname.HiteshhasbeeninvolvedinresearchrolesatIndia'stopbusinessschools:theIndianSchoolofBusiness,Hyderabad,andtheIndianInstituteofManagement,Ahmedabad.Heisalsoactivelyinvolvedintrainingandmentoringandhasbeeninvitedtobeaguestspeakerbyvariouscorporationsandassociationsacrosstheglobe.
TableofContents
Preface
Section1–DataParallelism
1
SplittingInputData
Single-nodetrainingistooslow4
Themismatchbetweendataloading
bandwidthandmodeltrainingbandwidth5
Single-nodetrainingtimeonpopular
datasets6
Acceleratingthetrainingprocesswith
dataparallelism8
Dataparallelism–the
high-levelbits9
Stochasticgradientdescent13
Modelsynchronization14
Hyperparametertuning15
Globalbatchsize16
Learningrateadjustment16
Modelsynchronizationschemes17
Summary18
2
ParameterServerandAll-Reduce
Technicalrequirements20
Parameterserverarchitecture21
Communicationbottleneckinthe
parameterserverarchitecture22
Shardingthemodelamongparameter
servers24
Implementingtheparameter
server26
Definingmodellayers26
Definingtheparameterserver27
Definingtheworker28
Passingdatabetweentheparameter
serverandworker30
Issueswiththeparameter
server32
Theparameterserverarchitecture
introducesahighcodingcomplexity
forpractitioners33
viiiTableofContents
Broadcast40
Gather41
All-Gather42
Summary
43
All-Reducearchitecture34
Reduce34
All-Reduce36
RingAll-Reduce37
Collectivecommunication40
3
BuildingaDataParallelTrainingandServingPipeline
Technicalrequirements
46
Single-machinemulti-GPU52
Thedataparalleltraining
Multi-machinemulti-GPU56
pipelineinanutshell
Inputpre-processing
Inputdatapartition
Dataloading
Training
46
48
49
50
50
Checkpointingandfault
tolerance64
Modelcheckpointing64
Loadmodelcheckpoints65
Modelsynchronization
51
Modelevaluationand
Modelupdate
52
hyperparametertuning67
Single-machinemulti-GPUsand
multi-machinemulti-GPUs
4
BottlenecksandSolutions
52
Modelservingindataparallelism71
Summary73
Communicationbottlenecksin
dataparalleltraining76
Analyzingthecommunicationworkloads76
Parameterserverarchitecture77
TheAll-Reducearchitecture80
Theinefficiencyofstate-of-the-art
communicationschemes83
Leveragingidlelinksandhost
resources85
TreeAll-Reduce85
HybriddatatransferoverPCIeand
NVLink91
On-devicememorybottlenecks93
Recomputationandquantization94
Recomputation95
Quantization98
Summary99
TableofContentsix
Section2–ModelParallelism
5
SplittingtheModel
Technicalrequirements104
Single-nodetrainingerror–out
ofmemory105
Fine-tuningBERTonasingleGPU105
Tryingtopackagiantmodelinsideone
state-of-the-artGPU107
ELMo,BERT,andGPT110
Basicconcepts110
RNN114
ELMo117
6
PipelineInputandLayerSplit
BERT
GPT
Pre-trainingandfine-tuningState-of-the-arthardware
P100,V100,andDGX-1
NVLink
A100andDGX-2
NVSwitch
Summary
119
121
122
123
123
124
125
125
125
Vanillamodelparallelismis
inefficient128
Forwardpropagation130
Backwardpropagation131
GPUidletimebetweenforwardand
backwardpropagation132
Pipelineinput137
Prosandconsofpipeline
parallelism141
Advantagesofpipelineparallelism141
Disadvantagesofpipelineparallelism142
Layersplit142
Notesonintra-layermodel
parallelism145
Summary145
xTableofContents
7
ImplementingModelParallelTrainingandServingWorkflows
Technicalrequirements148
Wrappingupthewholemodel
parallelismpipeline149
Amodelparalleltrainingoverview149
Implementingamodelparalleltraining
pipeline150
Specifyingcommunicationprotocol
amongGPUs153
Modelparallelserving158
Fine-tuningtransformers162
Hyperparametertuningin
modelparallelism163
BalancingtheworkloadamongGPUs163
Enabling/disablingpipelineparallelism164
NLPmodelserving164
Summary165
8
AchievingHigherThroughputandLowerLatency
Technicalrequirements169
Freezinglayers169
Freezinglayersduringforward
propagation171
Reducingcomputationcostduring
forwardpropagation173
Freezinglayersduringbackward
propagation174
Exploringmemoryand
storageresources177
Understandingmodel
decompositionanddistillation180
Modeldecomposition180
Modeldistillation183
Reducingbitsinhardware184
Summary184
Section3–AdvancedParallelismParadigms
9
A
HybridofDataandModelParallelism
Technicalrequirements189
CasestudyofMegatron-LM189
Layersplitformodelparallelism189
Row-wisetrial-and-errorapproach192
Column-wisetrial-and-errorapproach196
Cross-machinefordataparallelism200
Implementationof
Megatron-LM201
Casestudyof
Mesh-TensorFlow203
TableofContentsxi
Implementationof
ProsandconsofMegatron-LM
Mesh-TensorFlow204
andMesh-TensorFlow204
Summary205
10
FederatedLearningandEdgeDevices
Technicalrequirements209
Sharingknowledgewithout
sharingdata209
Recappingthetraditionaldataparallel
modeltrainingparadigm210
Noinputsharingamongworkers211
Communicatinggradientsfor
collaborativelearning212
Casestudy:TensorFlow
Federated217
Runningedgedeviceswith
TinyML219
Casestudy:TensorFlowLite219
Summary220
11
ElasticModelTrainingandServing
Technicalrequirements223
Introducingadaptive
modeltraining223
Traditionaldataparalleltraining224
Adaptivemodeltrainingindata
parallelism226
Adaptivemodeltraining(AllReduce-
based)226
Adaptivemodeltraining(parameter
server-based)229
Traditionalmodel-parallelmodel
trainingparadigm231
Adaptivemodeltraininginmodel
parallelism232
Implementingadaptivemodel
traininginthecloud235
Elasticityinmodelinference236
Serverless238
Summary238
xiiTableofContents
12
AdvancedTechniquesforFurtherSpeed-Ups
Technicalrequirements
241
Jobmigrationandmultiplexing
249
Debuggingandperformance
Jobmigration
250
analytics
241
Jobmultiplexing
251
Generalconceptsinthe
profilingresultsCommunicationresultsanalysisComputationresultsanalysis
243
245
246
Modeltrainingina
heterogeneousenvironmentSummary
251
252
Index
OtherBooksYouMayEnjoy
Preface
Reducingtimecostsinmachinelearningleadstoashorterwaitingtimeformodeltrainingandafastermodelupdatingcycle.Distributedmachinelearningenablesmachinelearningpractitionerstoshortenmodeltrainingandinferencetimebyordersofmagnitude.Withthehelpofthispracticalguide,you'llbeabletoputyourPythondevelopmentknowledgetoworktogetupandrunningwiththeimplementationofdistributedmachinelearning,includingmulti-nodemachinelearningsystems,innotime.
You'llbeginbyexploringhowdistributedsystemsworkinthemachinelearningareaandhowdistributedmachinelearningisappliedtostate-of-the-artdeeplearningmodels.Asyouadvance,you'llseehowtousedistributedsystemstoenhancemachinelearningmodeltrainingandservingspeed.You'llalsogettogripswithapplyingdataparallelandmodelparallelapproachesbeforeoptimizingthein-parallelmodeltrainingandservingpipelineinlocalclustersorcloudenvironments.
Bytheendofthisbook,you'llhavegainedtheknowledgeandskillsneededtobuildanddeployanefficientdataprocessingpipelineformachinelearningmodeltrainingandinferenceinadistributedmanner.
Whothisbookisfor
Thisbookisfordatascientists,machinelearningengineers,andmachinelearningpractitionersinbothacademiaandindustry.AfundamentalunderstandingofmachinelearningconceptsandworkingknowledgeofPythonprogrammingisassumed.Priorexperienceimplementingmachinelearning/deeplearningmodelswithTensorFloworPyTorchwillbebeneficial.You'llfindthisbookusefulifyouareinterestedinusingdistributedsystemstoboostmachinelearningmodeltrainingandservingspeed.
xivPreface
Whatthisbookcovers
Chapter1,SplittingInputData,showshowtodistributemachinelearningtrainingorservingworkloadontheinputdatadimension,whichiscalleddataparallelism.Chapter2,ParameterServerandAll-Reduce,describestwowidely-adoptedmodelsynchronizationschemesinthedataparalleltrainingprocess.
Chapter3,BuildingaDataParallelTrainingandServingPipeline,illustrateshowtoimplementdataparalleltrainingandtheservingworkflow.
Chapter4,BottlenecksandSolutions,describeshowtoimprovedataparallelismperformancewithsomeadvancedtechniques,suchasmoreefficientcommunicationprotocols,reducingthememoryfootprint.
Chapter5,SplittingtheModel,introducesthevanillamodelparallelapproachingeneral.Chapter6,PipelineInputandLayerSplit,showshowtoimprovesystemefficiencywithpipelineparallelism.
Chapter7,ImplementingModelParallelTrainingandServingWorkflows,discusseshowtoimplementmodelparalleltrainingandservingindetail.
Chapter8,AchievingHigherThroughputandLowerLatency,coversadvancedschemestoreducecomputationandmemoryconsumptioninmodelparallelism.
Chapter9,AHybridofDataandModelParallelism,combinesdataandmodelparallelismtogetherasanadvancedin-parallelmodeltraining/servingscheme.
Chapter10,FederatedLearningandEdgeDevices,talksaboutfederatedlearningandhowedgedevicesareinvolvedinthisprocess.
Chapter11,ElasticModelTrainingandServing,describesamoreefficientschemethatcanchangethenumberofacceleratorsusedonthefly.
Chapter12,AdvancedTechniquesforFurtherSpeed-Ups,summarizesseveralusefultools,suchasaperformancedebuggingtool,jobmultiplexing,andheterogeneousmodeltraining.
Prefacexv
Togetthemostoutofthisbook
YouwillneedtoinstallPyTorch/TensorFlowsuccessfullyonyoursystem.Fordistributedworkloads,wesuggestyouatleasthavefourGPUsinhand.
WeassumeyouhaveLinux/Ubuntuasyouroperatingsystem.WeassumeyouuseNVIDIAGPUsandhaveinstalledtheproperNVIDIAdriveraswell.Wealsoassumeyouhavebasicknowledgeaboutmachinelearningingeneralandarefamiliarwithpopulardeeplearningmodels.
Ifyouareusingthedigitalversionofthisbook,weadviseyoutotypethecodeyourselforaccessthecodefromthebook'sGitHubrepository(alinkisavailableinthenextsection).Doingsowillhelpyouavoidanypotentialerrorsrelatedtothecopyingandpastingofcode.
Downloadtheexamplecodefiles
YoucandownloadtheexamplecodefilesforthisbookfromGitHubat
https://
/PacktPublishing/Distributed-Machine-Learning-with-
Python
.Ifthere'sanupdatetothecode,itwillbeupdatedintheGitHubrepository.
Wealsohaveothercodebundlesfromourrichcatalogofbooksandvideosavailableat
/PacktPublishing/
.Checkthemout!
Downloadthecolorimages
WealsoprovideaPDFfilethathascolorimagesofthescreenshotsanddiagramsusedinthisbook.Youcandownloadithere:
/
downloads/9781801815697_ColorImages.pdf
xviPreface
Conventionsused
Thereareanumberoftextconventionsusedthroughoutthisbook.
Codeintext:Indicatescodewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandles.Hereisanexample:"ReplaceYOUR_API_KEY_HEREwiththesubscriptionkeyofyourCognitiveServicesresource.Leavethequotationmarks!"
Ablockofcodeissetasfollows:
#ConnecttoAPIthroughsubscriptionkeyandendpoint
subscription_key="<your-subscription-key>"
endpoint="https://<your-cognitive-service>.cognitiveservices.
/"
#Authenticate
credential=AzureKeyCredential(subscription_key)
cog_client=TextAnalyticsClient(endpoint=endpoint,
credential=credential)
Bold:Indicatesanewterm,animportantword,orwordsthatyouseeonscreen.Forinstance,wordsinmenusordialogboxesappearinbold.Hereisanexample:"Select
Review+Create."
TipsorImportantNotes
Appearlikethis.
Getintouch
Feedbackfromourreadersisalwayswelcome.
Generalfeedback:Ifyouhavequestionsaboutanyaspectofthisbook,emailusatcustomercare@andmentionthebooktitleinthesubjectofyour
message.
Errata:Althoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyouhavefoundamistakeinthisbook,wewouldbegratefulifyouwouldreportthistous.Pleasevisit
/support/errata
andfillintheform.
Prefacexvii
Piracy:Ifyo
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度個人醫(yī)療健康保險借款合同范文
- 2025年度歷史文化名城保護(hù)與建筑設(shè)計合同
- 2025年公司車輛租賃及駕駛培訓(xùn)服務(wù)合同范本
- 2025年度建筑垃圾清運與廢棄物焚燒發(fā)電項目合作合同
- 2025年度醫(yī)療器械檢驗試劑研發(fā)與技術(shù)轉(zhuǎn)讓合同
- 2025年度公廁信息化管理系統(tǒng)開發(fā)與實施合同
- 2025年度家庭園藝植物租賃與養(yǎng)護(hù)合同
- 2025年度企業(yè)并購重組經(jīng)濟咨詢服務(wù)合同
- 2025年度大型油田供油合同(設(shè)備維護(hù)升級版)
- 2025年度交通設(shè)施施工材料質(zhì)量檢測合同
- 電動三輪車購銷合同
- 淋巴瘤的免疫靶向治療
- 校園駐校教官培訓(xùn)
- 自然辯證法論述題146題帶答案(可打印版)
- 儲運部部長年終總結(jié)
- 物業(yè)管理裝修管理規(guī)定(5篇)
- (新版)工業(yè)機器人系統(tǒng)操作員(三級)職業(yè)鑒定理論考試題庫(含答案)
- 教育環(huán)境分析報告
- 人力資源服務(wù)公司章程
- (正式版)CB∕T 4552-2024 船舶行業(yè)企業(yè)安全生產(chǎn)文件編制和管理規(guī)定
- 自動體外除顫器項目創(chuàng)業(yè)計劃書
評論
0/150
提交評論