




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
最優(yōu)化Spark應(yīng)用的性能--使用高性價比的層次化方案加速大數(shù)據(jù)處理Yucai,Yu(yucai.yu@)BDT/STO/SSGApril,2016IntelConfidential4/23/20162Aboutme/us?Me:Sparkcontributor,previousonvirtualization,storage,OSetc.?IntelSparkteam,workingonSparkupstreamdevelopmentandx86optimization,including:core,SparkSQL,SparkR,GraphX,machinelearningetc.?Top3contributionin2015,3committers.?Twopublication:IntelConfidential4/23/20163Agenda?Generalsoftwaretuning?Bringup3xperformancewithNVMeSSD?NVMeSSDOverview?UseNVMeSSDtoacceleratecomputing?WhySSDisimportanttoSparkIntelConfidential4/23/20164Generalsoftwaretuning?Resourceallocation?Serialization?Partition?IO?MISCIntelConfidential4/23/20165ResourceAllocation-CPU?spark.executor.cores–recommend5coresperexecutor*-Lesscorenumber(likesingle-coreperexecutor)introducesJVMoverhead.e.g.,multiplebroadcastcopies-Morecoresnumbermayhardtoapplybigresource-ToachievefullwritethroughputontoHDFS?Numberofexecutorspernode–corespernode/5*(1~0.9)*/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/4/23/2016IntelConfidential6ResourceAllocation-Memory?spark.executor.memory–memorysizeperexecutor-Leave10-15%totalmemoryforOScache:dcache,pagecacheetc.-memorypernode*(85-90)%/executorspernode-2-5GBpercore:2-5GB*spark.executor.cores?spark.yarn.executor.memoryOverhead–indicateforoffheapmemorysize,increasingthattoavoidkillingbyyarnNM-Sometimesdefaultvalueistoosmallasmax(384,.07*spark.executor.memory),nettymayusetheseheavily-yarn.nodemanager.resource.memory-mb=spark.yarn.executor.memoryOverhead+spark.executor.memoryEExecutionMemoryStorageMemory4/23/20167Intel4/23/20167SoftwareTuning-Serialization?spark.serializer–Classtouseforserializingobjects-usingkryowhenspeedisnecessary,itbrings15-20%improvementJavaSeroccupieslotsoftimeIntelConfidentialSoftwareTuning-Serialization?spark.kryo.referenceTracking–Disablingtoavoidjavaperformancebug.-Kryoserializerhasamechanismforreducingrepeatedinformationinrecordsbykeepingacacheofpreviouslyseenobjects.ThatcachecanbesurprisinglyexpensivebecauseJDK’sidentityHashCodehasbadperformanceissue./view_bug.do?bug_id=6378256-Peopleseethesameissuealso:/blog/profiling-hadoop-jobs-with-riemann**ItisaJDKbugactually,notrelatedtocomplierIntelConfidential4/23/20169SoftwareTuning–Partition?TasksnumberaredecidedbyRDD’spartitionnumber.?Howtochooseproperpartitionnumber?-Iftherearefewerpartitionthanavailablecores,thetaskswon’tbetakingadvantageofallCPU.-Fewerpartition,biggerdatasize,itmeansthatmorememorypressureespeciallyinjoin,cogroup,*ByKeyetc.-Ifthenumberistoolarge,moretasks,moreiterative,moretime.-Toolargealsoputsmorepressureindisk.Whenshuffleread,itleadstomoresmallsegmenttofetch,especiallyworseinHDDs.-Setabignumbertomakeapplicationrunsuccess,decreaseitgraduallytoreachbestperformancepoint,payattentiontotheGC.-Sometimes,changingpartitionnumbertoavoiddataincline,checkingthisinfofromWebUI.104/23/2016Intel4/23/2016SoftwareTuning–IO?StorageLevel-MEMORY_ONLYgetthebestperformancemostoftime-MEMORY_ONLY_SERreducesthememoryconsumptionbyserializeobjectsbutneeduseCPU-MEMORY_AND_DISK,DISK_ONLY:ifdataislarge,youneedusethosetwooptions?Compression-spark.press,press,press:tradeoffbetweenCPUanddisk-pression.codec:lz4,lzf,andsnappyIntelConfidential4/23/201611SoftwareTuning-MISC?spark.dynamicAllocation.enabled–Toscaleupordownexecutornumberon-demand-Mustturnonspark.shuffle.service.enabled,otherwisesomeshuffledatawillbelost-Tofreeoroccupyexecutorresourcestimely,e.g.,spark-shell,SQL?GC–GCeasilyleadstostragglerissue-MeasuringtheimpactofGC-WebUI(4040/18080),stage>“GCTime”-EaseGCpressure-Reduceparallelismnumber-Increasepartitionnumber--Lowerspark.memory.storageFraction-IncreasememoryIntelConfidential4/23/201612Agenda?Generalsoftwaretuning?Bringup3xperformancewithNVMeSSD?NVMeSSDOverview?UsePCIeSSDtoacceleratecomputing?WhySSDisimportanttoSparkIntelConfidential4/23/201613Agenda?Generalsoftwaretuning?Bringup3xperformancewithNVMeSSD?NVMeSSDOverview?UseNVMeSSDtoacceleratecomputing?WhySSDisimportanttoSparkIntelConfidential4/23/201614NVMeSSDOverviewIntelConfidential4/23/201615Agenda?Generalsoftwaretuning?Bringup3xperformancewithNVMeSSD?NVMeSSDOverview?UseNVMeSSDtoacceleratecomputing?WhySSDisimportanttoSparkIntelConfidential4/23/201616Motivation?使用NVMeSSD和服務(wù)器上已有的機(jī)械硬盤搭建一個層次化外存系統(tǒng):用SSD外存上的數(shù)據(jù),這主要包括Shuffle相關(guān)的數(shù)據(jù)以?期望SSD的高帶寬和高IOPS可以為Spark應(yīng)用帶來性能上的提升。IntelConfidential4/23/201617Motivation?使用NVMeSSD和服務(wù)器上已有的機(jī)械硬盤搭建一個層次化外存系統(tǒng):用SSD外存上的數(shù)據(jù),這主要包括Shuffle相關(guān)的數(shù)據(jù)以?期望SSD的高帶寬和高IOPS可以為Spark應(yīng)用帶來性能上的提升。?業(yè)界也早已有層次化存儲的探索,比如Tachyon(現(xiàn)在已更名為Alluxio),RDD能緩存Shuffle的數(shù)據(jù)。IntelConfidential4/23/201618Implementation?Spark和外存交互的接口是文件。也就是說當(dāng)一個內(nèi)存數(shù)據(jù)塊需要放到數(shù)據(jù)塊寫到這個文件里面去。所以,我們只要修改SparkCore的文件?Yarndynamicalallocationissupportedalso.IntelConfidential4/23/201619Usage1.Setthepriorityandthresholdinspark-default.xml.2.Configure“ssd”location:justputthekeywordlike"ssd“inlocaldir.Forexample,inyarn-site.xml:IntelConfidential4/23/201620BenchmarkingWorkloadsRealworldSparkadoptionsWorkloadCategoryDescriptionRationaleCustomerNWeightMachineLearningTocomputeassociationsbetweentwoverticesthataren-hopawayIterativegraph-parallelalgorithm,implementedwithBagel,GraphX.YoukuSparkSQLSQLRealanalysisqueriesfromBaiduNuoMi.LeftOuterJoin,GroupBY,UNIONetc.BaiduIntelConfidential21ab(e,0.30)c0.60.4f0.5ab(e,0.30)c0.60.4f0.5Tocomputeassociationsbetweentwoverticesthataren-hopaway.e.g.,friend–to-friend,orsimilaritiesbetweenvideosforrecommendationInitialdirectedgraph2-hopassociation 0.1a 0.6 0.6 0.3 0.4 0.4 fe((d,0.6*0.1+0.3*0.2=0.12)dee22IntelConfidentialNomalizedExcutionSpeedSATASHD0SallinarcHTacHDDsNomalizedExcutionSpeedSATASHD0SallinarcHTacHDDs1PCI-ESSDallinSSD11a11A1.383q500quoalliSS00uoGBGBnS-SD11SSSSSDSDDDs11DHonsHierarchystoreperformancereport?Noextraoverhead:bestcasethesamewithpureSSD(PCIe/SATASSD),worstcasethesamewithpureHDDs.?Comparedwith11HDDs,x1.82improvementatleast(CPUlimitation).?ComparedwithTachyon,stillshowsx1.3performanceadvantage:cachebothRDDandshuffle,nointer-processcommunication. the4/23/201623Intel4/23/201623StageId22201916141211865Tasks4268526/2662/624268526/26497/497100/10042685StageId22201916141211865Tasks4268526/2662/624268526/26497/497100/1004268526/26273/273InputShuffleReadShuffleWrite61.1KB4.8KB3.1GB1982.7MB30.3MB15.2MB61.1KB4.8KBStageIdTasksInputShuffleReadShuffleWrite24320/32015.8MB23960/96088.2MB15.8MB18320/32032.5MB45.1MB21320/3201997.9MB3.5MB17320/32016.9MB32.5MB15320/3202.1GB16.9MB13320/32036.0GB206.3MB10320/32026.7MB39.9MB9320/32016.6MB26.77320/3202.1GB16.6Run8instancesimultaneously3.13.1GB1982.1982.756.9GB3178.2MB77.6MB61.1KB4.8KB3.1GB1982.7203.6MB145.7MBIntelConfidential4/23/201624SSDv.s.HDDs:x1.7end-to-endimprovementx1.7end-to-endimprovement(7minsv.s.12mins)x5shuffleimprovement(1minv.s.5mins)x6DiskBWx4networkBWSignificantIObottleneckIntelConfidential4/23/201625Agenda?Generalsoftwaretuning?Bringup3xperformancewithNVMeSSD?NVMeSSDOverview?UsePCIeSSDtoacceleratecomputing?WhySSDisimportanttoSpark4/23/2016IntelConfidential26NWeightDeepdiveintoarealcustomercaseNWeightxx2-3improvement!!!11HDDsPCIeSSDStageIdDescriptionInputOutputShuffleReadShuffleWriteDurationDuration23saveAsTextFileatBagelNWeight.scala:102+details50.1GB27.6GB27s20s17foreachatBagelscala:256+details732.0GB490.4GB23min7.5min16flatMapatBagel.scala:96+details732.0GB490.4GB15min13min11foreachatBagel.scala:256+details590.2GB379.5GB25min11min10flatMapatBagel.scala:96+details590.2GB379.6GB12min10min6foreachatBagel.scala:256+details56.1GB19.1GB4.9min3.7min5flatMapatBagel.scala:96+details56.1GB19.1GB1.5min1.5min2foreachatBagel.scala:256+details15.3GB38s39s1parallelizeatBagelNWeightscala:97+details38s38s0flatMapatBagelNWeight.scala:72+details22.6GB15.3GB46s46sIntelConfidential4/23/2016275MainIOpatternMapStageReduceStagedreadinmaprdd_read_in_reducerdd_write_in_reduceShuffleshuffle_write_in_mapshuffle_read_in_reduceIntelConfidential28Starttowrite560sectorsfromaddressStarttowrite560sectorsfromaddress52090704Starttoread256sectorsfromaddress13637888Finishthepreviousreadcommand(13637888+256)4/23/2016?Weuseblktrace*tomonitoreachIOtodisk.Suchas:Finishthepreviouswritecommand(52090704+560)?Weparsethoserawinfo,generating4kindsofcharts:IOsizehistogram,latencyhistogram,seekdistancehistogramandLBAtimeline,fromwhichwecanunderstandtheIOissequentialorrandom.*blktraceisakernelblocklayerIOtracingmechanismwhichprovidesdetailedinformationaboutdiskrequestqueueoperationsuptouserspace.IntelConfidential4/23/201629RDDReadinMap:sequentialRedRedisReadGreenisWriteSequentialdatadistributionBigIOsizeMuch0SDClassicharddiskseektimeis8-9ms,spindlerateis7200rps,itmeans1sectorrandomaccessneeds13msatleast.LowlatencyIntelConfidential4/23/201630BigIOsizeRDDReadinBigIOsizeRedRedisReadGreenisWriteMuch0SDSequentialdatadistributionLowlatencytelConfidential4/23/201631Those4Kreadisprobablybecauseofspillingincogroup,maybeasparkThose4Kreadisprobablybecauseofspillingincogroup,maybeasparkissueWriteIOsizeisbigbutwithmanysmall4KreadIOSequentialSequentialdatalocationRedisReadGreenisWriteIntelConfidential4/23/201632ShuffleWriteinMap:sequentialRedRedisReadGreenisWriteSequentialdatadistributionBigIOsizeMuch0SDIntelConfidential4/23/201633SmallIOsizeFew0SDSmallIOsizeFew0SDlatencyRedRedisReadGreenisWriteRandomdatadistributionIntelConfidential4/23/201634Conclusion?RDDread/write,shufflewritearesequential.?Shufflereadisveryrandom.IntelConfidential4/23/201635Only40MBperdiskatmaxShufflereadfromHDDleadstoHighOnly40MBperdiskatmaxShufflereadfromHDDleadstoHighIOWaitx2improvementfor“shufflereadinreduce”x3improvementinrealshufflex2improvementinE2EtestingPerdiskBWwhenshufflereadfrom11HDDBWwhenshufflereadfrom1NVMeSSDSSDismuchbetter,especiallythisstage11HDDssumDescriptionShuffleReadShuffleWriteSSD-RDD+HDD-Shuffle1SSDsaveAsTextFileatBagelNWeight.scala20s20sforeachatBagelscala490.3GB14min7.5minflatMapatBagel.scala490.4GB12min13minforeachatBagel.scala379.5GB13min11minflatMapatBagel.scala379.6GB10min10minforeachatBagel.scala19.1GB3.5min3.7minflatMapatBagelscala19.1GB1.5min1.5minforeachatBagel.scala15.3GB38s39sparallelizeatBagelNWeightscala38s38sflatMapatBagelNWeight.scala15.3GB46s46sNVMeSSDHSkillsIObottleneck4/23/201637Intel4/23/201637IfCPUNWeightisnotsobottleneck?xx3-5improvementforshufflex3improvementinE2Etesting11HDDsPCIeSSDHSWStageIdDescriptionInputOutputShuffleReadShuffleWriteDurationDurationDuration23saveAsTextFileatBagelNWeight.scala:102+details50.1GB27.6GB27s20s26s17foreachatBagelscala:256+details732.0GB490.4GB23min7.5min4.6min16flatMapatBagel.scala:96+details732.0GB490.4GB15min13min6.3min11foreachatBagelscala:256+details590.2GB379.5GB25min11min7.1min10flatMapatBagel.scala:96+details590.2GB379.6GB12min10min5.3min6foreachatBagelscala:256+details56.1GB19.1GB4.9min3.7min2.8min5flatMapatBagel.scala:96+details56.1GB19.1GB1.5min1.5min47s2foreachatBagelscala:256+details15.3GB38s39s36s1parallelizeatBagelNWeight.scala:97+details38s38s35s0flatMapatBagelNWeightscala:72+details22.6GB15.3GB46s46s43sIntelConfidential4/23/201638Hierarchystoresummary?IO是Spark應(yīng)用的一個常見瓶頸,而ShuffleRead是造成硬盤往往事倍功半。?使用NVMeSSD做緩存,徹底解決Spark的IO瓶頸,帶IntelConfidential4/23/201639IntelConfidential4/23/201641Usage1.Setthepriorityandthresholdinspark-default.xml.2.Configure“ssd”location:justputthekeywordlike"ssd“inlocaldir.Forexample,inyarn-site.xml:.SamenetworkSamenetworkBWsimilarshuffletimeSSDv.s.Memory:similarperformance,lowerpricesimilarendtoendperformance(<5%)128GBMem192GBMemDuringshuffle,mostinmemoryalready,fewdiskaccess.IntelConfidential4/23/2016
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 雞西市重點(diǎn)中學(xué)2025屆校高三第五次月考物理試題含解析
- 南開大學(xué)濱海學(xué)院《體育休閑娛樂導(dǎo)論》2023-2024學(xué)年第二學(xué)期期末試卷
- 工程質(zhì)量控制中的風(fēng)險(xiǎn)識別與應(yīng)對策略
- 第8課 北宋的政治 教案2024-2025學(xué)年七年級歷史下冊新課標(biāo)
- 白領(lǐng)上班背包使用習(xí)慣問卷
- 金灣區(qū)溫室大棚施工方案
- 襄陽移動木屋施工方案
- 燃燒器改造施工方案
- 噴灰漆施工方案
- 臨時用戶供電施工方案
- 生物質(zhì)燃料的資源開發(fā)與利用
- 《積極心理學(xué)》課件
- 食管胃底靜脈曲張出血的診治指南解讀
- 急性農(nóng)藥中毒護(hù)理查房
- 2024-2025北京高考英語作文及范文匯編
- 2024年八年級語文下冊《經(jīng)典常談》第一章《說文解字》練習(xí)題卷附答案
- 特殊環(huán)境焊接防護(hù)技術(shù)
- 通用電子嘉賓禮薄
- 學(xué)校輔導(dǎo)員安全培訓(xùn)課件
- 太陽能熱水器原理與應(yīng)用
- 左拇指近節(jié)指骨骨折護(hù)理查房課件
評論
0/150
提交評論