




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
IntroductiontoclustercomputingresourcesforNCNXufengWangElectricalandComputerEngineeringPurdueUniversityWestLafayette,IN47906IntroductionWelcome!ThispresentationisdesignedtohelppeoplegetfamiliarwithNCNcomputationalclusterresources.Youwilllearnwhatiscluster,itscomponents,andothers.2TableofcontentsPrelude:understandclustercomputingfromhumanthinkingClustercomponent#1:clustercomputingnodesClustercomponent#2:PublicBatchSystem(PBS)Clustercomponent#3:front-endmachinesNCNresourcesoverviewReferences3AsimpleproblemProblem“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”4CriticalelementsofthinkingDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.Mybrainbeginstocompute:3*10+4*2=38Igottheanswer!Result“38”isbufferedinmybrain.Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.5Criticalelementsofthinking6PaperProblemMathmaticalExpressionMemorypowerofbrainComputingpowerofbrainDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.3. Myeyesreadtheproblem,”3*10+4*2=?"isstored,orbuffered,inmybrain,readytobecomputed.4. Mybrainbeginstocompute:3*10+4*2=385. Igottheanswer!Result“38”isbufferedinmybrain.6. Mybrainsignalsmyhandtowritedowntheresult.Resultisthusstoredonthepaper.2. Writeproblemonapieceofpaper:”3*10+4*2=?”.Problemisthusstoredonthepaper.7. Icanforgetaboutthebufferedresult“38”inmybrainnow,asitiswrittendownonthepaper.Criticalelementsofcomputer’sthinking7ProblemMATLABscriptMemorypowerofcomputerComputingpowerofcomputerDescribetheabstractproblemwithacertainmodel/toolthatmybraincanhandle.Forexample,mathematicalexpressions.FilestoredinharddriveKeycharacteristicsMathmaticalexpression/MATLABscript[ComputerLanguage]Bothareintermediatethattranslateshuman’sabstractthinkingintoalanguageconvenientforcomputationandreadablebyothers.Paper/Filestoredonharddrive[Filestoragesystem]Botharephysicalitemsthatcanrecordinformation.Memorypowerofbrain/computer[RandomAccessMemory]Botharealsophysicalitemsthatcanrecord,butmuchfasterandprecious.Computingpowerofbrain/computer[CPU]Bothcancompute,thatis,processtheinformation.However,itcanonlyprocessinformationfromcertainphysicalmemory.8ComponentsonamodernASUSmotherboard9ProblemMATLABscriptHardDriveConnectorRAMsockets(yellow&black)MountedCPUinsideNBSBUSBNeedforcomputerclustersHereatNCN,weneedcomputingresourcesthatcan:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.Basedonourunderstandingofsinglecorecomputer,howdoweexpandittosuitourneeds?Well,ofcourse,theobviousansweris:IfwesimplygetNsinglecorecomputersystems,wecanallowuptoNuserstosolveNproblemsatthesametime!Let’slookatascenariowhich2usersaretryingtosolve3problemssimultaneously.102userswith3problemsBasedonourpreviousidea,wenowhavethreeindependentandidenticalcomputerssolving3problemsfrom2users.But,isitefficient?11Problem1HardDriveforUserAP_1.mCPURAMProblem2HardDriveforUserAP_2.mCPURAMProblem3HardDriveforUserBP_3.mCPURAMHardDriveStorageExplained“Harddrive”and“RandomAccessMemory”(RAM)bothhasthecapabilitytostoreinformation.Whyweneedtohavetwomemoryunits?What’stheirdifference?12HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHigh“Harddrive”isthusidealforstoringLargeamountofdata(largesize,lowcost)Datathathaslowread-writedemand(slowI/Orate)Long-termdata(non-volatile)RAMstorageexplainedHowever,whendoingintensivecomputation:thecommunicationbetweenmemorytoCPUshallberapid,veryfastI/Oneeded.onlyusedvariablesarestoredinmemory,thusthememorydoesn’thavetobelarge.memoryistemporary.Volatilememoryisok.RAMisthusidealforsuchsituation,andthatiswhywehavetwoformsofmemorystorageinacomputer.13HarddriveRAMUsualsizeInordersofGBorTB8GB–128GBRead/writespeedSlowFastStructurePlatterwitharm“needle”SolidstatetransistorsVolatile?NoYesPriceLowHighEPluribusUnumMemorystoragecanbesharedamongusers,aslongastheinformationarewellmanagedsousers’fileswon’tmixedup.14Problem1CPUProblem2CPUProblem3CPU1MBof500GBused4GBof8GBusedAdditionalofproblemswithoutIncreasingtheCost?15Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBused4problemscannotbeefficientlysolvedon3CPUssimultaneously.Wehowevercansolve3problemsfirstandthentheremainingonewheneveraCPUbecomesfree.It’slikedinningatabusyrestaurant:youneedtotakeyourorderandwaittobeseated.WhenasingleCPUtakesmultiplejobs
IfasingleCPUhasmultipletasksatthesametime(commonscenarioindesktopcomputers),itwillsimplyprocessonetaskforaveryshortmoment,stop,andgoprocessthenexttaskforaveryshortmoment,andsoon.Thisrapidprocessingofalltasksinsuccessiongivesauseranillusionthatalltasksarebeingprocessedatthesametime.Asthenumberofjobsincreases,moretimeisspentonCPUI/Ocommunication.JobswillbecomeslowerduetolongerwaittimetobeservedbyCPUandhigherI/Orequests.16CPUProcess#1Process#2Process#3Process#4Process#5Solving4problemswith3CPUs17Problem1CPUProblem2CPUProblem3CPUProblem41.5MBof500GBused6GBof8GBusedManagewhichjobtobesubmittedtoCPUsPBSScientificcomputationrequiresdedicatedCPU(s)tooneprocess.Thus,amanagementsystemisneededtoensureproperassignmentofCPUtoeachtask.ThisistheconceptofPublicBatchSystem(PBS)Clustercomponents18Problem1CPUProblem2CPUProblem3CPUProblem4PBSUserswrite,edit,andmanagefiles.Storelargeamountoffiles.Preparescriptsforrunning.Manageuser’srequest(numberofCPUs,RAMsize,etc.)CoordinatetaskswithcomputationalresourcesProviderawcomputationpowerFront-endMachinePBSClustersClustersexplained“Compute!Compute!Compute!”Inourdefinition,“clusters”aregroupsofRAMandCPUswiththeirsupportingcomponentstoproviderawcomputationalpower.19CPUCPUCPUOursimpleexamplehere:3CPUssharing1RAMisfarnotenoughtobeacomputationpowerhorse.Howdoweexpandthemtomakeahugeclustertoaccommodatelargeamountofcomputationaljobs?ToPBSAclusternodeRAMiscappedat8GBmaxforourCPUs.ThemoreCPUsattachedtoaRAM,thelessshareofmemoryeachCPUwillhaveinaverage.Inaddition,CPUmanufacturesusuallypack2(dualcore)or4(quadcore)CPUspersocket,with1~2socketssharing1RAM.20CPUCPUCPUSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeFormingasimpleclusterwithnodesOuroriginalgoal:Solvelargeamountofproblemsatthesametime.Servelargeamountofusersatthesametime.WearchivedthegoalbycouplingCPUswithRAMtoformnodes,andexpandthenumberofnodesinservice.Inthissmallmodelcluster,wehave6nodeswith8CPUspernode=48totalCPUsinservice,averaging16GB/8=2GBRAMperCPUateachnode.Roughly,48problemscanbesolvedatthesametime.21NodeNodeNodeNodeNodeNodeToPBSExploitingthecomputationalresources,inagoodway“Ok,clustersseemtomearejustbunchofcomputerssittingtogether.Howcanthatgivethemacomputationaladvantageoversinglecorecomputers?”Answer:TherealpowerofclusterscomesfromthecouplingofCPUswithinanodeandamongthenodesthemselves.Ouroriginalproblem:“Ihave3redboxeswith10pensineachofthemand4blackboxeswith2pensineachofthem.HowmanypensdoIhaveintotal?”Solve: 3*10+4*2=?22Solve3*10+4*2=?23ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeCPU#1>>3*10+4*2=?communications3*10=304*2=830+8=38Solve3*10+4*2=?Uncoupledcalculationscanbedonesimultaneouslytosavetime.Exploitparallelism,butnotdowntomachinelevel,i.e.humanpostprocessingneeded.“Embarrassinglyparallelscheme”.24ToPBSSharedRAM(16GB)CPUCPUCPUCPUCPUCPUCPUCPUQuadCore#1QuadCore#2Thisisa(steele)clusterNodeTask#1>>3*10=?Task#2>>4*2=?Task#3>>30+8=?Processmanuallycommunications3*10=30communications4*2=8com.30+8=38waitforCPU#1postprocessCPU#0>>CPU#1do:3*10=?CPU#2do:4*2=?Solve3*10+4*2=?25CPU#1>>3*10=?CPU#2>>4*2=?CPU#0>>CPU#1do:30+8=?sendreceiveMasterCPUSlaveCPUsParallelprogramming:MasterandSlaveconfigurationcom.communications3*10=30communications4*2=830+8=38waitforCPU#1receivesendsendcom.receiveThose“actionsofcollaboration”betweenCPUscannotbearchivedbytraditionalprogramminglanguagesuchasC,C++,MATLAB,andetc.MessagePassingInterface(MPI)MessagePassingInterface,commonlyknownasMPI,isintroducedasadditionallibrariestoseveralpopularexistingcomputerlanguages(C,C++,FORTRAN)toarchivescript-levelparallelprogramming.MPIallowsthecodewritertocontrolthecommunicationbetweenCPUs.“Actions”mentionedpreviouslycanbearchivedbywritingspecificMPIsentenceswithintheprogram.Examples: “sendthisvariablefromCPU#0toCPU#1”–MPI_send “addtheresultsgotfromCPU#1andCPU#2”–MPI_addModernscientificcodeswithMPIcanconsumelargeamountofCPUsandhourstosolvecomplicatedproblems.(OMENforexample)26Howcan10,000CPUsworkfor1program?Nodesneedtocommunicatewitheachother,soCPUsfromseveralnodescantalkviaMPI.Physicalconnectionsneeded.Noteverynodeneedtocommunicatewithallothers.Acertainnetworkconfigurationisthusneeded.Interconnectsareachievedthroughcables,anddifferenttypesofcablenetworkwillyielddifferentperformance27NodeNodeNodeNodeNodeNodeToPBSNodesInterconnectNetwork(GigabitEthernet,Infiniband,etc)InterconnectnetworkperformanceMajorfactorsevaluatingtheperformanceofinterconnectcables:Transferrate:howmuchdatacanthecabletransferpersecond?Latency:howmuchdelaydoeachtransferoverthecablehas?ThreekindsofcablesaredeployedonPurdueclustersGigabitEthernet:1GB/secwithlowlatency.(steele,pete,etc.)Infiniband:10GB/secwithultralowlatency.(steele,non-NCN)10GigabitEthernet:10Gb/secwithultralowlatency.(Coates)ThingsworthtomentionSerialprogramsdonotbenefitfromtheseinterconnectcables;MPIprogramsthatneedslotsofI/ObetweenCPUswilldo.UtilizingInfinibandmayrequireextracompilinglibrary.28Clusterssummary29UsertypeSolveproblemsviaofficedesktop/laptopSolveproblemsviaclustersCausalusersShortserialprogramsSlowdownyourcomputer.Unreliable.Fastprocessorsandlargememory.Donotslowdownyourcomputer.IntermediateusersMultiple,long-runserialprogramsRunprogram1by1.Significantlyslowdownyourcomputer.Embarrassinglyparallelyourjobs.FastanddonotslowyourPCdown.AdvancedusersMultiple,long-run,MPIbasedparallelprogramsCannotdoparallelruninsinglecorecomputers.ProgramisdesignedtorunonclusterswithmanyCPUs.TheSteeleclusterClustershavetomeettheneedswithvarioususers,sotheycanbemadetohavedifferentkindsofnodes.30NCNownednodesarealllocatedatSub-Cluster“Steele-A”.NCNalsoownnodesonotherclusterssuchas“Pete”and“Coates”.Detailswillbediscussedlater.Referencesandrecommendations31InterludeMorecompletepictureofentiresystem32FrontendmachineexplainedFront-endmachineisthegatewayforallusers.Itprovidesstorageandallowsuserstocomposite,compile,andmanagetheirfiles.ItisarathercompletecomputeritselfwithitsownCPUsandRAMs.Itisdesignedtoservegreatnumberofusersandstoreextremelyhighvolumeoffiles.33Problem1Problem2Problem3Front-endRAMFront-endCPUSteele’sfront-endmachine34ComparingFront-endmachinetoclusters35Front-endmachineClustersCPURAMCPURAMCharacterSameasclustersSameasfront-endmachineNumberFewAbundantUsercontrolNocontroloverCPUassignmentorRAMsize.TotalcontroloverCPUassignmentandRAMsizeviaPBSParallelcomputingSinglecoreprogramonly.CancompilebutshouldnotrunMPIprograms.MPIprogramscanbecompiledandrunhere.PurposeLightdutyfileediting,management,andcompilingHeavydutycomputationThus,NOcomputationalprogram,ex.MATLAB,onfront-endmachineforheavycalculations.Thisevenincludesdatapost-processing.Forserialjobs,allocatesingleCPUfromclustersviaPBS.FilestoragesolutionsOurmodel“sharedharddrive”isinrealitya“sharednetworkstorage”offeredviaBlueArcsystem.Twotiersofstorageoffering320TBspace.36SharedNetworkStorageNewfilesFibreChanneldisk(fast&expensive)SATAdisk(slow&cheap)RecentfilesOldfilesIfcalledtobeusedIfgetsoldandunusedFortressDXULSystemFortressDXULsystemprovidesasolutiontolong-termstorageforlargefiles.Noactivefilesshallbestoredhere.Nolargecollectionsofsmallfilesshallbestoredhere.Compressthem(viatarballorzip)firstandthenstore.37SharedNetworkStorageFortressDXULSystemLow-costdisksTape/opticaldisksTapecartridgeTapecartridgePrimarycopySecondarycopyForfilessmallerthan0.5MBForfileslargerthan0.5MBFront-endmachinessummary38RegularofficeworkstationFront-endmachinewithBlueArcstorageFortressDXULSystemPrimarystoragesizeDepend(usually100GB-500GB)Largeintotal,butcanbelimitedperperson(1-10GB)Huge,upto5TBperperson.Primarybackup?UsuallynoYesYesSecondarystoragesizeDepend(usuallynosecondharddrive)Scratchdrives(250GB).Large.Second.backup?UsuallynoYesAccessspeedSlow(SATAdrive)Fast(Fibredisk)VeryslowSoftwareavailabilityLimitedAbundantVeryfewPurposeDailyusageGatewaytoclustersLong-termstorageReferencesandrecommendations3
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 建筑設(shè)計(jì)及可持續(xù)性發(fā)展作業(yè)指導(dǎo)書
- 如何做好目標(biāo)管理
- 2025年征信法規(guī)解讀考試題庫:征信業(yè)務(wù)操作與風(fēng)險(xiǎn)防控試題
- 2025年小學(xué)語文畢業(yè)升學(xué)考試全真模擬卷(語文綜合素養(yǎng)測評(píng))古詩文背誦與默寫技巧
- 2025年小學(xué)語文畢業(yè)升學(xué)考試全真模擬卷(口語表達(dá)訓(xùn)練與能力測試試題)
- 2025年鄉(xiāng)村醫(yī)生考試題庫:農(nóng)村醫(yī)療衛(wèi)生服務(wù)體系建設(shè)醫(yī)療醫(yī)療倫理體系建設(shè)試題
- 2025年小學(xué)語文畢業(yè)升學(xué)考試全真模擬卷(基礎(chǔ)夯實(shí)版)作文素材搜集與積累試題
- 2025年小學(xué)英語畢業(yè)考試模擬卷(英語繪本閱讀):《彼得·潘》飛翔夢想試題
- 2025年自然災(zāi)害防范安全教育培訓(xùn)考試題庫(防災(zāi)減災(zāi)歷史案例分析)試題
- 2025年注冊(cè)會(huì)計(jì)師考試《會(huì)計(jì)》全真模擬實(shí)戰(zhàn)試題權(quán)威解答與解析
- 金陵十二釵判詞欣賞
- 500噸每日小區(qū)生活污水處理工程設(shè)計(jì)大學(xué)本科畢業(yè)論文
- 耶路撒冷問題
- 《結(jié)業(yè)證書》模板范本
- 密度計(jì)法顆粒分析試驗(yàn)記錄(自動(dòng)和計(jì)算)
- 焊接工藝評(píng)定規(guī)程
- 五腧穴、原穴、郄穴、募穴、背俞穴、絡(luò)穴、八脈交會(huì)穴、八會(huì)穴、下合穴
- DL-T 1083-2019 火力發(fā)電廠分散控制系統(tǒng)技術(shù)條件
- 七級(jí)美術(shù)下冊(cè)第4課扮靚生活的花卉紋樣課件3湘美版版本
- 三顧茅廬之隆中對(duì)課件
- 創(chuàng)傷后應(yīng)激障礙(PTSD)
評(píng)論
0/150
提交評(píng)論