




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、 2009 VMware Inc. All rights reservedSerengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ATodays Big Data System:ETLUnstructured Data (HDFS)Real TimeStruct
2、uredDatabaseBig SQLDataParallelBatchProcessingReal TimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&AChallenges To Use Hadoop
3、 in physical infrastructureDeployment Difficult to deploy, cost several people for several days even months Difficult to tune cluster performanceLow Efficiency Hadoop clusters are typically not 100% utilized across all hardware resources. Difficult to share resources safely between different workloa
4、dSingle Point of Failure Single point of failure for Name Node and Job tracker No HA for Hive, HCatalog, etc.Why Virtualize Hadoop? - Get your Hadoop cluster in minutes1/1000humanefforts,LeastHadoopoperation knowledgeFullyautomated process,10 minutesto get aHadoop/HBaseclusterfromscratchServer prepa
5、rationOS installationAutomateby Serengeti onvSpherewith best practiceNetwork ConfigurationHadoop Installation andConfigurationManual process, costdaysWhy Virtualize Hadoop? - Consolidate sprawling clustersClustersshareserverswithstrongisolation Single Hardware Infrastructure Unified operations Optim
6、ize Shared Resources = higher utilization Elastic resources = faster on-demand accessHadoop DevHadoopProdHBaseClusterSprawlingSingle purpose clusters for variousbusiness applications lead to clustersprawl.Cluster Consolidation SimplifyFinanceHadoopVirtualization PlatformHadoopDevHadoopProdHBase.Port
7、alHadoopPortalHadoop30%CAPEXDown50%+ resourcesaresittingidlewhilehighpriorityjob isburningup its cluster.Utilizeall resourcesfrompool on demand.Dynamic elasticscalingonsharedresourcepoolWhy Virtualize Hadoop? Utilize all your resources to solve the priority problem3X fasterto getanalyticresultsvSphe
8、re High Availability (HA) - protection against unplanned downtimeOverview Protection against host and VM failures Automatic failure detection (host, guest OS) Automatic virtual machine restart in minutes, on any available host in cluster OS and application-independent,does not require complex config
9、urationchanges(Coordination)ZookeeprManagement ServerHigh Availability for the Hadoop Stack(Hadoop Distributed File System)HBase (Key-Valuestore)HDFSMapReduce (Job Scheduling/Execution System)Pig (Data Flow)HiveBI ReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalog MDBServerX XHA
10、 HAAppOSApp AppOS OSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX Zero downtime, zero data lossfailover for all virtual machines incase of hardware failures Integrated with VMware HA/DRS No complex clustering orspecialized hardware required Single common mechanism for allapplications and operatingFTvSpher
11、e Fault Tolerance provides continuous protectionOverview Single identical VMs running inlockstep on separate hostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsin HadoopclustersAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualizat
12、ion Serengeti solution Deep insight into Serengeti Summary Q&AEasy and rapid deployment and managementOpen sourceprojectlaunched in June 2012, 0.8 is released at Apr.and willrelease0.9 at Jun.Toolkitthat leveragevirtualizationto simplifyHadoop deploymentand operationsDeploy a cluster in 10 Minut
13、es fully automatedCustomize Hadoop and HBase clusterAutomated cluster operationCome with eco-system componentsSupport all popular Hadoop DistributionsSerengetiDemo: 10 minutes to a Hadoop cluster with SerengetiAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questio
14、ns about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACommon questions about virtualizationLocal DiskCan local disk be used in virtualization environment?Flexibilityand ScalabilityHow to flexible schedule resources between clusters and differentapplications as mention
15、ed above?Data stabilityIn virtual environment, how can we distribute data across host and rack?Data localityHadoop will schedule compute tasks near by the data, to reduce networkIO for data R/W. Can virtual environment get the same result?PerformanceHow about the performance in virtual environment?A
16、genda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACan I use local diskeasily?Other VMOther VMOther VMOther VMOther VMOther VMOther VMOther VMHadoopHadoopHadoopHadoopHadoopHado
17、opHadoopHadoopHadoopHadoopSerengetiExtend Virtual StorageArchitectureto IncludeLocalDiskShared Storage:SAN or NAS Easy to provision Automated cluster rebalancingHybrid Storage SAN for boot images, otherworkloads Local disk for Hadoop & HDFSHostHostHostHostHostHostHow to flexiblescalein/scaleoutH
18、ow to flexiblescheduleresourcesbetween clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM- * VM lifecycledeterminedby Datanode- * Limited elasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters- * Separate compute -fromdata- * Remove elast
19、icconstrain- by Datanode- * Elastic compute- * Raise utilization-* Separate virtual compute* Compute clusterpertenant* Stronger VM-grade securityand resourceisolationEvolution of Hadoop on VMs Data/Compute separationSlave NodeSerengeti Node Scale Out / Scale InNameNodeHostDHostJobTrackerCCCCDHostCCC
20、CDHostCCCCDHostCCCCSerengeti Ballooning Enhancement for Java ApplicationJVMGuest OSHostJVMGuest OSHostGuest OSJVMHow to keep data stability?How to access data locallyif data node and computenodeare located in differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermasterHostworke
21、rHostworkerHostmasterHostData nodeHostTasktrackerData nodeHostTasktrackerTasktrackerTasktrackerData nodeHostComputeonly cluster1Computeonly cluster2HDFS clusterCompute OnlyclusterRack1Rack2Rack1Distributed and Data/Compute Associated VM PlacementRack2Rack1Job trackerJob trackerName nodeHostRack2Task
22、trackerTasktrackerData nodeHostHadoopTopologyChangesfor VirtualizationHadoop Topology Awareness Serengeti HVE/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81 12 321 1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExte
23、nsionHadoop Virtualization Extensions for TopologyHVETaskScheduling PolicyExtensionBalancerPolicy ExtensionReplicaChoosing PolicyExtensionReplicaPlacement PolicyExtensionReplicaRemovalPolicyExtensionHDFSMapReduceHadoop CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is there significantperfo
24、rmancedegradationin virtualizationenvironment?Is there any performancedata?Virtualized Hadoop PerformanceNative versus Virtual Platforms, 32 hosts, 16 disks/hostSource: http:/ Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solutio
25、n Deep insight into Serengeti Summary Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI ClientFlex UISerengeti architecture diagramCLI ClientSpring ShellSerengetiWebServiceHibernate/DAOvPostgresVC adapterIronfanserviceThriftServiceProgressIronfanreportC
26、hefserverRestAPICookbookVHMstepRabbitMQVM runtimeManagerHostHostHostHostHostVirtualization PlatformHadoopNodeChefClientHA kitHadoopNodeHadoopNodePackagerepositoryvCenterCustomizing your Hadoop/HBase cluster with Serengeti Choiceof distros Storageconfiguration Choice of shared storage or Local disk R
27、esourceconfiguration High availabilityoption # of nodesdistro:apache,groups: name:master,roles:hadoop_namenode,hadoop_jobtracker”,storage: type: SHARED,sizeGB: 20,instance_type:MEDIUM,instance_num:1,ha:true,name:worker,roles:hadoop_datanode,hadoop_tasktracker,instance_type:SMALL,instance_num:5,ha:fa
28、lseOne command to scale out your cluster with Serengeticluster resize name -nodegroup worker instanceNum Configure/reconfigure Hadoop with ease by SerengetiModifyHadoop clusterconfigurationfromSerengeti Use the “configuration” section of the json spec file Specify Hadoop attributes in core-site.xml,
29、 hdfs-site.xml, mapred-site.xml,hadoop-env.sh, perties Apply new Hadoop configuration using the edited spec fileconfiguration:hadoop:core-site.xml: / check for all settings at /common/docs/r1.0.0/core-default.html,hdfs-site.xml:/ check for all settings at http:/hadoop
30、./common/docs/r1.0.0/hdfs-default.html,mapred-site.xml:/ check for all settings at /common/docs/r1.0.0/mapred-default.htmlio.sort.mb: 300,hadoop-env.sh:/ HADOOP_HEAPSIZE:,/ HADOOP_NAMENODE_OPTS:,/ HADOOP_DATANODE_OPTS:, cluster config -name myHadoop -specFile /home/s
31、erengeti/myHadoop.jsonFreedom of Choice and Open SourceCommunity ProjectsDistributions Flexibilityto choosefrom major distributionscluster create -name myHadoop -distro apache Supportfor multipleprojects Open architectureto welcomeindustryparticipation ContributingHadoop VirtualizationExtensions(HVE
32、)to opensourcecommunityHDFS2 with Namenode Federation and HADeploy CDH4 Hadoop cluster Name Node Federation Name Node HA MapReduce v1 HBase, Pig, Hive, and Hive ServerCDH4 configurationsScale outElasticityJobTracker HA/FTActiveNamenodeStandby NamenodeActiveNamenodeStandby NamenodeZookeeper GroupZKZKZKCoordinateNamenodeGroup1Coordinate
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025共同租賃房屋合同范本匯編
- 四年級語文上冊13《精衛(wèi)填海》課件
- 環(huán)境設(shè)計實(shí)習(xí)內(nèi)容
- 幼兒園感恩醫(yī)護(hù)工作者主題教育
- 護(hù)理學(xué)科發(fā)展建議
- 教育與社會發(fā)展
- 生活與百分?jǐn)?shù)課件
- 湖北省部分高中協(xié)作體2024-2025學(xué)年高二下學(xué)期期中聯(lián)考地理試題(原卷版)
- 小學(xué)生消防安全教育班會
- 數(shù)學(xué)史與數(shù)學(xué)教育-汪曉勤
- 【2025新教材】教科版一年級科學(xué)下冊全冊教案【含反思】
- 2025年由民政局策劃的離婚協(xié)議官方文本模板
- 高血壓科普健康宣教課件
- 班級安全員信息員培訓(xùn)
- 科技領(lǐng)域?qū)嶒?yàn)室質(zhì)量控制關(guān)鍵技術(shù)與方法
- 商場運(yùn)營部的培訓(xùn)
- 四年級 人教版 數(shù)學(xué)《小數(shù)的意義》課件
- 《糖尿病與肥胖》課件
- 醫(yī)療糾紛防范與醫(yī)患溝通
- 服裝設(shè)計與工藝基礎(chǔ)知識單選題100道及答案
- 鋼結(jié)構(gòu)施工管理培訓(xùn)課件
評論
0/150
提交評論