Serengeti-虛擬化你的大數(shù)據(jù)應(yīng)用(VMWare)41_第1頁(yè)
Serengeti-虛擬化你的大數(shù)據(jù)應(yīng)用(VMWare)41_第2頁(yè)
Serengeti-虛擬化你的大數(shù)據(jù)應(yīng)用(VMWare)41_第3頁(yè)
Serengeti-虛擬化你的大數(shù)據(jù)應(yīng)用(VMWare)41_第4頁(yè)
Serengeti-虛擬化你的大數(shù)據(jù)應(yīng)用(VMWare)41_第5頁(yè)
已閱讀5頁(yè),還剩36頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

?

2009

VMware

Inc.

All

rights

reservedSerengeti

-

虛擬化你的大數(shù)據(jù)應(yīng)用藺永華Vmware,

Inc.Agenda?

Today’s

big

data

system?

Why

virtualize

hadoop??

Serengeti

introduction?

Common

questions

about

virtualization?

Serengeti

solution?

Deep

insight

into

Serengeti?

Summary?

Q&AToday’s

Big

Data

System:ETLUnstructured

Data

(HDFS)

Real

TimeStructured

DatabaseBig

SQLData

Parallel

BatchProcessingReal

Time

Streams

Real-Time

Processing

(s4,storm)AnalyticsAgenda?

Today’s

big

data

system?

Why

virtualize

hadoop??

Serengeti

introduction?

Common

questions

about

virtualization?

Serengeti

solution?

Deep

insight

into

Serengeti?

Summary?

Q&AChallenges

To

Use

Hadoop

in

physical

infrastructureDeployment?

Difficult

to

deploy,

cost

several

people

for

several

days

even

months?

Difficult

to

tune

cluster

performanceLow

Efficiency?

Hadoop

clusters

are

typically

not

100%

utilized

across

all

hardware

resources.?

Difficult

to

share

resources

safely

between

different

workloadSingle

Point

of

Failure?

Single

point

of

failure

for

Name

Node

and

Job

tracker?

No

HA

for

Hive,

HCatalog,

etc.Why

Virtualize

Hadoop?

-

Get

your

Hadoop

cluster

in

minutes

1/1000humanefforts,

LeastHadoopoperation

knowledgeFullyautomated

process,10

minutesto

get

aHadoop/HBaseclusterfromscratch

Server

preparation

OS

installation

Automateby

Serengeti

on

vSpherewith

best

practice

Network

Configuration

Hadoop

Installation

and

ConfigurationManual

process,

costdaysWhy

Virtualize

Hadoop?

-

Consolidate

sprawling

clustersClustersshareserverswithstrongisolation

?

Single

Hardware

Infrastructure

?

Unified

operations

Optimize

?

Shared

Resources

=

higher

utilization

?

Elastic

resources

=

faster

on-demand

accessHadoop

DevHadoop

ProdHBase

ClusterSprawlingSingle

purpose

clusters

for

variousbusiness

applications

lead

to

clustersprawl.Cluster

Consolidation

SimplifyFinanceHadoopVirtualization

PlatformHadoop

DevHadoop

ProdHBase...

PortalHadoop

PortalHadoop30%CAPEXDown50%+

resourcesaresittingidlewhilehighpriorityjob

isburningup

its

cluster.Utilizeall

resourcesfrompool

on

demand.

Dynamic

elasticscalingonshared

resourcepoolWhy

Virtualize

Hadoop?

–Utilize

all

your

resources

to

solve

the

priority

problem

3X

fasterto

getanalyticresultsvSphere

High

Availability

(HA)

-

protection

against

unplanned

downtimeOverview

?

Protection

against

host

and

VM

failures

?

Automatic

failure

detection

(host,

guest

OS)

?

Automatic

virtual

machine

restart

in

minutes,

on

any

available

host

in

cluster

?

OS

and

application-independent,does

not

require

complex

configuration

changes(Coordination)ZookeeprManagement

ServerHigh

Availability

for

the

Hadoop

Stack(Hadoop

Distributed

File

System)HBase

(Key-Valuestore)

HDFSMapReduce

(Job

Scheduling/Execution

System)Pig

(DataFlow)HiveBI

ReportingETLToolsRDBMSJobtracker

Namenode(SQL)

Hive

MetaDB

HCatalogHcatalog

MDBServerX

XHA

HAApp

OSApp

App

OS

OSApp

OSApp

OSApp

OSApp

OSVMwareESX

XVMwareESX?

Zero

downtime,

zero

data

loss

failover

for

all

virtual

machines

in

case

of

hardware

failures?

Integrated

with

VMware

HA/DRS?

No

complex

clustering

or

specialized

hardware

required?

Single

common

mechanism

for

all

applications

and

operatingFTvSphere

Fault

Tolerance

provides

continuous

protection

Overview

?

Single

identical

VMs

running

in

lockstep

on

separate

hosts

systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin

HadoopclustersAgenda?

Today’s

big

data

system?

Why

virtualize

hadoop??

Serengeti

introduction?

Common

questions

about

virtualization?

Serengeti

solution?

Deep

insight

into

Serengeti?

Summary?

Q&AEasy

and

rapid

deployment

and

managementOpen

sourceprojectlaunched

in

June

2012,

0.8

is

released

at

Apr.and

willrelease0.9

at

Jun.Toolkitthat

leveragevirtualizationto

simplifyHadoop

deploymentand

operations

Deploy

a

cluster

in

10

Minutes

fully

automated

Customize

Hadoop

and

HBase

cluster

Automated

cluster

operationCome

with

eco-system

componentsSupport

all

popular

Hadoop

DistributionsSerengetiDemo:

10

minutes

to

a

Hadoop

cluster

with

SerengetiAgenda?

Today’s

big

data

system?

Why

virtualize

hadoop??

Serengeti

introduction?

Common

questions

about

virtualization?

Serengeti

solution?

Deep

insight

into

Serengeti?

Summary?

Q&ACommon

questions

about

virtualization

Local

Disk?????

Can

local

disk

be

used

in

virtualization

environment?Flexibilityand

Scalability

How

to

flexible

schedule

resources

between

clusters

and

different

applications

as

mentioned

above?Data

stability

In

virtual

environment,

how

can

we

distribute

data

across

host

and

rack?Data

locality

Hadoop

will

schedule

compute

tasks

near

by

the

data,

to

reduce

network

IO

for

data

R/W.

Can

virtual

environment

get

the

same

result?Performance

How

about

the

performance

in

virtual

environment?Agenda?

Today’s

big

data

system?

Why

virtualize

hadoop??

Serengeti

introduction?

Common

questions

about

virtualization?

Serengeti

solution?

Deep

insight

into

Serengeti?

Summary?

Q&ACan

I

use

local

diskeasily?Other

VMOther

VMOther

VMOther

VMOther

VMOther

VMOther

VMOther

VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend

Virtual

StorageArchitectureto

IncludeLocalDiskShared

Storage:SAN

or

NAS

?

Easy

to

provision

?

Automated

cluster

rebalancingHybrid

Storage

?

SAN

for

boot

images,

other

workloads

?

Local

disk

for

Hadoop

&

HDFSHostHostHostHostHostHostHow

to

flexiblescalein/scaleoutHow

to

flexiblescheduleresourcesbetween

clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVM

Combined

Storage/Com

puteHadoopinVM-

*

VM

lifecycle

determined

by

Datanode-

*

Limited

elasticityVM

Storage

SeparateStorageVM

Storage

SeparateComputeClusters-

*

Separate

compute

-

fromdata-

*

Remove

elasticconstrain-

by

Datanode-

*

Elastic

compute-

*

Raise

utilization-*

Separate

virtual

compute*

Compute

clusterpertenant*

Stronger

VM-grade

securityand

resourceisolationEvolution

of

Hadoop

on

VMs

Data/Compute

separation

Slave

NodeSerengeti

Node

Scale

Out

/

Scale

InNameNode

Host

DHostJobTrackerCCCC

DHostCCC

C

DHostCCC

C

DHostCCC

CSerengeti

Ballooning

Enhancement

for

Java

ApplicationJVMGuest

OSHostJVMGuest

OSHostGuest

OS

JVMHow

to

keep

data

stability?How

to

access

data

locallyif

data

node

and

computenodeare

located

in

differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster

Hostworker

Hostworker

Hostmaster

HostData

node

HostTasktrackerData

node

HostTasktrackerTasktrackerTasktracker

Data

node

HostComputeonly

cluster1Computeonly

cluster2HDFS

cluster

Compute

OnlyclusterRack1Rack2Rack1Distributed

and

Data/Compute

Associated

VM

Placement

Rack2

Rack1Job

trackerJob

trackerName

node

Host

Rack2TasktrackerTasktracker

Data

node

HostHadoopTopologyChangesfor

VirtualizationHadoop

Topology

Awareness

Serengeti

HVE

/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81

12

321

1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension

Hadoop

Virtualization

Extensions

for

Topology

HVE

TaskScheduling

PolicyExtension

BalancerPolicy

ExtensionReplicaChoosing

PolicyExtensionReplicaPlacement

PolicyExtension

ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop

CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is

there

significantperformancedegradationin

virtualizationenvironment?Is

there

any

performancedata?Virtualized

Hadoop

PerformanceNative

versus

Virtual

Platforms,

32

hosts,

16

disks/hostAgenda?

Today’s

big

data

system?

Why

virtualize

hadoop??

Serengeti

introduction?

Common

questions

about

virtualization?

Serengeti

solution?

Deep

insight

into

Serengeti?

Summary?

Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI

Client

Flex

UISerengeti

architecture

diagram

CLI

Client

Spring

Shell

Serengeti

Web

ServiceHibernate/

DAOvPostgresVC

adapter

Ironfan

service

ThriftService

ProgressIronfan

report

Chef

serverRestAPICookbookVHMstepRabbitMQVM

runtime

ManagerHostHostHostHostHostVirtualization

PlatformHadoop

NodeChefClient

HA

kitHadoop

NodeHadoop

NodePackagerepositoryvCenterCustomizing

your

Hadoop/HBase

cluster

with

Serengeti

Choiceof

distros

Storageconfiguration

?

Choice

of

shared

storage

or

Local

disk

Resourceconfiguration

High

availabilityoption

#

of

nodes…

"distro":"apache",

"groups":[

{

"name":"master",

"roles":[

"hadoop_namenode",

"hadoop_jobtracker”],

"storage":

{

"type":

"SHARED",

"sizeGB":

20},

"instance_type":MEDIUM,

"instance_num":1,

"ha":true},

{"name":"worker",

"roles":[

"hadoop_datanode",

"hadoop_tasktracker"

],

"instance_type":SMALL,

"instance_num":5,

"ha":false

…One

command

to

scale

out

your

cluster

with

Serengeti>cluster

resize

–name

<clustername>

--nodegroup

worker

–instanceNum

<#>Configure/reconfigure

Hadoop

with

ease

by

SerengetiModifyHadoop

clusterconfigurationfromSerengeti?

Use

the

“configuration”

section

of

the

json

spec

file?

Specify

Hadoop

attributes

in

core-site.xml,

hdfs-site.xml,

mapred-site.xml,hadoop-env.sh,

perties?

Apply

new

Hadoop

configuration

using

the

edited

spec

file"configuration":{"hadoop":{"core-site.xml":

{//

check

for

all

settings

at

/common/docs/r1.0.0/core-default.html},"hdfs-site.xml":{//

check

for

all

settings

at

/common/docs/r1.0.0/hdfs-default.html},"mapred-site.xml":{//

check

for

all

settings

at

/common/docs/r1.0.0/mapred-default.html"io.sort.mb":

"300"},"hadoop-env.sh":{//

"HADOOP_HEAPSIZE":"",//

"HADOOP_NAMENODE_OPTS":"",//

"HADOOP_DATANODE_OPTS":"",…>

cluster

config

--name

myHadoop

--specFile

/home/serengeti/myHadoop.jsonFreedom

of

Choice

and

Open

SourceCommunity

ProjectsDistributions?

Flexibilityto

choosefrom

major

distributions

cluster

create

--name

myHadoop

--distro

apache?

Supportfor

multipleprojects?

Open

architectureto

welcomeindustryparticipation?

ContributingHadoop

VirtualizationExtensions(HVE)to

open

sourcecommunityHDFS2

with

Namenode

Federation

and

HADeploy

CDH4

Hadoop

cluster

?

Name

Node

Federation

?

Name

Node

HA

?

MapReduce

v1?

HBase,

Pig,

Hive,

and

Hive

ServerCDH4

configurationsScale

outElasticityJobTracker

HA/FTActiveNamenodeStandby

NamenodeActiveNamenodeStandb

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論