版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
?
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虛擬化你的大數(shù)據(jù)應(yīng)用藺永華Vmware,
Inc.?2009VMwareInc.Allrights1Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste2Today’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns3Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste4Challenges
To
Use
Hadoop
in
physical
infrastructureDeployment?
Difficult
to
deploy,
cost
several
people
for
several
days
even
months?
Difficult
to
tune
cluster
performanceLow
Efficiency?
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.?
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure?
Single
point
of
failure
for
Name
Node
and
Job
tracker?
No
HA
for
Hive,
HCatalog,
etc.ChallengesToUseHadoopinph5Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhyVirtualizeHadoop?-Gety6Why
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
?
Single
Hardware
Infrastructure
?
Unified
operations
Optimize
?
Shared
Resources
=
higher
utilization
?
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso750%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresults50%+resourcesaresittingUtiliz8vSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
?
Protection
against
host
and
VM
failures
?
Automatic
failure
detection
(host,
guest
OS)
?
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
?
OS
and
application-independent,does
not
require
complex
configuration
changesvSphereHighAvailability(HA)9(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServer(Coordination)ZookeeprManageme10X
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX?
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures?
Integrated
with
VMware
HA/DRS?
No
complex
clustering
or
specialized
hardware
required?
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
?
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersXXHAHAAppAppA11Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste12Easy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiEasyandrapiddeploymentand13Demo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiDemo:10minutestoaHadoopc14Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste15Common
questions
about
virtualization
Local
Disk?????
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Commonquestionsaboutvirtual16Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste17Can
I
use
local
diskeasily?CanIuselocaldiskeasily?18Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
?
Easy
to
provision
?
Automated
cluster
rebalancingHybrid
Storage
?
SAN
for
boot
images,
other
workloads
?
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther19How
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?Howtoflexiblescalein/scaleou20-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
Node-ComputeCurrentT1T2VMVMVMVM Co21Serengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengetiNodeScaleOut/Sca22Serengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMSerengetiBallooningEnhanceme23How
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?Howtokeepdatastability?How24DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostDatanodeandtasktrackercombined25HadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HadoopTopologyChangesforVirtu26HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo27Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Istheresignificantperformanc28Virtualized
Hadoop
PerformanceVirtualizedHadoopPerformance29Native
versus
Virtual
Platforms,
32
hosts,
16
disks/hostNativeversusVirtualPlatform30Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste31RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo32Customizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
?
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
"hadoop_tasktracker"
],
"instance_type":SMALL,
"instance_num":5,
"ha":false
…CustomizingyourHadoop/HBase33One
command
to
scale
out
your
cluster
with
Serengeti>cluster
resize
–name
<clustername>
--nodegroup
worker
–instanceNum
<#>Onecommandtoscaleoutyour34Configure/reconfigure
Hadoop
with
ease
by
SerengetiModifyHadoop
clusterconfigurationfromSerengeti?
Use
the
“configuration”
section
of
the
json
spec
file?
Specify
Hadoop
attributes
in
core-site.xml,
hdfs-site.xml,
mapred-site.xml,hadoop-env.sh,
perties?
Apply
new
Hadoop
configuration
using
the
edited
spec
file"configuration":{"hadoop":{"core-site.xml":
{//
check
for
all
settings
at
/common/docs/r1.0.0/core-default.html},"hdfs-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/hdfs-default.html},"mapred-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/mapred-default.html"io.sort.mb":
"300"},"hadoop-env.sh":{//
"HADOOP_HEAPSIZE":"",//
"HADOOP_NAMENODE_OPTS":"",//
"HADOOP_DATANODE_OPTS":"",…>
cluster
config
--name
myHadoop
--specFile
/home/serengeti/myHadoop.jsonConfigure/reconfigureHadoopw35Freedom
of
Choice
and
Open
SourceCommunity
ProjectsDistributions?
Flexibilityto
choosefrom
major
distributions
cluster
create
--name
myHadoop
--distro
apache?
Supportfor
multipleprojects?
Open
architectureto
welcomeindustryparticipation?
ContributingHadoop
VirtualizationExtensions(HVE)to
open
sourcecommunityFreedomofChoiceandOpenSou36HDFS2
with
Namenode
Federation
and
HADeploy
CDH4
Hadoop
cluster
?
Name
Node
Federation
?
Name
Node
HA
?
MapReduce
v1?
HBase,
Pig,
Hive,
and
Hive
ServerCDH4
configurationsScale
outElasticityJobTracker
HA/FTActiveNamenodeStandby
NamenodeActiveNamenodeStandby
NamenodeZookeeper
GroupZKZKZK
CoordinateNamenodeGroup1Coordinate
NamenodeGroup2Quorum-basedmetadatastore
Data
NodesDatanode
Datanode
Datanode
Datanode
Datanode
Datanode
Datanode
DatanodeBlockreportBlockreportHDFS2withNamenodeFederation37Proactive
monitoring
and
tuning
with
VCOPsProactivelymonitoring
through
VCOPsGain
comprehensivevisibilityEliminatemanual
processeswith
intelligentautomationProactivelymanage
operationsProactivemonitoringandtunin38Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste39VMWarebringsAgility,
Efficiency,
and
Elasticityto
Big
DataElasticity
Enable
full
elasticity
through
separation
of
Data
and
Compute
Scale
In/Out
Hadoop
with
Resource
ConstrainAgility
Deploy,
configure
and
monitor
Hadoop
clusters
on
the
fly
Dynamic
reconfiguring
of
Hadoop
to
meet
changing
business
demandsEfficiency
Consolidate
Hadoop
to
achieve
higher
utilization
Pool
resources
to
allow
for
increased
performance
and
priority
job
processingVMWarebringsAgility,Efficienc40Serengeti
ResourcesDownload
and
try
Serengeti
?
VMware
Hadoop
site
?
/hadoopSerengetiResourcesVMwareHado41?
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虛擬化你的大數(shù)據(jù)應(yīng)用藺永華Vmware,
Inc.?2009VMwareInc.Allrights42Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste43Today’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns44Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste45Challenges
To
Use
Hadoop
in
physical
infrastructureDeployment?
Difficult
to
deploy,
cost
several
people
for
several
days
even
months?
Difficult
to
tune
cluster
performanceLow
Efficiency?
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.?
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure?
Single
point
of
failure
for
Name
Node
and
Job
tracker?
No
HA
for
Hive,
HCatalog,
etc.ChallengesToUseHadoopinph46Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhyVirtualizeHadoop?-Gety47Why
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
?
Single
Hardware
Infrastructure
?
Unified
operations
Optimize
?
Shared
Resources
=
higher
utilization
?
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso4850%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresults50%+resourcesaresittingUtiliz49vSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
?
Protection
against
host
and
VM
failures
?
Automatic
failure
detection
(host,
guest
OS)
?
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
?
OS
and
application-independent,does
not
require
complex
configuration
changesvSphereHighAvailability(HA)50(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServer(Coordination)ZookeeprManageme51X
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX?
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures?
Integrated
with
VMware
HA/DRS?
No
complex
clustering
or
specialized
hardware
required?
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
?
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersXXHAHAAppAppA52Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste53Easy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiEasyandrapiddeploymentand54Demo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiDemo:10minutestoaHadoopc55Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste56Common
questions
about
virtualization
Local
Disk?????
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Commonquestionsaboutvirtual57Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste58Can
I
use
local
diskeasily?CanIuselocaldiskeasily?59Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
?
Easy
to
provision
?
Automated
cluster
rebalancingHybrid
Storage
?
SAN
for
boot
images,
other
workloads
?
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther60How
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?Howtoflexiblescalein/scaleou61-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
Node-ComputeCurrentT1T2VMVMVMVM Co62Serengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengetiNodeScaleOut/Sca63Serengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMSerengetiBallooningEnhanceme64How
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?Howtokeepdatastability?How65DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostDatanodeandtasktrackercombined66HadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HadoopTopologyChangesforVirtu67HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo68Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Istheresignificantperformanc69Virtualized
Hadoop
PerformanceVirtualizedHadoopPerformance70Native
versus
Virtual
Platforms,
32
hosts,
16
disks/hostNativeversusVirtualPlatform71Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AAgenda?Today’sbigdatasyste72RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo73Customizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
?
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 教學(xué)心得體會
- 工程經(jīng)理述職報(bào)告8篇
- 學(xué)院學(xué)風(fēng)工作總結(jié)(3篇)
- 初中閱讀之星事跡材料范文500字(34篇)
- 房屋建筑工作總結(jié)7篇
- 支部主題日護(hù)士演講詞(3篇)
- 珍惜時(shí)間演講稿400字(32篇)
- 快遞柜和快遞員合同7篇
- 生活垃圾分類工作方案
- 校車消毒登記表
- UHFReader18CSharpDLL動態(tài)連接庫使用手冊V25
- 地溝及蓋板圖集02J331
- 新人教版八年級下冊英語單詞表漢語
- 水箱滿水(閉水)試驗(yàn)記錄(完成)
- 美容導(dǎo)師崗位工作說明書
- 掃黑除惡目錄
- 輸電線路強(qiáng)制性條文監(jiān)理實(shí)施細(xì)則(共21頁)
- 形式發(fā)票中英文_通用范本
- 英語情景劇狐假虎威(課堂PPT)
- 林織項(xiàng)目三級動火許可證
- 瑞文智力測驗(yàn)及答案(經(jīng)典版)
評論
0/150
提交評論