版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
?
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虛擬化你的大數(shù)據(jù)應(yīng)用藺永華Vmware,
Inc.Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AToday’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsAgenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AChallenges
To
Use
Hadoop
in
physical
infrastructureDeployment?
Difficult
to
deploy,
cost
several
people
for
several
days
even
months?
Difficult
to
tune
cluster
performanceLow
Efficiency?
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.?
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure?
Single
point
of
failure
for
Name
Node
and
Job
tracker?
No
HA
for
Hive,
HCatalog,
etc.Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhy
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
?
Single
Hardware
Infrastructure
?
Unified
operations
Optimize
?
Shared
Resources
=
higher
utilization
?
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDown50%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresultsvSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
?
Protection
against
host
and
VM
failures
?
Automatic
failure
detection
(host,
guest
OS)
?
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
?
OS
and
application-independent,does
not
require
complex
configuration
changes(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServerX
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX?
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures?
Integrated
with
VMware
HA/DRS?
No
complex
clustering
or
specialized
hardware
required?
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
?
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersAgenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&AEasy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiDemo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiAgenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&ACommon
questions
about
virtualization
Local
Disk?????
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Agenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&ACan
I
use
local
diskeasily?Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
?
Easy
to
provision
?
Automated
cluster
rebalancingHybrid
Storage
?
SAN
for
boot
images,
other
workloads
?
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostHow
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
NodeSerengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMHow
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostHadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Virtualized
Hadoop
PerformanceNative
versus
Virtual
Platforms,
32
hosts,
16
disks/hostAgenda?
Today’s
big
data
system?
Why
virtualize
hadoop??
Serengeti
introduction?
Common
questions
about
virtualization?
Serengeti
solution?
Deep
insight
into
Serengeti?
Summary?
Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterCustomizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
?
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
"hadoop_tasktracker"
],
"instance_type":SMALL,
"instance_num":5,
"ha":false
…One
command
to
scale
out
your
cluster
with
Serengeti>cluster
resize
–name
<clustername>
--nodegroup
worker
–instanceNum
<#>Configure/reconfigure
Hadoop
with
ease
by
SerengetiModifyHadoop
clusterconfigurationfromSerengeti?
Use
the
“configuration”
section
of
the
json
spec
file?
Specify
Hadoop
attributes
in
core-site.xml,
hdfs-site.xml,
mapred-site.xml,hadoop-env.sh,
perties?
Apply
new
Hadoop
configuration
using
the
edited
spec
file"configuration":{"hadoop":{"core-site.xml":
{//
check
for
all
settings
at
/common/docs/r1.0.0/core-default.html},"hdfs-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/hdfs-default.html},"mapred-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/mapred-default.html"io.sort.mb":
"300"},"hadoop-env.sh":{//
"HADOOP_HEAPSIZE":"",//
"HADOOP_NAMENODE_OPTS":"",//
"HADOOP_DATANODE_OPTS":"",…>
cluster
config
--name
myHadoop
--specFile
/home/serengeti/myHadoop.jsonFreedom
of
Choice
and
Open
SourceCommunity
ProjectsDistributions?
Flexibilityto
choosefrom
major
distributions
cluster
create
--name
myHadoop
--distro
apache?
Supportfor
multipleprojects?
Open
architectureto
welcomeindustryparticipation?
ContributingHadoop
VirtualizationExtensions(HVE)to
open
sourcecommunityHDFS2
with
Namenode
Federation
and
HADeploy
CDH4
Hadoop
cluster
?
Name
Node
Federation
?
Name
Node
HA
?
MapReduce
v1?
HBase,
Pig,
Hive,
and
Hive
ServerCDH4
configurationsScale
outElasticityJobTracker
HA/FTActiveNamenodeStandby
NamenodeActiveNamenodeStandb
溫馨提示
- 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 武漢民政職業(yè)學(xué)院《電工技術(shù)與電氣控制》2023-2024學(xué)年第一學(xué)期期末試卷
- 個(gè)性化高端導(dǎo)購(gòu)服務(wù)2024協(xié)議
- 2024版在線教育平臺(tái)合作協(xié)議3篇
- 2024版反擔(dān)保協(xié)議二
- 二零二五版臨時(shí)用工崗位合同范本6篇
- 二零二五年度金融科技股票投資委托合同模板3篇
- 二零二五年度食品飲料個(gè)人物資采購(gòu)合同參考文本6篇
- 四川職業(yè)技術(shù)學(xué)院《稅收理論與實(shí)務(wù)》2023-2024學(xué)年第一學(xué)期期末試卷
- 二零二五版城市改造房屋拆遷掛靠管理合同3篇
- 2024美團(tuán)商家入駐平臺(tái)數(shù)據(jù)共享及隱私保護(hù)協(xié)議3篇
- 教師教學(xué)常規(guī)管理培訓(xùn)夯實(shí)教學(xué)常規(guī)強(qiáng)化教學(xué)管理PPT教學(xué)課件
- 公務(wù)員考試工信部面試真題及解析
- GB/T 15593-2020輸血(液)器具用聚氯乙烯塑料
- 2023年上海英語(yǔ)高考卷及答案完整版
- 西北農(nóng)林科技大學(xué)高等數(shù)學(xué)期末考試試卷(含答案)
- 金紅葉紙業(yè)簡(jiǎn)介-2 -紙品及產(chǎn)品知識(shí)
- 《連鎖經(jīng)營(yíng)管理》課程教學(xué)大綱
- 《畢淑敏文集》電子書
- 頸椎JOA評(píng)分 表格
- 定量分析方法-課件
- 朱曦編著設(shè)計(jì)形態(tài)知識(shí)點(diǎn)
評(píng)論
0/150
提交評(píng)論