大數(shù)據(jù)技術(shù)交流_第1頁(yè)
大數(shù)據(jù)技術(shù)交流_第2頁(yè)
大數(shù)據(jù)技術(shù)交流_第3頁(yè)
大數(shù)據(jù)技術(shù)交流_第4頁(yè)
大數(shù)據(jù)技術(shù)交流_第5頁(yè)
已閱讀5頁(yè),還剩32頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、大數(shù)據(jù)平臺(tái)技術(shù)交流吳敏達(dá) 資深技術(shù)顧問(wèn)2從各種各樣類型的巨大數(shù)據(jù)中,快速獲得有價(jià)值信息的能力,就是大數(shù)據(jù)技術(shù)什么是大數(shù)據(jù)Variety: 管理復(fù)雜的多角度關(guān)系和非關(guān)系類型的數(shù)據(jù) (你是否忽略利用的非結(jié)構(gòu)化數(shù)據(jù)進(jìn)行決策嗎)Velocity: 流數(shù)據(jù)或者大量數(shù)據(jù)的移動(dòng) (你是否希望通過(guò)實(shí)時(shí)操作提供更好的結(jié)果)Volume: 數(shù)據(jù)量從TB級(jí)到ZB級(jí) (你是否收集了所有數(shù)據(jù),并在使用它嗎)Veracity:1/3 的領(lǐng)導(dǎo)在做業(yè)務(wù)決策時(shí)候不相信獲得的信息大數(shù)據(jù)參考架構(gòu)超越傳統(tǒng)的數(shù)據(jù)倉(cāng)庫(kù)概念流計(jì)算Internet級(jí)別傳統(tǒng)數(shù)據(jù)倉(cāng)庫(kù)In-Motion AnalyticsData Analytics, Data

2、 Operations & Model BuildingResultsInternet ScaleDatabase &WarehouseAt-Rest Data AnalyticsResultsUltra Low Latency ResultsInfoSphere BigInsights傳統(tǒng)/關(guān)系型數(shù)據(jù)源非傳統(tǒng)/非關(guān)系型數(shù)據(jù)源傳統(tǒng)/關(guān)系型數(shù)據(jù)源非傳統(tǒng)/非關(guān)系型數(shù)據(jù)源Cloud | Mobile | Security IBM大數(shù)據(jù)平臺(tái)和應(yīng)用框架通過(guò)可視化的方法采集、抽取、以及探查數(shù)據(jù)應(yīng)用加速器,加速應(yīng)用開發(fā),快速實(shí)現(xiàn)分析價(jià)值BI / ReportingBI / ReportingExplorat

3、ion / VisualizationFunctionalAppIndustryAppPredictive AnalyticsContent AnalyticsAnalytic Applications(分析應(yīng)用)IBM Big Data Platform(大數(shù)據(jù)平臺(tái))Systems ManagementApplications & DevelopmentVisualization & Discovery分析流數(shù)據(jù),以及在大數(shù)據(jù)的是誰(shuí)數(shù)據(jù)洞察數(shù)據(jù)管控(數(shù)據(jù)質(zhì)量、生命周期、)低成本地分析PB級(jí)結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)操作型數(shù)據(jù)或者歷史數(shù)據(jù)的,基于數(shù)據(jù)倉(cāng)庫(kù)內(nèi)嵌分析Accelerators(加速器)In

4、formation Integration & Governance信息整合和管控HadoopSystemStream ComputingData WarehouseContextual Discovery索引和聯(lián)邦的上下文相關(guān)分析議程IBM hadoop平臺(tái)BigInsightsIBM 流計(jì)算Streams IBM數(shù)據(jù)倉(cāng)庫(kù)平臺(tái)pure Data基于大數(shù)據(jù)平臺(tái)的數(shù)據(jù)分析-DataExplorerIBM大數(shù)據(jù)優(yōu)勢(shì)匯總Forrester Wave關(guān)于大數(shù)據(jù)的報(bào)告6BigInsights 企業(yè)版連接和集成StreamsNetezzaText processing engine and library

5、 JDBCFlume基礎(chǔ)架構(gòu)JaqlHivePigHBase MapReduceHDFS ZooKeeperIndexingLuceneAdaptive MapReduceOozieText compressionEnhanced securityFlexible scheduler可選 IBM 產(chǎn)品分析和探查應(yīng)用 DB2BigSheetsWeb CrawlerDistrib file copyDB exportBoardreaderDB importAd hoc queryMachine learningData processing. . . 管理和開發(fā)工具 管理控制臺(tái) Monitor c

6、luster health, jobs, etc. Add / remove nodes Start / stop services Inspect job status Inspect workflow status Deploy applications Launch apps / jobs Work with distrib file systemWork with spreadsheet interfaceSupport REST-based API . . . R Eclipse 開發(fā)工具 Text analytics MapReduce programming Jaql, Hive

7、, Pig development BigSheets plug-in development Oozie workflow generation Integrated installerOpen SourceIBM IBM Cognos BIBig SQL Accelerator for machine data analysis Accelerator for social data analysis GuardiumDataStageData ExplorerSqoop HCatalogGPFS FPOBigInsights 優(yōu)勢(shì)列表High Performance & Availabi

8、lity GPFS-FPO At least 2X faster than open source Hadoop17x throughput speedup for document index lookupsFault resistance for Real Time DataPOSIXAdaptive MapReduce SQL Interface ( BigSQL )Integrated Install & Mgt Consoles Security LDAP+High speed LZO CompressionDevelopment Toolingenvironment, testin

9、g, and optimizationWarehouse RDBMS & Streams IntegrationSystemT Text AnalyticsBlazing Fast, Uses Unstructured data does not require structuring, (MapReduce)Customized AnnotatorsBigSheetsInsight Engine for analytics on Massive amounts of data in BigInsights. Power of Map/Reduce within reach of the Bu

10、siness professional with a familiar Spreadsheet-like environment.Built in VisualizationsSystemML Machine Leaning (Watson)Directly implemented ML algorithms on MapReduceDeep Statistical / Mining embedded into BigInsights PlatformBigIndexDistributed indexing and searchParallel indexing and search企業(yè)級(jí)別基

11、礎(chǔ)設(shè)施企業(yè)級(jí)別分析能力GPFS-FPO與HDFS各項(xiàng)指標(biāo)對(duì)比BigInsights GPFS-FPO開源HDFS或其他方案健壯性無(wú)單點(diǎn)故障 99.99%NameNode 存在單點(diǎn)故障數(shù)據(jù)一致性高數(shù)據(jù)可能會(huì)丟失可擴(kuò)展性數(shù)千節(jié)點(diǎn),實(shí)測(cè)4000+數(shù)千節(jié)點(diǎn)POSIX 兼容完全兼容有限數(shù)據(jù)管理能力安全、備份、快照、緩存、復(fù)制有限傳統(tǒng)應(yīng)用性能好,兼顧讀寫性能隨機(jī)讀寫性能差安全性支持ACL, 容量限制,安全認(rèn)證不支持IBM Adaptive MapReduce 提供強(qiáng)大的企業(yè)級(jí)管理,用于在可擴(kuò)展的共享網(wǎng)格上運(yùn)行分布式應(yīng)用程序和大數(shù)據(jù)分析。它可加速數(shù)十個(gè)并行應(yīng)用程序,以加快實(shí)現(xiàn)成果并更好地利用所有可用資源。T

12、eraSort ThroughputSWIM10 times fewer CPU cores6 times faster60 times fasterBerkley SWIM is a workload benchmark developed at University of California at Berkley.Measure core scheduling efficiency of MapReduce workloads at Hadoop World 2011Multi-tenant resource management10 x Less hardware for the fa

13、stest TeraSort score.Big SQL: 讓 Hadoop 原生支持 SQL原生 SQL 支持 BigInsightsANSI SQL 92+Standard syntax support (joins, data types, )真正的 JDBC/ODBC Prepared statementsCancel supportDatabase metadata API supportSecure socket connections (SSL)優(yōu)化Leveraging MapReduce parallelismorDirect access for low-latency qu

14、eries多種數(shù)據(jù)源HBase (including secondary indexes)CSV, Delimited files, Sequence filesJSONHive tablesBig SQL EngineBigInsightsData Sources SQLHive TablesHBase tablesCSV FilesApplicationJDBC / ODBC Server JDBC / ODBC Driver 使用報(bào)表工具Cognos BI server 可以下推計(jì)算到 BigInsights更快響應(yīng)時(shí)間沒(méi)有 Hive 的限制Application (Map-Reduce

15、)Storage(HBase, HDFS)InfoSphere BigInsightsCognos BI ServerExplore & AnalyzeReport & ActSQL Interfacevia JDBC可以使用已有的工具: SQuirreL SQLUsing existing SQL tooling against BigDataSupport for “standard” authentication!(not supported for Hive, but supported by Big SQL!)13可以使用已有的工具: EclipseUsing existing SQ

16、L tooling against BigDataSame setup as for existing SQL sources!Support for “standard” authentication!14集成的基于Web的安裝無(wú)縫的單節(jié)點(diǎn)或者集群模式安裝開源組件和IBM組件的安裝驗(yàn)證檢查,確保系統(tǒng)正常運(yùn)行基于Web的管理控制平臺(tái)任務(wù)和工作流管理系統(tǒng)健康監(jiān)控集群以及文件系統(tǒng)管理基于表單的分析儀表盤議程IBM hadoop平臺(tái)BigInsightsIBM 流計(jì)算Streams IBM數(shù)據(jù)倉(cāng)庫(kù)平臺(tái)pure Data基于大數(shù)據(jù)平臺(tái)的數(shù)據(jù)分析-DataExplorerIBM大數(shù)據(jù)優(yōu)勢(shì)匯總計(jì)算模式的變

17、革動(dòng)態(tài)數(shù)據(jù)的實(shí)時(shí)分析 流數(shù)據(jù)結(jié)構(gòu)或非結(jié)構(gòu)化動(dòng)態(tài)數(shù)據(jù)流 流計(jì)算實(shí)時(shí)對(duì)流數(shù)據(jù)進(jìn)行分析計(jì)算靜態(tài)數(shù)據(jù)的歷史分析批處理模式查詢驅(qū)動(dòng):靜態(tài)數(shù)據(jù)提交查詢依靠數(shù)據(jù)庫(kù),數(shù)據(jù)倉(cāng)庫(kù)傳統(tǒng)計(jì)算模式流計(jì)算模式QueriesMemoryDiskUpdatesMemoryDiskEvent DataQueriesAlerts Actions20IBM InfoSphere StreamsA platform for real-time analytics on BIG data一個(gè)處理流數(shù)據(jù)的低延遲平臺(tái)毫秒級(jí),甚至微妙級(jí)端到端的延遲一個(gè)可高度擴(kuò)展的,用于實(shí)時(shí)分析的高性能平臺(tái)通過(guò)橫向增加硬件獲得近線性的處理能力擴(kuò)展高達(dá)125個(gè)

18、節(jié)點(diǎn)擴(kuò)展一個(gè)靈活的、動(dòng)態(tài)的平臺(tái)Streams應(yīng)用靈活部署支持動(dòng)態(tài)部署新的分析應(yīng)用Millions of events per secondMicrosecond LatencyTraditional / Non-traditional data sourcesReal time decisionsPowerfulAnalyticsAlgo TradingTelco churnpredictSmartGridCyberSecurityGovernment /Law enforcementICUMonitoringEnvironmentMonitoring21 連續(xù)注入連續(xù)分析實(shí)現(xiàn)可擴(kuò)展: 將應(yīng)用

19、分布到多個(gè)計(jì)算節(jié)點(diǎn) 在流連接的硬件節(jié)點(diǎn)之間分發(fā)Streams流計(jì)算平臺(tái)基礎(chǔ)設(shè)施提供服務(wù):在跨硬件/軟件節(jié)點(diǎn)中調(diào)度分析建立流媒體連接變換過(guò)濾 / 采樣分類關(guān)聯(lián)注釋在適當(dāng)?shù)牡胤?,處理單元可以是“融合”在一起從而消除通信的延遲Streams Toolkit (常用)JoinFunctorAggregatePunctorSortFilterDirectory-ScanFileSourceFileSinkUDPSourceUDPSinkTCPSourceTCPSinkExportImportODBCSourceODBCEnrichsolidDBEnrichInetSourceODBCAppend高度可并

20、行擴(kuò)展能力同時(shí)利用幾十臺(tái)、上百臺(tái)進(jìn)行實(shí)時(shí)數(shù)據(jù)流處理X86 BoxX86 BladeCellBladeBlue GeneFPGABladeX86 BladeX86 BladeX86BladeX86 BladeX86BladeTransport Streams Data Fabric高可用、集群部署能力Processing Element ContainerProcessing Element ContainerProcessing Element ContainerProcessing Element ContainerProcessing Element Container優(yōu)化調(diào)度器把操作符分

21、配到不同的節(jié)點(diǎn)運(yùn)行,并持續(xù)監(jiān)控資源的使用情況自適應(yīng)資源、工作負(fù)載、數(shù)據(jù)速率的變化在低成本硬件上運(yùn)行從單節(jié)點(diǎn)PC到刀片服務(wù)器到多陣列集群26參考性能數(shù)據(jù) 吞吐性能和延遲1,975 streams2,133 streams163 streams24 channels163 Decision Engines356 Blue Gene Nodes356 Processing Elements4,274 streamsData Feed每秒500萬(wàn)條記錄平均延遲 150 微秒最小延遲50 微秒65K中49條延遲 2 毫秒大數(shù)據(jù)的實(shí)時(shí)分析平臺(tái)分析不同的數(shù)據(jù)源,每秒分析上百萬(wàn)個(gè)事件易用性能圖形化應(yīng)用開發(fā)模式

22、,容易管理和監(jiān)控集成能力集成XML ,MQ, DataStage,HDFS等高級(jí)工具包和加速器事件序列和地理信息數(shù)據(jù)庫(kù)工具箱和CEP處理 可定制化的電信和媒體分析加速器幫助快速部署應(yīng)用.InfoSphere StreamsInfoSphere Streams議程IBM hadoop平臺(tái)BigInsightsIBM 流計(jì)算Streams IBM數(shù)據(jù)倉(cāng)庫(kù)平臺(tái)pure Data基于大數(shù)據(jù)平臺(tái)的數(shù)據(jù)分析-DataExplorerIBM大數(shù)據(jù)優(yōu)勢(shì)匯總 專家集成數(shù)據(jù)系統(tǒng)for Transactions為滿足交易型數(shù)據(jù)服務(wù)而優(yōu)化的系統(tǒng)for Operational Analytics為滿足操作型分析數(shù)據(jù)服務(wù)

23、而優(yōu)化的系統(tǒng)for Analytics為滿足分析型數(shù)據(jù)服務(wù)而優(yōu)化的系統(tǒng)Data PlatformDelivering Data Services議程IBM hadoop平臺(tái)BigInsightsIBM 流計(jì)算Streams IBM數(shù)據(jù)倉(cāng)庫(kù)平臺(tái)pure Data基于大數(shù)據(jù)平臺(tái)的數(shù)據(jù)分析-DataExplorerIBM大數(shù)據(jù)優(yōu)勢(shì)匯總Name change and new releaseIBM InfoSphere Data ExplorerApplication/Users結(jié)合企業(yè)各類數(shù)據(jù),包括大數(shù)據(jù)的分析結(jié)果FileSystemsRelationalDataContentManagementEm

24、ailCRMSupplyChainERPRSS FeedsExternalSourcesCloudCustomSourcesVelocity PlatformIBM Big Data PlatformSystems ManagementApplication DevelopmentVisualization & DiscoveryAcceleratorsInformation Integration & GovernanceHadoopSystemStream ComputingData WarehouseCommentingRatingSharedFoldersTaggingIDE給最終用戶提供整合的企業(yè)級(jí)的信息探查Application FrameworkReal TimeAnalyticsInternet ScaleAnalyticsIn-DatabaseAnalyticsFederatedDiscoveryNavigation and Visualization33Enterprise DataConnectorsI

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論