面向交互式分析的大數(shù)據(jù)索引方案_第1頁
面向交互式分析的大數(shù)據(jù)索引方案_第2頁
面向交互式分析的大數(shù)據(jù)索引方案_第3頁
面向交互式分析的大數(shù)據(jù)索引方案_第4頁
面向交互式分析的大數(shù)據(jù)索引方案_第5頁
已閱讀5頁,還剩39頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

1、CarbonData:面向交互式分析的索引文件格式Big DataNetwork54B records per day750TB per monthComplex correlated data2Consumer100 thousands of sensors2 million events per secondTime series, geospatial dataEnterprise100 GB to TB per dayData across different domains企業(yè)數(shù)據(jù)量大,維度多,結(jié)構(gòu)復雜,且在快速增長Typical ScenarioBig TableEx. CDR,

2、transaction, Web log,Small tableSmall tableUnstructured datadataReport & DashboardOLAP & Ad-hocBatch processingMachine learning企業(yè)中包含多種數(shù)據(jù)應用,從商業(yè)智能到批處理到機器學習Realtime Analytics3Analytic ExamplesTracing and Record Query for Operation Engineer4過去1天使用Whatapp應用的終端按流量排名情況?過去1天上海市每個小區(qū)的網(wǎng)絡擁塞統(tǒng)計?Challenge - Data5D

3、ata SizeSingle Table 10 BFast growingMulti-dimensionalEvery record 100 dimensionAdd new dimension occasionallyRich of DetailBillion level high cardinality1B terminal * 200K cell * 1440 minutes = 28800 (萬億)百億級數(shù)據(jù)量多維度細粒度Challenge - ApplicationEnterprise IntegrationSQL 2003 Standard SyntaxBI integration

4、, JDBC/ODBCFlexible QueryAny combination of dimensionsOLAP Vs Detail RecordFull scan Vs Small scanPrecise search & Fuzzy searchSmall Scan QueryFull Scan QueryMulti-dimensional OLAP Query企業(yè)應用集成6靈活查詢 無固定模式How to choose storage?7如何構(gòu)建數(shù)據(jù)平臺?Option1:NoSQL DatabaseKey-Value store: low latency, 5ms只能通過Key訪問,

5、一鍵一值適合實時應用對接,不適合分析型應用8Option2:Parallel databaseParallel scan + Fast computeQuestionable scalability and fault-toleranceCluster size 100 data nodeNot suitable for big batch job細粒度控制并行計算,適合中小規(guī)模 數(shù)據(jù)分析(數(shù)據(jù)集市)9擴展能力有上限 查詢內(nèi)容錯能力弱不適合海量數(shù)據(jù)分析(企業(yè)級數(shù)倉Option3:Search engineAll column indexedFast searchingSimple aggreg

6、ationDesigned for search but not OLAPNot for TopN, join, multi-level aggregation34X data expansion in sizeNo SQL support數(shù)據(jù)膨脹10無法完成復雜計算專用語法,難以遷移適合多條件過濾,文本分析Option4:SQL on HadoopModern distributed architecture, scale well in computation.Pipeline based: Impala, Drill, Flink, BSP based: Hive, SparkSQLBU

7、T, still using file format designed for batch jobFocus on scan onlyNo index support, not suitable for point or small scan queries仍然使用為批處理設 計的存儲,場景受限11并行掃描+并行計算 適合海量數(shù)據(jù)計算Capability MatrixOptionStoreGoodBadKV StoreHBase, Cassandr a, Parallel databaseGreenplum, Vertica, Search engineSolr, ElasticSearc h

8、, SQL on Hadoop - PipelineImpala, HAWQ,Drill, SQL on Hadoop - BSPHive, SparkSQLSmall Scan QueryFull Scan QueryMulti-dimensional OLAP Query只針對某個場景設計,解決一部分問題12Architects choiceLoadingDataReplicationDataChoice 1: Compromising做出妥協(xié),只滿足部分應用App1App2App313Choice 2: Replicating of data復制多份數(shù)據(jù),滿足所有應用App1App2Ap

9、p3MotivationCarbonData: Unified StorageSmall Scan QueryFull Scan QueryMulti-dimensional OLAP Query一份數(shù)據(jù)滿足多種分析場景詳單過濾,海量數(shù)倉,數(shù)據(jù)集市,14Spark + CarbonData:打造大數(shù)據(jù)交互式分析引擎15Apache CarbonData社區(qū)介紹CarbonData 2016年6月全票通過正式進入Apache孵化器。目標:更易用,一份存儲覆蓋更多場景更高的分析性能,面向用戶提供交互式分析已發(fā)布了3個Apache穩(wěn)定版本歡迎訂閱郵件列表和貢獻:Code: /apache/incub

10、ator-carbondataJIRA: /jira/browse/CARBONDATAMaillist: dev貢獻者來自: Huawei, Talend, Intel, eBay, Inmobi, 美團, 阿里, 樂視,HuluComputeStorage16Carbon-Spark IntegrationBuilt-in Spark integrationSpark 1.5, 1.6,2.0InterfaceSQLDataFrame APIData ManagementBulk load/Incremental loadDelete loadCompactionReader/Writer

11、Data ManagementQuery OptimizationCarbon FileCarbon FileCarbon File17Integration with SparkQuery CarbonData TableDataFrame APISQLcarbonContext.read.format(“carbondata”).option(“tableName”, “table1”).load()18CREATE TABLE IF NOT EXISTS T1 (name String, PhoneNumber String) STORED BY “carbondata”LOAD DAT

12、A LOCAL INPATH path/to/data INTO TABLE T1sqlContext.read.format(“carbondata”).load(“path_to_carbon_file”)With late decode optimization and carbon-specific SQL command supportData Ingestion19Bulk Data IngestionLoad from CSV fileLoad from other tableSave Spark Dataframe as Carbon data filedf.write.for

13、mat(“carbondata).options(tableName“, “tbl1).mode(SaveMode.Overwrite).save()LOAD DATA LOCAL INPATH folder path OVERWRITE INTO TABLE tablename OPTIONS(property_name=property_value, .)INSERT INTO TABLE tablennmeselect_statement1 FROM table1;Segment IntroductionSegmentJDBCServer (Load)JDBCServer (Query)

14、Carbon TableEvery data load becomes one segment in CarbonData table, data is sorted within one segment.Segment Manager (ZK based)Carbon FileSegmentCarbon FileSegmentCarbon File20CarbonData Table OrganizationCarbon FileData FooterCarbon FileData FooterCarbon FileData FooterCarbon FileData FooterDicti

15、onary FileDictionary MapIndexIn Memory B TreeIndex FileAll FooterSchema FileLatest Schema/tableName/fact/segmentId/tableName/metaHDFSSpark(append only)(rewrite)(Index is stored in the footer of each data file)21Data Compaction22Data compaction is used to merge small filesRe-clustering across loads f

16、or better performanceTwo types of compactions supportedMinor compactionCompact adjacent segment based on number of segmentAutomatically triggerMajor compactionCompact segments based on sizeManually triggerALTER TABLE db_name.table_name COMPACT MINOR/MAJORCarbonData File StructureBuilt-in Columnar &

17、IndexStore index and data in the same file, co-located in HDFSBalance between batch and point queryIndex support:Multi-dimensional Index (B+ Tree)Min/Max indexInverted indexEncoding:Dictionary, RLE, DeltaSnappy for compressionData Type:Primitive type and nested typeSchema Evolution:Add, Remove, Rena

18、me columnsCarbonData FileBlockletBlockletBlockletFooter23Index IntroductionSpark DriverExecutorCarbon File DataFooterCarbon File DataFooterCarbon File DataFooterFile Level Index & ScannerTable Level IndexExecutorFile Level Index & ScannerCatalystColumn Level Index24Multi-level indexes:Table level in

19、dex: global B+ tree index, used to filter blocksFile level index: local B+ tree index, used to filter blockletColumn level index: inverted index within column chunkEncoding ExampleData are sorted along MDK (multi- dimensional keys)Data stored as index in columnar formatDictionary Encoding1,1,1,1,1 :

20、142,114321,1,1,3,2 :541,547021,1,1,1,3 :443,446221,1,2,1,4 :545,588711,1,2,1,5 :675,561811,1,3,3,6 : 52,97491,1,3,1,7 :570,510181,1,3,2,8 :561,55245Blocklet Logical ViewSort (MDK Index)1,1,1,1,1 :142,114321,1,1,1,3 :443,446221,1,1,3,2 :541,547021,1,2,1,4 :545,588711,1,2,1,5 :675,561811,1,3,1,7 :570,

21、510181,1,3,2,8 :561,552451,1,3,3,6 :Sorted MDK IndexC1C2C3C4C5C6C71111114211432111134434462211132541547021121454558871112156755618111317570510181132856155245113365297491241114411532124395255039826YearsQuartersMonthsTerritoryCountryQuantitySales2003QTR1JanEMEAGermany14211,4322003QTR1JanAPACChina54154

22、,7022003QTR1JanEMEASpain44344,6222003QTR1FebEMEADenmark54558,8712003QTR1FebEMEAItaly67556,1812003QTR1MarAPACIndia529,7492003QTR1MarEMEAUK57051,0182003QTR1MarJapanJapan56155,2452003QTR2AprAPACAustralia52550,3982003QTR2AprEMEAGermany14411,532Column splitFile LevelBlocklet Index1111111200011 1 21250001

23、1 2 1111200011 2 212500011 3 1111200011 3 2125000Blocklet 212 1 3231100012 2 3231100012 3 3231100013 1 434200013 1 534100013 2 4342000Blocklet 313 2 534100013 3 434200013 3 534100014 1 4112000014 2 4112000014 3 41120000Blocklet 4211111120002112125000212111120002122125000213111120002132125000Blocklet

24、 Index Blocklet1Start Key1End Key1Start Key1 End Key4StartEnd Key2Key1StartKey3 End Key4StarEnd Key1t Key1StartKey2End Key2StartEnd Key3Key3StartKey4 End Key4File FooterBlockletBlocklet 1Build in-memory file level MDK index tree for filteringMajor optimization for efficient scanC1(Min, Max).C7(Min,

25、Max)Blocklet4Start Key4End Key4C1(Min, Max).C7(Min, Max)C1(Min,Max) C7(Min,Max)C1(Min,Max) C7(Min,Max)C1(Min,Max) C7(Min,Max)C1(Min,Max) C7(Min,Max)26BlockletBlock PruningHDFSBlockFooterBlockletBlockletC1C2C3C4C5C6C7C9Spark Driver side index (table level)Idnverte IndexQuery optimizationPredicate pus

26、h-down: leveraging multi-level indexesColumn PruningBlockBlockletBlockletFooterBlockFooterBlockletBlockletBlockBlockletBlockletFooter27Blocklet Rows1|1 :1|1 :1|1 :1|1 :1|11|2 :1|2 :1|2 :1|2 :1|91|3 :1|3 :1|3 :1|4 :2|31|4 :1|4 :2|4 :1|5 :3|21|5 :1|5 :2|5 :1|6 :4|41|6 :1|6 :3|6 :1|9 :5|51|7 :1|7 :3|7

27、:2|7 :6|81|8 :1|8 :3|8 :3|3 :7|61|9 :2|9 :4|9 :3|8 :8|71|10:2|10:4|10:3|10 :9|10: 142:11432: 443:44622: 541:54702: 545:58871: 675:56181: 570:51018: 561:55245: 52:9749: 144:11532: 525:50398Blocklet( sort column within column chunk)Run Length Encoding & CompressionDim1 Block 1(1-10)Dim3 Block 1(1-3)2(

28、4-5)Dim4 Block 1(1-2,4-6,9)2(7)Dim5 Block 1(1,9)2(3)6(8)7(6)8(7)9(10)Columnar StoreDim2 Block 1(1-8)2(9-10)Column chunk Level inverted IndexMeasure1 Measure2 BlockBlock142:11432443:44622570:51018561:5524552:9749144:11532525:50398Column ChunkInverted IndexOptionally store column data as inverted inde

29、x within column chunksuitable to low cardinality columnbetter compression & fast predicate filteringBlocklet Physical View142114324434462254154702545588713(6-8)3(3,8,10)3(2)541:54702675561814(9-10)4(4)545:58871570510185(5)675:56181561552455297491441153252550398rdrdrdd111111111110 1081031062212224292

30、21311339333311147422111354111C1C2C3C4C5C6drdrrC728Column GroupAllow multiple columns form a column groupstored as a single column chunk in row- based formatsuitable to set of columns frequently fetched togethersaving stitching cost for reconstructing row10223233815.210250152918.510351185222.81166029

31、1632.912868321821.6Chunk Chunk ChunkBlocklet 1C1C2C3C4C5C6ColColColCol ChunkCol Group Chunk29Nested Data Type RepresentationRepresented as a composite of two columnsOne column for the element valueOne column for start index & length of ArrayArraysRepresented as a composite of finite number of column

32、sEach struct element is a separate columnStrutsNameArrayJohn192,191Sam121,345,333Bob198,787NameArray start,lenPh_NumberJohn0,2192Sam2,3191Bob5,2121345333198787NameInfo StrutJohn31,MSam45,FBob16,MNameInfo.ageInfo.genderJohn31MSam45FBob16M30Encoding & CompressionEfficient encoding scheme supported:DEL

33、TA, RLE, BIT_PACKEDDictionary:table level global dictionary - Enable Lazy Decode optimizationCompression:Column data compression: SnappyAdaptive Data Type CompressionSpeedup AggregationReduce run-time memory footprintEnable fast distinct count31Big Win:Lazy DecodeAggregationFilter ScanAggregationFil

34、terScanDictionaryDecodeTranslate dictionary key to valueGroupby on dictionary keyOriginal plan32Optimized planCarbonData性能對比(Spark-Parquet)1.45X 131X33測試環(huán)境p 集群:3(Worker)+1(Master),40 核, 384GB, 10G網(wǎng)絡帶寬p 軟件:Hadoop 2.7.2,Spark 1.5.2p 數(shù)據(jù):10億記錄,300列,原始數(shù) 據(jù)1.9TB查詢特點p Point Query:基于主key過濾p Small Scan:包含多個列過

35、濾p Full Scan:復雜聚合、Join,無過 濾條件p OLAP Query:同時帶過濾,聚合i對于小結(jié)果集的場景,mpala 依賴全表掃描,而 Carbon使用索引測試環(huán)境p 集群:3(Worker)+1(Master),40 核, 384GB, 10G網(wǎng)絡帶寬p 軟件:Hadoop 2.7.2,Spark 1.5.2,Impala 2.6p 數(shù)據(jù):10億記錄,300列,原始數(shù) 據(jù)830G查詢特點p 多維度過濾p 多個Join:與多個維表34CarbonData性能對比(Impala)Success Case應用案例35客戶價值l 資源采用Yarn統(tǒng)一管理,用戶可配,可調(diào)l 百億-千億數(shù)據(jù),秒級響應(半年查詢10秒,1年 查詢700s,且極易查詢失敗)l 交互查詢穩(wěn)定性問題(impala存在掛死情況)l impala資源不能統(tǒng)一管理,無法共享應對方案36l 批量加工: Hivel 交互式分析: SparkSQL+CarbonData客戶價值l 資源采用Yarn統(tǒng)一管理,用戶可配,可調(diào)l 百億-千億數(shù)據(jù),任意維度過濾查詢,秒級響應電信詳單分析:開源系統(tǒng)無法滿足業(yè)務性能和穩(wěn)定性要求數(shù)據(jù)源數(shù)據(jù)導入批量作業(yè)網(wǎng)絡數(shù)據(jù)每5分鐘入庫,200萬條每秒Spark+CarbonYarnHDFS數(shù)據(jù)平臺系統(tǒng)管理M

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論