超人學(xué)院hadoop就業(yè)培訓(xùn)第2期價(jià)值8000深入淺出hive_第1頁
超人學(xué)院hadoop就業(yè)培訓(xùn)第2期價(jià)值8000深入淺出hive_第2頁
超人學(xué)院hadoop就業(yè)培訓(xùn)第2期價(jià)值8000深入淺出hive_第3頁
超人學(xué)院hadoop就業(yè)培訓(xùn)第2期價(jià)值8000深入淺出hive_第4頁
超人學(xué)院hadoop就業(yè)培訓(xùn)第2期價(jià)值8000深入淺出hive_第5頁
已閱讀5頁,還剩111頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

第1章HiveHivehadoop數(shù)據(jù)。Hive據(jù)庫,因此入手非常方便。Hivederby、Mysql。0.9.0。執(zhí)行命令 進(jìn)入$HIVE_HOME/conf ,首先把四個(gè)文件的.temte后綴去掉。會(huì)有一個(gè)以后我們修改的內(nèi)容添加到hive-site.xml中。最后修改$HIVE_HOME/binhive-config.sh,增加以下三行exportJAVA_HOME=/usr/local/jdkexportexport進(jìn)入到hive的bin路徑下,執(zhí)行文件hive,就進(jìn)入了hive令行服務(wù)模式中。本節(jié)講的基本語法是為了讓大家快速,為接下來的系統(tǒng)學(xué)習(xí)準(zhǔn)備的接下來插入數(shù)據(jù)到表t1中,首先在linux磁盤的 /root/Downloads下面創(chuàng)建一個(gè)文件t1_data,文件內(nèi)容是數(shù)字1、2、3、4、5,每個(gè)數(shù)字一行。然后執(zhí)行以下命令 hivelinux磁盤上的數(shù)據(jù)文件。t1表中了。t1SELECT*FROM很多列的情況。假設(shè)有個(gè)學(xué)生表t2,里面有學(xué)員編號(hào)、學(xué)員兩列。執(zhí)行以下建表語句t2,有兩個(gè)字段,idint類型;name,CREATETABLEt2(idint,namestring)ROWt2,有兩個(gè)字段,idint類型;name,是string類型后面的ROWFORMATDELIMITED表示后面的語句是分隔符那么關(guān)鍵FIELDSTERMINATEDBY指的是字段分隔符,最后的’/t’是說使用制表符作為字段分隔符。為什么要在建表語句中指明字段的分隔符哪?因?yàn)閔ive表中的數(shù)據(jù)是通過文件加載進(jìn)去的。文件中的列與列之間是通過逗號(hào)、制表符、還是分號(hào)作為分隔符,hive是不知道的,下面就要加載數(shù)據(jù)了。在linux磁盤的 /root/Downloads下面創(chuàng)建一個(gè)文件t2_data,id與namelinuxhivet2 t2SELECT*FROMt2DESCRIBEt1_data中的內(nèi)容是字符串,t1t1INT類nullINT,為Hive不對(duì)數(shù)據(jù)文件進(jìn)行校驗(yàn)嗎?,MySQL為例。MySQL是在的數(shù)據(jù)不能進(jìn)入。Hive在數(shù)據(jù)加載的時(shí)候是不會(huì)對(duì)數(shù)據(jù)進(jìn)行校驗(yàn)的,而是在查詢的時(shí)候進(jìn)行校驗(yàn),數(shù)據(jù)顯示null。對(duì)于MySQL這種在數(shù)據(jù)插入時(shí)進(jìn)行校驗(yàn)的方式,稱作寫write,readhive的日志,可以快速定位錯(cuò)誤內(nèi)容,方便排查錯(cuò)誤。也可以查看hivehive。hivelog4jWARN,默認(rèn)輸出的目的地是文hive運(yùn)行的會(huì)話參數(shù)。例如,我們要修改日志輸出級(jí)別和位置,可以在執(zhí)行bin/hive時(shí),執(zhí)行以下命hive--hiveconfhive.root.loggerhivehive內(nèi)部的參數(shù)值。DEBUG表示日志記錄的一個(gè)級(jí)別,含義是日志記錄最詳細(xì);console表示日志[root@hadoop0hivedata]#hive--hiveconf14/05/1821:55:45DEBUGconf.HiveConf:Usinghive-site.xmlfoundonCLASSPATHat/usr/local/hive/conf/hive-site.xml14/05/1821:55:45DEBUGconf.HiveConf:OverridingHadoopconfpropertyfs.har.impl='org.apache.hadoop.fs.HarFileSystem'withHivedefaultvalue14/05/1821:55:45DEBUGOverridingHadoopmapred.min.split.size='0'withHivedefaultvalue14/05/1821:55:45DEBUGOverridingHadoopmapred.reduce.tasks='1'withHivedefaultvalue'-14/05/1821:55:45DEBUGconf.HiveConf:OverridingHadoopconfpropertymapred.reduce.tasks.speculative.execution='true'withHivedefaultvalue'true'Logginginitializedusingconfigurationinfile:/usr/local/hive/conf/perties14/05/1821:55:46INFOSessionState:LogginginitializedusingconfigurationinHivehistoryfile=/tmp/root/hive_job_log_root_201405182155_ 14/05/1821:55:46DEBUG CreatingnewGroups 14/05/1821:55:46DEBUGsecurity.UserGroupInformation:hadooplogin14/05/1821:55:46DEBUGsecurity.UserGroupInformation:hadooplogincommit user:UnixPrincipal:root14/05/1821:55:46DEBUGsecurity.UserGroupInformation:UGIloginUser:root14/05/1821:55:46DEBUGsecurity.Groups:Returningfetchedgroupsfor'root'14/05/1821:55:46DEBUGsecurity.Groups:Returningcachedgroupsfor'root'如果想以后也這樣,那就修改${HIVE_HOME}/conf/pertiesjobhive.querylog.location建議使用客戶端設(shè)置,而不要修改配置文件。因?yàn)樾薷呐渲梦募?,hive會(huì)產(chǎn)生大量Hive非常方便進(jìn)行數(shù)據(jù)計(jì)算和分析。HQLHive的一大亮點(diǎn)。Hive作為數(shù)據(jù)倉庫,倉庫中的數(shù)據(jù)是在HDFS中,HQL語句的執(zhí)行是轉(zhuǎn)化為MapReduceHDFS中的數(shù)據(jù)。因此,HiveHadoopHadoop就Hive。HiveHQLHadoop的操作。Hive對(duì)外提供命令行(CLI)、JDBC/ODBC接口、WebHQL命HQLHadoopDriverHadoop結(jié)合。那么,HQL中的數(shù)據(jù)庫、數(shù)據(jù)表、字段等如何轉(zhuǎn)化為對(duì)HDFS的操作哪?這里面有一套映射機(jī)制,即HDFSmetastore完成了雙方的metastore使用關(guān)系數(shù)據(jù)庫作為映射信息的存放地,默認(rèn)為Derby,也可以自己修改為metastoreDerby數(shù)據(jù)庫。MySQLHivehive-site.xml把MySQL的jdbc驅(qū)動(dòng)包放到Hive的 HiveHDFShive-site.xml中參數(shù)hive.metastore.warehouse.dirHiveHDFSHiveHive1byte,-128~2byte,-32,768~4byte8byteARRAYARRAY:ARRAY類型是由一系列相同數(shù)據(jù)類型的元素組成,這些元素可以通過下標(biāo)來。比一個(gè)ARRAYfruits,它是由['apple','orange','mango']組成,那fruits[1]來元素orangeARRAY類型的下標(biāo)是從0開始的;MAP:MAPkey->valuekey來元素。比如”userlist”是一個(gè)mapusername是keypassword是value;那么我們可以通過STRUCT:STRUCT可以包含不同數(shù)據(jù)類型的元素。這些元素可以通過”點(diǎn)語法”的userSTRUCTuser.addressnameSTRING,favorsARRAY<STRING>,)ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\001'COLLECTIONITEMSTERMINATEDBY'\002'MAPKEYSTERMINATEDBY'\003'LINESTERMINATEDBY'\n'STOREDASTEXTFILE;name是基本類型,favors是數(shù)組類型,可以保存很多,scores是映射類型,可以保存多個(gè)課程的成績,address是結(jié)構(gòu)類型,可以住址信息。接下來重點(diǎn)講一下表定義后面的。ROWFORMATDELIMITED是指明后面的是列和元素分隔符的。FIELDSTERMINATEDBY是字段分隔符,COLLECTIONITEMSTERMINATEDBY是元素分隔符(Array中的各元素、Struct中的各元素、Mapkey、value,MAP\003Hive版本支持程度來看,MAPKEYSTERMINATEDLINESTERMINATEDBY的、并且是無法打印的字符,這樣才能避免與用戶數(shù)據(jù)。nameSTRING,scoresMAP<STRING,)ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'COLLECTIONITEMSTERMINATEDBYMAPKEYSTERMINATEDBY','Hive中通過一張表就可以搞定了。也就是說,復(fù)合數(shù)據(jù)類型把多表關(guān)hive的單表的好處是沒有表連接、查詢速度快。在進(jìn)行關(guān)系數(shù)據(jù)庫的表設(shè)計(jì)時(shí),為了加這些文件夾內(nèi)部。Hive中默認(rèn)數(shù)據(jù)庫叫做default,它在HDFS中的位置是由配置文件hive-site.xmlhive.metastore.warehouse.dir值決定的。如果我們修改了該值,那么defaultHDFSLOCTION 作為默認(rèn)數(shù)據(jù)庫所在的位置 setable,一ableHive管理表的定義,不管理表中的數(shù)據(jù)。的類型存放在一起;桶表是按照Hash值對(duì)數(shù)據(jù)進(jìn)行分類。linuxdata_151data_252SELECTidFROMcommon_tableWHEREid=1;jobhistoryLaunchedmaptasks2,Mapinputrecords的102MapMap任務(wù)掃描一個(gè)文件,2Map任務(wù)一共讀10行記錄。LOADDATALOCALINPATH‘data_1’INTOTABLEpartition_managedPARTITION(day=1);LOADDATALOCALINPATH‘data_2’INTOTABLEpartition_managedPARTITION(day=2);SELECTidFROMpartition_managedWHEREday=1;以上命令中的第一行是建表語句,這與前面不同的是,多個(gè)設(shè)置分區(qū)字段的子句HDFS1Map一共了5行記錄。SELECTidFROMpartition_managedWHERE2Map任務(wù)。可見,查詢時(shí)不使用分區(qū)字段不會(huì)帶來優(yōu)化DESCRIBEEXTENDED如果要查看分區(qū)字段位于HDFS的哪 下,使用如下命因?yàn)?,hashhash值劃分出的數(shù)據(jù)幾乎都是數(shù)據(jù)量一樣多的。CREATETABLEbucket_managed(idint)CLUSTEREDBY(id)INTO3SET.bucketing=SET.bucketing=INSERTOINTOTABLEbucket_managedSELECTidFROMcommon_table;Hive默認(rèn)是不支持桶表的。第一行命令是開啟桶表插入功能。HiveMapReducejobjobreducereduceHDFS3Reduce任務(wù)產(chǎn)生的,每個(gè)Reduce任務(wù)的數(shù)據(jù)輸入是由Paritioner決定的,該作業(yè)使用的是org.apache.hadoop.hive.ql.io.DefaultHivePartitionerParition數(shù)量的代碼如下publicintgetBucket(K2key,V2value,intnumBuckets)}hashHiveHive管理,Hive對(duì)數(shù)據(jù)只有使用hive倉庫之外的數(shù)據(jù)了。如果一個(gè)項(xiàng)目組負(fù)責(zé)生產(chǎn)數(shù)據(jù),一個(gè)項(xiàng)目可以把外部表的特點(diǎn),加入到分區(qū)表中,形成外部分區(qū)表。假設(shè)有如下 結(jié)構(gòu)CREATEEXTERNALTABLEexternal_partition(idint)PARTITIONEDBY(daystring)LOCATION 2013-03-12 HiveHiveHDFSHDFSputLOADDATALOADDATA【LOCALINPATH‘....’【OVERWRITEINTOTABLEt1【PARTITIONLOCAL表示從本地加載數(shù)據(jù),即Linux文件系統(tǒng),這時(shí)數(shù)據(jù)會(huì)到HDFS的Hive倉庫LOCALHDFS的某個(gè)路徑移動(dòng)Hive倉庫中。OVERWRITEOVERWRITE,但是已經(jīng)導(dǎo)入過這批數(shù)OVERWRITEINTOINSERTOVERWRITETABLEt1OVERWRITEINTOINSERTOVERWRITETABLEt2PARTITION(day=2)SELECT*FROMt1t2FROMINSERTOVERWRITETABLEt2PARTITION(day=2)SELECTidWHEREday=2INSERTOVERWRITETABLEt2PARTITION(day=3)SELECTidWHEREday=3INSERTOVERWRITETABLEt4SELECTidWHEREday=4默認(rèn),Hive INSERTOVERWRITETABLEt2PARTITION(province,city)SELECT.......,vince,a.cityFROMa;INSERTINSERTOVERWRITETABLEt2PARTITION(province=’beijing’,city) ,a.cityFROMstrict每個(gè)mapper或者reducer數(shù)動(dòng)態(tài)分區(qū)時(shí)HDFS數(shù)CREATETABLE...ASCREATETABLEt4ASSELECTidFROMt1WHERE。HDFS命令導(dǎo)出文件夾,這就不做介紹了。INSERTOVERWRITELOCALDIRECTORY‘/tmp/t1’SELECTidFROM最經(jīng)典的查詢語句就是SELECT....FROM... SELECTe.idFROMt1不能在WHERE語句中列別名。HiveONWHEREJOINWHERE子句。user123job112234CREATETABLEIFNOTEXISTSuser(idint,namestring)ROWFORMATDELIMITEDFIELDSTERMINATEDBY‘\t’;CREATETABLEIFNOTEXISTSjob(idint,jobstring,user_idint)ROWFORMATFIELDSTERMINATEDBYselect*fromuserjoinjobonselect*fromuserleftouterjoinjobonletjoinleftouterjoinselect*fromuserrightouterjoinjobonselect*fromuserfullouterjoinjobonselect*fromuserleftsemijoinjobonselect*fromuserwhereidin(selectuser_idfrom但是,hiveinleftsemi積select*fromuserjoinHiveORDERBYSORTBYreduce任務(wù)時(shí),ORDERBYreduce任務(wù)中進(jìn)行排序;SORTBY是reducereduce任務(wù)自己排序,不管全局是否有序。DISTRIBUTEBYSORTBYSORTBY排序時(shí)把相同分類的數(shù)據(jù)放reduce中進(jìn)行排序。CLUSTERBYDistributebySortbyCountersmap處理數(shù)據(jù)量的差異過大,使得這些平均值能代表的價(jià)值降低。Hive的執(zhí)行是分階段的,map處理數(shù)據(jù)量的差異取決于上一stagereducereducekeyReducereduce斷字段0值或空值過 groupgroupbyCount reduce、keySQL100%,reducereducehive.map.aggr=MapCombinertrueMRJob。MRJob中,Map的輸出結(jié)果集合會(huì)隨機(jī)分布到Reduce中,每個(gè)Reduce做部分聚合操作,并輸出結(jié)果,這樣處理的結(jié)果是相同的GroupByKey有可能被分發(fā)到不同的ReduceMRJobGroupByKey分布到Reduce中(這個(gè)過程可以保證相同的GroupByKey被分布到同一個(gè)Reduce中,最后完成最終的聚合操作。SQL語句調(diào)節(jié):Join:joinkeyfilterjoin的時(shí)候,數(shù)據(jù)量相對(duì)變小的效果。Join:mapjoin讓小的維度表(1000條以下的記錄條數(shù))mapJoinkeyreducecountdistinctunion。groupbysum(groupbycount(distinct)完成計(jì)算。unionuser_iduser_id和user_id關(guān)聯(lián),會(huì)碰到數(shù)據(jù)傾斜的問題。1user_id為空的不參與關(guān)聯(lián)(紅色字體為修改后select*fromlogajoinusersbona.user_idisnotnullanda.user_id=b.user_idunionselect*fromlogwherea.user_idis2keyselectfromlogleftouterjoinusersoncasewhena.user_idisnullthenconcat(‘hive’,rand())elsea.user_idend=21io1log讀取兩次,jobs22job1id(-99,’’,null等)產(chǎn)生的傾斜問題。把空值的key變成一個(gè)字符串加上隨機(jī)數(shù),就能把傾斜的數(shù)據(jù)分到不同的reduce上,解決數(shù)據(jù)傾斜問題。user_idint,loguser_idstringint類型。當(dāng)stringidReducer中。select*fromusersleftouterjoinlogsona.usr_id=cast(b.user_idasmapjoin使用mapjoin解決小表(記錄數(shù)少)關(guān)表的數(shù)據(jù)傾斜問題,這個(gè)方法使用的頻率非常高,mapjoinbug或異常,這時(shí)就需要特別的處理。以下例子:select*fromlogleftouterjoinusersona.user_id=users600w+usersmapmapjoin不支持這么大的小表。如果用普通的join,又會(huì)碰到數(shù)據(jù)傾斜的問題。select/*+mapjoin(x)*/*fromlogaleftouterjoin( from(selectdistinctuser_idfromlog)cjoinusersdonc.user_id=)ona.user_id=假如,loguser_idmapjoinuv不mapreduceHash算法的局keyHash會(huì)或多或少的造成數(shù)據(jù)傾斜。大量經(jīng)驗(yàn)表明數(shù)據(jù)傾斜的原因是人為的建1loguser_idtmp1。由于對(duì)計(jì)算框架來說,所2key不會(huì)太多,就像一個(gè)社會(huì)的富tmp1tmp1usersmapjoin生成tmp2,tmp2distributefilecachemap過程。3、map讀入usersloglog,user_idtmp2里,如果是,輸出到本地文件a,否則生成<user_id,value>的key,value對(duì),假如記錄來自member生成4aStage3reducehdfs1join1Gmap2groupbydistincthive.groupby.skewindata=true3SQL語句調(diào)節(jié)進(jìn)行優(yōu)化CREATEVIEWsub_userASselect*fromuserwhereorg.apache.hadoop.hive.ql.exec.UDFevaluate()方法,該方法的形參jar在hive命令行下,執(zhí)行命令A(yù)DD hiveCREATETEMPORARYFUNCTIONxxxAS函數(shù)全名稱hiveDROPTEMPORARYFUNCTIONIFEXISTSDefaultValue:AddedIn:Hive0.13.0withHIVE-6103andHIVE-mr(Mapreducedefault)ortez(TezexecutionforHadoop2SeeHiveonTezformoreinformation,andseetheTezsectionbelowforTezconfigurationDefaultValue:-AddedIn:Hivehive使用-1作為默認(rèn)值。使用-1,hivereduce任務(wù)數(shù)量。DefaultAddedIn:HiveDefaultValue:AddedIn:HivereducereduceAddedIn:Hive0.2.0DefaultValue:AddedIn:HiveORC。用戶可以在建時(shí)使用CREATETABLESTOREDASTEXTFILE|SEQUENCEFILE|RCFILE|ORCDefaultValue:AddedIn:HiveDefaultValue:AddedIn:Hive0.7.0withHIVE-查詢的中間結(jié)果使用的文件格式??蛇x值有TextFile、SequenceFile和RCfile。SettoSequenceFileifanycolumnsarestringtypeandcontainnew-linecharactersHIVE-1608,DefaultHive0.4.0withHIVE-DefaultValue:Hive0.4.0withHIVE-DefaultValue:Hive0.4.0withHIVE-DefaultHive0.4.0withHIVE-DefaultValue:AddedIn:Hive0.13.0withHIVE-4660andHIVE-Ifturnedon,splitsgeneratedbyORCwillincludemetadataaboutthestripesinthefile.Thisdataisreadremoy(fromtheorHiveServer2machine)andsenttoalltheDefaultValue:AddedIn:Hive0.13.0withHIVE-4660andHIVE-CachesizeforkeemetainformationaboutORCsplitscachedintheDefaultValue:AddedIn:Hive0.13.0withHIVE-4660andHIVE-6128HowmanythreadsORCshouldusetocreatesplitsinparallel.DefaultValue:AddedIn:Hive0.13.0withHIVE-6347andHIVE-UsezerocopyreadswithORC.(ThisrequiresHadoop2.3or DefaultValue:AddedIn:Hive0.6.0DefaultValue:trueinHive0.3andlater;falseinHiveAddedIn:HivegroupbymapDefaultValue:AddedIn:HiveWhetherthereisskewindatatooptimizegroupbyDefaultValue:AddedIn:HiveNumberofrowsafterwhichsizeofthegroukeys/aggregationclassesisDefaultValue:AddedIn:Hive0.11.0withHIVE-Whetheranewmap-reducejobshouldbelaunchedforgrouForaquerylike"selecta,b,c,count(1)fromTgroupbya,b,cwithrollup;"fourrowsarecreatedperrow:(a,b,c),(a,b,null),(a,null,null),(null,null,null).Thiscanleadtoexplosionacrossthemap-reduceboundaryifthecardinalityofTisveryhigh,andmap-sideaggregationdoesnotdoaverygoodjob.ThisparameterdecidesifHiveshouldaddanadditionalmap-reducejob.Ifthesetcardinality(4intheexampleabove)ismorethanthisvalue,anewMRjobisaddedundertheassumptionthattheorginal"groupby"willreducethedatasize.DefaultValue:AddedIn:HiveForlocalmode,memoryoftheDefaultValue:AddedIn:HiveThemaxmemorytobeusedbymap-sidegroupaggregationhashtable,ifthememoryusageishigherthanthisnumber,toflushdata.DefaultValue:AddedIn:HivePortionoftotalmemorytobeusedbymap-sidegroupaggregationhashDefaultValue:AddedIn:HiveHashaggregationwillbeturnedoffiftheratiobetweenhashtablesizeandinputrowsisbiggerthanthisnumber.Setto1tomakesurehashaggregationisneverturnedoff.DefaultValue:AddedIn:HiveWhethertoenablethebucketedgroupbyfrombucketedDefaultValue:AddedIn:Hive0.8.0withHIVE-Removed Hive0.9.0byHIVE-2621(seeWhethertooptimizemultigroupbyquerytogenerateasingleM/Rjobn.Ifthemultigroupbyqueryhascommongroupbykeys,itwillbeoptimizedtogenerateasingleM/RDefaultValue:AddedIn:Hive0.9.0withHIVE-WhethertooptimizemultigroupbyquerytogenerateasingleM/Rjobn.Ifthemultigroupbyqueryhascommongroupbykeys,itwillbeoptimizedtogenerateasingleM/RDefaultValue:AddedIn:HiveOptout:Hive0.13.0withHIVE-Whethertoenablecolumnpruner.(ThisconfigurationpropertywasremovedinreleaseDefaultValue:AddedIn:Hive0.8.0withHIVE-Whethertoenableautomaticuseof SeeIndexingformoreconfigurationpropertiesrelatedtoHiveDefaultValue:AddedIn:HiveWhethertoenablepredicateDefaultValue:AddedIn:HiveWhethertopushpredicatesdownintostoragehandlers.Ignoredwhenhive.optimize.ppdisfalse.DefaultValue:AddedIn:HiveWhethertotransitivelyreplicatepredicatefiltersoverequijoinDefaultValue:AddedIn:HiveHowmanyrowsintheright-mostjoinoperandHiveshouldbufferbeforeemittingthejoinresult.DefaultValue:AddedIn:HiveHowmanyrowsinthejoiningtables(exceptthestreamingtable)shouldbecachedinmemory.DefaultValue:AddedIn:Hive0.5.0(recedbyhive.smbjoin.cache.rowsinHive0.12.0)Howmanyvaluesineachkeyinthemap-joinedtableshouldbecachedinmemory.DefaultValue:AddedIn:HivePortionoftotalmemorytobeusedbymap-sidegroupaggregationhashtable,whenthisgroupbyisfollowedbymapjoin.DefaultAddedIn:Hive0.7.0withHIVE-1642:hive.smalltable.filesize(recedbyhive.mapjoin.smalltable.filesizeinHive0.8.1)AddedIn:Hive0.8.1withHIVE-2499:Thethresholdfortheinputfilesizeofthesmalltables;ifthefilesizeissmallerthanthisthreshold,itwilltrytoconvertthecommonjoinintomapjoin.DefaultValue:AddedIn:Hive0.7.0withHIVE-1808andHIVE-Thisnumbermeanshowmuchmemorythelocaltaskcantaketoholdthekey/valueintoin-memoryhashtable;Ifthelocaltask'smemoryusageismorethanthisnumber,thelocaltaskwillbeaborted.Itmeansthedataofsmalltableistoolargetobeheldinmemory.DefaultValue:AddedIn:HiveThisnumbermeanshowmuchmemorythelocaltaskcantaketoholdthekey/valueintoin-memoryhashtablewhenthismapjoinfollowedbyagroupby;Ifthelocaltask'smemoryusageismorethanthisnumber,thelocaltaskwillbeaborted.Itmeansthedataofsmalltableistoolargetobeheldinthememory.DefaultValue:AddedIn:Hive0.7.0withHIVE-1808andHIVE-ThenumbermeansafterhowmanyrowsprocesseditneedstocheckthememoryDefaultValue: (reHowmanyrowswiththesamekeyvalueshouldbecachedinmemorypersort-merge-bucketjoinedtable.DefaultValue:AddedIn:Hive0.13.0withHIVE-6429andHIVE-WhetheraMapJoinhashtableshoulduseoptimized(size-wise)keys,allowingthetabletakelessmemory.Dependingonthekey,memorysavingsfortheentiretablecanbe5-15%orso.DefaultValue:AddedIn:Hive0.13.0withHIVE-6418andHIVE-WhetheraMapJoinhashtableshoulddeserializevaluesondemand.Dependingonhowmanyvaluesinthetablethejoinwillactuallytouch,itcansavealotofmemorybynotcreatingobjectsforrowsthatarenotneeded.Ifallrowsareneeded,obviouslythere'snoDefaultValue:AddedIn:Hive DefaultValue:AddedIn:HiveDetermineifwegetaskewkeyinjoin.Ifweseemorethanthespecifiednumberofrowswiththesamekeyinjoinoperator,wethinkthekeyasaskewjoinkey.DefaultValue:AddedIn:HiveDeterminethenumberofmaptaskusedinthefollowupmapjoinjobforaskewjoin.Itshouldbeusedtogetherwithhive.skewjoin.mapjoin.min.splittoperformafinegrainedDefaultAddedIn:HiveDeterminethenumberofmaptaskatmostusedinthefollowupmapjoinjobforaskewjoinbyspecifyingtheminimumsplitsize.Itshouldbeusedtogetherwithhive.skewjoin.mapjoin.map.taskstoperformafinegrainedcontrol. DefaultValue:AddedIn:HiveWhethertocreateaseparatenforskewedkeysforthetablesinthejoin.Thisisbasedontheskewedkeysstoredinthemetadata.Atcompiletime,thenisbrokenintodifferentjoins:onefortheskewedkeys,andtheotherfortheremainingkeys.Andthen,aunionisperformedforthetwojoinsgeneratedabove.Sounlessthesameskewedkeyispresentinboththejoinedtables,thejoinfortheskewedkeywillbeperformedasamap-sidejoin.Themaindifferencebetweenthisparamaterandhive.optimize.skewjoinisthatthisparameterusestheskewinformationstoredinthemetastoretooptimizethenatcompiletimeitself.Ifthereisnoskewinformationinthemetadata,thisparameterwillnothaveanyeffect.Bothhive.opt piletimeandhive.optimize.skewjoinshouldbesettotrue.(Ideally,hive.optimize.skewjoinshouldberenamedashive.optimize.skewjoin.runtime,butforbackwardcompatibilitythathasnotbeenIftheskewinformationiscorrectlystoredinthemetadata, piletimewillchangethequeryntotakecareofit,andhive.optimize.skewjoinwillbeano-op.DefaultValue:AddedIn:Hive0.10.0withHIVE-Whethertoremovetheunionandpushtheoperatorsbetweenunionandthefilesinkaboveunion.Thisavoidsanextrascanoftheoutputbyunion.Thisisindependentlyusefulforunionqueries,andespeciallyusefulwhen piletimeissettotrue,sinceanextraunionisinserted.Themergeistriggeredifeitherofhive.merge.mapfilesorhive.merge.mapredfilesissettotrue.Iftheuserhassethive.merge.mapfilestotrueandhive.merge.mapredfilestofalse,theideawasthatthenumberofreducersarefew,sothenumberoffilesanywayissmall.However,withthisoptimization,weareincreasingthenumberoffilespossiblybyabigmargin.So,wemergeDefaultValue:AddedIn:Hive0.10.0withHIVE-WhethertheversionofHadoopwhichisrunningsupportssub-directoriesfortables/partitions.ManyHiveoptimizationscanbeappliediftheHadoopversionsupportssub-directoriesfortables/partitions.ThissupportwasaddedbyMAPREDUCE-1501.DefaultValue:AddedIn:HiveThemodeinwhichtheHiveoperationsarebeingperformed.Instrictmode,someriskyqueriesarenotallowedtorun.DefaultValue:AddedIn:Hiveumnumberofbytesascriptisallowedtoemittostandarderror(permap-reducetask).Thispreventsrunawayscriptsfromfillinglogspartitionstocapacity.DefaultValue:AddedIn:HiveWhenenabled,thisoptionallowsauserscripttoexitsuccessfullywithoutconsumingallthedatafromthestandardinput.DefaultValue:AddedIn:HiveNameoftheenvironmentvariablethatholdstheuniquescriptoperatorIDinthetransformfunction(thecustommapper/reducerthattheuserhasspecifiedintheDefaultValue:AddedIn:HiveThiscontrolswhetherthefinaloutputsofaquery(toalocal/hdfsfileoraHivetable)iscompressed.ThecompressioncodecandotheroptionsaredeterminedfromHadoopconfigurationvariables press*.DefaultValue:AddedIn:HiveThiscontrolswhetherintermediatefilesproducedbyHivebetweenmultiplemap-reducejobsarecompressed.ThecompressioncodecandotheroptionsaredeterminedfromHadoopconfigurationvariables DefaultValue:AddedIn:Hive0.5.0Whethertoexecutejobsinparallel.DefaultValue:AddedIn:HiveHowmanyjobsatmostcanbeexecutedinDefaultValue:AddedIn:HiveWhethertoprovidetherowoffsetvirtualDefaultValue:AddedIn:HiveRemovedin:Hive0.13.0withHIVE-WhetherHiveshouldperiodicallyupdatetaskprogresscountersduringexecution.Enablingthisallowstaskprogresstobemonitoredmorecloselyinthejobtracker,butmayimposeaperformancepenalty.Thisflagisautomaticallysettotrueforjobswithhive.exec.dynamic.partitionsettotrue.DefaultValue:AddedIn:Hive0.13.0withHIVE-Countergroupnameforcountersusedduringqueryexecution.ThecountergroupisusedforinternalHivevariables(CREATED_FILE,FATAL_ERROR,andsoon).DefaultValue:AddedIn:HiveComma-separatedlistofpre-executionhookstobeinvokedforeachstatement.Apre-executionhookisspecifiedasthenameofaJavaclasswhichimplementstheorg.apache.hadoop.hive.ql.hooks.ExecuteWithHookContextinterface.DefaultValue:AddedIn:HiveComma-separatedlistofpost-executionhookstobeinvokedforeachstatement.Apost-executionhookisspecifiedasthenameofaJavaclasswhichimplementstheorg.apache.hadoop.hive.ql.hooks.ExecuteWithHookContextinterface.DefaultValue:AddedIn:HiveComma-separatedlistofon-failurehookstobeinvokedforeachstatement.Anon-failurehookisspecifiedasthenameofJavaclasswhichimplementstheDefaultValue:AddedMergesmallfilesattheendofamap-onlyDefaultValue:AddedMergesmallfilesattheendofamap-reduceDefaultValue:AddedTrytogenerateamap-onlyjobformergingfilesifCombineHiveInputFormatisDefaultAddedSizeofmergedfilesattheendoftheDefaultAddedWhentheaverageoutputfilesizeofajobislessthanthisnumber,Hivewillstartanadditionalmap-reducejobtomergetheoutputfilesintobiggerfiles.Thisisonlydoneformap-onlyjobsifhive.merge.mapfilesistrue,andformap-reducejobsifhive.merge.mapredfilesistrue.DefaultValue:AddedSendaheartbeatafterthisinterval–usedbymapjoinandfilterDefaultValue:falsein0.10.0;truein0.11.0andlater(HIVE-AddedIn:WhetherHiveenablestheoptimizationaboutconvertingcommonjoinintomapjoinbasedontheinputfilesize.(Notethathive-default.xml.temteincorrectlygivesthedefaultasfalseinHive0.11.0through0.13.1.)DefaultValue:AddedIn:0.11.0withHIVE-3784(defaultchangedtotruewithHIVE-WhetherHiveenablestheoptimizationaboutconvertingcommonjoinintomapjoinbasedontheinputfilesize.Ifthisparameterison,andthesumofsizeforn-1ofthetables/partitionsforann-wayjoinissmallerthanthesizespecifiedbyhive.auto.convert.join.noconditionaltask.size,thejoinisdirectlyconvertedtoamapjoin(thereisnoconditionaltask).DefaultAddedIn:0.11.0withHIVE-Ifhive.auto.convert.join.noconditionaltaskisoff,thisparameterdoesnottakeeffect.However,ifitison,andthesumofsizeforn-1ofthetables/partitionsforann-wayjoinissmallerthanthissize,thejoinisdirectlyconvertedtoamapjoin(thereisnoconditionaltask).Thedefaultis10MB.DefaultValue:AddedIn:0.13.0withHIVE-Forconditionaljoins,ifinputstreamfromasmallaliascanbedirectlyappliedtothejoinoperatorwithoutfilteringorprojection,thealiasneednotbepre-stagedinthedistributedcacheviaamapredlocaltask.Currently,thisisnotworkingwithvectorizationorTezexecutionengine.DefaultValue:AddedWhetherHiveTranform/Map/ReduceClauseshouldautomaticallysendprogressinformationtoTaskTrackertoavoidthetaskgettingkilledbecauseofinactivity.Hivesendsprogressinformationwhenthescriptisoutputtingtostderr.Thisoptionremovestheneedofperiodicallyproducingstderrmessages,butusersshouldbecautiousbecausethismaypreventinfiniteloopsinthescriptstobekilledbyTaskTracker.DefaultValue:AddedThedefaultSerDefortransmittinginputdatatoandreadingoutputdatafromtheuserDefaultValue:AddedThedefaultrecordreaderforreadingdatafromtheuserDefaultValue:AddedThedefaultrecordwriterforwritingdatatotheuserDefaultValue: AddedThedefaultinputformat.SetthistoHiveInputFormatifyouencounterproblemswithDefaultValue:AddedWhetherHiveshouldautomaticallysendprogressinformationtoTaskTrackerwhenusingUDTF'stopreventthetaskgettingkilledbecauseofinactivity.UsersshouldbecautiousbecausethismaypreventTaskTrackerfromkillingtaskswithinfiniteloops.DefaultValue:AddedWhetherspeculativeexecutionforreducersshouldbeturnedDefaultValue:AddedTheintervalwithwhichtopolltheJobTrackerforthecounterstherunningjob.Thesmalleritisthemoreloadtherewillbeonthejobtracker,thehigheritisthelessgranularthecaughtwillbe.DefaultValue:AddedWhetherbucketingisend.Iftrue,whileinsertingintothetable,bucketingisenDefaultValue:AddedWhethersortingisend.Iftrue,whileinsertingintothetable,sortingisenDefaultValue:AddedRemoveextramap-reducejobsifthedataisalreadyclusteredbythesamekeywhichneedstobeusedagain.Thisshouldalwaysbesettotrue.Sinceitisanewfeature,ithasbeenmadeconfigurable.DefaultValue:AddedWhetherornottoallowdynamicpartitionsinDefaultValue:AddedInstrictmode,theusermustspecifyatleastonestaticpartitionincasetheuseraccidentallyoverwritesallpartitions.DefaultValue:AddedumnumberofdynamicpartitionsallowedtobecreatedinDefaultValue:AddedumnumberofdynamicpartitionsallowedtobecreatedineachDefaultValue:AddedumnumberofHDFSfilescreatedbyallmappers/reducersinaMapReduceDefaultValue:AddedThedefaultpartitionnameincasethedynamicpartitioncolumnvalueisnull/emptystringoranyothervaluesthatcannotbeescaped.ThisvaluemustnotcontainanyspecialcharacterusedinHDFSURI(e.g.,':','%','/'etc).Theuserhastobeawarethatthedynamicpartitionvalueshouldnotcontainthisvaluetoavoidconfusions.DefaultValue:AddedTheSerDeusedbyFetchTasktoserializethefetchDefaultValue:AddedLetHivedeterminewhethertoruninlocalmodeDefaultValue:AddedDonotreportanerrorifDROPTABLE/VIEWspecifiesanon-existent.DefaultValue:AddedIfajobfails,whethertoprovidealinkintheCLItothetaskwiththemostfailures,alongwithdebugginghintsifapplicable.DefaultValue:AddedIn:HiveHowlongtorunautoprogressorforthescript/UDTFoperators(inseconds).Setto0forDefaultValue:AddedIn:HiveDefaultpropertyvaluesfornewlycreatedDefaultValue:AddedIn:HiveThisenablessubstitutionusingsyntaxlike${var}${system:var}andDefaultValue:AddedIn:HiveWhethertothrowanexceptionifdynamicpartitioninsertgeneratesemptyDefaultValue:AddedIn:HiveAcommaseparatedlistofacceptableURIschemesforimportandDefaultValue:AddedIn:HiveWhentryingasmallersubsetofdataforsimpleLIMIT,howmuchsizeweneedtoguaranteeeachrowtohaveatleast.DefaultValue:AddedIn:HiveWhentryingasmallersubsetofdataforsimpleLIMIT,umnumberoffileswecanDefaultValue:AddedWhethertoenabletooptimizationtotryingasmallersubsetofdataforsimpleLIMITDefaultValue:AddedIn:HiveumnumberofrowsallowedforasmallersubsetofdataforsimpleLIMIT,ifitisafetchquery.Insertqueriesarenotrestrictedbythislimit.DefaultValue:AddedIn:HiveShouldreworkthemapredworkornot.ThisisfirstintroducedbySymlinkTextInputFormattorecesymlinkfileswithrealpathsatcompiletime.DefaultValue:AddedIn:HiveAnumberusedtopercentagesampling.Bychangingthisnumber,userwillchangethesubsetsofdatasampled.DefaultValue:AddedIn:HiveAlistofI/Oexceptionhandlerclassnames.ThisisusedtoconstructalistofexceptionhandlerstohandleexceptionsthrownbyrecordreadersDefaultValue:AddedIn:HiveStringusedasaprefixwhenautogeneratingcolumnalias.Bydefaulttheprefixlabelwillbeappendedwithacolumnpositionnumbertoformthecolumnalias.Autogenerationwouldhappenifanaggregatefunctionisusedinaselectclausewithoutanexplicitalias. DefaultValue:AddedIn:HiveWhethertoincludefunctionnameinthecolumnaliasautogeneratedbyDefaultValue:AddedIn:HiveTheclassresponsibleloggingsideperformancemetrics.MustbeasubclassofDefaultValue:AddedIn:HiveTocleanuptheHivescratchdirectorywhilestartingtheHiveDefaultValue:AddedIn:HiveStringusedasafileextensionforoutputfiles.Ifnotset,defaultstothecodecextensionfortextfiles(e.g.".gz"),ornoextensionotherwise.DefaultValue:AddedIn:HiveWheretoinsertintomultileveldirectorieslike"insertdirectory'/HIVEFT25686/chinna/'fromtable".DefaultValue:AddedIn:Hive0.13.

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論