hive的基本操作_第1頁(yè)
hive的基本操作_第2頁(yè)
hive的基本操作_第3頁(yè)
已閱讀5頁(yè),還剩9頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Hive的基本操作一般情況下hive所操作的數(shù)據(jù)文件是已經(jīng)存在的(也可以是外部導(dǎo)入的),常 見(jiàn)的web日志文件格式有多種(如josn格式)。注意:Hive所創(chuàng)建的數(shù)據(jù)庫(kù)和數(shù)據(jù)表都是在 HDFS里的某個(gè)目錄如果是數(shù)據(jù)庫(kù),那么在 HDFS里就是:/user/hive/warehouse/庫(kù)名稱(chēng).db 如果是數(shù)據(jù)表,那么在 HDFS里就是:/user/hive/warehouse/庫(kù)名稱(chēng).db/表名 稱(chēng)Hive下默認(rèn)有一個(gè)庫(kù)default,如果不建庫(kù),直接建表,則表建在defaule庫(kù)下。hive> CREATE SCHEMA laserdb:OKI' line 七曰c已n: 0 _

2、DC se c-dmIhSHOW DATABASES;OKdefaalEu呂亡rcLbTime tafcen; C-CiSS aecond3F Fetclied: 2 iiive> uae default;OKTime 匸aken: 0-034 seconds iiivie> slioir xables;OKU3®r_movieUBerinovielU5Er_inovlE2ufler_iriovie3Time taken; C.D2B aeeonds. Fetched: 4 r&w(s| hivie> I1. Hive的基本操作(1) 建庫(kù)命令:CREATE

3、 SCHEMA 庫(kù)名;(2) 建表命令:CREATE TABL表名(字段名稱(chēng)、類(lèi)型);女口: CREATE TABLE tuoguan_tbl (flied string);數(shù)據(jù)表里的內(nèi)容,實(shí)質(zhì)就是HDFS里的某個(gè)文件,需要把這個(gè)文件解析為數(shù)據(jù)表的格式。(3) 創(chuàng)建普通表,每行的字段用逗號(hào)分隔create table web_log(id int, name string, address string) row format delimited fields termi nated by ','查看表的命令:show tables;(5) 查看表中數(shù)據(jù)的命令:select *

4、不需要轉(zhuǎn)換為 map reduceselect * from tuogua n_tbl;(6) 查看表結(jié)構(gòu)命令:Desc表名稱(chēng);(7) 舉例:在Linux文件系統(tǒng)/home/oracle 下有一個(gè)文件 t_hive.txt (文本以tab分隔)#查看數(shù)據(jù)文件的內(nèi)容(文本以tab分隔) vi t_hive.txt16236112134123117213712311123411234#創(chuàng)建新表FIELDS TERMINATEDINTO TABLE t_hivehive> CREATE TABLE t_hive (a int, b int, c int) ROWFORMAT DELIMITED

5、BY 't'OKTime taken: 0.489 seconds#導(dǎo)入數(shù)據(jù)t_hive.txt至U t_hive 表hive> LOAD DATA LOCAL INPATH 7home/cos/demo/t_hive.txt'OVERWRITECopying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs:/:9000/user/hive/ware

6、house/t_hiveOKTime taken: 0.397 seconds#查看表hive> show tables;OKt_hiveTime taken: 0.099 seconds#查看表數(shù)據(jù)hive> select * from t_hive;OK16236112134123117213712311123411234Time taken: 0.264 seconds#查看表結(jié)構(gòu)hive> desc t_hive;OKaintbintc intTime taken: 0.1 seconds#修改表,增加一個(gè)字段hive> ALTER TABLE t_hive A

7、DD COLUMNS (new_col String);OKTime taken: 0.186 secondshive> desc t_hive;OKa intb intc intnew_col stringTime taken: 0.086 seconds#重命名表名ALTER TABLE t_hive RENAME TO t_hadoop;OKTime taken: 0.45 secondshive> show tables;OKt_hadoopTime taken: 0.07 seconds#刪除表hive> DROP TABLE t_hadoop;OKTime tak

8、en: 0.767 seconds#查看表hive> show tables;OKTime taken: 0.064 seconds(8) 如果不想把HDFS里的文件進(jìn)行移動(dòng),則可以創(chuàng)建外部表:create external table web_log2 (id int, name string, address string ) Location /user/weblog/2. 將json格式的web日志文件user_movie.json導(dǎo)入到Hive的某個(gè)表中方法一:使用第三方j(luò)ar包(1)使用一個(gè)第三方的jar 包 json-serde-1.3.6-SNAPSHOT-jar-wit

9、h-dependencies.jar(老師給的),將其復(fù)制到HIVE_HOME/lib目錄下創(chuàng)建表user_moviecreate table user_movie(custid string, sno string, genreid string, movieid string) ROW FORMAT SERDE'org.openx.data.jsonserde.JsonSerDe STORED AS TEXTFILE;(3)將 json 文件 user_movie.json 導(dǎo)入到表 user_movie 中首先將json文件上傳到Linux文件系統(tǒng)/home/oracle目錄下面

10、,然后在Linux命令下執(zhí)行如下命令:(本地的文件直接導(dǎo)入到HDFS相應(yīng)的目錄里)1-1 5f e 一空匕 * J us e±_EM-vi*. >/ua&i:/t_in£vieoracleioraele hedocd""|或者在hive命令中執(zhí)行:(從HDFS里直接導(dǎo)入數(shù)據(jù),這個(gè)會(huì)把HDFS里的文件移動(dòng)到 HIVE表的相應(yīng)目錄里)hive> Load data local inpath T/home/oracle/user_jk>vie- jsan1 into table u3er_Tnovie3; Loading data t

11、o table defau11*user_movie3Table defau.lt.uaer_mDvie3 stats: numFiles-lj totaISize*821779JOKTine taken: 0.245 eco-ads方法二:使用 hive 自帶的 jar 包 hive-hcatalog-core-1.2.1.jar需要把 HIVE_HOME/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar 復(fù)制到 HIVE_HOME/lib,然后在hive命令下用下面方法建表。create table user_movie2(custi

12、d stri ng, sno stri ng, gen reid stri ng, movieid stri ng)ROWFORMAT SERDE)rg. apache.hive.hcatalog.data.Jso nSerDe' STORED AS TEXTFILE;其他都一樣。3. 數(shù)據(jù)導(dǎo)入還以剛才的t_hive為例。#創(chuàng)建表結(jié)構(gòu)hive> CREATE TABLE t_hive (a int, b int, c int) ROWFORMAT DELIMITED FIELDS TERMINATED BY 't'從操作本地文件系統(tǒng)加載數(shù)據(jù)(LOCAL)hive&

13、gt; LOAD DATA LOCAL INPATH 7home/cos/demo/t_hive.txt'OVERWRITE INTO TABLE t_hive ;Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs:/:9000/user/hive/warehouse/t_hiveOKTime taken: 0.612 seconds#在HDFS中查找剛剛導(dǎo)入

14、的數(shù)據(jù) hadoop fs -cat /user/hive/warehouse/t_hive/t_hive.txt16236112134123117213712311123411234從HDFS加載數(shù)據(jù)創(chuàng)建表t_hive2hive> CREATE TABLE t_hive2 (a int, b int, c int) ROWFORMAT DELIMITED FIELDSBY 't'TERMINATED#從HDFS加載數(shù)據(jù)hive> LOAD DATA INPATH 7user/hive/warehouse/t_hive/t_hive.txt' t_hive2

15、;Loading data to table default.t_hive2Deleted hdfs:/:9000/user/hive/warehouse/t_hive2OKTime taken: 0.325 secondsOVERWRITEINTO TABLE#查看數(shù)據(jù)hive> select * from t_hive2;OK16236112134123117213712311123411234Time taken: 0.287 seconds從其他表導(dǎo)入數(shù)據(jù)hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;T

16、otal MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0002. Tracking URL =:50030/jobdetails.jsp?jobid=job_201307131407_0002Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/./bin/hadoop job-Dmapred.jo

17、b.tracker=hdfs:/:9001 -kill job_201307131407_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:32:41,979 Stage-1 map = 0%, reduce = 0%2013-07-16 10:32:48,034 Stage-1 map = 100%, 2013-07-16 10:32:49,050 Stage-1 map = 100%, 2013-07-16 10:32:50,068 Stage-1

18、 map = 100%, 2013-07-16 10:32:51,082 Stage-1 map = 100%, 2013-07-16 10:32:52,093 Stage-1 map = 100%, 2013-07-16 10:32:53,102 Stage-1 map = 100%, 2013-07-16 10:32:54,112 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec reduce = 0%, Cumulative CPU 1.03 sec reduce = 0%, Cumulative CPU 1.03 sec

19、reduce = 0%, Cumulative CPU 1.03 sec reduce = 0%, Cumulative CPU 1.03 sec reduce = 0%, Cumulative CPU 1.03 sec reduce = 100%, Cumulative CPU 1.03 secMapReduce Total cumulative CPU time: 1 seconds 30 msecEnded Job =job_201307131407_0002Ended Job = -314818888, job is filtered out (removed at runtime).

20、Moving data to:hdfs:/:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764 014154/-ext-10000Loading data to table default.t_hive2Deleted hdfs:/:9000/user/hive/warehouse/t_hive2Table default.t_hive2stats:num_partitions:0, num_files:1, num_rows: 0, total_size:56, raw_data_size: 07 Rows loaded

21、to t_hive2MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 273 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 30 msecOKTime taken: 23.227 seconds hive> select * from t_hive2;OK16236112134123117213712311123411234Time taken: 0.134 seconds創(chuàng)建表并從其他表導(dǎo)入數(shù)據(jù)#刪除表hiv

22、e> DROP TABLE t_hive;#創(chuàng)建表并從其他表導(dǎo)入數(shù)據(jù)hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0003. Tracking URL =:50030/jobdetails.jsp?jobid=job_20130

23、7131407_0003Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/./bin/hadoop job -Dmapred.job.tracker=hdfs:/:9001 -kill job_201307131407_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:36:48,612 Stage-1 map = 0%, reduce = 0%2013-07-16 10:36:54,648

24、Stage-1 map = 100%,2013-07-16 10:36:55,657 Stage-1 map = 100%,2013-07-16 10:36:56,666 Stage-1 map = 100%,2013-07-16 10:36:57,673 Stage-1 map = 100%,2013-07-16 10:36:58,683 Stage-1 map = 100%,2013-07-16 10:36:59,691 Stage-1 map = 100%,reduce = 0%, Cumulative CPU 1.13 sec reduce = 0%, Cumulative CPU 1

25、.13 sec reduce = 0%, Cumulative CPU 1.13 sec reduce = 0%, Cumulative CPU 1.13 sec reduce = 0%, Cumulative CPU 1.13 sec reduce = 100%, Cumulative CPU 1.13 secMapReduce Total cumulative CPU time: 1 seconds 130 msecEnded Job =job_201307131407_0003Ended Job = -670956236, job is filtered out (removed at

26、runtime).Moving data to:hdfs:/:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812 540343/-ext-10001Moving data to: hdfs:/:9000/user/hive/warehouse/t_hiveTable default.t_hivestats:num_partitions:0, num_files:1, num_rows: 0, total_size:56, raw_data_size: 07 Rows loaded to hdfs:/:9000/tmp/hiv

27、e-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.13 sec HDFS Read: 272 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 130 msecOKTime taken: 20.13 seconds hive> select * from t_hive;OK1623611213412311721371231

28、1123411234Time taken: 0.109 seconds僅復(fù)制表結(jié)構(gòu)不導(dǎo)數(shù)據(jù) hive> CREATE TABLE t_hive3 LIKE t_hive;hive> select * from t_hive3;OKTime taken: 0.077 seconds4數(shù)據(jù)導(dǎo)出從HDFS復(fù)制到HDFS其他位置 hadoop fs -cp /user/hive/warehouse/t_hive / hadoop fs -ls /t_hiveFound 1 items-rw-r-r- 1 cos supergroup56 2013-07-16 10:41 /t_hive/0

29、00000_0 hadoop fs -cat /t_hive/000000_016236112134123117213712311123411234通過(guò)Hive導(dǎo)出到本地文件系統(tǒng)hive> INSERT OVERWRITE LOCAL DIRECTORY 7tmp/t_hive' SELECT * FROM t_hive;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job

30、 = job_201307131407_0005. Tracking URL = :50030/jobdetails.jsp?jobid=job_201307131407_0005Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/./bin/hadoop job -Dmapred.job.tracker=hdfs:/:9001 -kill job_201307131407_0005Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

31、2013-07-16 10:46:24,774 Stage-1 map = 0%, reduce = 0%2013-07-16 10:46:30,823 Stage-1 map = 100%,2013-07-16 10:46:31,833 Stage-1 map = 100%,2013-07-16 10:46:32,844 Stage-1 map = 100%,2013-07-16 10:46:33,856 Stage-1 map = 100%,2013-07-16 10:46:34,865 Stage-1 map = 100%,2013-07-16 10:46:35,873 Stage-1

32、map = 100%,2013-07-16 10:46:36,884 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec reduce = 0%, Cumulative CPU 0.87 sec reduce = 0%, Cumulative CPU 0.87 sec reduce = 0%, Cumulative CPU 0.87 sec reduce = 0%, Cumulative CPU 0.87 sec reduce = 0%, Cumulative CPU 0.87 sec reduce = 100%, Cumulati

33、ve CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job =job_201307131407_0005Copying data to local directory /tmp/t_hiveCopying data to local directory /tmp/t_hive7 Rows loaded to /tmp/t_hiveMapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 0.87 sec HDFS Read: 271 HDFS Write: 56 S

34、UCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 23.369 seconds#查看本地操作系統(tǒng)hive> ! cat /tmp/t_hive/000000_0;hive> 162361121341231172137123111234112345. Hive 查詢 HiveQL注:以下代碼將去掉 map,reduce的日志輸出部分。普通查詢:排序,列別名,嵌套子查詢hive> FROM (> SELECT b,c as c2 FROM t_hive> )t> SELECT t.b, t.c

35、2> WHERE b>2> LIMIT 2;1213213連接查詢:JOINhive> SELECT t1.a,t1.b,t2.a,t2.b> FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a> WHERE t1.c>10;1 12 1 1211211241241261126112712712聚合查詢1 : count, avghive> SELECT count(*), avg(a) FROM t_hive;731.142857142857142聚合查詢 2 : count, distincthive>

36、; SELECT count(DISTINCT b) FROM t_hive;3聚合查詢 3 : GROUP BY, HAVING#GROUP BYhive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c16.02356.026211.023461.01213I. 0123417.0213#HAVINGhive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>3056.0262II. 02341.012346. Hive視圖Hive視圖和數(shù)據(jù)庫(kù)視圖的

37、概念是一樣的,我們還以 t_hive為例hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;hive> select * from v_hive;4127121 1211 2刪除視圖hive> DROP VIEW IF EXISTS v_hive;OKTime taken: 0.495 seconds7. Hive分區(qū)表分區(qū)表是數(shù)據(jù)庫(kù)的基本概念,但很多時(shí)候數(shù)據(jù)量不大,我們完全用不到分區(qū)表。Hive是一種OLAP數(shù)據(jù)倉(cāng)庫(kù)軟件,涉及的數(shù)據(jù)量是非常大的,所以分區(qū)表在這個(gè)場(chǎng)景就顯得非常重要!下面我們重新定義

38、一個(gè)數(shù)據(jù)表結(jié)構(gòu):t_hft創(chuàng)建數(shù)據(jù) vi /home/cos/demo/t_hft_20130627.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45 vi /home/cos/demo/t_hft_20130628.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45創(chuàng)建數(shù)據(jù)表DROP TABLE IF EXISTS t_hft;CREATE TABLE t_hft(SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE)ROW FORMAT DELIMITED FIELDS TERMI

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論