teradata數(shù)據(jù)倉庫編程實用教程(Teradata)解析課件_第1頁
teradata數(shù)據(jù)倉庫編程實用教程(Teradata)解析課件_第2頁
teradata數(shù)據(jù)倉庫編程實用教程(Teradata)解析課件_第3頁
teradata數(shù)據(jù)倉庫編程實用教程(Teradata)解析課件_第4頁
teradata數(shù)據(jù)倉庫編程實用教程(Teradata)解析課件_第5頁
已閱讀5頁,還剩76頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、實用教程(Teradata)陸世潮2008年9月第1頁,共81頁。問題總結(jié)常見問題分類:表屬性不對: Set / Multiset問題:INSERT操作慢主索引(PI)設(shè)置不合理問題1:數(shù)據(jù)傾斜度大,空間爆滿。問題2:JOIN操作,數(shù)據(jù)需要重分布。 分區(qū)索引(PPI)設(shè)置不合理問題:全表掃描連接條件過于復(fù)雜問題:系統(tǒng)無法優(yōu)化執(zhí)行計劃缺乏統(tǒng)計信息問題:系統(tǒng)無法找到最優(yōu)化的執(zhí)行計劃SQL跑得慢哈!第2頁,共81頁。提綱Teradata架構(gòu)常見問題,及解決方法Teradata工具實用小技巧JOIN的實現(xiàn)機(jī)制JOIN的優(yōu)化第3頁,共81頁。Teradata 體系架構(gòu)Teradata and MPP S

2、ystemsRDBMS ARCH第4頁,共81頁。Logical Example of NPPI versus PPI4 AMPs with Orders Table defined with PPI on O_Date.RHO_#O_DateRHO_#O_DateRHO_#O_DateRHO_#O_Date35100706/0126100206/0124100406/0120100506/0139101106/0136101206/0132100306/0143101006/0114100106/0106100906/0104100806/0108100606/0103101606/0217

3、101306/0248102306/0207101706/0216102106/0245101506/0209101806/0227101406/0244102206/0202102406/0211101906/0222102006/0201102806/0312103106/0328103206/0310103406/0329103306/0334102906/0319102506/0340103506/0347102706/0325103606/0331102606/0346103006/0323104006/0430103806/0442104706/0413103706/0421104

4、506/0436104306/0405104806/0415104206/0433103906/0418104106/0438104606/0441104406/044 AMPs with Orders Table defined with NPPI.01102806/0312103106/0328103206/0310103406/0329103306/0334102906/0319102506/0340103506/0347102706/0325103606/0331102606/0346103006/0303101606/0217101306/0248102306/0207101706/

5、0216102106/0245101506/0209101806/0227101406/0244102206/0202102406/0211101906/0222102006/0214100106/0135100706/0139101106/0106100906/0126100206/0136101206/0104100806/0124100406/0132100306/0108100606/0120100506/0143101006/0123104006/0430103806/0442104706/0413103706/0421104506/0436104306/0405104806/041

6、5104206/0433103906/0418104106/0438104606/0441104406/04RHO_#O_DateRHO_#O_DateRHO_#O_DateRHO_#O_Date第5頁,共81頁。提綱Teradata架構(gòu)常見問題,及解決方法Teradata工具實用小技巧JOIN的實現(xiàn)機(jī)制JOIN的優(yōu)化第6頁,共81頁。表屬性:Set & MultiSetSet Table不允許記錄重復(fù)MultiSet Table允許記錄重復(fù)默認(rèn)值:Set TableCreate Table. AS . 生成的目標(biāo)表屬性默認(rèn)為Set Table對SET Table進(jìn)行INSERT操作,需要檢查

7、是否存在重復(fù)記錄相當(dāng)?shù)暮馁Y源若真要限定唯一性,可以通過UPI或USI實現(xiàn)CREATE SET TABLE pmart.RPT_NM_GRP_PRE_WARN_MON,(CAL_Month INTEGER TITLE 統(tǒng)計月份 ,ORG_NUM CHAR(12) TITLE 集團(tuán)編號 ,City_ID CHAR(3) TITLE 地市標(biāo)識 ,ORG_SUBS_GRP_NUM CHAR(10) TITLE 集團(tuán)用戶群編號 ,ORG_Title VARCHAR(200) TITLE 集團(tuán)名稱 ,ORG_Level CHAR(2) TITLE 集團(tuán)級別 ,STAT_Item_Code CHAR(2)

8、TITLE 統(tǒng)計項 ,STAT_Value DECIMAL(18,2) TITLE 統(tǒng)計值 ) PRIMARY INDEX (ORG_NUM);例子:pmart.RPT_NM_GRP_PRE_WARN_MON 內(nèi)蒙移動集團(tuán)客戶預(yù)警指標(biāo)月報表假設(shè)原有1286449條記錄插入:152853條記錄耗時:15秒第7頁,共81頁。表屬性:Set & MultiSet (cont.)CREATE MULTISET TABLE pmart.RPT_NM_GRP_PRE_WARN_MON(CAL_Month INTEGER TITLE 統(tǒng)計月份 ,ORG_NUM CHAR(12) TITLE 集團(tuán)編號 ,Ci

9、ty_ID CHAR(3) TITLE 地市標(biāo)識 ,ORG_SUBS_GRP_NUM CHAR(10) TITLE 集團(tuán)用戶群編號 ,ORG_Title VARCHAR(200) TITLE 集團(tuán)名稱 ,ORG_Level CHAR(2) TITLE 集團(tuán)級別 ,STAT_Item_Code CHAR(2) TITLE 統(tǒng)計項 ,STAT_Value DECIMAL(18,2) TITLE 統(tǒng)計值 ) PRIMARY INDEX (ORG_NUM);例子:pmart.RPT_NM_GRP_PRE_WARN_MON 內(nèi)蒙移動集團(tuán)客戶預(yù)警指標(biāo)月報表建議:Teradata中都用 MultiSet假設(shè)

10、原有1286449條記錄插入:152853條記錄耗時:1秒例子: CREATE MULTISET TABLE tttemp.VT_SUBS_VIOC_QUAN as ( SELECT * FROM tttemp.MID_SUBS_VIOC_QUAN WHERE CAL_MONTH = 200802 AND * )WITH DATA PRIMARY INDEX ( subs_id);臨時表, 默認(rèn)為: Set需要指定為: Multiset字段越多,記錄越多差別越明顯第8頁,共81頁。PI(Primary Index 主索引)的選擇PI影響數(shù)據(jù)的存儲與訪問,其選擇標(biāo)準(zhǔn):不同值盡量多的字段(More

11、 Unique Values)使用頻繁的字段:包括值訪問和連接訪問少更新PI字段不宜太多最好是手動指定PI 例子:用戶語音業(yè)務(wù)量中間表CREATE MULTISET TABLE tttemp.MID_SUBS_VIOC_QUAN ( CAL_Month INTEGER TITLE 統(tǒng)計月份, City_ID CHAR(4) TITLE 地市標(biāo)識, Channel_ID CHAR(8) TITLE 渠道標(biāo)識, Subs_id CHAR(12) TITLE 用戶標(biāo)識, 。) PRIMARY INDEX ( subs_id);例子:用戶語音業(yè)務(wù)量臨時表CREATE MULTISET TABLE tt

12、temp.VT_SUBS_VIOC_QUAN as ( SELECT * FROM tttemp.MID_SUBS_VIOC_QUAN WHERE CAL_MONTH = 200802 AND * )WITH DATA PRIMARY INDEX ( subs_id);Subs_ID: 頻繁使用 Unique Value多如果不指定PI,系統(tǒng)默認(rèn)為:Cal_Month第9頁,共81頁。PI(Primary Index 主索引)的選擇(cont.)例子:夢網(wǎng)客戶活躍客戶分析CREATE MULTISET TABLE PMART.FCT_DATA_MONNET_ACTIVE_MON ( CAL_M

13、onth INTEGER TITLE 統(tǒng)計月份, City_ID CHAR(4) TITLE 地市標(biāo)識, Channel_ID CHAR(8) TITLE 渠道標(biāo)識, Mont_SVC_Type_Cod CHAR(3) TITLE 夢網(wǎng)業(yè)務(wù)類型編碼, Mont_SVC_CAT_MicroCls_Cod CHAR(3) TITLE 夢網(wǎng)業(yè)務(wù)分類小類編碼, Mont_SVC_CHRG_Type_Cod CHAR(2) TITLE 夢網(wǎng)業(yè)務(wù)計費類型編碼, THR_Brand_Cod CHAR(1) TITLE 三大品牌編碼, Mont_Consume_Level_Cod CHAR(2) TITLE

14、夢網(wǎng)消費層次編碼, Consume_Level_Cod CHAR(2) TITLE 消費層次編碼, 。)PRIMARY INDEX ( CAL_Month ,City_ID ,Channel_ID ,Mont_SVC_Type_Cod ,Mont_SVC_CAT_MicroCls_Cod ,Mont_SVC_CHRG_Type_Cod,THR_Brand_Cod ,Mont_Consume_Level_Cod ,Consume_Level_Cod );PI:9字段 2字段: City_ID ,Channel_ID 調(diào)整PI后,在右邊的SQL中,PI是否起作用?以下SQL,PI是否起作用?:1.

15、值訪問Select *From FCT_DATA_MONNET_ACTIVE_MONWhere City_ID = 070010 and Channel_ID= 0100 and cal_month = 2007072.連接訪問Select *From FCT_DATA_MONNET_ACTIVE_MON ALEFT JOIN MID_CHANNEL_INFO_DAILY B ON A. Channel_ID = B. Channel_ID and A. City_ID = b. City_ID LEFT JOIN VW_CDE_REGION_TYPE C ON A. City_ID = C.

16、 City_ID 3、值訪問連接訪問Select *From FCT_DATA_MONNET_ACTIVE_MON A, VT_INFO BWHERE A. Channel_ID = B. Channel_ID AND A. City_ID = B. City_ID AND A.CAL_MONTH = 200707 AND A. Consume_Level_Cod=B. Consume_Level_Cod第10頁,共81頁。PPI的使用PPI(Partition Primary Index,分區(qū)索引),把具有相同分區(qū)值的數(shù)據(jù)聚簇存放在一起;類似于SQL Server的聚簇索引(Cluster

17、Index),Oracle的聚簇表(Cluster Table)。利用PPI,可以快速插入/訪問同一個Partition(分區(qū))的數(shù)據(jù)。CREATE MULTISET TABLE qdata.TB_DQC_KPI_CHECK_RESULT ( TX_DATE DATE FORMAT YYYYMMDD TITLE 數(shù)據(jù)日期 NOT NULL, KPI_CODE INTEGER TITLE 指標(biāo)代碼 NOT NULL, 。 )PRIMARY INDEX ( KPI_CODE )PARTITION BY RANGE_N(TX_DATE BETWEEN CAST(20030101) AS DATE F

18、ORMAT YYYYMMDD) AND CAST(20191231) AS DATE FORMAT YYYYMMDD) EACH INTERVAL 1 DAY , NO RANGE OR UNKNOWN);Select *From TB_DQC_KPI_CHECK_RESULTWhere tx_date = 20070701;或Where tx_date between 20070701 and 20070731;或Where tx_date 20070701;但Where tx_date like 200707%;不起作用第11頁,共81頁。PPI的使用(cont.)Partition上不要

19、使用表達(dá)式,否則Partition不能被正確使用。T1. tx_date/100=CAST(20070917AS DATE FORMAT YYYYMMDD)/100Substring(T1. tx_date from 1 for 6) =200709應(yīng)該修改為 T1. tx_date=CAST(20070901 AS DATE FORMAT YYYYMMDD)第12頁,共81頁。PPI的使用(cont.) 腳本:tb_030040270.pl/* 刪除當(dāng)月 */ 2小時 del BASS1.tb_03004 where proc_dt = 200709;insert into BASS1.tb

20、_03004 7小時。 sel . from pview.vw_evt_cust_so cust where acpt_date=cast(200710|01 as date) cast(200710|01 as date)寫法錯誤,PPI不起作用日期的正確寫法:Cast(20071001 as date format YYYYMMDD)在proc_dt建立PPIPPI字段從Load_Date調(diào)整為acpt_date第13頁,共81頁。創(chuàng)建可變臨時表它僅存活于同一個Session之內(nèi)注意指定可變臨時表為multiset(通常也要指定PI)可變臨時表不能帶有PPI例子1:create volat

21、ile multiset table vt_RETAIN_ANLY_MON as ( select col1,col2, from where group by . )with data PRIMARY INDEX (PI_Cols) ON COMMIT PRESERVE ROWS;例子2: create volatile multiset table vt_RETAIN_ANLY_MON ( col1 char(2), col2 varchar(12) NOT NULL)PRIMARY INDEX (PI_Cols) ON COMMIT PRESERVE ROWS;第14頁,共81頁。創(chuàng)建可

22、變臨時表(cont.) 例子3:create volatile multiset table vt_RETAIN_ANLY_MON as ( select col1, cast(adc as varchar(12) col2 from where )with no data PRIMARY INDEX (col1) ON COMMIT PRESERVE ROWS; 例子4:create volatile multiset table vt_net_gsm_nl as pdata.tb_net_gsm_nl with no data ON COMMIT PRESERVE ROWS;字段col2將

23、用unicode字符集;當(dāng)跟普通字段(latin字符集)join時,需要進(jìn)行數(shù)據(jù)重新分布。不建議失?。阂驗閜data.tb_net_gsm_nl 有PPI而可變臨時表不允許有PPI第15頁,共81頁。固化臨時表固化臨時表,就是把查詢結(jié)果存放到一張物理表。共下次分析或他人使用Session斷開之后,仍然可以使用。示例1: CREATE MULTISET TABLE tttemp.TMP_BOSS_VOIC as ( select * from pview.vw_net_gsm_nl) WITH no DATA PRIMARY INDEX (subs_id);INSERT INTO tttemp.

24、TMP_BOSS_VOICSELECT * FROM pview.vw_net_gsm_nl WHERE *;示例2:CREATE MULTISET TABLE tttemp.TMP_BOSS_VOIC as ( select * from pview.vw_net_gsm_nl WHERE *) WITH DATA PRIMARY INDEX (subs_id);示例3:(復(fù)制表,數(shù)據(jù)備份)CREATE MULTISET TABLE tttemp.TMP_BOSS_VOIC AS pdata.tb_net_gsm_nl WITH DATA ;第16頁,共81頁。數(shù)據(jù)類型注意非日期字段與日期字

25、段char & date的轉(zhuǎn)換與關(guān)聯(lián):如果數(shù)據(jù)類型一致可以直接使用;在CASE WHEN or COALESCE一定要使用顯式的類型轉(zhuǎn)換(CAST)CASE WHEN A = B THEN DATE1 ELSE 20061031 END應(yīng)寫成CASE WHEN A = B THEN DATE1 ELSE CAST(20061031 AS DATE) END數(shù)值運算時,確保運算過程中不丟失計算精度。CAST(100/3 AS DEC(5,2)應(yīng)該寫成CAST(100/3.00 AS DEC(5,2)第17頁,共81頁。字符(串)與數(shù)字相比較比較規(guī)則:1) 比較兩個值(字段),它們的類型必須一樣!

26、2) 當(dāng)字符(串)與數(shù)字相比較時,先把字符(串)轉(zhuǎn)換成數(shù)字,再進(jìn)行比較。3) 經(jīng)分系統(tǒng)中容易出錯的,有Cal_Month字段Case 1Table 1CREATE TABLE Emp1 (Emp_noCHAR(6), Emp_nameCHAR(20)PRIMARY INDEX (Emp_no);Statement 1SELECT *FROMEmp1WHEREEmp_no = 1234;Statement 2SELECT *FROMEmp1WHEREEmp_no = 1234;Table 1CREATE TABLE Emp2 (Emp_noINTEGER, Emp_nameCHAR(20)PRI

27、MARY INDEX (Emp_no);Statement 1SELECT *FROMEmp2WHEREEmp_no = 1234;Statement 2SELECT *FROMEmp2WHEREEmp_no = 1234;Case 2Results in Full Table ScanResults in unnecessary conversion 第18頁,共81頁。目標(biāo)列的選擇減少目標(biāo)列,可以少消耗SPOOL空間,從而提高SQL的效率當(dāng)系統(tǒng)任務(wù)繁忙,系統(tǒng)內(nèi)存少的時候,效果尤為明顯。舉例:GSM語言話單表,PDATA.TB_NET_GSM_NL共有73字段,以下SQL供返回1.6億條記錄

28、左邊的SQL,記錄最長為:698字節(jié),平均399字節(jié)右邊的SQL,記錄最長為:59字節(jié), 平均30字節(jié)兩者相差400多GB的SPOOL空間,IO次數(shù)也隨著相差甚大!SPOOL空間估計:497 GBSPOOL空間估計:42 GBSELECT SUBS_ID ,MSISDN ,Begin_Date ,Begin_Time ,Call_DUR ,CHRG_DURFROM PDATA.TB_NET_GSM_NLWHERE PROC_DATE BETWEEN 20070701 AND 20070731 SELECT * FROM PDATA.TB_NET_GSM_NLWHERE PROC_DATE BE

29、TWEEN 20070701 AND 20070731第19頁,共81頁。Where條件的限定根據(jù)Where條件先進(jìn)行過濾數(shù)據(jù)集,再進(jìn)行連接(JOIN)等操作這樣,可以減少參與連接操作的數(shù)據(jù)集大小,從而提高效率好的查詢引擎,可以自動優(yōu)化;但有些復(fù)雜SQL,查詢引擎優(yōu)化得并不好。注意:系統(tǒng)的SQL優(yōu)化,只是避免最差的,選擇相對優(yōu)的,未必能夠得到最好的優(yōu)化結(jié)果。SELECT A.TX_DATE, A.KPI_CODE ,B.SRC_NAME,A.KPI_VALUEFROM ( select * from qdata.tb_dqc_kpi_check_result where TX_DATE = 2

30、0070701 AND KPI_CODE = 65 ) ALEFT JOIN ( SELECT * FROM qdata.tb_dqc_kpi_def where KPI_CODE = 65 and N_TYPE = M) BON A.KPI_CODE = B.KPI_CODE SELECT A.TX_DATE, A.KPI_CODE ,coalesce(B.SRC_NAME, no name) ,A.KPI_VALUEFROM qdata.tb_dqc_kpi_check_result ALEFT JOIN qdata.tb_dqc_kpi_def BON A.KPI_CODE = B.KP

31、I_CODE WHERE A. TX_DATE = 20070701 AND A.KPI_CODE = 65 AND B.N_TYPE = M rewrite第20頁,共81頁。用Case When替代UNION sel city_id,channel_id,cust_brand_id,sum(stat_values) as stat_valuesfrom ( select t.city_id 語音雜志計費量 ,coalesce(v.channel_id,b.channel_id,-) as channel_id ,cust_brand_id ,sum(case when SMS_SVC_Ty

32、pe_Level_SECND = 017 and Call_Type_Code in (00,10,01,11) then sms_quan else 0 END) as stat_values from PVIEW.vw_mid_sms_svc_quan_daily t left join VT_SUBS v on t.subs_id=v.subs_id left join PVIEW.vw_FCT_CDE_BUSN_CITY_TYPE b on t.city_id=b.City_ID where cal_date=20070914 group by 1,2,3 union all sele

33、ct t.city_id 夢網(wǎng)短信計費量 ,coalesce(v.channel_id,b.channel_id,-) as channel_id ,cust_brand_id ,sum(sms_quan) as stat_values from PVIEW.vw_mid_sms_svc_quan_daily t left join VT_SUBS v on t.subs_id=v.subs_id left join PVIEW.vw_FCT_CDE_BUSN_CITY_TYPE b on t.city_id=b.City_ID where cal_date=20070914 and SMS_

34、SVC_Type_Level_SECND like 02% and SMS_SVC_Type_Level_SECND not in (021,022) group by 1,2,3 )tmpGroup by 1,2,3兩個子查詢的表連接部分完全一樣兩個子查詢除了取數(shù)據(jù)條件,其它都一樣。Union all是多余的,它需要重復(fù)掃描數(shù)據(jù),進(jìn)行重復(fù)的JOIN可以用Case when替代union 作業(yè):KPI_NWR_SMS_BILL_QUAN描述:點對點短信計費量腳本: kpi_nwr_sms_bill_quan0600.pl第21頁,共81頁。用Case When替代UNION (cont.)se

35、l city_id,channel_id,cust_brand_id,sum(stat_values) as stat_valuesfrom ( select t.city_id ,coalesce(v.channel_id,b.channel_id,-) as channel_id ,cust_brand_id ,sum(CASE WHEN SMS_SVC_Type_Level_SECND = 017 and Call_Type_Code in (00,10,01,11) THEN sms_quan 語音雜志計費量 WHEN SMS_SVC_Type_Level_SECND like 02%

36、 and SMS_SVC_Type_Level_SECND not in (021,022) THEN sms_quan 夢網(wǎng)短信計費量 ELSE 0 END ) as stat_values from PVIEW.vw_mid_sms_svc_quan_daily t left join VT_SUBS v on t.subs_id=v.subs_id left join PVIEW.vw_FCT_CDE_BUSN_CITY_TYPE b on t.city_id=b.City_ID where cal_date=20070914)tmpGroup by 1,2,3SQL優(yōu)化重寫第22頁,共

37、81頁。用OR替代UNION Select city_id , channel_id, cust_brand_id, sum(sms_quan ) stat_valuesfrom( select t.city_id 語音雜志計費量 ,coalesce(v.channel_id,b.channel_id,-) as channel_id ,cust_brand_id ,sum(sms_quan ) stat_values from PVIEW.vw_mid_sms_svc_quan_daily t left join VT_SUBS v on t.subs_id=v.subs_id left j

38、oin PVIEW.vw_FCT_CDE_BUSN_CITY_TYPE b on t.city_id=b.City_ID where cal_date=20070914 and SMS_SVC_Type_Level_SECND = 017 and Call_Type_Code in (00,10,01,11) group by 1,2,3 union all select t.city_id 夢網(wǎng)短信計費量 ,coalesce(v.channel_id,b.channel_id,-) as channel_id ,cust_brand_id ,sum(sms_quan) as stat_val

39、ues from PVIEW.vw_mid_sms_svc_quan_daily t left join VT_SUBS v on t.subs_id=v.subs_id left join PVIEW.vw_FCT_CDE_BUSN_CITY_TYPE b on t.city_id=b.City_ID where cal_date=20070914 and SMS_SVC_Type_Level_SECND like 02% and SMS_SVC_Type_Level_SECND not in (021,022) group by 1,2,3 )T Group by 1,2,3兩個子查詢的表

40、連接部分完全一樣兩個子查詢除了取數(shù)據(jù)條件,其它都一樣。Union all是多余的,它需要重復(fù)掃描數(shù)據(jù),進(jìn)行重復(fù)的JOIN可以用OR替代union 此類的問題,在腳本中經(jīng)常見到。第23頁,共81頁。用OR替代UNION (cont.)select t.city_id ,coalesce(v.channel_id,b.channel_id,-) as channel_id ,cust_brand_id ,sum( sms_quan) as stat_values from PVIEW.vw_mid_sms_svc_quan_daily t left join VT_SUBS v on t.subs

41、_id=v.subs_id left join PVIEW.vw_FCT_CDE_BUSN_CITY_TYPE b on t.city_id=b.City_ID where cal_date=20070914 and ( SMS_SVC_Type_Level_SECND = 017 語音雜志計費量 and Call_Type_Code in (00,10,01,11) ) OR (SMS_SVC_Type_Level_SECND like 02% 夢網(wǎng)短信計費量 and SMS_SVC_Type_Level_SECND not in (021,022) ) )Group by 1,2,3SQL

42、優(yōu)化重寫第24頁,共81頁。去掉多余的Distinct與Group by sel t.operator ,t.acpt_channel_id ,t.acpt_city_id ,t.subs_id ,t.acpt_date as evt_date From ( sel operator, ACPT_Channel_ID, acpt_city_id,subs_id, acpt_date from pview.vw_evt_cust_so cust where acpt_date =20071007 and so_meth_code in(0,1,2) and PROC_STS_Code =-1 g

43、roup by 1,2,3,4,5union all sel operator_num as operator, ACPT_Channel_ID, acpt_city_id, subs.subs_id, charge_date as acpt_date from pview.vw_fin_busi_rec bus join crmmart.subs_day_info_daily subs on subs.msisdn=bus.msisdn where charge_date =20071007 group by 1,2,3,4,5 )t group by 1,2,3,4,5;既然t查詢外層有g(shù)

44、roup by操作去重,那么子查詢內(nèi)的Group by去重是多余的。而且,兩個子查詢group by后再用union all,就可能再產(chǎn)生重復(fù)記錄,那么group by也失去意義了。解決方法: 把t查詢內(nèi)部的兩個group by去掉即可 類似的Distinct問題,可效仿解決。去重去重去重第25頁,共81頁。Group by vs. DistinctDistinct是去除重復(fù)的操作Group by是聚集操作某些情況下,兩者可以起到相同的作用。兩者的執(zhí)行計劃不一樣,效率也不一樣建議:使用Group byselect subs_id ,acct_idfrom PVIEW. VW_FIN_ACCT_

45、SUBS_HISwhere efct_date 20070701 group by 1,2select DISTINCT subs_id ,acct_idfrom PVIEW. VW_FIN_ACCT_SUBS_HISwhere efct_date 20070701 第26頁,共81頁。Union vs. Union all Union與Union all的作用是將多個SQL的結(jié)果進(jìn)行合并。Union將自動剔除集合操作中的重復(fù)記錄;需要耗更多資源。Union all則保留重復(fù)記錄,一般建議使用Union all。第一個SELECT語句,決定輸出的字段名稱,標(biāo)題,格式等要求所有的SELECT語句

46、: 1) 必須要有同樣多的表達(dá)式數(shù)目; 2) 相關(guān)表達(dá)式的域必須兼容select *from (select a) T1(col1)unionselect *from (select bc)T2(col2)select *from (select bc)T3(col3)union allselect *from (select a) T1(col1)union allselect *from (select bc)T2(col2)col3abcbccol1ab第27頁,共81頁。先Group by再join腳本:rpt_mart_new_comm_mon0400.pl 11小時Select c

47、ase when b.CUST_Brand_ID is null then 5020 when b.CUST_Brand_ID in(2000,5010) then 5020 else b.CUST_Brand_ID end ,sum(COALESCE(b.Bas_CHRG_DUR_Unit,0) as Thsy_Accum_New_SUBS_CHRG_DUR , sum(case when b.call_type_code =20 then b.Bas_CHRG_DUR_Unit else 0 END) from VTNEW_SUBS_THISYEAR t inner join VTDUR_

48、MON b on t.Subs_ID=b.Subs_ID left join PVIEW.vw_MID_CDE_LONG_CALL_TYPE_LVL c on b.Long_Type_Level_SECND= c.Long_Type_Level_SECND left join PVIEW.vw_MID_CDE_ROAM_TYPE_LVL d on b.Roam_Type_Level_SECND= d.Roam_Type_Level_SECND group by 1;記錄數(shù)情況:t: 580萬,b: 9400萬, c:8, d:8 主要問題:假如連接順序為: ( (b join c) join

49、d) join t)則是( (9400萬 join 8) join 8) join 580萬)數(shù)據(jù)分布時間長(IO多),連接次數(shù)多解決方法: 先執(zhí)行(t join b),然后groupby,再join c,d第28頁,共81頁。先Group by再join (cont.)腳本:rpt_mart_new_comm_mon0400.pl40秒Select case when b.CUST_Brand_ID is null then 5020 when b.CUST_Brand_ID in(2000,5010) then 5020 else b.CUST_Brand_ID end ,sum(COAL

50、ESCE(b.Bas_CHRG_DUR_Unit,0) as Thsy_Accum_New_SUBS_CHRG_DUR , sum(case when b.call_type_code =20 then b.Bas_CHRG_DUR_Unit else 0 END) from (select CUST_Brand_ID, call_type_code, Long_Type_Level_SECND, Roam_Type_Level_SECND, sum(Bas_CHRG_DUR_Unit) Bas_CHRG_DUR_Unit, count(*) quan from VTDUR_MON where

51、 subs_id in (select subs_id from VTNEW_SUBS_THISYEAR) group by 1,2,3,4 )b left join PVIEW.vw_MID_CDE_LONG_CALL_TYPE_LVL c on b.Long_Type_Level_SECND= c.Long_Type_Level_SECND left join PVIEW.vw_MID_CDE_ROAM_TYPE_LVL d on b.Roam_Type_Level_SECND= d.Roam_Type_Level_SECND group by 1;記錄數(shù)情況:t: 580萬,b: 940

52、0萬, c:8, d:8 處理過程: 先執(zhí)行(t join b),然后groupby,再join c,d結(jié)果:1、 VTDUR_MON join VTNEW_SUBS_THISYEAR PI相同,merge join,只需10秒2、經(jīng)過group by,b表只有332記錄3、b join c join d, 就是: 332 8 84、最終結(jié)果:5記錄,共40秒第29頁,共81頁。先Group by再join(cont.)先匯總再連接,可以減少參與連接的數(shù)據(jù)集大小,減少比較次數(shù),從而提高效率。以下面SQL為例,假設(shè)歷史表( History )有1億條記錄左邊的SQL,需要進(jìn)行 1億 90次比較右

53、邊的SQL,則只需要 1億 1 次比較SELECT H.product_id ,sum(H.account_num)FROM History H , Calendar DTWHERE H.sale_date = DT.calendar_date AND DT.quarter = 3 GROUP BY 1;SELECT H.product_id, SUM(H.account_num)FROM History H , (SELECT min(calendar_date) min_date ,max(calendar_date) max_date FROM Calendar WHERE quarte

54、r = 3 ) DT WHERE H.sale_date BETWEEN DT.min_date and DT.max_dateGROUP BY 1;第30頁,共81頁。提取公共SQL形成臨時表腳本:rpt_nmmart_comm_subs_mon0403.pl出現(xiàn)以下SQL代碼段,共5次,平均每次執(zhí)行需10分鐘 。 FROM PVIEW.VW_MID_VOIC_SVC_QUAN_MON a ,PVIEW.VW_MID_CDE_SUBS_BRAND_LVL b ,vt_subs c WHERE a.CUST_Brand_ID=b.SUBS_Brand_Level_Third AND a.CAL

55、_Month=200708 AND a.SUBS_ID=c.SUBS_ID 。整個腳本需要掃描以下SQL 14次,平均每次執(zhí)行需3分鐘PVIEW.VW_MID_VOIC_SVC_QUAN_MON where CAL_Month=200708提取公共SQL,形成臨時表,較少掃描(IO)次數(shù)。該腳本,經(jīng)過優(yōu)化之后,從50分鐘縮減至10分鐘第31頁,共81頁。關(guān)聯(lián)條件 (1)Select A.a2, B.b2 from A join Bon substring(A.a1 from 1 for 7) = B.b1應(yīng)該寫為Select A.a2, B.b2 from (select substring(

56、a1 from 1 for 7) as a1_new,a2 from A ) A_newjoin Bon a1_new = b1第32頁,共81頁。關(guān)聯(lián)條件 (2)Select A.a2, B.b2 from A join Bon TRIM(A.a1 ) = TRIM(B.b1)應(yīng)該寫為Select A.a2, B.b2 from A join Bon A.a1 = B.b1第33頁,共81頁。SQL書寫不當(dāng)可能會引起笛卡兒積以下面兩個SQL為例,它們將進(jìn)行笛卡兒積操作。例子1:Select employee.emp_no , employee.emp_nameFrom employee A例

57、子2:SELECT A.EMP_Name, B.Dept_NameFROM employee A, Department BWhere a.dept_no = b.dept_no;表Employee與表A進(jìn)行笛卡兒積表A與表B進(jìn)行笛卡兒積表A與表B進(jìn)行Inner Join第34頁,共81頁。修改表定義常見的表定義修改操作:增加字段修改字段長度建議的操作流程Rename table db.tablex as db.tabley;通過Show table語句獲得原表db.tablex的定義定義新表: db.tablexInsert into db.tablex(。) select 。 From d

58、b.tabley;Drop table db.tabley;Teradata提供ALTER TABLE語句,可進(jìn)行修改表定義但,不建議采用ALTER TABLE方式。第35頁,共81頁。插入/更新/刪除記錄時,盡量不要Abort當(dāng)目標(biāo)表有數(shù)據(jù)時,插入和更新操作,以及部分刪除,都產(chǎn)生TJ如果此時abort該操作,系統(tǒng)將會回滾 Delete BASS1.tb_03004 where proc_dt = 200709 ;UPDATE Customer SET Credit_Limit = Credit_Limit * 1.20 ; DELETE FROM Trans WHERE Trans_Date

59、 981231;DROP TABLE Trans;RENAME TABLE Trans_N TO Trans;先建立空表,通過insert / select 方式插入數(shù)據(jù)這是非??斓牟僮?!先備份,然后做變更操作,更加安全!對于大表進(jìn)行Update/DELETE操作,將耗費相當(dāng)多的資源與相當(dāng)長的時間。Update/Delete操作,需要事務(wù)日志TJ(Transient Journal)以防意外中斷導(dǎo)致數(shù)據(jù)受到破壞在Update/Delete操作中途被Cancel,系統(tǒng)則需回滾,這將耗更多的資源與時間!在經(jīng)分系統(tǒng)中,應(yīng)嚴(yán)防此類事件發(fā)生!DELETE FROM Trans WHERE Trans_D

60、ate 990101; 第37頁,共81頁。經(jīng)分系統(tǒng)的實體命名規(guī)范實體的命名,最長不超過30個字母;通常要求都是大寫。實體的命名:_后綴前綴:表:基礎(chǔ)表以TB_開頭中間表以MID_開頭應(yīng)用模塊的表以相應(yīng)的主體縮寫開頭視圖:一般地,視圖名稱與表名稱一一對應(yīng)。以VW_開頭。對于TB_開頭的表,把TB_替換成VW;對于其他表,加上VW_即可。宏:以M_或者M(jìn)acro_開頭后綴:歷史表:_HIS月表:_MON日表:_DAILY第38頁,共81頁。實體的命名規(guī)范示例TB_OFR_SUBS_HIS 用戶歷史Efct_date, End_date示例:Select *From pview.vw_ofr_su

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論