第6章創(chuàng)建數(shù)據(jù)集

上傳人：扣*** IP屬地：寧夏上傳時(shí)間：2021-05-10 格式：PPT 頁數(shù)：89 大?。?41KB 積分：22 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩84頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、第6章創(chuàng)建數(shù)據(jù)集蔡德利 tsaid 黑龍江八一農(nóng)墾大學(xué)植物科技學(xué)院 statistical analysis systemstatistical analysis system 本章主要內(nèi)容 n概述 ndata step n加工數(shù)據(jù)集 nproc step statistical analysis systemstatistical analysis system 概述 nsas對(duì)數(shù)據(jù)的管理 n通常，有兩種： nsassas數(shù)據(jù)集（數(shù)據(jù)集（data setdata set） nsassas數(shù)據(jù)視圖（數(shù)據(jù)視圖（data viewdata view） statistical analysis

2、systemstatistical analysis system 概述 statistical analysis systemstatistical analysis system 概述 nsas數(shù)據(jù)視圖只有描述部分，沒有數(shù)據(jù) 部分，但描述部分包含了足夠的信息以找到保存在其他文件中的數(shù)據(jù)； n數(shù)據(jù)視圖減少了維護(hù)費(fèi)用。 n如果改變了源數(shù)據(jù)，數(shù)據(jù)視圖將隨著改變。 n可由sql、access和data step產(chǎn)生。 statistical analysis systemstatistical analysis system 概述 nlibname命令 n作用：指定庫標(biāo)記。 n一般格式： li

3、bnamelibname 庫標(biāo)記文件夾位置選項(xiàng); n例如：指定目錄“g:sas統(tǒng)計(jì)分析sas”為庫標(biāo)記mysaslib libnamelibname mysaslib g:sas統(tǒng)計(jì)分析 sas; statistical analysis systemstatistical analysis system 概述 nsas語言是一種專用的數(shù)據(jù)管理、分析語言，它提供了很強(qiáng)的數(shù)據(jù)操作能力。 n表現(xiàn)在 n它可以輕易地讀入任意復(fù)雜格式的輸入數(shù)據(jù)，并可以對(duì)輸入的數(shù)據(jù)進(jìn)行計(jì)算、子集選擇、更新、合并、拆分等操作。 n另外，sas系統(tǒng)還提供了用來訪問其它數(shù)據(jù)庫系統(tǒng) 如sybase、oracle的接口

4、，訪問各種微機(jī)用數(shù)據(jù)庫文件如foxpro、excel的接口及向?qū)?，并提供了?個(gè)sql過程來實(shí)現(xiàn)數(shù)據(jù)庫查詢語言sql的功能。 statistical analysis systemstatistical analysis system 概述 nsas語言的自編程計(jì)算功能主要在數(shù)據(jù)步實(shí)現(xiàn)。 n一個(gè)sas數(shù)據(jù)步相當(dāng)于一個(gè)單獨(dú)運(yùn)行的程序。 n但是，sas語言又是一個(gè)專用數(shù)據(jù)處理語言，所以sas數(shù)據(jù)步有其他語言所沒有的特點(diǎn)。 statistical analysis systemstatistical analysis system 概述 n例如 /* sasprog0601.sas */

5、data a; put x= y= z=; input x y; z=x+y; put x= y= z=; cards; 10 20 100 200 ; run; statistical analysis systemstatistical analysis system 概述 n運(yùn)行后在log窗口顯示 x=. y=. z=. x=10 y=20 z=30 x=. y=. z=. x=100 y=200 z=300 x=. y=. z=. note: the data set work.a has 2 observations and 3 variables. statistical anal

6、ysis systemstatistical analysis system 概述 n程序的運(yùn)行流程： 1. data語句標(biāo)志了數(shù)據(jù)步開始，并指定了數(shù)據(jù)步結(jié) 束時(shí)要生成的數(shù)據(jù)集名字為a（實(shí)際是work.a）。 2. 第一個(gè)put語句要輸出變量x、y、z的值, 但它們還都沒有定義，所以顯示為三個(gè)缺失值。 3. input語句，從cards語句后面的數(shù)據(jù)行中讀取變量x的值10，變量y的值20。 4. 下一個(gè)賦值語句計(jì)算變量z的值得到30。因此， log 中的第二行輸出顯示三個(gè)變量的值分別為10、 20、30。 statistical analysis systemstatistical ana

7、lysis system 概述 5. 從cards語句開始到空分號(hào)行的各行是非執(zhí)行的，程序運(yùn)行到run語句，發(fā)現(xiàn)這是本數(shù)據(jù)步的最后一個(gè)語句，按一般的程序語言的規(guī)則，程序到這里就應(yīng)該結(jié)束了，但是，sas是一個(gè)專用數(shù)據(jù)處理語言，如果按一般語言的規(guī)則，程序中的第二行數(shù)據(jù) （100 200）就不能被讀入。所以，這個(gè)程序運(yùn)行到run語句后，把讀入的觀測（這是第一號(hào)觀測）寫入輸出數(shù)據(jù)集， 6. 流程又返回到data語句后的第一個(gè)可執(zhí)行語句開始執(zhí)行，并先把所有的變量置初值為缺失值。于是，第一個(gè)put語句的結(jié)果顯示三個(gè)變量均為缺失值，而不是上一步的10、20、30。 statistical

8、analysis systemstatistical analysis system 概述 7. input語句又從數(shù)據(jù)行中讀入下一個(gè)觀測，把變量 x、y賦值為100、200。讀取位置由運(yùn)行時(shí)設(shè)置的一個(gè)數(shù)據(jù)指針指示。然后計(jì)算變量z的值得300。于是 put語句輸出的x、y、z值分別為100、200、300。 8. 流程跳過cards語句到空語句，到數(shù)據(jù)步結(jié)尾，把第二號(hào)觀測輸出到數(shù)據(jù)集。 9. 再返回到數(shù)據(jù)步開頭，把變量值賦初值為缺失值，所以第一個(gè)put語句輸出的三個(gè)變量值為缺失值。 10. 運(yùn)行到input語句，應(yīng)該讀入下一個(gè)觀測，但是查詢數(shù)據(jù)指針發(fā)現(xiàn)已經(jīng)讀完了所有數(shù)據(jù)，所以本數(shù) 據(jù)步

9、結(jié)束，并把兩個(gè)觀測寫入數(shù)據(jù)集work.a中。 statistical analysis systemstatistical analysis system 概述 n從這個(gè)例子可以看出sas數(shù)據(jù)步程序和普通程序的一個(gè)重大區(qū)別： nsas數(shù)據(jù)步如果有數(shù)據(jù)輸入，比如用input、 set、merge、update、modify等語句讀入數(shù) 據(jù)，則數(shù)據(jù)步中隱含了一個(gè)循環(huán)數(shù)據(jù)步中隱含了一個(gè)循環(huán)。數(shù)據(jù)步因?yàn)橛羞@樣一個(gè)隱含循環(huán)，所以也提供了用來查詢某一步是第幾次循環(huán)的特殊變量 _n_，它的值為數(shù)據(jù)步循環(huán)計(jì)數(shù)值。 statistical analysis systemstatistical analy

10、sis system 概述 ndata step 流程流程： statistical analysis systemstatistical analysis system 概述 statistical analysis systemstatistical analysis system data step n第一種方式語法第一種方式語法: : datadata 數(shù)據(jù)集名; inputinput 變量列表; cardscards; ; data、input和 cards三個(gè)關(guān)鍵詞缺一不可。注意：cards語句在input語句之后 statistical analysis systemsta

11、tistical analysis system data step ndata語句 n作用： n標(biāo)志數(shù)據(jù)步的開始。 n命名將要?jiǎng)?chuàng)建的sas數(shù)據(jù)集。 ncards語句 n作用： n用于直接輸入數(shù)據(jù)，標(biāo)志著數(shù)據(jù)塊的開始 statistical analysis systemstatistical analysis system data step n直接輸入sas數(shù)據(jù)： n/* sasprog0602.sas */ data temp; input x y; cards; 34 56 78 90 35 67 89 10 23 65 77 45 ; run; sas默認(rèn)按列來分隔變量，可是這里只有

12、兩個(gè)變量，輸成兩列數(shù)據(jù)太長，因此加上，sas見到這個(gè)符號(hào)，在按變量名依次讀取完數(shù)據(jù)后，不是跳到下一行，而是繼續(xù)在該行讀數(shù)據(jù)，直至本行結(jié)束或到達(dá) 分號(hào)為止。數(shù)據(jù)集work.temp 輸入變量x,y，的作用是不換行，連續(xù)輸入 statistical analysis systemstatistical analysis system data step n第二種方式語法第二種方式語法: : datadata 數(shù)據(jù)集名; infileinfile 文件名 inputinput 變量列表; 用infile語句指定了一個(gè)外部數(shù)據(jù)文件，所有需要輸入的數(shù)據(jù)存放在該文件中，從而取代了第一種方

13、式中的cards語句及其下列的一連串?dāng)?shù)據(jù)，當(dāng) 數(shù)據(jù)比較多的時(shí)候，用第二種方式可以使程序看上去顯得比較簡潔。注意：infile語句在input語句之前 statistical analysis systemstatistical analysis system data step ninfile語句 n作用： n確定一個(gè)包含原始數(shù)據(jù)的外部文本文件。 n一般格式： infileinfile 外部文件名 options; n選項(xiàng)（options）可以有選擇地讀取外部文件中的記錄： nfirstobs=n1表示從第n1條記錄開始讀取 nobs=n2表示共讀取n2條記錄 statistica

14、l analysis systemstatistical analysis system data step n從外部文件讀入數(shù)據(jù)示例： /* sasprog0603.sas */ data temp; infile g:sas統(tǒng)計(jì)分析sasdatatemp.dat; input x y; run; statistical analysis systemstatistical analysis system data step ninput語句 n作用： n讀入由語句指定的數(shù)據(jù)列。 n為相應(yīng)的數(shù)據(jù)域定義變量名。 n確定變量的讀入模式。 n一般格式： inputinput 設(shè)定1設(shè)定n|; st

15、atistical analysis systemstatistical analysis system data step list：變量名變量名 $：輸入格式：輸入格式 column ：變量名變量名 $始列始列-終列終列 formatted ：指針指針變量名變量名輸入格式輸入格式 ( 指針：指針：n|+n ) named ：變量名變量名=$始列始列-終列終列：不換行等待下一個(gè)：不換行等待下一個(gè)input 語句語句：形成輸出記錄時(shí)輸入也不換行：形成輸出記錄時(shí)輸入也不換行 statistical analysis systemstatistical analysis system da

16、ta step ninput語句- input 變量名變量名 $：輸入格式：輸入格式變量名變量名 $：輸入格式：輸入格式； statistical analysis systemstatistical analysis system data step /* sasprog0604.sas */ data c9501; input name $ sex $ math chinese; cards; 李明男 92 98 張紅藝女 89 106 王思明男 86 90 張聰男 98 109 劉穎女 80 110 ; run; 按順序列出每個(gè)觀測的各個(gè)變量名，中間用空格分開。變量如果是

17、字符型，需要在變量名后加一個(gè)$符號(hào)，$符號(hào)可以與變量直接相連，也可以間隔一個(gè)空格。 statistical analysis systemstatistical analysis system data step /* sasprog0605.sas */ data a; input date yymmdd8. sales; format date yymmdd10.; cards; 56-6-13 1100 67.12.15 1200 78 10 2 1300 891001 1400 19960101 1500 20020901 1600 ; run; proc print; run;

18、需要按格式輸入，可在變量名后加格式名，最常用的是輸入日期數(shù)據(jù)；本例日期占8列，不足用空格補(bǔ)足；日期在sas中是按數(shù)值存儲(chǔ)的，要顯示日期值，必須設(shè)置輸出格式，本例用format語句。 statistical analysis systemstatistical analysis system data step /* sasprog0606.sas */ data b; input date yymmdd10. sales; format date yymmdd10.; cards; 56-6-13 1100 67.12.15 1200 78 10 2 1300 891001 1

19、400 19960101 1500 20020901 1600 1956-6-13 1100 1967.12.15 1200 1978 10 2 1300 19891001 1400 19960101 1500 20020901 1600 ; run; proc print; run; yymmdd六位數(shù)的日期 yyyymmdd帶世紀(jì)格式的日期 yyyymmdd10.帶世紀(jì)的，中間有分隔符或無分隔符的日期 statistical analysis systemstatistical analysis system data step /* sasprog0607.sas */ data b;

20、 input sales date : yymmdd10. ; format date yymmdd10.; put date=; cards; 1100 56-6-13 1200 67.12.15 ; run; 如果日期變量不在第一項(xiàng)，并且與前一項(xiàng)用空白分隔，可以在格式名 yymmdd10.前面加一個(gè)冒號(hào)表示允許日期值前面的空白。 “變量名：格式”表示讀取當(dāng)前第一個(gè)非空列開始的值，并用指定的輸入格式轉(zhuǎn)換。 statistical analysis systemstatistical analysis system data step ninput語句- input 變量名變量名 $始

21、列始列-終列終列變量名變量名 $始列始列-終列終列； statistical analysis systemstatistical analysis system data step n一個(gè)典型原始數(shù)據(jù)文件（一個(gè)典型原始數(shù)據(jù)文件（overseas.datoverseas.dat）內(nèi)容：內(nèi)容： statistical analysis systemstatistical analysis system data step 練習(xí)：練習(xí)：用用columncolumn格式讀入文件格式讀入文件overseas.datoverseas.dat，創(chuàng)創(chuàng) 建建sassas數(shù)據(jù)集名為數(shù)據(jù)集名為mylib.co

22、lumnmylib.column，最后查看該最后查看該數(shù)據(jù)集的內(nèi)容。數(shù)據(jù)集的內(nèi)容。 /* sasprog0608.sas */ data mylib.column; infile e:sas統(tǒng)計(jì)分析sasdataoverseas.dat; input date $ 1-9 dest $ 10-12 boarded 13-15; run; proc print data=mylib.column; run; statistical analysis systemstatistical analysis system data step /* sasprog0609.sas */ data c9

23、502; input name $ 1-7 sex $ 8-9 math 11-12 chinese 14- 16; cards; 李明男 92 98 張紅藝女 89 106 王思明男 86 90 張聰男 98 109 劉穎女 80 110 ; run; 將sasprog0603的例子改為數(shù)據(jù)上下行對(duì)齊，然后用column格式讀入。 statistical analysis systemstatistical analysis system data step /* sasprog0610.sas */ data pids; input year 7-10 mon 11-12 d

24、ay 13-14; cards;110101196902150059 ; run; column不要求數(shù)據(jù)項(xiàng)之間分開，所以經(jīng)常用來輸入緊縮格式的數(shù)據(jù)。比如，輸入一批身份證號(hào)，從中提取生日的年、月、日信息。 statistical analysis systemstatistical analysis system data step ninput語句- input 指針控制符指針控制符變量名變量名 $輸入格式輸入格式； n適合讀入含有不標(biāo)準(zhǔn)數(shù)據(jù)的文件適合讀入含有不標(biāo)準(zhǔn)數(shù)據(jù)的文件 nformatted formatted 格式中可以：格式中可以

25、： n將輸入指針移到數(shù)據(jù)域的開始位置； n定義變量名； n定義輸入格式 statistical analysis systemstatistical analysis system data step n指針控制符： n 表示將輸入指針移動(dòng)到第n列； +n 表示將輸入指針向后移動(dòng)n列。 n如： input lname $15. 21 fname +2 sex $1; statistical analysis systemstatistical analysis system data step 練習(xí)：練習(xí)：用用formattedformatted模式讀入文件模式讀入文件overseas.dat

26、overseas.dat。創(chuàng)建后的創(chuàng)建后的sassas數(shù)據(jù)集名為數(shù)據(jù)集名為mylib.formatmylib.format。 /* sasprog0611.sas */ data mylib.format; infile d:overseas.dat; input 1 date $9. 10 dest $3. 13 boaeded 3.; run; proc print data=mylib.format; run; statistical analysis systemstatistical analysis system data step length 變量名變量名 $長度長度； i

27、nformat 變量名變量名輸入格式輸入格式； format 變量名變量名格式格式； label 變量名變量名=字符串輸入格式字符串輸入格式； statistical analysis systemstatistical analysis system data step 12234.12 12 234， comma8.2 dollar8.2 8.2 12234.12 12234.12 $12 34.56，1234.56dollar8.2 輸出格式輸出格式 8.2 comma8.2 dollar8.2 實(shí)際存儲(chǔ)數(shù)值實(shí)際存儲(chǔ)數(shù)值 12234.12 12234.1 12234 12 23

28、4.1234， $12 234.1234， 12234.1234 輸入格式輸入格式實(shí)際存儲(chǔ)數(shù)值實(shí)際存儲(chǔ)數(shù)值輸入數(shù)據(jù)輸入數(shù)據(jù) 輸出數(shù)據(jù)輸出數(shù)據(jù) statistical analysis systemstatistical analysis system data step 20oct97 ddmmyy8 mmddyy8 輸出數(shù)據(jù)輸出數(shù)據(jù)輸出格式輸出格式 date7 ddmmyy8 實(shí)際存儲(chǔ)數(shù)值實(shí)際存儲(chǔ)數(shù)值 13807 20/10/97 10/20/97 20/10/97 輸入數(shù)據(jù)輸入數(shù)據(jù)輸入格式輸入格式 1960,1,1 0 1960,1,2 1 1960,2,1 31 statistical

29、analysis systemstatistical analysis system data step n變量的屬性：變量的屬性：（1）字符型還是數(shù)值型：input在讀入字符型時(shí)在變量后要加$符號(hào)。（2）變量標(biāo)簽（label）：可以給變量加一個(gè)長度不超過40個(gè)字符（漢字不超過20個(gè)）的標(biāo)簽，標(biāo)簽可以用在報(bào)表中。（3）變量存儲(chǔ)長度（length）：數(shù)值數(shù)據(jù)一般長度為8字節(jié)，也可以對(duì)取值較小的數(shù)值規(guī)定較小的長度以節(jié)省存儲(chǔ)空間；字符型變量的缺省長度為8字節(jié)。 statistical analysis systemstatistical analysis system data s

30、tep n變量的屬性：變量的屬性：（4）變量的輸出格式（format)：指定如何顯示變量的值。（5）變量的輸入格式（informat）：指定如何把外部數(shù)據(jù)轉(zhuǎn)換為sas數(shù)據(jù)。 n數(shù)據(jù)步中的attrib語句可以指定這些屬性，格式為： attrib 變量名屬性名=屬性值; statistical analysis systemstatistical analysis system data step /* sasprog0612.sas */ data sales; attrib name label=姓名 length=$10 date label=日期 format=yymmdd10

31、. informat=mmddyy10. amount label=金額 format=10.2; input name $ 1-10 date amount; cards; 張鵬 10/15/1998 2000 李志明 1/3/99 1500 王敏 11/5/99 3000 ; run; proc print noobs label; run; 可以同時(shí)指定多個(gè)變量的屬性；可以為一個(gè)變量同時(shí)指定多個(gè)屬性。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n用set語句復(fù)制數(shù)據(jù)集復(fù)制數(shù)據(jù)集 /* sasprog

32、0613.sas */ data c9501; set mylib.youth; run; 把mylib.youth復(fù)制為work.c9501 該程序流程也隱含著一個(gè)循環(huán)，其中的set 語句是讀取觀測的語句。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n修改數(shù)據(jù)集數(shù)據(jù)集 /* sasprog0614.sas */ data c9501; set mylib.youth; if height170 then height=170; run; 原數(shù)據(jù)集中趙大可身高172，修改后為170； statistical ana

33、lysis systemstatistical analysis system 加工數(shù)據(jù)集 n使用keep語句指定保留數(shù)據(jù)集中的變量， drop語句指定丟棄數(shù)據(jù)集中的變量。 /* sasprog0615.sas */ data c9501; set mylib.youth; keep name height weight; run; 生成新數(shù)據(jù)集，保留原數(shù)據(jù)集中name、height、 weight三個(gè)變量； statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 /* sasprog0616.sas */ data c9501

34、; set mylib.youth; drop age,sex; run; 丟棄age、sex變量，與上例作用相同；用這種方法可以取得數(shù)據(jù)集的某些列的子集取得數(shù)據(jù)集的某些列的子集。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 /* sasprog0617.sas */ data c9501a; set c9501; if height150 and weight50; run; 也可以取得數(shù)據(jù)集的某些行的子集取得數(shù)據(jù)集的某些行的子集。身高大于150cm，并且體重大于50kg的觀測。這里if語句不同于分支語句，它

35、沒有then部分。用于取出滿足條件的行，形成子集。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集利用if-then結(jié)合delete篩選數(shù)據(jù)集 /* sasprog0618.sas */ data c9501; set c9501; if height150 then delete; run; n程序執(zhí)行后,身高小于150的記錄均被刪除。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n在用set語句引入數(shù)據(jù)集時(shí)，可以給數(shù)據(jù)集加選項(xiàng)，格式為

36、：數(shù)據(jù)集名（數(shù)據(jù)集選項(xiàng)）； n選項(xiàng)包括： nkeep=：表示引入時(shí)只要指定的變量； ndrop=：表示不引入指定的變量； nobs=：表示讀取觀測時(shí)，讀到指定的序號(hào)為止； nfristobs:表示從指定的序號(hào)開始讀起，不讀取序號(hào)之前的觀測。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 /* sasprog0619.sas */ data huge; array x(10); do i=1 to 1000000; do j=1 to 10; x(j) = normal(0); end; output; end; d

37、rop i j; run; data new; set huge(obs=100 keep=x1 x2); run; 產(chǎn)生一個(gè)10個(gè)變量， 1百萬個(gè)觀測的數(shù)據(jù) 集work.huge；從中復(fù)制前100行和兩個(gè)變量，形成新數(shù)據(jù)集work.new。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n拆分?jǐn)?shù)據(jù)集數(shù)據(jù)集 /* sasprog0620.sas */ data datam dataf; set mylib.youth; select(sex); when(男) output datam; when(女) outp

38、ut dataf; otherwise put sex=有錯(cuò); end; drop sex; run; proc print data= datam;run; proc print data= dataf;run; n把mylib.youth數(shù) 據(jù)集中的男生放到 work.datam中，將女生放到 work.dataf中。 noutput是一個(gè)可執(zhí) 行語句，它強(qiáng)制當(dāng) 前觀測寫入到語句指定的數(shù)據(jù)集中。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n縱向合并數(shù)據(jù)集數(shù)據(jù)集 n幾個(gè)結(jié)構(gòu)相同的數(shù)據(jù)集上下地合并在一起； n比

39、如： data classes; set class1 class2 class3 class4; run; classes class1 class2 class3 class4 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 /* sasprog0621.sas */ data new; set datam(in=male) dataf(in=female); if male=1 then sex

40、=男; if female=1 then sex=女; run; 將前例拆分的男生、女生兩個(gè)數(shù)據(jù)集合并；為了指示觀測來自哪一個(gè)小數(shù)據(jù)集，在set語句數(shù)據(jù) 集名后可加一個(gè)小括號(hào)，里面加上“in=變量名”，該變量取1時(shí)表示觀測來自該數(shù)據(jù)集，取0時(shí)不是來自該數(shù)據(jù)集。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n橫向合并數(shù)據(jù)集數(shù)據(jù)集 n兩個(gè)（或多個(gè)）數(shù)據(jù)集如果包含了同樣一些觀測的不同屬性（變量），且各個(gè)數(shù)據(jù)集的觀測順序是一一對(duì)應(yīng)的，則可以通過merge 語句合并到一個(gè)新數(shù)據(jù)集； n比如： data new; me

41、rge c9501u c9501v c9501w; run; statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 n如果順序不對(duì)應(yīng)，則合并結(jié)果不正確。所以橫向合并一般采取按關(guān)鍵字合并的方法。 n即把每個(gè)數(shù)據(jù)集按照相同的、能唯一區(qū)分各個(gè)觀測的一個(gè)（或幾個(gè)）變量排序，然后用 by語句和merge語句，使原來觀測順序不一致或個(gè)數(shù)不同的數(shù)據(jù)集正確合并。 statistical analysis systemstatistical analysis system 加工數(shù)據(jù)集 statistical analysis system

42、statistical analysis system 加工數(shù)據(jù)集 /* sasprog0622.sas */ data c9501x; set mylib.youth; keep name sex; run; data c9501y; set mylib.youth; keep name height weight; run; proc sort data=c9501x; by name; run; proc sort data=c9501y; by name; run; data new; merge c9501x c9501y; by name; run; proc print;run;

43、把mylib.youth拆分為包含name、sex的數(shù)據(jù)集 c9591x；把mylib.youth拆分為包含name、height、 weight的數(shù)據(jù)集c9591y；按關(guān)鍵字橫向合并。 sort是排序過程。 statistical analysis systemstatistical analysis system proc step nsas程序的過程步表示一個(gè)處理過程，如排序、 t檢驗(yàn)、方差分析等等。 n過程步以關(guān)鍵字proc開始，后面緊跟著過程名，用以區(qū)分不同的程序步，并以關(guān)鍵字run結(jié)束。一般格式如下：一般格式如下： proc proc 過程名過程名選項(xiàng)列表；選項(xiàng)列表

44、；（其它語句）；（其它語句）；（其它語句）；（其它語句）； runrun； statistical analysis systemstatistical analysis system proc step nsas程序中涉及的過程多達(dá)數(shù)百種； n實(shí)現(xiàn)統(tǒng)計(jì)功能時(shí)常用的過程也有數(shù) 十種； n現(xiàn)將最為常用的過程名稱及其所能實(shí)現(xiàn)的功能列入下表。 statistical analysis systemstatistical analysis system proc step 過程名功能 sort將指定的數(shù)據(jù)集按指定變量排序 print將數(shù)據(jù)集中的數(shù)據(jù)列表輸出 tabulate將數(shù)據(jù)按照指定的分類

45、變量以表格的形式分類匯總 means對(duì)指定的數(shù)值變量進(jìn)行簡單的統(tǒng)計(jì)描述 freq對(duì)指定的分類變量進(jìn)行簡單的統(tǒng)計(jì)描述 ttest對(duì)指定的變量做t檢驗(yàn) anova對(duì)指定的變量做方差分析 npar1way對(duì)指定的變量做非參數(shù)檢驗(yàn) reg對(duì)指定的變量做回歸分析 corr對(duì)指定的變量做相關(guān)分析 discrim對(duì)指定的變量做判別分析 cluster對(duì)指定的變量做聚類分析 chart繪出低分辨率的統(tǒng)計(jì)圖 statistical analysis systemstatistical analysis system proc step n對(duì)sas文件進(jìn)行操作的datasetsdatasets過程： n功能： n

46、將sas文件從一個(gè)庫中拷入另一個(gè)庫中； n對(duì)sas文件進(jìn)行重命名； n修復(fù)損壞的sas文件； n刪除sas文件； n列出某一sas庫中所有的sas文件； n列出一個(gè)sas數(shù)據(jù)集的屬性，如最后修改時(shí)間、數(shù)據(jù)是否壓縮、數(shù)據(jù)是否索引等； n對(duì)sas文件進(jìn)行設(shè)置密碼的操作； n向sas數(shù)據(jù)集添加記錄； n對(duì)sas數(shù)據(jù)集的屬性以及數(shù)據(jù)集內(nèi)變量的屬性進(jìn)行修改； n創(chuàng)建或刪除sas數(shù)據(jù)集的索引； n創(chuàng)建并管理sas數(shù)據(jù)集的核查文件； n創(chuàng)建或刪除sas數(shù)據(jù)集的完整性規(guī)則。 statistical analysis systemstatistical analysis system proc step nda

47、tasets格式： proc datasets ； age 當(dāng)前文件名相關(guān)文件名列表； append base=數(shù)據(jù)集名； audit 文件名； initiate； change 舊文件名1=新文件名1 ； contents ； copy out=庫標(biāo)記； exclude 文件名；（該語句只能在copy語句后出現(xiàn)，不能和select語句同時(shí)出現(xiàn)） select 文件名；（該語句只能在copy語句后出現(xiàn)，不能和exclude語句同時(shí)出現(xiàn)） delete 文件名； exchange文件名1=交換文件名1 ； modify 文件名；； repair文件名； save文件名； ru

48、n； statistical analysis systemstatistical analysis system proc step n常用功能說明： nage語句用于批量地重命名文件，按照當(dāng)前文件和相關(guān)文件的排列順序，依次將后一個(gè)文件名重命名給前一個(gè)文件，結(jié)果是最后一個(gè)文件被刪除，當(dāng)前文件名被廢棄。 proc datasets library=daily; age today day1-day7; run; n以上程序?qū)?shù)據(jù)庫daily中的數(shù)據(jù)集today改名為 day1，day1改名為day2，day2改名為day3，原來的最后一個(gè)文件名day7被丟棄。 statistical

49、analysis systemstatistical analysis system proc step n常用功能說明： nappend語句執(zhí)行向數(shù)據(jù)集添加記錄的功能，選項(xiàng)”base=數(shù)據(jù)集名”用以指定要添加記錄的數(shù)據(jù) 集，”data=數(shù)據(jù)集名”則指定所要添加的記錄所在的數(shù)據(jù)集，此選項(xiàng)若省略則默認(rèn)為當(dāng)前數(shù)據(jù)集（最近一次操作的數(shù)據(jù)集）。 proc datasets; append base=d1 data=d2 force; run; n以上程序?qū)2中的數(shù)據(jù)縱向合并到d1，如果d2中有 d1中不存在的變量，則相應(yīng)數(shù)據(jù)會(huì)被舍棄。 statistical analysis systemst

50、atistical analysis system proc step n常用功能說明： naudit語句用于對(duì)文件的核查，生成核查文件并對(duì)其進(jìn)行管理； nchange語句以新文件名替換舊文件名； ncontents語句用于顯示指定數(shù)據(jù)集或當(dāng)前數(shù) 據(jù)集的各種屬性； ncopy語句用于將當(dāng)前庫中相應(yīng)的文件拷貝到指定的庫中，選項(xiàng)“out=庫標(biāo)記”用來指定文件要拷貝到的目標(biāo)庫； statistical analysis systemstatistical analysis system proc step n常用功能說明： ndelete語句用于刪除指定的文件； nexchange語句的功能是

51、將等號(hào)前后兩個(gè)文件的文件名進(jìn)行互換； nmodify語句用于修改文件各方面的屬性； nrepair語句用于對(duì)指定的文件（受到過某種損壞）進(jìn)行修復(fù)，使其恢復(fù)到可以使用的狀態(tài)； nsave語句的功能是將其指定的文件保留，當(dāng)前庫中的其他所有文件則被刪除。 statistical analysis systemstatistical analysis system proc step 選項(xiàng)含義及用法 alter=轉(zhuǎn)換保護(hù)密碼 sas文件設(shè)置有轉(zhuǎn)換操作密碼時(shí)用以驗(yàn)證操作的合法性，密碼正確時(shí)代碼才會(huì)被執(zhí) 行 details/nodetails控制有關(guān)sas文件的詳細(xì)信息顯示與否，前者為顯示，后者不

52、顯示，默認(rèn)值為后者 force此選項(xiàng)具有兩個(gè)功能：（1）在過程步的語句存在錯(cuò)誤時(shí)仍然強(qiáng)制程序的執(zhí)行；（2）在append語句中，兩個(gè)數(shù)據(jù)集的變量不完全相同時(shí)仍然強(qiáng)制append語句的執(zhí) 行。 gennum=控制對(duì)衍生數(shù)據(jù)集的處理方式，等號(hào)后可為all, hist, revert或某一整數(shù) kill此選項(xiàng)表示刪除待處理的庫中的所有文件，應(yīng)慎用 library=庫標(biāo)記用以指定所要處理的庫 memtype=成員類型指定處理所針對(duì)的庫成員類型（文件類型），默認(rèn)值為all（所有類型） nolist在日志文件中禁止對(duì)所處理文件目錄的顯示 nowarn在語句中指定的文件不存在等情況下，禁止顯示出錯(cuò)信息，強(qiáng)

53、制程序繼續(xù)執(zhí)行 pw=操作密碼sas文件設(shè)置操作密碼時(shí)驗(yàn)證操作的合法性（包括讀、寫保護(hù)以及轉(zhuǎn)換保護(hù)的文件） read=讀保護(hù)密碼sas文件設(shè)置讀保護(hù)密碼時(shí)驗(yàn)證操作的合法性 statistical analysis systemstatistical analysis system proc step n對(duì)數(shù)據(jù)文件中記錄進(jìn)行排序的sortsort過程： n一般格式： proc sort 選項(xiàng)列表； by 變量名1 ； run； nby語句即用以指定排序所要依據(jù)的變量，可為數(shù)值型也可為字符型。 n多個(gè)變量時(shí)，先按靠前的變量排序，再按照靠后的變量排序。 nby語句中每個(gè)變量前可用descendin

54、g/ascending選項(xiàng)來指定按照其排序的方式（降序或升序），默認(rèn)狀態(tài)為升序。 nproc sort語句后各選項(xiàng)含義及其用法見下表。 statistical analysis systemstatistical analysis system proc step 選項(xiàng)含義及用法 data=數(shù)據(jù)集名用以指定sort過程所要處理的數(shù)據(jù)集，若省略則默認(rèn)為最近建立或處理的數(shù)據(jù)集 datecopy此選項(xiàng)指定在不改變文件創(chuàng)建日期和修改日期的條件下對(duì)文件進(jìn)行排序操作 out=數(shù)據(jù)集名將排序后文件以指定的文件名存儲(chǔ)，原文件不進(jìn)行任何修改，若無此選項(xiàng) 則將原文件覆蓋 sortseq=排序依據(jù)指定對(duì)字符型

55、變量排序時(shí)依據(jù)的標(biāo)準(zhǔn) reverse/equals/ noequals 指定輸出數(shù)據(jù)中的排序方式，三者分別表示將字符變量的次序翻轉(zhuǎn)顯示，在排序變量的各水平內(nèi)部次序保持不變，在排序變量的各水平內(nèi)部允許次序的改變 nodupkey/ noduprecs 指定重復(fù)變量的消除方式，前者表示除去排序變量值重復(fù)的記錄，后者表示除去所有變量值重復(fù)的記錄 sortsize=用以指定可用最大內(nèi)存的大小，等號(hào)后為表示內(nèi)存大小的數(shù)值及單位，比如10m force用以強(qiáng)制執(zhí)行重復(fù)排序（對(duì)已建立索引的文件排序）過程 tagsort指定在臨時(shí)文件中僅存儲(chǔ)排序變量和記錄編號(hào)，以減少對(duì)磁盤空間的使用 statisti

56、cal analysis systemstatistical analysis system proc step n將數(shù)據(jù)文件輸出顯示的printprint過程 n功能： n將sas數(shù)據(jù)集的記錄以一定的方式顯示到輸出設(shè)備（顯示屏）； n可以顯示其全部的變量或部分變量； n可以創(chuàng)建從簡單列表到可進(jìn)行數(shù)據(jù)匯總的各種報(bào)告的各種不同的表單。 statistical analysis systemstatistical analysis system proc step nprintprint的一般格式 proc print options； by var-1 ； pageby variable；

57、sumby variable ； id variable ； sum variable； var variable； run； statistical analysis systemstatistical analysis system proc step nby語句在所有過程中的用法都相同，即將數(shù)據(jù)集分割為若干小數(shù) 據(jù)集分別進(jìn)行處理。 npageby語句用來控制換頁時(shí)變量的顯示方式，對(duì)于其后所指定的變量，相同的值不會(huì)顯示在不同的頁中，該變量某一值的記錄在一頁的剩余部分顯示不下時(shí)，則從該值的第一條記錄開始換行顯示。 nsumby語句的作用和pageby語句相似，只不過是將換頁的動(dòng)作換為

58、求和，對(duì)指定變量的每一值計(jì)算var變量的總計(jì)值。 nid語句的作用是用指定的變量值代替記錄編號(hào)對(duì)每一條記錄進(jìn)行標(biāo)識(shí)。 nsum語句用于指定報(bào)告中要進(jìn)行求和操作的變量。 nvar語句用于指定要在報(bào)告中顯示的變量。 statistical analysis systemstatistical analysis system proc step 選項(xiàng)含義及用法 contents=文本用以指定html內(nèi)容文件中指向輸出的鏈接的標(biāo)識(shí)文本，等號(hào)后可為任何文本 data=數(shù)據(jù)集名指定所要處理的數(shù)據(jù)集，等號(hào)后為數(shù)據(jù)集文件名 double指定在相鄰的記錄間插入一空行 n=字符串在報(bào)告的末尾或by變量各水平分組

59、的末尾顯示顯示記錄的數(shù)目，并以等號(hào)后的字符串對(duì) 其進(jìn)行標(biāo)識(shí) noobs禁止記錄編號(hào)在報(bào)告中的顯示 obs=列標(biāo)題用以指定記錄編號(hào)所在列的列標(biāo)題 round對(duì)未進(jìn)行格式化的數(shù)值變量進(jìn)行四舍五入，統(tǒng)一格式化為帶兩位小數(shù)的十進(jìn)制數(shù)值 rows=page規(guī)定頁面格式，目前page是此選項(xiàng)唯一可用的值，表示在每一頁中只顯示一條記錄的一行變量值，即一行中顯示盡可能多的記錄數(shù) width=列寬度指定列的寬度，可取的值有full,minimum,uniform,uniformby等 heading=方向取值可為v（vertical）或h（horizontal），表示列標(biāo)題顯示的方向（橫向或縱向） label

60、指定以變量標(biāo)識(shí)作為相應(yīng)的列標(biāo)題，否則以變量名作為列標(biāo)題 split=字符首先此選項(xiàng)指定以變量標(biāo)識(shí)作為列標(biāo)題，以指定的字符作為列標(biāo)題換行的標(biāo)志 style=類型元素指定報(bào)告中特定位置所要應(yīng)用的類型元素（涉及很多內(nèi)容，詳細(xì)內(nèi)容略） statistical analysis systemstatistical analysis system 過程步常用語句 nvar 語句語句 nvar var 語句在很多過程中用來指定分析變量。語句在很多過程中用來指定分析變量。 n在在var var 后面給出變量列表：后面給出變量列表： var var 變量名變量名1 1 變量名變量名2 2 變量名變量名n n;

人人文庫> 全部分類> 應(yīng)用文書 > 事務(wù)文書

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

第6章創(chuàng)建數(shù)據(jù)集

文檔簡介

溫馨提示

最新文檔

評(píng)論

第6章創(chuàng)建數(shù)據(jù)集

文檔簡介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔