




已閱讀5頁(yè),還剩9頁(yè)未讀, 繼續(xù)免費(fèi)閱讀
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
湖南大學(xué)課程實(shí)驗(yàn)報(bào)告 課 程 名 稱: 計(jì)算機(jī)組成與結(jié)構(gòu) 實(shí)驗(yàn)項(xiàng)目名稱: perflab 專 業(yè) 班 級(jí): 姓 名: 學(xué) 號(hào): 指 導(dǎo) 教 師: 完 成 時(shí) 間: 2015 年 05 月 22 日計(jì)算機(jī)科學(xué)與工程系實(shí)驗(yàn)題目:程序性能調(diào)優(yōu)實(shí)驗(yàn)實(shí)驗(yàn)?zāi)康模簁ernel.c文件中主要有兩個(gè)需要進(jìn)行優(yōu)化的函數(shù):rotate和smooth,并分別給出了naive_rotate和naive_smooth兩個(gè)函數(shù)的基本實(shí)現(xiàn)作為baseline作為你改進(jìn)后的程序的比較對(duì)象。你需要讀懂rotate和smooth函數(shù),并對(duì)其進(jìn)行優(yōu)化。你每寫一個(gè)新版本的、優(yōu)化的rotate和smooth函數(shù),均可在成注冊(cè)后使用driver進(jìn)行測(cè)試,并得到對(duì)應(yīng)的CPE和加速比。本次實(shí)驗(yàn),要求針對(duì)每個(gè)函數(shù)、每個(gè)人均至少寫出3種優(yōu)化版本、并根據(jù)driver報(bào)告的結(jié)果進(jìn)行性能分析。實(shí)驗(yàn)環(huán)境:Vmware虛擬機(jī) ubuntu12.04 linux終端實(shí)驗(yàn)步驟和結(jié)果分析:函數(shù)源碼:rotate函數(shù):void naive_rotate(int dim, pixel *src, pixel *dst) int i, j; for (i = 0; i dim; i+)for (j = 0; j dim; j+) dstRIDX(dim-1-j, i, dim) = srcRIDX(i, j, dim);rotate函數(shù)的作用是通過(guò)將每個(gè)像素進(jìn)行行列調(diào)位,將一副點(diǎn)陣圖像進(jìn)行90度旋轉(zhuǎn)。其中RIDX(i,j,n)即(i)*(n)+(j)。函數(shù)缺點(diǎn)為程序局部性不好,循環(huán)次數(shù)過(guò)多??梢詫?duì)其進(jìn)行分塊來(lái)提高空間局部性,也可以進(jìn)行循環(huán)展開(kāi)。smooth函數(shù):void naive_smooth(int dim, pixel *src, pixel *dst) int i, j; for (i = 0; i dim; i+)for (j = 0; j dim; j+) dstRIDX(i, j, dim) = avg(dim, i, j, src);smooth函數(shù)的作用是通過(guò)對(duì)圖像每幾點(diǎn)像素求平均值來(lái)對(duì)圖像進(jìn)行模糊化處理。函數(shù)缺點(diǎn)是循環(huán)次數(shù)過(guò)多和頻繁調(diào)用avg函數(shù),avg函數(shù)中又包含許多函數(shù)。應(yīng)該減少avg函數(shù)的調(diào)用次數(shù),且進(jìn)行循環(huán)展開(kāi)。第一種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i,j,ii,jj; for(ii=0;iidim;ii+=4) for(jj=0;jjdim;jj+=4) for(i=ii;iii+4;i+) for(j=jj;jjj+4;j+) dstRIDX(dim-1-j,i,dim)=srcRIDX(i,j,dim);多添加了兩個(gè)for函數(shù),將循環(huán)分成了4*4的小塊,在cache存儲(chǔ)體不足夠大的情況下,對(duì)循環(huán)分塊能夠提升高速緩存命中率,從高提升了空間局部性。從測(cè)試的CPE中也可以看出,在dim是64的時(shí)候,原代碼和本代碼CPE相差不大,而隨著dim的增大,本代碼CPE增加不大,而原代碼CPE急劇增加,就是受到了cache存儲(chǔ)的局限性。smooth函數(shù):void smooth(int dim, pixel *src, pixel *dst)pixel_sum rowsum530530; int i, j, snum; for(i=0; idim; i+) rowsumi0.red = (srcRIDX(i, 0, dim).red+srcRIDX(i, 1, dim).red); rowsumi0.blue = (srcRIDX(i, 0, dim).blue+srcRIDX(i, 1, dim).blue); rowsumi0.green = (srcRIDX(i, 0, dim).green+srcRIDX(i, 1, dim).green); rowsumi0.num = 2; for(j=1; jdim-1; j+) rowsumij.red = (srcRIDX(i, j-1, dim).red+srcRIDX(i, j, dim).red+srcRIDX(i, j+1, dim).red); rowsumij.blue = (srcRIDX(i, j-1, dim).blue+srcRIDX(i, j, dim).blue+srcRIDX(i, j+1, dim).blue); rowsumij.green = (srcRIDX(i, j-1, dim).green+srcRIDX(i, j, dim).green+srcRIDX(i, j+1, dim).green); rowsumij.num = 3; rowsumidim-1.red = (srcRIDX(i, dim-2, dim).red+srcRIDX(i, dim-1, dim).red); rowsumidim-1.blue = (srcRIDX(i, dim-2, dim).blue+srcRIDX(i, dim-1, dim).blue); rowsumidim-1.green = (srcRIDX(i, dim-2, dim).green+srcRIDX(i, dim-1, dim).green); rowsumidim-1.num = 2; for(j=0; jdim; j+) snum = rowsum0j.num+rowsum1j.num; dstRIDX(0, j, dim).red = (unsigned short)(rowsum0j.red+rowsum1j.red)/snum); dstRIDX(0, j, dim).blue = (unsigned short)(rowsum0j.blue+rowsum1j.blue)/snum); dstRIDX(0, j, dim).green = (unsigned short)(rowsum0j.green+rowsum1j.green)/snum); for(i=1; i512時(shí),超出了設(shè)置的數(shù)組大小會(huì)報(bào)錯(cuò)。第二種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i, j; int temp; int it,jt; int im,jm; for(jt=0; jtdim; jt+=32) jm=jt+32; for(it=0; itdim; it+=32) im=it+32; for(j=jt; jjm; j+) temp=dim-1-j; for(i=it; ired=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; for(i=1;ired=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red)/6; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue)/6; dst+; P1+; P2+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; P1=src; P2=P1+dim0; P3=P2+dim0; for(i=1;ired=(P1-red+(P1+1)-red+P2-red+(P2+1)-red+P3-red+(P3+1)-red)/6; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green+P3-green+(P3+ 1)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue+P3-blue+(P3+1)-blue)/6; dst+; dst1=dst+1; for(j=1;jred=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red+P3-red+(P3+1)-red+(P3+2)-red)/9; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green+P3-green+(P3+1)-green+(P3+2)-green)/9; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue+P3-blue+(P3+1)-blue+(P3+2)-blue)/9; dst1-red=(P1+3)-red+(P1+1)-red+(P1+2)-red+(P2+3)-red+(P2+1)-red+(P2+2)-red+(P3+3)-red+(P3+1)-red+(P3+2)-red)/9; dst1-green=(P1+3)-green+(P1+1)-green+(P1+2)-green+(P2+3)-green+(P2+1)-green+(P2+2)-green+(P3+3)-green+(P3+1)-green+(P3+2)-green)/9; dst1-blue=(P1+3)-blue+(P1+1)-blue+(P1+2)-blue+(P2+3)-blue+(P2+1)-blue+(P2+2)-blue+(P3+3)-blue+(P3+1)-blue+(P3+2)-blue)/9; dst+=2; dst1+=2; P1+=2; P2+=2; P3+=2; for(;jred=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red+P3-red+(P3+1)-red+(P3+2)-red)/9; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green+P3-green+(P3+1)-green+(P3+2)-green)/9; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue+P3-blue+(P3+1)-blue+(P3+2)-blue)/9; dst+; P1+; P2+; P3+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red+P3-red+(P3+1)-red)/6; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green+P3-green+(P3+1)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue+P3-blue+(P3+1)-blue)/6; dst+; P1+=2; P2+=2; P3+=2; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; for(i=1;ired=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red)/6; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue)/6; dst+; P1+; P2+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; 這段代碼也是通過(guò)不調(diào)用avg函數(shù)來(lái)加速程序。將Smooth函數(shù)處理分為4塊,一為主體內(nèi)部,由9點(diǎn)求平均值;二為4個(gè)頂點(diǎn),由4點(diǎn)求平均值;三為四條邊界,由6點(diǎn)求平均值。從圖片的頂部開(kāi)始處理,再上邊界,順序處理下來(lái),其中在處理左邊界時(shí),for循環(huán)處理一行主體部分,就是以上的代碼。第三種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i, j; int dst_base=(dim-1)*dim; dst+=dst_base; for(i=0;idim;i+=32) for(j=0;jdim;j+) *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+; src-=(dim5)-dim; dst-=31+dim; dst+=dst_base+dim; dst+=32; src+=(dim2; dst0.blue = (src0.blue+src1.blue+srcdim.blue+srcdim+1.blue)2; dst0.green = (src0.green+src1.green+srcdim.green+srcdim+1.green)2; dstdim-1.red = (srcdim-1.red+srcdim-2.red+srcdim*2-1.red+srcdim*2-2.red)2; dstdim-1.blue = (srcdim-1.blue+srcdim-2.blue+srcdim*2-1.blue+srcdim*2-2.blue)2; dstdim-1.green = (srcdim-1.green+srcdim-2.green+srcdim*2-1.green+srcdim*2-2.green)2; dstdim*(dim-1).red = (srcdim*(dim-1).red+srcdim*(dim-1)+1.red+srcdim*(dim-2).red+srcdim*(dim-2)+1.red)2; dstdim*(dim-1).blue = (srcdim*(dim-1).blue+srcdim*(dim-1)+1.blue+srcdim*(dim-2).blue+srcdim*(dim-2)+1.blue)2; dstdim*(dim-1).green = (srcdim*(dim-1).green+srcdim*(dim-1)+1.green+srcdim*(dim-2).green+srcdim*(dim-2)+1.green)2; dstdim*dim-1.red = (srcdim*dim-1.red+srcdim*dim-2.red+srcdim*(dim-1)-1.red+srcdim*(dim-1)-2.red)2; dstdim*dim-1.blue = (srcdim*dim-1.blue+srcdim*dim-2.blue+srcdim*(dim-1)-1.blue+srcdim*(dim-1)-2.blue)2; dstdim*dim-1.green = (srcdim*dim-1.green+srcdim*dim-2.green+srcdim*(dim-1)-1.green+srcdim*(dim-1)-2.green)2; for (j = 1; j dim-1; j+) dstj.red = (srcj.red+srcj-1.red+srcj+1.red+srcj+dim.red+srcj+1+dim.red+srcj-1+dim.red)/6; dstj.green = (srcj.green+srcj-1.green+srcj+1.green+srcj+dim.green+srcj+1+dim.green+srcj-1+dim.green)/6; dstj.blue = (srcj.blue+srcj-1.blue+srcj+1.blue+srcj+dim.blue+srcj+1+dim.blue+srcj-1+dim.blue)/6; for (j = dim*(dim-1)+1; j dim*dim-1; j+) dstj.red = (srcj.red+srcj-1.red+srcj+1.red+srcj-dim.red+srcj+1-dim.red+srcj-1-dim.red)/6; dstj.green = (srcj.green+srcj-1.green+srcj+1.green+srcj-dim.green+srcj+1-dim.green+srcj-1-dim.green)/6; dstj.blue = (srcj.blue+srcj-1.blue+srcj+1.blue+srcj-dim.blue+srcj+1-dim.blue+srcj-1-dim.blue)/6; for (j = dim; j dim*(dim-1); j+=dim) dstj.red = (srcj.red+srcj-dim.red+srcj+1.red+srcj+dim.red+srcj+1+dim.red+srcj-dim+1.red)/6; dstj.green = (srcj.green+srcj-dim.green+srcj+1.green+srcj+dim.green+srcj+1+dim.green+srcj-dim+1.green)/6; dstj.blue = (srcj.blue+srcj-dim.blue+srcj+1.blue+srcj+dim.blue+srcj+1+dim.blue+srcj-dim+1.blue)/6; for (j = dim+dim-1; j dim*dim-1; j+=d
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 信息系統(tǒng)項(xiàng)目管理師應(yīng)試策略與心理調(diào)節(jié)試題及答案
- 2025年公共營(yíng)養(yǎng)師職業(yè)選擇試題及答案
- 二級(jí)計(jì)算機(jī)考試學(xué)科知識(shí)交叉探究試題及答案
- 2025年公共營(yíng)養(yǎng)師考試中團(tuán)隊(duì)合作的重要性試題及答案
- 2025年教師資格考試復(fù)習(xí)資料試題及答案
- 2024年圖書(shū)管理員考試學(xué)習(xí)動(dòng)機(jī)試題及答案
- 2025年計(jì)算機(jī)二級(jí)考試信息檢索的重要性試題及答案
- 2025年公共衛(wèi)生執(zhí)業(yè)醫(yī)師考試衛(wèi)生管理試題及答案
- 2025年初級(jí)會(huì)計(jì)師考試課程設(shè)計(jì)試題及答案
- 2025-2030中國(guó)可消化傳感器行業(yè)市場(chǎng)發(fā)展趨勢(shì)與前景展望戰(zhàn)略研究報(bào)告
- 幼兒園4000余冊(cè)師生圖書(shū)配置一覽表
- 輸電線路工程施工驗(yàn)收表格
- 國(guó)資委風(fēng)險(xiǎn)預(yù)警-47頁(yè)P(yáng)PT課件
- 金風(fēng)1.5MW機(jī)組液壓、偏航及潤(rùn)滑控制系統(tǒng)
- 跑冒油事故應(yīng)急預(yù)案
- 水泥穩(wěn)定碎石底基層配合比報(bào)告
- B類表(施工單位報(bào)審、報(bào)驗(yàn)用表)
- 質(zhì)量監(jiān)督員監(jiān)督報(bào)告
- 現(xiàn)場(chǎng)零工簽證單及罰款單簽證樣板完整版
- 醫(yī)療安全不良事件分析記錄表
- (完整版)APQP(現(xiàn)用版)
評(píng)論
0/150
提交評(píng)論