




已閱讀5頁,還剩9頁未讀, 繼續(xù)免費(fèi)閱讀
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
湖南大學(xué)課程實(shí)驗(yàn)報(bào)告 課 程 名 稱: 計(jì)算機(jī)組成與結(jié)構(gòu) 實(shí)驗(yàn)項(xiàng)目名稱: perflab 專 業(yè) 班 級: 姓 名: 學(xué) 號: 指 導(dǎo) 教 師: 完 成 時(shí) 間: 2015 年 05 月 22 日計(jì)算機(jī)科學(xué)與工程系實(shí)驗(yàn)題目:程序性能調(diào)優(yōu)實(shí)驗(yàn)實(shí)驗(yàn)?zāi)康模簁ernel.c文件中主要有兩個(gè)需要進(jìn)行優(yōu)化的函數(shù):rotate和smooth,并分別給出了naive_rotate和naive_smooth兩個(gè)函數(shù)的基本實(shí)現(xiàn)作為baseline作為你改進(jìn)后的程序的比較對象。你需要讀懂rotate和smooth函數(shù),并對其進(jìn)行優(yōu)化。你每寫一個(gè)新版本的、優(yōu)化的rotate和smooth函數(shù),均可在成注冊后使用driver進(jìn)行測試,并得到對應(yīng)的CPE和加速比。本次實(shí)驗(yàn),要求針對每個(gè)函數(shù)、每個(gè)人均至少寫出3種優(yōu)化版本、并根據(jù)driver報(bào)告的結(jié)果進(jìn)行性能分析。實(shí)驗(yàn)環(huán)境:Vmware虛擬機(jī) ubuntu12.04 linux終端實(shí)驗(yàn)步驟和結(jié)果分析:函數(shù)源碼:rotate函數(shù):void naive_rotate(int dim, pixel *src, pixel *dst) int i, j; for (i = 0; i dim; i+)for (j = 0; j dim; j+) dstRIDX(dim-1-j, i, dim) = srcRIDX(i, j, dim);rotate函數(shù)的作用是通過將每個(gè)像素進(jìn)行行列調(diào)位,將一副點(diǎn)陣圖像進(jìn)行90度旋轉(zhuǎn)。其中RIDX(i,j,n)即(i)*(n)+(j)。函數(shù)缺點(diǎn)為程序局部性不好,循環(huán)次數(shù)過多??梢詫ζ溥M(jìn)行分塊來提高空間局部性,也可以進(jìn)行循環(huán)展開。smooth函數(shù):void naive_smooth(int dim, pixel *src, pixel *dst) int i, j; for (i = 0; i dim; i+)for (j = 0; j dim; j+) dstRIDX(i, j, dim) = avg(dim, i, j, src);smooth函數(shù)的作用是通過對圖像每幾點(diǎn)像素求平均值來對圖像進(jìn)行模糊化處理。函數(shù)缺點(diǎn)是循環(huán)次數(shù)過多和頻繁調(diào)用avg函數(shù),avg函數(shù)中又包含許多函數(shù)。應(yīng)該減少avg函數(shù)的調(diào)用次數(shù),且進(jìn)行循環(huán)展開。第一種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i,j,ii,jj; for(ii=0;iidim;ii+=4) for(jj=0;jjdim;jj+=4) for(i=ii;iii+4;i+) for(j=jj;jjj+4;j+) dstRIDX(dim-1-j,i,dim)=srcRIDX(i,j,dim);多添加了兩個(gè)for函數(shù),將循環(huán)分成了4*4的小塊,在cache存儲體不足夠大的情況下,對循環(huán)分塊能夠提升高速緩存命中率,從高提升了空間局部性。從測試的CPE中也可以看出,在dim是64的時(shí)候,原代碼和本代碼CPE相差不大,而隨著dim的增大,本代碼CPE增加不大,而原代碼CPE急劇增加,就是受到了cache存儲的局限性。smooth函數(shù):void smooth(int dim, pixel *src, pixel *dst)pixel_sum rowsum530530; int i, j, snum; for(i=0; idim; i+) rowsumi0.red = (srcRIDX(i, 0, dim).red+srcRIDX(i, 1, dim).red); rowsumi0.blue = (srcRIDX(i, 0, dim).blue+srcRIDX(i, 1, dim).blue); rowsumi0.green = (srcRIDX(i, 0, dim).green+srcRIDX(i, 1, dim).green); rowsumi0.num = 2; for(j=1; jdim-1; j+) rowsumij.red = (srcRIDX(i, j-1, dim).red+srcRIDX(i, j, dim).red+srcRIDX(i, j+1, dim).red); rowsumij.blue = (srcRIDX(i, j-1, dim).blue+srcRIDX(i, j, dim).blue+srcRIDX(i, j+1, dim).blue); rowsumij.green = (srcRIDX(i, j-1, dim).green+srcRIDX(i, j, dim).green+srcRIDX(i, j+1, dim).green); rowsumij.num = 3; rowsumidim-1.red = (srcRIDX(i, dim-2, dim).red+srcRIDX(i, dim-1, dim).red); rowsumidim-1.blue = (srcRIDX(i, dim-2, dim).blue+srcRIDX(i, dim-1, dim).blue); rowsumidim-1.green = (srcRIDX(i, dim-2, dim).green+srcRIDX(i, dim-1, dim).green); rowsumidim-1.num = 2; for(j=0; jdim; j+) snum = rowsum0j.num+rowsum1j.num; dstRIDX(0, j, dim).red = (unsigned short)(rowsum0j.red+rowsum1j.red)/snum); dstRIDX(0, j, dim).blue = (unsigned short)(rowsum0j.blue+rowsum1j.blue)/snum); dstRIDX(0, j, dim).green = (unsigned short)(rowsum0j.green+rowsum1j.green)/snum); for(i=1; i512時(shí),超出了設(shè)置的數(shù)組大小會報(bào)錯(cuò)。第二種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i, j; int temp; int it,jt; int im,jm; for(jt=0; jtdim; jt+=32) jm=jt+32; for(it=0; itdim; it+=32) im=it+32; for(j=jt; jjm; j+) temp=dim-1-j; for(i=it; ired=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; for(i=1;ired=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red)/6; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue)/6; dst+; P1+; P2+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; P1=src; P2=P1+dim0; P3=P2+dim0; for(i=1;ired=(P1-red+(P1+1)-red+P2-red+(P2+1)-red+P3-red+(P3+1)-red)/6; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green+P3-green+(P3+ 1)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue+P3-blue+(P3+1)-blue)/6; dst+; dst1=dst+1; for(j=1;jred=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red+P3-red+(P3+1)-red+(P3+2)-red)/9; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green+P3-green+(P3+1)-green+(P3+2)-green)/9; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue+P3-blue+(P3+1)-blue+(P3+2)-blue)/9; dst1-red=(P1+3)-red+(P1+1)-red+(P1+2)-red+(P2+3)-red+(P2+1)-red+(P2+2)-red+(P3+3)-red+(P3+1)-red+(P3+2)-red)/9; dst1-green=(P1+3)-green+(P1+1)-green+(P1+2)-green+(P2+3)-green+(P2+1)-green+(P2+2)-green+(P3+3)-green+(P3+1)-green+(P3+2)-green)/9; dst1-blue=(P1+3)-blue+(P1+1)-blue+(P1+2)-blue+(P2+3)-blue+(P2+1)-blue+(P2+2)-blue+(P3+3)-blue+(P3+1)-blue+(P3+2)-blue)/9; dst+=2; dst1+=2; P1+=2; P2+=2; P3+=2; for(;jred=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red+P3-red+(P3+1)-red+(P3+2)-red)/9; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green+P3-green+(P3+1)-green+(P3+2)-green)/9; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue+P3-blue+(P3+1)-blue+(P3+2)-blue)/9; dst+; P1+; P2+; P3+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red+P3-red+(P3+1)-red)/6; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green+P3-green+(P3+1)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue+P3-blue+(P3+1)-blue)/6; dst+; P1+=2; P2+=2; P3+=2; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; for(i=1;ired=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red)/6; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue)/6; dst+; P1+; P2+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; 這段代碼也是通過不調(diào)用avg函數(shù)來加速程序。將Smooth函數(shù)處理分為4塊,一為主體內(nèi)部,由9點(diǎn)求平均值;二為4個(gè)頂點(diǎn),由4點(diǎn)求平均值;三為四條邊界,由6點(diǎn)求平均值。從圖片的頂部開始處理,再上邊界,順序處理下來,其中在處理左邊界時(shí),for循環(huán)處理一行主體部分,就是以上的代碼。第三種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i, j; int dst_base=(dim-1)*dim; dst+=dst_base; for(i=0;idim;i+=32) for(j=0;jdim;j+) *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+; src-=(dim5)-dim; dst-=31+dim; dst+=dst_base+dim; dst+=32; src+=(dim2; dst0.blue = (src0.blue+src1.blue+srcdim.blue+srcdim+1.blue)2; dst0.green = (src0.green+src1.green+srcdim.green+srcdim+1.green)2; dstdim-1.red = (srcdim-1.red+srcdim-2.red+srcdim*2-1.red+srcdim*2-2.red)2; dstdim-1.blue = (srcdim-1.blue+srcdim-2.blue+srcdim*2-1.blue+srcdim*2-2.blue)2; dstdim-1.green = (srcdim-1.green+srcdim-2.green+srcdim*2-1.green+srcdim*2-2.green)2; dstdim*(dim-1).red = (srcdim*(dim-1).red+srcdim*(dim-1)+1.red+srcdim*(dim-2).red+srcdim*(dim-2)+1.red)2; dstdim*(dim-1).blue = (srcdim*(dim-1).blue+srcdim*(dim-1)+1.blue+srcdim*(dim-2).blue+srcdim*(dim-2)+1.blue)2; dstdim*(dim-1).green = (srcdim*(dim-1).green+srcdim*(dim-1)+1.green+srcdim*(dim-2).green+srcdim*(dim-2)+1.green)2; dstdim*dim-1.red = (srcdim*dim-1.red+srcdim*dim-2.red+srcdim*(dim-1)-1.red+srcdim*(dim-1)-2.red)2; dstdim*dim-1.blue = (srcdim*dim-1.blue+srcdim*dim-2.blue+srcdim*(dim-1)-1.blue+srcdim*(dim-1)-2.blue)2; dstdim*dim-1.green = (srcdim*dim-1.green+srcdim*dim-2.green+srcdim*(dim-1)-1.green+srcdim*(dim-1)-2.green)2; for (j = 1; j dim-1; j+) dstj.red = (srcj.red+srcj-1.red+srcj+1.red+srcj+dim.red+srcj+1+dim.red+srcj-1+dim.red)/6; dstj.green = (srcj.green+srcj-1.green+srcj+1.green+srcj+dim.green+srcj+1+dim.green+srcj-1+dim.green)/6; dstj.blue = (srcj.blue+srcj-1.blue+srcj+1.blue+srcj+dim.blue+srcj+1+dim.blue+srcj-1+dim.blue)/6; for (j = dim*(dim-1)+1; j dim*dim-1; j+) dstj.red = (srcj.red+srcj-1.red+srcj+1.red+srcj-dim.red+srcj+1-dim.red+srcj-1-dim.red)/6; dstj.green = (srcj.green+srcj-1.green+srcj+1.green+srcj-dim.green+srcj+1-dim.green+srcj-1-dim.green)/6; dstj.blue = (srcj.blue+srcj-1.blue+srcj+1.blue+srcj-dim.blue+srcj+1-dim.blue+srcj-1-dim.blue)/6; for (j = dim; j dim*(dim-1); j+=dim) dstj.red = (srcj.red+srcj-dim.red+srcj+1.red+srcj+dim.red+srcj+1+dim.red+srcj-dim+1.red)/6; dstj.green = (srcj.green+srcj-dim.green+srcj+1.green+srcj+dim.green+srcj+1+dim.green+srcj-dim+1.green)/6; dstj.blue = (srcj.blue+srcj-dim.blue+srcj+1.blue+srcj+dim.blue+srcj+1+dim.blue+srcj-dim+1.blue)/6; for (j = dim+dim-1; j dim*dim-1; j+=d
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 創(chuàng)新金融工具在糧食安全保障中的應(yīng)用前景
- 產(chǎn)學(xué)研合作模式促進(jìn)藝術(shù)學(xué)理論人才創(chuàng)新能力培養(yǎng)
- 2025至2030中國騎行服市場營銷渠道與供求平衡預(yù)測分析報(bào)告
- 遼寧省遼陽市二中學(xué)教育協(xié)作2025屆九上化學(xué)期末質(zhì)量跟蹤監(jiān)視模擬試題含解析
- 湖南省株洲市荷塘區(qū)2024年七上數(shù)學(xué)期末復(fù)習(xí)檢測模擬試題含解析
- 內(nèi)蒙古自治區(qū)通遼市奈曼旗2024年化學(xué)九年級第一學(xué)期期末聯(lián)考試題含解析
- 2025至2030中國景觀設(shè)計(jì)行業(yè)市場深度調(diào)研及競爭格局與投資發(fā)展?jié)摿?bào)告
- 吉林省長春寬城區(qū)四校聯(lián)考2024-2025學(xué)年八年級物理第一學(xué)期期末預(yù)測試題含解析
- 餐飲商鋪?zhàn)赓U及品牌孵化合同
- 醫(yī)療設(shè)備質(zhì)量管理實(shí)踐案例分析
- 中風(fēng)腦梗死恢復(fù)期中醫(yī)護(hù)理方案課件
- 新生兒重癥監(jiān)護(hù)室母乳使用專家共識(2024版)解讀
- 病毒性腦炎診療指南(兒科)
- 樂器設(shè)備供貨項(xiàng)目實(shí)施方案及售后服務(wù)方案
- 中共黨史知識競賽試題及答案
- 2020年杭州學(xué)軍中學(xué)高一入學(xué)分班考試英語試卷及答案
- (高清版)AQ 1044-2007 礦井密閉防滅火技術(shù)規(guī)范
- 死亡醫(yī)學(xué)證明書填寫培訓(xùn)
- 做自己的心理壓力調(diào)節(jié)師智慧樹知到期末考試答案章節(jié)答案2024年嘉興大學(xué)
- 學(xué)術(shù)期刊推廣方案
- 安檢設(shè)備采購安裝調(diào)試方案
評論
0/150
提交評論