自動(dòng)化專業(yè)英文文獻(xiàn)翻譯2_第1頁
自動(dòng)化專業(yè)英文文獻(xiàn)翻譯2_第2頁
自動(dòng)化專業(yè)英文文獻(xiàn)翻譯2_第3頁
自動(dòng)化專業(yè)英文文獻(xiàn)翻譯2_第4頁
自動(dòng)化專業(yè)英文文獻(xiàn)翻譯2_第5頁
已閱讀5頁,還剩31頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

{管理信息化OA自動(dòng)化}自動(dòng)化專業(yè)英文文獻(xiàn)翻譯2西南交通大學(xué)畢業(yè)設(shè)計(jì)(英文翻譯)節(jié)能型機(jī)房溫濕度遠(yuǎn)程控制系統(tǒng)測(cè)控節(jié)點(diǎn)設(shè)計(jì)年級(jí):學(xué)號(hào):姓名:專業(yè):指導(dǎo)老師:二零一一年六月MosixMosix修改bsdi上的BSD/OS來提供電腦上的跨網(wǎng)絡(luò)的動(dòng)態(tài)負(fù)載平衡組和謝謝閱讀謝謝閱讀一個(gè)集群很像一個(gè)可擴(kuò)展的SMPLinuxmpf97謝謝閱讀其中包括一些MMFP的評(píng)論。顯然,MMFP將支持兩個(gè)32位浮點(diǎn)數(shù)字裝入到精品文檔放心下載一個(gè)64位的MMXMMFP管道將產(chǎn)生四個(gè)單精度每時(shí)鐘觸感謝閱讀發(fā)器。SIMD或向量式并行。相同的操作同事應(yīng)用到各個(gè)領(lǐng)域。有許多辦法來廢止感謝閱讀SIMD謝謝閱讀的損害。SWAR精品文檔放心下載別是MMXx[y]向量(其中y精品文檔放心下載謝謝閱讀媒體應(yīng)用。對(duì)于算法的正確類型,SWAR比SMP或并行群集更效率得多,而感謝閱讀且它不需要任何費(fèi)用。4.2SWAR編程導(dǎo)論SWAR的基本概念,在A寄存器里的SIMD,是指在字長(zhǎng)寄存器的操作能被用謝謝閱讀于加速由SIMD在n,k/n位表上的操作運(yùn)算。然而,使用SWAR技術(shù)可能會(huì)精品文檔放心下載難以處理,并且一些SWAR操作確實(shí)比串行序列比昂貴的多,因?yàn)樗麄冃枰x謝閱讀額外的指令以執(zhí)行實(shí)地劃分。SWAR32位精品文檔放心下載寄存器里的4個(gè)8位字段。兩個(gè)寄存器中的值可以表示為:謝謝閱讀PE3PE2PE1PE0+-------+-------+-------+-------+精品文檔放心下載Reg0|D7:0|C7:0|B7:0|A7:0|+-------+-------+-------+-------+精品文檔放心下載Reg1|H7:0|G7:0|F7:0|E7:0|+-------+-------+-------+-------+精品文檔放心下載這個(gè)模型表明每一個(gè)寄存器都被有必要地視為一個(gè)包含4個(gè)獨(dú)立的8位整精品文檔放心下載A和E作為Reg0和Reg1來處理元素0(PE0B和精品文檔放心下載F視為PE1的寄存器,等等。本文檔的其余部分簡(jiǎn)要回顧這些整數(shù)向量的SIMD并行操作的基本類型如何感謝閱讀將這些功能都可以實(shí)現(xiàn)。多態(tài)操作有些SWAR操作可以很一般地使用普通的32感謝閱讀作是真的打算在這些8位的字段上并行操作。我們呼吁任何SWAR操作多態(tài)謝謝閱讀性,因?yàn)樵摴δ懿⑽词艿阶侄晤愋停ù笮。┑挠绊憽>肺臋n放心下載通的位與操作(C的&操作符)執(zhí)行按位與,無論這個(gè)字段的大小。一個(gè)簡(jiǎn)感謝閱讀單的寄存器位與實(shí)例如下:PE3PE2PE1PE0+---------+---------+---------+---------+謝謝閱讀Reg2|D&H7:0|C&G7:0|B&F7:0|A&E7:0|謝謝閱讀+---------+---------+---------+---------+謝謝閱讀由于位與操作總有K謝謝閱讀相同的單指令。分區(qū)操作不幸的是,很多重要的SWAR操作都不具有多態(tài)性。算術(shù)運(yùn)算,如加,減,精品文檔放心下載/SWAR謝謝閱讀感謝閱讀間相互影響。然而,實(shí)際上有三種不同的方法來獲得這樣的效果。感謝閱讀分區(qū)說明”謝謝閱讀來進(jìn)行字段見的進(jìn)/借位。這種方法可以產(chǎn)生最高的性能,但它需要改變處感謝閱讀8感謝閱讀12AMDIntelMMX,DigitalMAX,HPMAX和SunVIS感謝閱讀謝謝閱讀便。例如,分析下面的采樣分區(qū)操作:指令A(yù)MDIntelMMXDECMAXHPMAXSunVIS謝謝閱讀+---------------------+---------------------+---------+--------+-感謝閱讀--------+|絕對(duì)差別||8||8|+---------------------+---------------------+---------+--------+-謝謝閱讀--------+|相融最大值||8,16|||+---------------------+---------------------+---------+--------+-謝謝閱讀--------+|比較|8,16,32|||16,32|+---------------------+---------------------+---------+--------+-精品文檔放心下載--------+|相乘|16|||8x16|+---------------------+---------------------+---------+--------+-感謝閱讀--------+|相加|8,16,32||16|16,32|+---------------------+---------------------+---------+--------+-謝謝閱讀--------+精品文檔放心下載感謝閱讀HLLS感謝閱讀植性普遍較差。未劃分的操作與校正碼謝謝閱讀被硬件所支持時(shí)該如何解決?方法是你可以使用一連串普通指令來實(shí)現(xiàn)字精品文檔放心下載段見的進(jìn)/借位操作,來改正那些不受歡迎的限制。感謝閱讀感謝閱讀謝謝閱讀事實(shí)上,如C一樣的表達(dá)語言,這種方法使得SWAR方案得到充分的可移植感謝閱讀性。問題立即出現(xiàn)了:很明顯,使用帶糾錯(cuò)碼的未分區(qū)操作來模擬SWAR分區(qū)操謝謝閱讀作是多么的無效?因此,這是一個(gè)類似于$64K的問題…但許多操作并不像感謝閱讀人們想象中的那么難。考慮推行使用四個(gè)原色的8位整型向量通過使用普通32位操作來實(shí)現(xiàn)兩個(gè)精品文檔放心下載源向量的相加操作。一個(gè)普通的32位加法實(shí)際上可能產(chǎn)生正確的結(jié)果,但8位字段進(jìn)位操作排精品文檔放心下載k位感謝閱讀字段的數(shù)相加最少會(huì)產(chǎn)生k+1感謝閱讀有效的位來確保沒有進(jìn)位發(fā)生。下面是一個(gè)通過與0x7f7f7f7f的與操作來謝謝閱讀實(shí)現(xiàn)普通32位的加操作。t=((x&0x7f7f7f7f)+(y&0x7f7f7f7f));精品文檔放心下載謝謝閱讀僅僅是一個(gè)分區(qū)的問題做兩個(gè)1位增加了7位最重要的位從x和y進(jìn)行計(jì)感謝閱讀算的結(jié)果。幸運(yùn)的是,1位分區(qū)加操作是有一個(gè)普通的異或操作來實(shí)現(xiàn)的。感謝閱讀因此,結(jié)果很簡(jiǎn)單:(t^((x^y)&0x0))6次操作僅僅來實(shí)現(xiàn)4次相謝謝閱讀…感謝閱讀精品文檔放心下載感謝閱讀感謝閱讀控制字段值精品文檔放心下載以更有效地計(jì)算,而不是控制字段值,使不同字段間的進(jìn)/借位永遠(yuǎn)不會(huì)發(fā)謝謝閱讀精品文檔放心下載ADD感謝閱讀ADD感謝閱讀校正。因此,問題變成了如何確保字段值不會(huì)引起進(jìn)/借位事件。感謝閱讀確保這一點(diǎn)的一個(gè)方法是實(shí)施分區(qū)指令來限制字段值的范圍。DigitalMAX謝謝閱讀的最低和最高指令可以被視為由硬件支持的避免內(nèi)部字段的進(jìn)/借位。感謝閱讀…是否有充分謝謝閱讀的條件,能以廉價(jià)地確保進(jìn)/借位事件不會(huì)干擾相鄰字段?答案在于性能分感謝閱讀析算法。兩個(gè)k位數(shù)相加最多會(huì)生成k+1k+1位字段感謝閱讀可以安全地使用這樣的操作盡管使用普通指令來操作。因此,假設(shè)前面例子中的8位字段現(xiàn)在改成7位并帶有1位“進(jìn)/借位”操精品文檔放心下載作的字段:PE3PE2PE1PE0+----+-------+----+-------+----+-------+----+-------+感謝閱讀Reg0|D'|D6:0|C'|C6:0|B'|B6:0|A'|A6:0|感謝閱讀+----+-------+----+-------+----+-------+----+-------+謝謝閱讀7感謝閱讀分區(qū),所有的執(zhí)行間隔位(A的,B的,C的,和D的)的值為0.只需執(zhí)行謝謝閱讀一個(gè)普通的加操作,所有字段即可獲得正確的7精品文檔放心下載能會(huì)是1謝謝閱讀7位的整型向量x與y相加,是這樣的:((x+y)&0x7f7f7f7f)這只用了兩個(gè)操作就實(shí)現(xiàn)了四次相加,很明顯效果良好。0謝謝閱讀x-yx里的間隔位為1y精品文檔放心下載里的所有間隔位為0。最壞的情況下,我們可以得到:(((x|0x0)-y)&0x7f7f7f7f)精品文檔放心下載用X|0x0而來的,而不是用&0x7f7f7f7f而來的。謝謝閱讀哪種方法更適合用于SWAR分區(qū)操作?答案很簡(jiǎn)單“能產(chǎn)生最好的加速比。精品文檔放心下載精品文檔放心下載字段。通信與類型轉(zhuǎn)換操作感謝閱讀向量里的第i個(gè)值是一個(gè)出現(xiàn)在第i感謝閱讀FFT精品文檔放心下載并作為平滑需要更復(fù)雜的(較少本地化)的通訊模式。有效地實(shí)現(xiàn)一維鄰近溝通來讓SWAR使用未分區(qū)的移位操作并不難。例如,精品文檔放心下載移動(dòng)一個(gè)值從PE(iPE(i+18位精品文檔放心下載長(zhǎng)的話,我們將使用:(x<<8)PE(iPE(i+1感謝閱讀個(gè)簡(jiǎn)單的轉(zhuǎn)換操作就足夠了…而C精品文檔放心下載精品文檔放心下載號(hào)位:((x>>8)&0x00ffffff)精品文檔放心下載一個(gè)值從PE(i)到PE(i+1):((x<<8)|((x>>24)&0x000000ff))精品文檔放心下載HPMAX指令集支持任謝謝閱讀Permute感謝閱讀任意置換的字段,更可以允許重復(fù)??偸撬鼘?shí)現(xiàn)了一個(gè)任意x[y]操作。感謝閱讀不幸的是,x[y]謝謝閱讀MasParMP1/MP2精品文檔放心下載和具有思維的CM1CM200SIMD上執(zhí)行相對(duì)高速的x[y]操作是這些機(jī)器運(yùn)行精品文檔放心下載x[y]謝謝閱讀級(jí)計(jì)算機(jī),所以許多算法的目的是為了盡量減少x[y]操作的需要??傊兄x閱讀SWARx[y]謝謝閱讀或者說至少不便宜。復(fù)發(fā)操作(約化,掃描等)謝謝閱讀復(fù)發(fā)包含關(guān)系操作,它很有可能會(huì)重新編碼計(jì)算使用樹結(jié)構(gòu)的并行算法。感謝閱讀精品文檔放心下載的值的綜合,一個(gè)純粹的連續(xù)C寫法如下:t=0;for(i=0;i<MAX;++i)t+=x[i];謝謝閱讀謝謝閱讀感謝閱讀精品文檔放心下載精品文檔放心下載一個(gè)具有4個(gè)88位的相加,感謝閱讀產(chǎn)生2個(gè)16位的字段(每一個(gè)包含一個(gè)9位的結(jié)果):感謝閱讀t=((x&0x00ff00ff)+((x>>8)&0x00ff00ff));精品文檔放心下載第二步是將這些兩個(gè)16位字段的9位值進(jìn)行運(yùn)算而產(chǎn)生一個(gè)10位的結(jié)果:謝謝閱讀((t+(t>>16))&0x000003ff)16位字段的相加…但前16感謝閱讀就是為什么結(jié)果是一個(gè)偽10位的結(jié)果。謝謝閱讀謝謝閱讀用一個(gè)相當(dāng)明顯的并行操作序列。4.3Linux下的MMXSWAR對(duì)于Linux,IA32處理器是我們首要關(guān)注的。好消息是,AMD公司,Cyrix謝謝閱讀和英特爾都實(shí)現(xiàn)了同樣的MMXMMX謝謝閱讀在K6型只有一個(gè)MMX的管道-的MMX謝謝閱讀特爾仍然在運(yùn)行那些愚蠢的MMX的廣告….實(shí)際上有三種方法可以使用MMXforSWAR:1.從MMX公司使用程序庫。特別是,英特爾已經(jīng)開發(fā)了幾個(gè)“性能庫”謝謝閱讀.tools/,能夠向用戶提供各種任務(wù)的手,共同優(yōu)化多媒體程序。隨著一點(diǎn)精品文檔放心下載謝謝閱讀或多個(gè)這些庫例程。這些庫目前沒有可用的Linux,但可以移植。謝謝閱讀2.直接使用MMXMMX感謝閱讀公司可能并不使用于處理器,因此另一種實(shí)現(xiàn)還必須提供。第二個(gè)問題是,謝謝閱讀IA32的Linux使用的匯編器目前通常還無法識(shí)別MMX指令。精品文檔放心下載3.MMX謝謝閱讀Linux精品文檔放心下載./?hankd/SWAR/C函數(shù)謝謝閱讀寫在一個(gè)顯式并行,并會(huì)產(chǎn)生SWAR模塊的調(diào)用為C的功能,以此來使用現(xiàn)感謝閱讀有的SWARMMX1996年秋感謝閱讀季,然而,使用這種技術(shù)將會(huì)比我們?cè)阮A(yù)期的更多的時(shí)間。精品文檔放心下載MMXSWAR精品文檔放心下載方法目前可以使用了。下面是一些基礎(chǔ):1.你不能使用MMXGCC代碼將被用于測(cè)試精品文檔放心下載MMX00則謝謝閱讀表示支持。inlineexternintmmx_init(void){intmmx_available;__asm____volatile__(/*GetCPUversioninformation*/精品文檔放心下載"movl$1,%%eax\n\t""cpuid\n\t""andl$0x800000,%%edx\n\t""movl%%edx,%0":"=q"(mmx_available):/*noinput*/);returnmmx_available;}2.一個(gè)MMX寄存器基本上長(zhǎng)期持一個(gè)在GCC里稱之為無符號(hào)雙字型。因此,感謝閱讀這種類型的內(nèi)存基礎(chǔ)變量成為了MMX模塊與C程序之間的溝通機(jī)制?;蛘?,精品文檔放心下載你可以聲明你的MMX數(shù)據(jù)為6464位隊(duì)列感謝閱讀精品文檔放心下載3.如果MMXMMX精品文檔放心下載感謝閱讀MMX指令PADDBMM0MM1可被編碼為GCC里的感謝閱讀如下代碼:__asm____volatile__(".byte0x0f,0xfc,0xc1\n\t");謝謝閱讀請(qǐng)記住,MMX謝謝閱讀與MMXMMX代碼之前精品文檔放心下載必須為空;浮點(diǎn)堆棧通常在不適用浮點(diǎn)的C函數(shù)功能前是空的。謝謝閱讀4.通過執(zhí)行EMMS指令的方式退出您的MMX代碼,它可被編碼為:感謝閱讀__asm____volatile__(".byte0x0f,0x77\n\t");感謝閱讀是否上面看起來很尷尬很粗糙?然而,MMX還相當(dāng)年輕….這個(gè)文件的未來精品文檔放心下載版本將提供更好的方法來使用MMXSWAR。5.Linux的托管附加處理器謝謝閱讀Linux謝謝閱讀很少提供軟件支持,幾乎都是關(guān)于自己的研究。5.1Linux的PC是良好的主機(jī)在一般情況下,往往附加并行處理器是專門履行職能的具體類型。感謝閱讀謝謝閱讀到一個(gè)適當(dāng)?shù)腖inuxPC主機(jī)特定的系統(tǒng),在Linux平臺(tái)PC是一個(gè)非常適合謝謝閱讀少數(shù)這種使用類型。對(duì)以后的學(xué)習(xí)將會(huì)有很大的幫助。PC機(jī)作為主機(jī)有兩個(gè)主要原因。首先是價(jià)格低廉,且易于擴(kuò)展的能力;資精品文檔放心下載精品文檔放心下載ISA和PCI感謝閱讀的性能侵入接口更是一個(gè)優(yōu)勢(shì)。在在IA32的獨(dú)立的I/O空間也便于借口提精品文檔放心下載供單獨(dú)的I/O端口地址,以起到硬件I/O地址的保護(hù)作用。精品文檔放心下載Linux也成為了一名優(yōu)秀的主機(jī)操作系統(tǒng)。源代碼免費(fèi)提供全面和廣泛的感謝閱讀“黑客導(dǎo)游,顯然是一個(gè)巨大的幫助。然而,Linux還提供了良好的近實(shí)謝謝閱讀時(shí)調(diào)度,甚至有一個(gè)真正的實(shí)時(shí)Linux版本在./?RTLinux的/。也許更為感謝閱讀UNIX環(huán)境中,Linux可以支持開發(fā)Windows精品文檔放心下載DOS和/或WindowsMSODS的程序可以在Linux中使感謝閱讀用dosemuMSDOS的情況下執(zhí)行,并且提供一個(gè)受保護(hù)的并且能真正運(yùn)行感謝閱讀MSDOS的虛擬機(jī)。Linux程序支持Windows3.xx系列更是直接:免費(fèi)軟件比感謝閱讀如wine,/TMS320和模擬裝置(/SHARCDSP家族,被設(shè)計(jì)來利用小或精品文檔放心下載者無“膠”的邏輯來構(gòu)建并行機(jī)器。2.MIP或MFLOPDSP謝謝閱讀處理器的成本僅僅是PC電腦成本的十分之一,對(duì)于DSP來說這并不是聞所精品文檔放心下載未聞。3.感謝閱讀樣一些芯片的所有的功率都是有傳統(tǒng)的PC電腦來提供,在封閉的情況下你精品文檔放心下載的PC電腦就不會(huì)變成一個(gè)烤箱。4.大多數(shù)DSP指令集里都具有看起來奇怪的東西,那些高級(jí)(例如,C)編精品文檔放心下載譯器好像是用得并不怎么好,例如,“反向位”是用一個(gè)附加的并行系統(tǒng),精品文檔放心下載DSP上仔細(xì)手工調(diào)整代碼精品文檔放心下載一樣運(yùn)行最耗時(shí)算法。5.這些DSP處理器并不是真正設(shè)計(jì)于運(yùn)行類UNIX操作系統(tǒng),而且一般都不感謝閱讀謝謝閱讀感謝閱讀主機(jī)來使用時(shí)…如Linux電腦。雖然有些聲卡和調(diào)制解調(diào)器包括DSP處理器的Linux感謝閱讀收益來自使用一個(gè)附加的有四個(gè)或更多的DSP處理器的并行系統(tǒng)。感謝閱讀由于德州儀器TMS320系列,docs,已經(jīng)流行了非常久,但這僅僅是微不足精品文檔放心下載道的建設(shè)TMS320,已經(jīng)有好一些這樣的系統(tǒng)可用。現(xiàn)在已經(jīng)有僅整型和浮謝謝閱讀點(diǎn)能力的TMS320版本,點(diǎn)格式舊的設(shè)計(jì)采用了不同尋常的單精度浮點(diǎn),但謝謝閱讀新機(jī)型支持IEEE格式。老TMS320C4x(又名'C4x)達(dá)到80MFLOPS使用TI精品文檔放心下載'C67x將提供高達(dá)1GFLOPS的單精度或謝謝閱讀420MFLOPS雙精度為IEEE浮點(diǎn)運(yùn)算,使用一個(gè)VLIW的芯片架構(gòu)稱為感謝閱讀VelociTI謝謝閱讀片中,'C8x多處理器將提供100MFLOPSIEEE浮點(diǎn)DSP的主從處理器的RISC謝謝閱讀隨著兩個(gè)或四個(gè)整數(shù)。其他的DSPADI公司的SHARC感謝閱讀(又名,ADSP-2106x)/。這些芯片可以配置為一個(gè)六處理器共享內(nèi)存多處感謝閱讀6個(gè)4位精品文檔放心下載鏈接/芯片。大多數(shù)規(guī)模較大的系統(tǒng)似乎針對(duì)軍事應(yīng)用,但是有點(diǎn)昂貴。然精品文檔放心下載/PCIGreenICE。感謝閱讀本單元包含一個(gè)16SHARC處理器陣列,并能夠提供格式精度IEEE峰值速度精品文檔放心下載大約為190GFLOPS的處理器。GreenICE成本還不到5000美元。謝謝閱讀DSP處理器真的值得在Linux并行操作社區(qū)里引起極大的關(guān)感謝閱讀注。5.3FPGA可重構(gòu)邏輯運(yùn)算感謝閱讀感謝閱讀法時(shí)便變得無用等。但是,最近進(jìn)展中的電可編程FPGA(現(xiàn)場(chǎng)可編程門陣感謝閱讀列)已廢止了那些反對(duì)?,F(xiàn)在,門密度足夠高,使整個(gè)simpleFPGA的內(nèi)置精品文檔放心下載處理器可以在一個(gè)單一,時(shí)間來重新配置(重新設(shè)定)的FPGA也已經(jīng)下降精品文檔放心下載謝謝閱讀個(gè)。VHDL的FPGA配置,謝謝閱讀主機(jī)系統(tǒng),以及寫作的低級(jí)別的代碼到Linux上的程序接口。但是FPGA成感謝閱讀本低,尤其是算法操作在低精度的整數(shù)數(shù)據(jù)(實(shí)際上,還擅長(zhǎng)于SWAR的小感謝閱讀FPGA執(zhí)行復(fù)雜的操作系統(tǒng)時(shí)的速度可以和你輸入的數(shù)據(jù)速度一樣快。謝謝閱讀例如,簡(jiǎn)單的基于FPGA的系統(tǒng)已經(jīng)在基因數(shù)據(jù)庫搜索時(shí)間上已經(jīng)優(yōu)于超級(jí)精品文檔放心下載計(jì)算機(jī)系統(tǒng)。還有其他公司在制作合適的基于FPGA的硬件,但下面的兩家公司做了一個(gè)謝謝閱讀很好的示范。虛擬電腦公司提供各種產(chǎn)品的使用動(dòng)態(tài)可重構(gòu)SRAM為基礎(chǔ)的賽靈思FPGA。謝謝閱讀他們的8/16位的“虛擬的ISA樣機(jī)板,價(jià)格低于2000美元。感謝閱讀Altera的ARC的PCI(Altera的可重構(gòu)計(jì)算機(jī),PCI總線)謝謝閱讀/pressrel/pr_arc-AlteraFPGA和一個(gè)PCI精品文檔放心下載總線,而不是ISA總線。感謝閱讀代碼只運(yùn)行在Windows和/或DOS下。任何時(shí)候當(dāng)您需要使用它們時(shí),你可感謝閱讀以簡(jiǎn)單地保持并重新啟動(dòng)您主機(jī)PC上的磁盤分區(qū)為DOS/Windows,然而,感謝閱讀許多這些軟件包可以在Linux下使用dosemu或像wine一樣的Windows模感謝閱讀擬器。6.普遍感興趣本節(jié)中所包含的材料,適用于所有四個(gè)并行處理模型的Linux。謝謝閱讀6.1編程語言和編譯器謝謝閱讀為L(zhǎng)inux精品文檔放心下載C代碼的并行操作一般都是有GCC來編譯精品文檔放心下載的。下面的語言/編譯器項(xiàng)目,是指語言中的一些高級(jí)別盡了最大努力從生產(chǎn)走謝謝閱讀精品文檔放心下載GCC編譯的謝謝閱讀C程序…精品文檔放心下載短的開發(fā)時(shí)間,更容易調(diào)試和維修等。感謝閱讀費(fèi)提供的編譯器(其中大部分與Linux并行處理無關(guān))。精品文檔放心下載Fortran66PCFHPF/95z有FortranFortran精品文檔放心下載并不意味著還是處理1966年的ANSI標(biāo)準(zhǔn)一樣的事情,F(xiàn)ortran66是一個(gè)很謝謝閱讀Fortran77精品文檔放心下載字符數(shù)據(jù)和循環(huán)變化。PCF(并行計(jì)算論壇)Fortran試圖增加各種功能的謝謝閱讀支持圖像的并行處理于77型。HPF(高性能的Fortran/和vast_謝謝閱讀不清楚,是否這些編譯器會(huì)用在SMPLinux上,但它有可能會(huì)給出在精品文檔放心下載SMPLinux下工作的標(biāo)準(zhǔn)POSIX線程(即,LinuxThreads謝謝閱讀波特蘭集團(tuán),/,具有商業(yè)并行的可以為SMPLinux生成代碼的HPFFortran謝謝閱讀(和C,C++MPI或PVM的集群。這些在/感謝閱讀的FORGRxHPF產(chǎn)品有可能對(duì)SMP或集群有用。精品文檔放心下載免費(fèi)提供的并行Fortrans可能工作于的并行Linux系統(tǒng)包括:謝謝閱讀1.ADAPTOR(自動(dòng)數(shù)據(jù)并行翻譯,/。Jade和SAMJade是一種并行編程語言,擴(kuò)展了C的功能來適應(yīng)連續(xù)的和必要的程序。感謝閱讀Sam實(shí)施集群工作站PVM感謝閱讀多信息可在.edu/~scales/。Mentat和LegionMentat是一個(gè)可與工作站集群一起運(yùn)行的面向?qū)ο蟮牟⑿刑幚硐到y(tǒng),并且感謝閱讀已被移植到Linux。Mentat編程語言(MPL)是一種建立在C++基礎(chǔ)之上的精品文檔放心下載面向?qū)ο蟮木幊陶Z言。Mentat執(zhí)行系統(tǒng)使用類似于非阻塞遠(yuǎn)程調(diào)用技術(shù)。謝謝閱讀MPL建成使用GCC編譯感謝閱讀MarPar謝謝閱讀的MPL已經(jīng)可以利用AFAPI實(shí)現(xiàn)重定向生成C代碼,因此可以再LinuxSMP精品文檔放心下載/可獲取更多相關(guān)知識(shí)。感謝閱讀Parallaxis-IIIParallaxis-III是一個(gè)結(jié)構(gòu)化編程語言,它為數(shù)據(jù)并行(一個(gè)SIMD模型)感謝閱讀擴(kuò)展了Modula–2Parallaxis軟件包含了使用精品文檔放心下載gdb和xgbd精品文檔放心下載Linux系精品文檔放心下載統(tǒng)上…PVM感謝閱讀更多的信息可見cpuWWW網(wǎng)站/在PC硬件上有技術(shù)全面的概精品文檔放心下載Linux的標(biāo)桿感謝閱讀HOWTO.eduHOWTO/Benchmarking-是一個(gè)很好的開端。精品文檔放心下載英特爾IA32的處理器有許多精致的細(xì)節(jié)特殊寄存器,可用于測(cè)量正感謝閱讀在運(yùn)行的系統(tǒng)性能。英特爾VTune,.perftool使用用途廣發(fā)的性能寄存器感謝閱讀在一個(gè)非常完整的代碼微調(diào)系統(tǒng)…不行的是不能再Linux感謝閱讀載模塊的設(shè)備驅(qū)動(dòng)程序和庫例程寄存器,訪問現(xiàn)有的奔騰性能從精品文檔放心下載mpf97,thatincludesafewmentsaboutMMFP.Apparently,MMFPwillsupporttwo32-bitfl感謝閱讀oating-pointnumberstobepackedintoa64-bitMMXregister;biningthiswithtwoMMFPp謝謝閱讀ipelineswillyieldfoursingle-precisionFLOPsperclock.謝謝閱讀SIMDorvector-styleparallelism.Thesameoperationisappliedtoallfieldss精品文檔放心下載imultaneously.Therearewaystonullifytheeffectsonselectedfields(i.e.,equival感謝閱讀enttoSIMDenablemasking),buttheyplicatecodingandhurtperformance.精品文檔放心下載Localized,regular(preferablypacked),memoryreferencepatterns.SWARing謝謝閱讀eneral,andMMXinparticular,areterribleatrandomly-orderedaccesses;gatheringa感謝閱讀vectorx[y](whereyisanindexarray)isprohibitivelyexpensive.謝謝閱讀Theseareseriousrestrictions,butthistypeofparallelismoccursinmanyparallelal謝謝閱讀gorithms-notjustmultimediaapplications.Fortherighttypeofalgorithm,SWARismo謝謝閱讀reeffectivethanSMPorclusterparallelism...anditdoesn'tcostanythingtouseit.謝謝閱讀4.2IntroductionToSWARProgramming謝謝閱讀ThebasicconceptofSWAR,SIMDWithinARegister,isthatoperationsonword-lengthreg精品文檔放心下載isterscanbeusedtospeed-upputationsbyperformingSIMDparalleloperationsonnk/n謝謝閱讀-bitfieldvalues.However,makinguseofSWARtechnologycanbeawkward,andsomeSWARo精品文檔放心下載perationsareactuallymoreexpensivethanthecorrespondingsequencesofserialoper謝謝閱讀ationsbecausetheyrequireadditionalinstructionstoenforcethefieldpartitionin謝謝閱讀g.Toillustratethispoint,let'sconsideragreatlysimplifiedSWARmechanismthatmana精品文檔放心下載gesfour8-bitfieldswithineach32-bitregister.Thevaluesintworegistersmightber精品文檔放心下載epresentedas:PE3PE2PE1PE0+-------+-------+-------+-------+感謝閱讀Reg0|D7:0|C7:0|B7:0|A7:0|+-------+-------+-------+-------+精品文檔放心下載Reg1|H7:0|G7:0|F7:0|E7:0|+-------+-------+-------+-------+Thissimplyindicatesthateachregisterisviewedasessentiallyavectoroffourindep

endent8-bitintegervalues.Alternatively,thinkofAandEasvaluesinReg0andReg1of

processingelement0(PE0),BandFasvaluesinPE1'sregisters,andsoforth.

TheremainderofthisdocumentbrieflyreviewsthebasicclassesofSIMDparallelopera

tionsontheseintegervectorsandhowthesefunctionscanbeimplemented.

PolymorphicOperationsSomeSWARoperationscanbeperformedtriviallyusingordinary32-bitintegeroperati

ons,withoutconcernforthefactthattheoperationisreallyintendedtooperateindep

endentlyinparallelonthese8-bitfields.WecallanysuchSWARoperationpolymorphic,s

incethefunctionisunaffectedbythefieldtypes(sizes).謝謝閱讀Testingifanyfieldisnon-zeroispolymorphic,asareallbitwiselogicoperations.Fo

rexample,anordinarybitwise-andoperation(C's&operator)performsabitwiseandno

matterwhatthefieldsizesare.Asimplebitwiseandoftheaboveregistersyields:

PE3PE2PE1PE0感謝閱讀+---------+---------+---------+---------+精品文檔放心下載Reg2|D&H7:0|C&G7:0|B&F7:0|A&E7:0|謝謝閱讀+---------+---------+---------+---------+感謝閱讀Becausethebitwiseandoperationalwayshasthevalueofresultbitkaffectedonlybyth精品文檔放心下載evaluesoftheoperandbitkvalues,allfieldsizesaresupportedusingthesamesinglei謝謝閱讀nstruction.PartitionedOperationsUnfortunately,lotsofimportantSWARoperationsarenotpolymorphic.Arithmeticope

rationssuchasadd,subtract,multiply,anddivideareallsubjecttocarry/borrowint感謝閱讀eractionsbetweenfields.WecallsuchSWARoperationspartitioned,becauseeachsuch謝謝閱讀operationmusteffectivelypartitiontheoperandsandresulttopreventinteractions精品文檔放心下載betweenfields.However,thereareactuallythreedifferentmethodsthatcanbeusedto感謝閱讀achievethiseffect.PartitionedInstructionsPerhapsthemostobviousapproachtoimplementingpartitionedoperationsistoprovid謝謝閱讀ehardwaresupportfor"partitionedparallelinstructions"thatcutthecarry/borrow謝謝閱讀logicbetweenfields.Thisapproachcanyieldthehighestperformance,butitrequires精品文檔放心下載achangetotheprocessor'sinstructionsetandgenerallyplacesmanyrestrictionsonf感謝閱讀ieldsize(e.g.,8-bitfieldsmightbesupported,butnot12-bitfields).謝謝閱讀TheAMDIntelMMX,DigitalMAX,HPMAX,andSunVISallimplementrestrictedversionsofp感謝閱讀artitionedinstructions.Unfortunately,thesedifferentinstructionsetextension謝謝閱讀shavesignificantlydifferentrestrictions,makingalgorithmssomewhatnon-portab精品文檔放心下載lebetweenthem.Forexample,considerthefollowingsamplingofpartitionedoperatio感謝閱讀ns:InstructionAMDIntelMMXDECMAXHPMAXSunVIS精品文檔放心下載+---------------------+---------------------+---------+--------+------謝謝閱讀---+|AbsoluteDifference||8||8|感謝閱讀+---------------------+---------------------+---------+--------+------謝謝閱讀---+|MergeMaximum||8,16|||+---------------------+---------------------+---------+--------+------謝謝閱讀---+|Compare|8,16,32|||16,32|+---------------------+---------------------+---------+--------+------精品文檔放心下載---+|Multiply|16|||8x16|+---------------------+---------------------+---------+--------+------感謝閱讀---+|Add|8,16,32||16|16,32|+---------------------+---------------------+---------+--------+------謝謝閱讀---+Inthetable,thenumbersindicatethefieldsizes,inbits,forwhicheachoperationiss感謝閱讀upported.Eventhoughthetableomitsmanyinstructionsincludingallthemoreexotico感謝閱讀nes,itisclearthattherearemanydifferences.Thedirectresultisthathigh-levella感謝閱讀nguages(HLLs)reallyarenotveryeffectiveasprogrammingmodels,andportabilityis謝謝閱讀generallypoor.UnpartitionedOperationsWithCorrectionCode精品文檔放心下載Implementingpartitionedoperationsusingpartitionedinstructionscancertainlyb謝謝閱讀eefficient,butwhatdoyoudoifthepartitionedoperationyouneedisnotsupportedbyt精品文檔放心下載hehardware?Theansweristhatyouuseaseriesofordinaryinstructionstoperformtheo感謝閱讀perationwithcarry/borrowacrossfields,andthencorrectfortheundesiredfieldint感謝閱讀eractions.Thisisapurelysoftwareapproach,andthecorrectionsdointroduceoverhead,butitwo謝謝閱讀rkswithfullygeneralfieldpartitioning.Thisapproachisalsofullygeneralinthati謝謝閱讀tcanbeusedeithertofillgapsinthehardwaresupportforpartitionedinstructions,o謝謝閱讀ritcanbeusedtoprovidefullfunctionalityfortargetmachinesthathavenohardwares感謝閱讀upportatall.Infact,byexpressingthecodesequencesinalanguagelikeC,thisapproa感謝閱讀challowsSWARprogramstobefullyportable.謝謝閱讀Thequestionimmediatelyarises:preciselyhowinefficientisittosimulateSWARpart謝謝閱讀itionedoperationsusingunpartitionedoperationswithcorrectioncode?Well,thati感謝閱讀scertainlythe$64kquestion...butmanyoperationsarenotasdifficultasonemightex謝謝閱讀pect.Considerimplementingafour-element8-bitintegervectoraddoftwosourcevectors,x

+y,usingordinary32-bitoperations.感謝閱讀Anordinary32-bitaddmightactuallyyieldthecorrectresult,butnotifany8-bitfiel

dcarriesintothenextfield.Thus,ourgoalissimplytoensurethatsuchacarrydoesnot

occur.Becauseaddingtwok-bitfieldsgeneratesanatmostk+1bitresult,wecanensurethatnocarryoccursbysimply"maskingout"themostsignificantbitofeachfield.Thisisdonebybitwiseandingeachoperandwith0x7f7f7f7fandthenperforminganordinary32-bitadd.謝謝閱讀t=((x&0x7f7f7f7f)+(y&0x7f7f7f7f));精品文檔放心下載Thatresultiscorrect...exceptforthemostsignificantbitwithineachfield.Comput謝謝閱讀ingthecorrectvalueforeachfieldissimplyamatterofdoingtwo1-bitpartitionedadd謝謝閱讀softhemostsignificantbitsfromxandytothe7-bitcarryresultwhichwasputedfort.Fortunately,a1-bitpartitionedaddisimplementedbyanordinaryexclusiveoroperati

on.Thus,theresultissimply:感謝閱讀(t^((x^y)&0x0))Ok,well,maybethatisn'tsosimple.Afterall,itissixoperationstodojustfouradds.謝謝閱讀However,noticethatthenumberofoperationsisnotafunctionofhowmanyfieldstherea精品文檔放心下載re...so,withmorefields,wegetspeedup.Infact,wemaygetspeedupanywaysimplybeca謝謝閱讀usethefieldswereloadedandstoredinasingle(integervector)operation,registera精品文檔放心下載vailabilitymaybeimproved,andtherearefewerdynamiccodeschedulingdependencies精品文檔放心下載(becausepartialwordreferencesareavoided).精品文檔放心下載ControllingFieldValuesWhiletheothertwoapproachestopartitionedoperationimplementationbothcenteron精品文檔放心下載gettingthemaximumspaceutilizationfortheregisters,itcanbeputationallymoreef精品文檔放心下載ficienttoinsteadcontrolthefieldvaluessothatinter-fieldcarry/borroweventssh謝謝閱讀ouldneveroccur.Forexample,ifweknowthatallthefieldvaluesbeingaddedaresuchth謝謝閱讀atnofieldoverflowwilloccur,apartitionedaddoperationcanbeimplementedusingan謝謝閱讀ordinaryaddinstruction;infact,giventhisconstraint,anordinaryaddinstruction精品文檔放心下載appearspolymorphic,andisusableforanyfieldsizeswithoutcorrectioncode.Theque感謝閱讀stionthusbeeshowtoensurethatfieldvalueswillnotcausecarry/borrowevents.感謝閱讀Onewaytoensurethispropertyistoimplementpartitionedinstructionsthatcanrestr謝謝閱讀icttherangeoffieldvalues.TheDigitalMAXvectorminimumandmaximuminstructionsc精品文檔放心下載anbeviewedashardwaresupportforclippingfieldvaluestoavoidinter-fieldcarry/b謝謝閱讀orrow.However,supposethatwedonothavepartitionedinstructionsthatcanefficientlyres

tricttherangeoffieldvalues...isthereasufficientconditionthatcanbecheaplyim

posedtoensurecarry/borroweventswillnotinterferewithadjacentfields?Theanswe

rliesinanalysisofthearithmeticproperties.Addingtwok-bitnumbersgeneratesare

sultwithatmostk+1bits;thus,afieldofk+1bitscansafelycontainsuchanoperationdespiteusingordinaryinstructions.感謝閱讀Thus,supposethatthe8-bitfieldsinourearlierexamplearenow7-bitfieldswith1-bi謝謝閱讀t"carry/borrowspacers":PE3PE2PE1PE0+----+-------+----+-------+----+-------+----+-------+感謝閱讀Reg0|D'|D6:0|C'|C6:0|B'|B6:0|A'|A6:0|感謝閱讀+----+-------+----+-------+----+-------+----+-------+精品文檔放心下載Avectorof7-bitaddsisperformedasfollows.Letusassumethat,priortothestartofan

ypartitionedoperation,allthecarryspacerbits(A',B',C',andD')havethevalue0.B感謝閱讀ysimplyexecutinganordinaryaddoperation,allthefieldsobtainthecorrect7-bitva精品文檔放心下載lues;however,somespacerbitvaluesmightnowbe1.Wecancorrectthisbyjustonemorec謝謝閱讀onventionaloperation,masking-outthespacerbits.Our7-bitintegervectoradd,x+y,i感謝閱讀sthus:((x+y)&0x7f7f7f7f)Thisisjusttwoinstructionsforfouradds,clearlyyieldinggoodspeedup.謝謝閱讀Thesharpreadermayhavenoticedthatsettingthespacerbitsto0doesnotworkforsubtr謝謝閱讀actoperations.Thecorrectionis,however,remarkablysimple.Toputex-y,wesimplye謝謝閱讀nsuretheinitialconditionthatthespacersinxareall1,whilethespacersinyareall0.I精品文檔放心下載ntheworstcase,wewouldthusget:謝謝閱讀(((x|0x0)-y)&0x7f7f7f7f)However,theadditionalbitwiseoroperationcanoftenbeoptimizedoutbyensuringtha精品文檔放心下載ttheoperationgeneratingthevalueforxused|0x0ratherthan&0x7f7f7f7fasthelasts精品文檔放心下載tep.WhichmethodshouldbeusedforSWARpartitionedoperations?Theanswerissimply"whic

heveryieldsthebestspeedup."Interestingly,theidealmethodtousemaybedifferent

fordifferentfieldsizeswithinthesameprogramrunningonthesamemachine.

Communication&TypeConversionOperations精品文檔放心下載Althoughsomeparallelputations,includingmanyoperationsonimagepixels,havethe

propertythattheithvalueinavectorisafunctiononlyofvaluesthatappearintheithp

ositionoftheoperandvectors,thisisgenerallynotthecase.Forexample,evenpixelo

perationssuchassmoothingrequirevaluesfromadjacentpixelsasoperands,andtrans

formationslikeFFTsrequiremoreplex(lesslocalized)municationpatterns.

Itisnotdifficulttoefficientlyimplement1-dimensionalnearestneighbormunicati

onforSWARusingunpartitionedshiftoperations.Forexample,tomoveavaluefromPEitoPE(i+1),asimpleshiftoperationsuffices.Ifthefieldsare8-bitsinlength,wewoulduse:謝謝閱讀(x<<8)Still,itisn'talwaysquitethatsimple.Forexample,tomoveavaluefromPEitoPE(i-1),a謝謝閱讀simpleshiftoperationmightsuffice...buttheClanguagedoesnotspecifyifshiftsri謝謝閱讀ghtpreservethesignbit,andsomemachinesonlyprovidesignedshiftright.Thus,inth感謝閱讀egeneralcase,wemustexplicitlyzerothepotentiallyreplicatedsignbits:精品文檔放心下載((x>>8)&0x00ffffff)Adding"wrap-aroundconnections"isalsoreasonablyefficientusingunpartitioneds謝謝閱讀hifts.Forexample,tomoveavaluefromPEitoPE(i+1)withwraparound:謝謝閱讀((x<<8)|((x>>24)&0x000000ff))Therealproblemeswhenmoregeneralmunicationpatternsmustbeimplemented.Onlythe

HPMAXinstructionsetsupportsarbitraryrearrangementoffieldswithasingleinstru

ction,whichiscalledPermute.ThisPermuteinstructionisreallymisnamed;notonlyc

anitperformanarbitrarypermutationofthefields,butitalsoallowsrepetition.Ins

hort,itimplementsanarbitraryx[y]operation.謝謝閱讀Unfortunately,x[y]isverydifficulttoimplementwithoutsuchaninstruction.Theco

desequenceisgenerallybothlongandinefficient;infact,itissequentialcode.This

isverydisappointing.Therelativelyhighspeedofx[y]operationsintheMasParMP1/M

P2andThinkingMachinesCM1CM200SIMDsuperputerswasoneofthekeyreasonsthesemach

inesperformedwell.However,x[y]hasalwaysbeenslowerthannearestneighbormunica

tion,evenonthosesuperputers,somanyalgorithmshavebeendesignedtominimizethen

eedforx[y]operations.Inshort,withouthardwaresupport,itisprobablybesttodeve

lopSWARalgorithmsasthoughx[y]wasn'tlegal...oratleastisn'tcheap.RecurrenceOperations(Reductions,Scans,etc.)Arecurrenceisaputationinwhichthereisanapparentlysequentialrelationshipbetw

eenvaluesbeingputed.However,iftheserecurrencesinvolveassociativeoperations,i

tmaybepossibletorecodetheputationusingatree-structuredparallelalgorithm.

Themostmontypeofparallelizablerecurrenceisprobablytheclassknownasassociati

vereductions.Forexample,toputethesumofavector'svalues,onemonlywritespurely

sequentialCcodelike:謝謝閱讀t=0;for(i=0;i<MAX;++i)t+=x[i];感謝閱讀However,theorderoftheadditionsisrarelyimportant.Floatingpointandsaturation感謝閱讀mathcanyielddifferentanswersiftheorderofadditionsischanged,butordinarywrap精品文檔放心下載-aroundintegeradditionswillyieldthesameresultsindependentofadditionorder.T感謝閱讀hus,wecanre-writethissequenceintoatree-structuredparallelsummationinwhichw精品文檔放心下載efirstaddpairsofvalues,thenpairsofthosepartialsums,andsoforth,untilasingle精品文檔放心下載finalsumresults.Foravectoroffour8-bitvalues,justtwoadditionstepsareneeded;謝謝閱讀thefirststepdoestwo8-bitadds,yieldingtwo16-bitresultfields(eachcontaininga感謝閱讀9-bitresult):t=((x&0x00ff00ff)+((x>>8)&0x00ff00ff));感謝閱讀Thesecondstepaddsthesetwo9-bitvaluesin16-bitfieldstoproduceasingle10-bitre精品文檔放心下載sult:((t+(t>>16))&0x000003ff)Actually,thesecondstepperformstwo16-bitfieldadds...butthetop16-bitaddismea謝謝閱讀ningless,whichiswhytheresultismaskedtoasingle10-bitresultvalue.感謝閱讀Scans,alsoknownas"parallelprefix"operations,aresomewhathardertoimplementef精品文檔放心下載ficiently.Thisisbecause,unlikereductions,scansproducepartitionedresults.Fo謝謝閱讀rthisreason,scanscanbeimplementedusingafairlyobvioussequenceofpartitionedo謝謝閱讀perations.4.3MMXSWARUnderLinuxForLinux,IA32processorsareourprimaryconcern.ThegoodnewsisthatAMD,Cyrix,and謝謝閱讀IntelallimplementthesameMMXinstructions.However,MMXperformancevaries;forex精品文檔放心下載ample,theK6hasonlyoneMMXpipeline-thePentiumwithMMXhastwo.Theonlyreallybadn謝謝閱讀ewsisthatIntelisstillrunningthosestupidMMXmercials....;-)謝謝閱讀TherearereallythreeapproachestousingMMXforSWAR:精品文檔放心下載1.UseroutinesfromanMMXlibrary.Inparticular,Intelhasdevelopedseveral"p感謝閱讀erformancelibraries,".tools/,thatofferavarietyofhand-optimizedroutinesform精品文檔放心下載onmultimediatasks.Withalittleeffort,manynon-multimediaalgorithmscanberewor謝謝閱讀kedtoenablesomeofthemostpute-intensiveportionstobeimplementedusingoneormor感謝閱讀eoftheselibraryroutines.TheselibrariesarenotcurrentlyavailableforLinux,but謝謝閱讀couldbeported.2.UseMMXinstructionsdirectly.Thisissomewhatplicatedbytwofacts.Thefirs

tproblemisthatMMXmightnotbeavailableontheprocessor,soanalternativeimplemen

tationmustalsobeprovided.ThesecondproblemisthattheIA32assemblergenerallyus

edunderLinuxdoesnotcurrentlyrecognizeMMXinstructions.精品文檔放心下載3.Useahigh-levellanguageormodulepilerthatcandirectlygenerateappropria精品文檔放心下載teMMXinstructions.Suchtoolsarecurrentlyunderdevelopment,butnoneisyetfullyf感謝閱讀unctionalunderLinux.Forexample,atPurdueUniversity(./~hankd)wearecurrentlyd謝謝閱讀evelopingapilerthatwilltakefunctionswritteninanexplicitlyparallelCdialecta謝謝閱讀ndwillgenerateSWARmodulesthatarecallableasCfunctions,yetmakeuseofwhateverS精品文檔放心下載WARsupportisavailable,includingMMX.Thefirstprototypemodulepilerswerebuilti感謝閱讀nFall1996,however,bringingthistechnologytoausablestateistakingmuchlongerth精品文檔放心下載anwasoriginallyexpected.Insummary,MMXSWARisstillawkwardtouse.However,withalittleextraeffort,thesec

ondapproachgivenabovecanbeusednow.Herearethebasics:謝謝閱讀1.YoucannotuseMMXifyourprocessordoesnotsupportit.ThefollowingGCCcodec感謝閱讀anbeusedtotestifMMXissupportedonyourprocessor.Itreturns0ifnot,non-zeroifit謝謝閱讀issupported.inlineexternintmmx_init(void){intmmx_available;__asm____volatile__(/*GetCPUversioninformation*/感謝閱讀"movl$1,%%eax\n\t""cpuid\n\t""andl$0x800000,%%edx\n\t""movl%%edx,%0":"=q"(mmx_available):/*noinput*/);returnmmx_available;}2.AnMMXregisteressentiallyholdsoneofwhatGCCwouldcallanunsignedlonglon

g.Thus,memory-basedvariablesofthistypebeethemunicationmechanismbetweenyour

MMXmodulesandtheCprogramsthatcallthem.Alternatively,youcandeclareyourMMXda

taasany64-bitaligneddatastructure(itisconvenienttoensure64-bitalignmentbyd

eclaringyourdatatypeasaunionwithanunsignedlonglongfield).謝謝閱讀3.IfMMXisavailable,youcanwriteyourMMXcodeusingthe.byteassemblerdirect

ivetoencodeeachinstruction.Thisispainfulstufftodobyhand,butnotdifficultforapilertogenerate.Forexample,theMMXinstructionPADDBMM0,MM1couldbeencodedastheGCCin-lineassemblycode:精品文檔放心下載__asm____volatile__(".byte0x0f,0xfc,0xc1\n\t");感謝閱讀RememberthatMMXusessomeofthesamehardwarethatisusedforfloatingpointoperatio精品文檔放心下載ns,socodeintermixedwithMMXcodemustnotinvokeanyfloatingpointoperations.Thef感謝閱讀loatingpointstackalsoshouldbeemptybeforeexecutinganyMMXcode;thefloatingpoi感謝閱讀ntstackisnormallyemptyatthebeginningofaCfunctionthatdoesnotusefloatingpoin精品文檔放心下載t.4.ExityourMMXcodebyexecutingtheEMMSinstruction,whichcanbeencodedas:謝謝閱讀__asm____volatile__(".byte0x0f,0x77\n\t");謝謝閱讀Iftheabovelooksveryawkwardandcrude,itis.However,MMXisstillquiteyoung....fu感謝閱讀tureversionsofthisdocumentwillofferbetterwaystoprogramMMXSWAR.精品文檔放心下載5.Linux-HostedAttachedProcessors精品文檔放心下載Althoughthisapproachhasrecentlyfallenoutoffavor,itisvirtuallyimpossiblefor精品文檔放心下載oth

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論