外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究

上傳人：r*** IP屬地：湖北上傳時(shí)間：2024-10-28 格式：DOC 頁(yè)數(shù)：19 大?。?47.50KB 積分：18 舉報(bào) 版權(quán)申訴

外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究_第2頁(yè)

外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究_第3頁(yè)

外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究_第4頁(yè)

外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究_第5頁(yè)

已閱讀5頁(yè)，還剩14頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶(hù)提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

譯文VPR：一種新的包裝，布局和布線(xiàn)工具的FPGA研究

沃恩貝茨和喬納森羅斯

系電氣與計(jì)算機(jī)工程系，多倫多大學(xué)

多倫多，ON，加拿大M5S3G4{沃恩，jayar}@摘要

我們描述了一個(gè)基于FPGA新的功能和CAD工具使用的算法，各種途徑和方（VPR）。在減少路由面積計(jì)算方面，VPR優(yōu)于所有的FPGA布局布線(xiàn)工具，我們可以比較。雖然常用的算法是基于已知的方法，是我們目前而言改善運(yùn)行時(shí)間和質(zhì)量的幾個(gè)有效方法。我們目前的版圖和路由上的大型電路的一套新的結(jié)果，讓未來(lái)的基準(zhǔn)電路尺寸上的設(shè)計(jì)方法更多，用于今天的典型的FPGA布局布線(xiàn)工具工業(yè)品外觀設(shè)計(jì)。VPR是針對(duì)一個(gè)范圍廣泛的FPGA架構(gòu)的能力，并且源代碼是公開(kāi)的。它和相關(guān)的網(wǎng)表翻譯/群集工具VPACK已經(jīng)被用在世界各地的一些研究項(xiàng)目，并且是有用的FPGA體系結(jié)構(gòu)的研究。1簡(jiǎn)介在FPGA的研究中，人們通常必須評(píng)估新結(jié)構(gòu)特色的實(shí)用工具而做評(píng)估實(shí)驗(yàn)。也就是說(shuō)評(píng)估基準(zhǔn)電路技術(shù)映射，放置和FPGA的布線(xiàn)結(jié)構(gòu)上的關(guān)系和措施的架構(gòu)質(zhì)量，如運(yùn)算速度或區(qū)域，然后可以很容易地提取出來(lái)。因此，有相當(dāng)大的對(duì)于靈活CAD工具的需求，這樣才可以針對(duì)各種架構(gòu)的FPGA做高效的設(shè)計(jì)，從而便于比較均勻的設(shè)計(jì)架構(gòu)。本文介紹了通用的地點(diǎn)和路線(xiàn)（VPR）工具，設(shè)計(jì)很靈活，足夠讓許多FPGA架構(gòu)的比較VPR可以執(zhí)行的位置，要么全球路由或合并后的全球詳細(xì)路由。這是公開(kāi)的/?jayar/軟件。為了使FPGA體系結(jié)構(gòu)的比較有意義，它是至關(guān)重要的CAD工具用于將每個(gè)電路架構(gòu)，以地圖的高品質(zhì)展現(xiàn)。路由相優(yōu)于所有的VPR在查看FPGA的路由器方面，任何標(biāo)準(zhǔn)基準(zhǔn)測(cè)試的結(jié)果都可用，并且指出VPR的砂礦和路由器的組合勝過(guò)所有出版的FPGA布局和布線(xiàn)工具。本文結(jié)構(gòu)如下：在第2節(jié)我們描述了一些VPR功能的FPGA架構(gòu)和范圍與它可能被使用的地方。在第3和第4節(jié)，我們描述了布局布線(xiàn)法。在第5節(jié)講述了比較有必要的VPR曲目數(shù)量和該電路成功的布線(xiàn)所要求的其他已發(fā)表的工具。在第6節(jié)得出了我們的結(jié)論，并提出一些VPR將來(lái)的升級(jí)。2概述VPR圖1概括了VPR的CAD流程。VPR投入到由一個(gè)technologymapped網(wǎng)表和一個(gè)文本文件描述了的FPGA架構(gòu)中。VPR可以放置電路，或一個(gè)預(yù)先存在的位置，可以讀入VPR可以執(zhí)行或者是全局的路線(xiàn)或合并后的全球/詳細(xì)的安置途徑。VPR的輸出由布局、布線(xiàn)和統(tǒng)計(jì)組成，評(píng)估一項(xiàng)有用的工具FPGA架構(gòu)，如路由線(xiàn)長(zhǎng)，跟蹤計(jì)數(shù)最大凈長(zhǎng)度。給出一些可指定的建筑結(jié)構(gòu)參數(shù)描述文件：?邏輯塊輸入和輸出的數(shù)量，?對(duì)每個(gè)邏輯塊的輸入和輸出端訪(fǎng)問(wèn)（S）之和?邏輯等價(jià)性不同的輸入和輸出引腳（例如，所有對(duì)照表輸入功能當(dāng)量），?對(duì)I/成一行或一列的FPGA適合O引腳數(shù)，?邏輯塊陣列的尺寸（如23×30的邏輯塊）。此外，如果全球路由要執(zhí)行，你也可以指定：?橫向和縱向通道的相對(duì)寬度之和?在不同區(qū)域的FPGA的渠道相對(duì)寬度。最后，如??果合并后的全球和詳細(xì)的路由被執(zhí)行，一個(gè)也會(huì)進(jìn)行求值：?開(kāi)關(guān)塊[1]架構(gòu)（即為何路由曲目是相互關(guān)聯(lián)的），?曲目號(hào)碼，每個(gè)邏輯塊的輸入引腳連接（[1]），?為邏輯塊輸出FC值，?對(duì)I/O口FC值。當(dāng)前的體系結(jié)構(gòu)描述格式不允許跨越多個(gè)領(lǐng)域和多個(gè)邏輯塊和被列入路由體系結(jié)構(gòu)，但我們目前加入此功能。添加新的路由架構(gòu)的功能VPR相對(duì)容易，因?yàn)閂PR使用體系結(jié)構(gòu)描述來(lái)創(chuàng)建路由資源圖。每個(gè)路由跟蹤和建設(shè)中的每一個(gè)腳成為在這個(gè)圖中的節(jié)點(diǎn)，圖邊表示為允許的連接。路由器，圖形可視化和統(tǒng)計(jì)計(jì)算程序都與此路由資源圖的工作相關(guān)，所以添加新的路由架構(gòu)功能僅涉及更改的子程序來(lái)建設(shè)這個(gè)圖。雖然VPR最初是島式FPGA的開(kāi)發(fā)[2，3]，它也可以和以行為為基礎(chǔ)的FPGA應(yīng)用[4]。VPR目前沒(méi)有能力為目標(biāo)的層次FPGA的[5]，顯然增加一個(gè)適當(dāng)?shù)奈恢煤统杀竞瘮?shù)設(shè)計(jì)所需的布線(xiàn)資源圖形程序?qū)⑹蛊淠軌蚪鉀Q這些問(wèn)題。最后，VPR的內(nèi)置圖形允許交互式可視化的布局，路由可用資源和互連的可能途徑路由資源。VPACK邏輯塊包裝程序/網(wǎng)絡(luò)表翻譯VPACK讀取一個(gè)已經(jīng)技術(shù)映射電路網(wǎng)表格式blif到LUT和觸發(fā)器，包裝成所需的FPGA邏輯LUT和觸發(fā)器塊，并輸出在VPR的網(wǎng)表。VPACK可以針對(duì)邏輯塊組成一個(gè)LUT，如圖2所示，因?yàn)檫@是一種常見(jiàn)的FPGA邏輯元件。VPACK也針對(duì)邏輯塊包含幾個(gè)有用的LUT和幾個(gè)拖動(dòng)程序，有或沒(méi)有共享LUT的輸入[6]。這些“clusterbased”邏輯塊類(lèi)似于最近由AlteraFPGA開(kāi)發(fā)的工具類(lèi)型。3布局算法VPR采用模擬退火算法[7]。我們已經(jīng)嘗試與幾個(gè)不同的成本函數(shù)聯(lián)系，發(fā)現(xiàn)我們稱(chēng)之為線(xiàn)性擠塞的成本函數(shù)提供了一個(gè)合理的計(jì)算時(shí)間，最好的結(jié)果[8]。此成本函數(shù)的函數(shù)形式就是對(duì)所有的求和電路中的網(wǎng)進(jìn)行計(jì)算。對(duì)于每一個(gè)網(wǎng)，北方新宇和bby指出在其邊界框的水平和垂直跨度分別為Q（n）的因數(shù)補(bǔ)償。邊界線(xiàn)長(zhǎng)度模型中的實(shí)際低估所需的布線(xiàn)，就可以看成超過(guò)三個(gè)終端網(wǎng)，作為建議[10]。它的價(jià)值取決于凈N兩端號(hào)碼;Q是對(duì)總體1有3個(gè)或更少的終端，并慢慢增加了50臺(tái)網(wǎng)邏輯與上2.79。賈夫常數(shù)x（n）、?（n）為平均信道容量（在首部）在X和Y方向，分別比較全凈邊框和成本函數(shù)的余量，需要更多的調(diào)配路由的領(lǐng)域，F(xiàn)PGA具有窄渠道。本文中的所有結(jié)果的得到，是利用FPGA中的所有通道都有相同的原則。在這種情況下，賈夫是一個(gè)常數(shù)，函數(shù)的線(xiàn)性阻塞耗費(fèi)降低到一個(gè)包圍盒的成本函數(shù)。一個(gè)良好的退火算法的必要條件是時(shí)間表取得一個(gè)合理的高品質(zhì)的解決方案與模擬退火的計(jì)算時(shí)間相關(guān)聯(lián)。我們已經(jīng)開(kāi)發(fā)出一種新的退火附表，導(dǎo)致非常高品質(zhì)的展示位置，并在其中給出退火參數(shù)的自動(dòng)調(diào)節(jié)功能，不同的成本和電路尺寸。我們計(jì)算在初始溫度相同的方式為[11]。讓Nblocks是總數(shù)邏輯塊加的I/O口電路中的數(shù)量。我們首先創(chuàng)建一個(gè)隨機(jī)安置的電路。接下來(lái)，我們執(zhí)行Nblocks移動(dòng)（成對(duì)掉期）的邏輯塊或I/O口，并計(jì)算出不同的成本，這些Nblocks標(biāo)準(zhǔn)偏差配置。初始溫度設(shè)定為20倍標(biāo)準(zhǔn)差，確保最初幾乎所有的行動(dòng)是在退火算法范圍內(nèi)被系統(tǒng)接受。正如在[12]，默認(rèn)號(hào)碼的行為在每個(gè)溫度都有評(píng)價(jià)。這個(gè)默認(rèn)的數(shù)字可以在命令行被取代，從而讓不同的CPU時(shí)間和填筑質(zhì)量權(quán)衡。減少溫度每秒移動(dòng)數(shù)的10倍，例如，加快安置到10倍，并降低了大約只有10％的最終填筑質(zhì)量。當(dāng)溫度是如此之高，幾乎任何舉動(dòng)都可以被接受時(shí)，我們基本上從一個(gè)位置隨機(jī)移動(dòng)到另一個(gè)位置所改善獲得的成本都是小成本。相反，如果動(dòng)作是很少被接受（因溫度當(dāng)前正處于低位，安置相當(dāng)高的品質(zhì)），也有不少改善成本。有了這個(gè)動(dòng)機(jī)，我們提出了一個(gè)新的溫度更新附表，在溫度增加的時(shí)間花費(fèi)在一個(gè)重要的小區(qū)域上，但不是全部動(dòng)作都被接受。如表1：最后，它表明在[12，13]，這是可取的Raccept保證作為近似0.44的量有可能被取值。為此，就需要利用Raccept值來(lái)控制這個(gè)范圍限制器。塊是小于或等于交匯處的值，Dlimit單位除了在X和Y方向嘗試。一個(gè)小的Dlimit增加值由Raccept確保這僅僅是塊進(jìn)行交換考慮。而這些“本地交換“往往導(dǎo)致安置成本相對(duì)較小的變化，越來(lái)越多被接受的可能性增加。最初，Dlimit設(shè)置為整個(gè)芯片。每當(dāng)溫度降低，Dlimit整個(gè)芯片的尺寸為這個(gè)結(jié)果退火的第一部分，逐漸萎縮退火過(guò)程中的中間階段，并正在為退火低溫第1部分最后設(shè)計(jì)余量，當(dāng)T退火終止“0.005*成本/Nnets。該運(yùn)動(dòng)的邏輯塊總是至少影響到一個(gè)網(wǎng)。當(dāng)溫度高于平均凈成本的一個(gè)單位時(shí)，它是不可能接受任何成本增加的調(diào)配結(jié)果的，所以我們終止了退火。4路由算法VPR的路由器是基于試探談判的擁塞算法[14，8]?；旧显撍惴ㄓ勺畛醺鳁l線(xiàn)路的最短路徑找到網(wǎng)，無(wú)論任何接線(xiàn)段或邏輯塊管腳，都可能會(huì)導(dǎo)致過(guò)度使用。路由器的迭代過(guò)程包含順序抓取行動(dòng)和重新路由（由最低成本路徑中找到）中的每個(gè)電路網(wǎng)。對(duì)使用路由資源成本的函數(shù)，其對(duì)資源的任何過(guò)度使用都會(huì)讓當(dāng)前路由發(fā)生事先迭代。通過(guò)逐漸增加的多余認(rèn)購(gòu)路由資源成本，該算法勢(shì)力替代路線(xiàn)網(wǎng)，以避免使用超額認(rèn)購(gòu)資源，只剩下網(wǎng)最需要一個(gè)給定的資源。對(duì)于本文的實(shí)驗(yàn)結(jié)果，我們?cè)O(shè)置路由器的最大數(shù)量迭代為45，如果電路中路由沒(méi)有成功，一定數(shù)目的目錄中45迭代就被假定為不可路由通道的寬度。為了避免過(guò)于迂回路線(xiàn)以節(jié)省CPU時(shí)間，我們讓一個(gè)去凈路由最外的3個(gè)通道的凈終端邊界框。一個(gè)重要的執(zhí)行細(xì)節(jié)值得一提。無(wú)論是原探路者算法和Vpr路由器使用的Dijkstra算法（即一個(gè)迷宮路由器[15]），以每個(gè)網(wǎng)絡(luò)連接和AK用線(xiàn)網(wǎng)為依據(jù)，路由器調(diào)用通道的k-1次執(zhí)行所有需要的連接。在第一次調(diào)用迷宮路由波從凈源擴(kuò)大，直到它到達(dá)任何的K–1值之后。路徑從源到接收器作為現(xiàn)在這個(gè)網(wǎng)的路由的第一部分。波前的迷宮路由被清空，新波前擴(kuò)展是從整個(gè)網(wǎng)絡(luò)布線(xiàn)開(kāi)始發(fā)出的。之后的K-1路由器的迷宮調(diào)用凈終端將所有k值連接。不幸的是，這種方法需要高扇出網(wǎng)絡(luò)相當(dāng)多的CPU時(shí)間。高扇出網(wǎng)絡(luò)通?？缭酱蟛糠只蛩械腇PGA。因此，后者調(diào)用迷宮路由器的路由部分作為凈源會(huì)非常大，它將需要相當(dāng)長(zhǎng)的時(shí)間以擴(kuò)大迷宮路由器波前部分到下一個(gè)接收器。幸好，有一個(gè)更有效的方法。當(dāng)達(dá)到凈水槽值時(shí)，加入所有路由資源分部需要連接水槽和目前的局部路由成本為0的波前（即擴(kuò)展列表）。當(dāng)前不要空迷宮路由波前，只要保證繼續(xù)擴(kuò)大正常。由于增加新的路徑路由的部分有一個(gè)零成本，由于這項(xiàng)新路徑通常相當(dāng)小迷宮路由器將首先擴(kuò)大它范圍，也需要相對(duì)較少的時(shí)間來(lái)添加此新波，如果整個(gè)波前擴(kuò)展了能實(shí)現(xiàn)那么下一個(gè)接收器將達(dá)到的速度遠(yuǎn)遠(yuǎn)超過(guò)現(xiàn)在。圖3說(shuō)明了差異圖形。5實(shí)驗(yàn)結(jié)果各種FPGA在本節(jié)中使用的參數(shù)，總是選擇與先前參數(shù)有明顯對(duì)比的那些參數(shù)。所得結(jié)果在本節(jié)獲得了邏輯的4輸入LUT加上一個(gè)觸發(fā)器組成的塊，如圖所示在圖2。時(shí)鐘網(wǎng)和時(shí)序電路沒(méi)有遞交，因?yàn)樗ǔＪ锹酚赏ㄟ^(guò)專(zhuān)用FPGA的商業(yè)網(wǎng)絡(luò)中的路由。每個(gè)LUT的輸入出現(xiàn)在一個(gè)邏輯塊的一面，而邏輯塊輸出一般訪(fǎng)問(wèn)底部和右側(cè)，如圖4。每個(gè)邏輯塊的輸入或輸出連接任何相鄰?fù)ǖ溃╯）（即Fc的=寬）。每根電線(xiàn)段和其他布線(xiàn)連接到三段，而在通道交叉口（即值=3）和開(kāi)關(guān)箱拓?fù)涫恰安幌嘟弧边@是因?yàn)樵?磁道接線(xiàn)段只連接在0磁道的其他布線(xiàn)段。5.1實(shí)驗(yàn)結(jié)果與輸入引腳Doglegs以往大多數(shù)FPGA布線(xiàn)結(jié)果認(rèn)為“輸入引腳doglegs”是可能。如果輸入引腳之間的音軌和它連接接線(xiàn)盒的Fc通過(guò)獨(dú)立的SRAM位控制晶體所組成，為了驗(yàn)證兩條軌道上的這些開(kāi)關(guān)通過(guò)電氣連接的可能性。我們將把這個(gè)作為一個(gè)輸入管腳doglegs。作為商業(yè)化的FPGA，實(shí)現(xiàn)從一個(gè)輸入引腳接線(xiàn)盒到多路通道，只有一個(gè)軌道可以連接到輸入引腳，使用多路復(fù)用器而不是獨(dú)立通過(guò)在FPGA中的晶體管布局來(lái)保存相當(dāng)?shù)拿娣e。另外，通常有一個(gè)緩沖軌道之間的連接塊和它連接多路復(fù)用這樣做的目的是為了提高速度，同時(shí)這也意味著緩沖輸入引腳doglegs不能被使用。因此，如果在未來(lái)FPGA的路由器測(cè)試時(shí)沒(méi)有輸入引腳doglegs那么我們必須讓輸入引腳doglegs和過(guò)去??的結(jié)果公平的比較這樣是最好的。在本節(jié)中我們比較了所需的最低數(shù)目，每一條成功的路徑和CAD工具的路由設(shè)置。所有的基準(zhǔn)circuits.1在表2給出結(jié)果，得到了路由Altor[16]，制作了一個(gè)基于位置的工具min。列出三兩步（全球和詳細(xì)）路由與其它路由器進(jìn)行合并后的全球和詳細(xì)的路由。

VPR要求比第二，第三最佳路由器降低10％的資源數(shù)目，表3列出了音軌需要執(zhí)行這些標(biāo)準(zhǔn)時(shí)數(shù)新的CAD工具，同時(shí)允許地方和路線(xiàn)的電路的連接。列出所有電路邏輯快的消息清單。VPR使用少于13％資源數(shù)目的同時(shí)，它將執(zhí)行合并后的全球和詳細(xì)的路由，世嘉比用于執(zhí)行詳細(xì)路由對(duì)AAVPR生成全版圖走線(xiàn)。執(zhí)行安置和全局路由，在試圖改善繞線(xiàn)同時(shí)需要超過(guò)87％以上VPR總資源數(shù)目。最后，讓VPR配置電路而不是強(qiáng)迫它使用Altor內(nèi)存來(lái)減少資源數(shù)目的40％，這表明VPR的模擬退火算法單元遠(yuǎn)較Altor最小單元更好。5.2不輸入引腳的Doglegs實(shí)驗(yàn)比較了VPR與SPLACE/SROUTE工具，不允許輸入引腳doglegs的性能。當(dāng)這兩個(gè)工具都只能使用路線(xiàn)一，比起SROUTE軌道Altor產(chǎn)生的安置需求VPR減少13％，。當(dāng)然這些工具都支持允許布局和布線(xiàn)的電路，對(duì)于SPLACE/SROUTE組合VPR還需要少29％資源數(shù)目。無(wú)論是基于VPR和SPLACE只要是使用模擬退火算法，我們相信VPR單元在一方面優(yōu)于SPLACE是因?yàn)樗幚砀呱瘸鼍W(wǎng)絡(luò)更有效率，讓更多的動(dòng)作進(jìn)行評(píng)估，另一方面是因?yàn)樗行У耐嘶饡r(shí)間表給定的時(shí)間。大電路5.3實(shí)驗(yàn)結(jié)果在第5.1和5.2的54至358的邏輯基準(zhǔn)塊范圍內(nèi)使用面積計(jì)算顯然太小，因?yàn)檫@是特殊的FPGA。因此在本節(jié)中我們目前的實(shí)驗(yàn)結(jié)果，20個(gè)最大的MCNC基準(zhǔn)電路[27]，它的大小范圍從1047到8383邏輯塊。我們使用Flowmap[28]以技術(shù)圖每4個(gè)LUT和拖動(dòng)塊并為VPACKtocombine拖動(dòng)塊，進(jìn)入我們的基本邏輯電路塊LUT。I/O引腳數(shù)每行或列適合設(shè)置為2，符合目前的商業(yè)化FPGA。每個(gè)電路被放置在最小的正方形FPGA可以包含它的路由并且輸入引腳doglegs是不允許的。請(qǐng)注意三個(gè)基準(zhǔn)bigkey，DES和dsip，是padlimited要求在FPGA架構(gòu)表5比較資源數(shù)量的地方，在完全路線(xiàn)電路與全版圖范圍內(nèi)所需地點(diǎn)與路線(xiàn)的電路與數(shù)字VPR，然后進(jìn)行詳細(xì)的路由世嘉[23]。表5還給出了大小每個(gè)邏輯塊的數(shù)量計(jì)算電路。在世嘉列中的條目3仿真無(wú)法成功，因?yàn)槭兰芜\(yùn)行路由內(nèi)存不足。由VPR增加路由產(chǎn)生的全版圖航線(xiàn)曲目總數(shù)，有超過(guò)所需68％路線(xiàn)的電路主場(chǎng)由VPR路由完全執(zhí)行。顯然，世嘉處理無(wú)法進(jìn)行。因?yàn)槁酚纱箅娐樊?dāng)輸入引腳doglegs是不允許的。為了鼓勵(lì)其它FPGA研究人員公布的結(jié)果，以這些大型路由基準(zhǔn)，我們發(fā)出以下“FPGA的挑戰(zhàn)?！泵看悟?yàn)證結(jié)果跳動(dòng)的最好驗(yàn)證先前對(duì)這些基準(zhǔn)結(jié)果公布，我們將每條信息支付1美元給作者（對(duì)不起，1元加幣。，而不是1美元），由他們來(lái)處理如果減少需要跟蹤的總數(shù)。該技術(shù)映射網(wǎng)表，由VPR生成和投放位置的目前最全的跟蹤路由在/?jayar/software.html。上可以找到6結(jié)論和未來(lái)工作我們已經(jīng)提出了一個(gè)優(yōu)于所有這類(lèi)工具的新的FPGA布局布線(xiàn)工具，它讓我們可以進(jìn)行直接的比較。此外，我們已經(jīng)提出更大的電路基準(zhǔn)測(cè)試結(jié)果。建立專(zhuān)門(mén)用于描述精密學(xué)術(shù)的FPGA布局布線(xiàn)工具。我們希望下一代的FPGACAD工具將優(yōu)化這些大型基點(diǎn)，因?yàn)樗麄兪且幌盗忻芮械膯?wèn)題被映射成今天的FPGA。VPR的主要設(shè)計(jì)目標(biāo)之一是保持足夠的靈活性，允許工具使用在很多FPGA架構(gòu)的研究上。我們目前正進(jìn)行幾個(gè)VPR改進(jìn)，才能進(jìn)一步提高其在FPGA架構(gòu)的研究能力。在不久的將來(lái)VPR將支持緩沖和分段路由結(jié)構(gòu)，我們計(jì)劃增加定時(shí)分析儀和時(shí)序驅(qū)動(dòng)的路由。外文原文VPR:ANewPacking，PlacementandRoutingToolforFPGAResearch1VaughnBetzandJonathanRoseDepartmentofElectricalandComputerEngineering，UniversityToronto，ON，CanadaM5S3G4{vaughn，jayar}@AbstractWedescribethecapabilitiesofandalgorithmsusedinanewFPGACADtool，VersatilePlaceandRoute(VPR).Intermsofminimizingroutingarea，VPRoutperformsallpublishedFPGAplaceandroutetoolstowhichwecancompare.Althoughthealgorithmsusedarebasedonpreviouslyknownapproaches，wepresentseveralenhancementsthatimproverun-timeandquality.WepresentplacementandroutingresultsonanewsetoflargecircuitstoallowfuturebenchmarkcomparisonsofFPGAplaceandroutetoolsoncircuitsizesmoretypicaloftoday’sindustrialdesigns.VPRiscapableoftargetingabroadrangeofFPGAarchitectures，andthesourcecodeispubliclyavailable.Itandtheassociatednetlisttranslation/clusteringtoolVPACKhavealreadybeenusedinanumberofresearchprojectsworldwide，andshouldbeusefulinmanyareasofFPGAarchitectureresearch.1IntroductionInFPGAresearch，onemusttypicallyevaluatetheutilityofnewarchitecturalfeaturesexperimentally.Thatis，benchmarkcircuitsaretechnologymapped，placedandroutedontotheFPGAarchitecturesofinterest，andmeasuresofthearchitecture’squality，suchasspeedorarea，canthenreadilybeextracted.Accordingly，thereisconsiderableneedforflexibleCADtoolsthatcantargetawidevarietyofFPGAarchitecturesefficiently，andhenceallowfaircomparisonsofthearchitectures.ThispaperdescribestheVersatilePlaceandRoute(VPR)tool，whichhasbeendesignedtobeflexibleenoughtoallowcomparisonofmanydifferentFPGAarchitectures.VPRcanperformplacementandeitherglobalroutingorcombinedglobalanddetailedrouting.Itispubliclyavailablefrom/~jayar/software.html.InordertomakemeaningfulFPGAarchitecturecomparisons，itisessentialthattheCADtoolsusedtomapcircuitsintoeacharchitectureareofhighquality.TheroutingphaseofVPRoutperformsallpreviouslypublishedFPGAroutersforwhichstandardbenchmarksresultsareavailable，andthatthecombinationofVPR’splacerandrouteroutperformsallpublishedcombinationsofFPGAplacementandroutingtools.2Theorganizationofthispaperisasfollows.InSection2wedescribesomeofthefeaturesofVPRandtherangeofFPGAarchitectureswithwhichitmaybeused.InSections3and4wedescribetheplacementandroutingalgorithms.InSection5，wecomparethenumberoftracksrequiredbyVPRtosuccessfullyroutecircuitswiththatrequiredbyotherpublishedtools.InSection6weconcludeandoutlinesomefutureenhancementswhichwillbemadetoVPR.2OverviewofVPRFigure1outlinestheVPRCADflow.TheinputstoVPRconsistofatechnologymappednetlistandatextfiledescribingtheFPGAarchitecture.VPRcanplacethecircuit，orapre-existingplacementcanbereadin.VPRcanthenperformeitheraglobalrouteoracombinedglobal/detailedrouteoftheplacement.VPR’soutputconsistsoftheplacementandrouting，aswellasstatisticsusefulinassessingtheutilityofanFPGAarchitecture，suchasroutedwirelength，trackcount，andmaximumnetlength.Someofthearchitecturalparametersthatcanbespecifiedinthearchitecturedescriptionfileare:?thenumberoflogicblockinputsandoutputs，?theside(s)ofthelogicblockfromwhicheachinputandoutputisaccessible，?thelogicalequivalencebetweenvariousinputandoutputpins(e.g.allLUTinputsarefunctionallyequivalent)，?thenumberofI/OpadsthatfitintooneroworonecolumnoftheFPGA，and?thedimensionsofthelogicblockarray(e.g.23x30logicblocks).Inaddition，ifglobalroutingistobeperformed，onecanalsospecify:?therelativewidthsofhorizontalandverticalchannels，and?therelativewidthsofthechannelsindifferentregionsoftheFPGA.Finally，ifcombinedglobalanddetailedroutingistobeperformed，onealsospecifies:?theswitchblock[1]architecture(i.e.howtheroutingtracksareinterconnected)，?thenumberoftrackstowhicheachlogicblockinputpinconnects(Fc[1])，?theFcvalueforlogicblockoutputs，and?theFcvalueforI/Opads.Thecurrentarchitecturedescriptionformatdoesnotallowsegmentsthatspanmorethanonelogicblocktobeincludedintheroutingarchitecture，butwearepresentlyaddingthisfeature.AddingnewroutingarchitecturefeaturestoVPRisrelativelyeasy，sinceVPRusesthearchitecturedescriptiontocreatearoutingresourcegraph.Everyroutingtrackandeverypininthearchitecturebecomesanodeinthisgraph，andthegraphedgesrepresenttheallowableconnections.Therouter，graphicsvisualiza-tionandstatisticscomputationroutinesallworkonlywiththisroutingresourcegraph，soaddingnewroutingarchitecturefeaturesonlyinvolveschangingthesubroutinesthatbuildthisgraph.AlthoughVPRwasinitiallydevelopedforisland-styleFPGAs[2，3]，itcanalsobeusedwithrow-basedFPGAs[4].VPRisnotcurrentlycapableoftargetinghierarchicalFPGAs[5]，althoughaddinganappropriateplacementcostfunctionandtherequiredroutingresourcegraphbuildingroutineswouldallowittotargetthem.Finally，VPR’sbuilt-ingraphicsallowinteractivevisualizationoftheplacement，therouting，theavailableroutingresourcesandthepossiblewaysofinterconnectingtheroutingresources.2.1TheVPACKLogicBlockPacker/NetlistTranslatorVPACKreadsinablifformatnetlistofacircuitthathasbeentechnology-mappedtoLUTsandflip-flops，packstheLUTsandflipflopsintothedesiredFPGAlogicblock，andoutputsanetlistinVPR’snetlistformat.VPACKcantargetalogicblockconsistingofoneLUTandoneFF，asshowninFigure2，asthisisacommonFPGAlogicelement.VPACKisalsocapableoftargetinglogicblocksthatcontainseveralLUTsandseveralflipflops，withorwithoutsharedLUTinputs[6].These“clusterbased”logicblocksaresimilartothoseemployedinrecentFPGAsbyAltera，XilinxandLucentTechnologies.PlacementAlgorithmVPRusesthesimulatedannealingalgorithm[7]forplacement.Wehaveexperimentedwithseveraldifferentcostfunctions，andfoundthatwhatwecallalinearcongestioncostfunctionprovidesthebestresultsinareasonablecomputationtime[8].Thefunctionalformofthiscostfunctioniswherethesummationisoverallthenetsinthecircuit.Foreachnet，bbxandbbydenotethehorizontalandverticalspansofitsboundingbox，respectively.Theq(n)factorcompensatesforthefactthattheboundingboxwirelengthmodelunderestimatesthewiringnecessarytoconnectnetswithmorethanthreeterminals，assuggestedin[10].Itsvaluedependsonthenumberofterminalsofnetn;qis1fornetswith3orfewerterminals，andslowlyincreasesto2.79fornetswith50terminals.Cav，x(n)andCav，y(n)aretheaveragechannelcapacities(intracks)inthexandydirections，respectively，overtheboundingboxofnetn.ThiscostfunctionpenalizesplacementswhichrequiremoreroutinginareasoftheFPGAthathavenarrowerchannels.Alltheresultsinthispaper，however，areobtainedwithFPGAsinwhichallchannelshavethesamecapacity.InthiscaseCavisaconstantandthelinearcongestioncostfunctionreducestoaboundingboxcostfunction.Agoodannealingscheduleisessentialtoobtainhigh-qualitysolutionsinareasonablecomputationtimewithsimulatedannealing.Wehavedevelopedanewannealingschedulewhichleadstoveryhigh-qualityplacements，andinwhichtheannealingparametersautomaticallyadjusttodifferentcostfunctionsandcircuitsizes.Wecomputetheinitialtemperatureinamannersimilarto[11].LetNblocksbethetotalnumberoflogicblocksplusthenumberofI/Opadsinacircuit.Wefirstcreatearandomplacementofthecircuit.NextweperformNblocksmoves(pairwiseswaps)oflogicblocksorI/Opads，andcomputethestandarddeviationofthecostoftheseNblocksdifferentconfigurations.Theinitialtemperatureissetto20timesthisstandarddeviation，ensuringthatinitiallyvirtuallyanymoveisacceptedatthestartoftheanneal.Asin[12]，thedefaultnumberofmovesevaluatedateachtemperatureis.Thisdefaultnumbercanbeoverriddenonthecommandline，however，toallowdifferentCPUtime/placementqualitytradeoffs.Reducingthenumberofmovespertemperaturebyafactorof10，forexample，speedsupplacementbyafactorof10andreducesfinalplacementqualitybyonlyabout10%.Whenthetemperatureissohighthatalmostanymoveisaccepted，weareessentiallymovingrandomlyfromoneplacementtoanotherandlittleimprovementincostisobtained.Conversely，ifveryfewmovesarebeingaccepted(duetothetemperaturebeinglowandthecurrentplacementbeingoffairlyhighquality)，thereisalsolittleimprovementincost.Withthismotivationinmind，weproposeanewtemperatureupdateschedulewhichincreasestheamountoftimespentattemperatureswhereasignificantfractionof，butnotall，movesarebeingaccepted.AnewtemperatureiscomputedasTnew=aTold，wherethevalueofadependsonthefractionofattemptedmovesthatwereaccepted(Raccept)atTold，asshowninTable1.Finally，itwasshownin[12，13]thatitisdesirabletokeepRacceptnear0.44foraslongaspossible.WeaccomplishthisbyusingthevalueofRaccepttocontrolarangelimiter--onlyinterchangesofblocksthatarelessthanorequaltoDlimitunitsapartinthexandydirectionsareattempted.AsmallvalueofDlimitincreasesRacceptbyensuringthatonlyblockswhichareclosetogetherareconsideredforswapping.These“l(fā)ocalswaps”tendtoresultinrelativelysmallchangesintheplacementcost，increasingtheirlikelihoodofacceptance.Initially，Dlimitissettotheentirechip.Wheneverthetemperatureisreduced，thevalueofDlimitisupdatedaccordingto，andthenclampedtotherange1￡Dlimit￡maximumFPGAdimension.ThisresultsinDlimitbeingthesizeoftheentirechipforthefirstpartoftheanneal，shrinkinggraduallyduringthemiddlestagesoftheanneal，andbeing1forthelow-temperaturepartoftheanneal.Finally，theannealisterminatedwhenT<0.005*Cost/Nnets.Themovementofalogicblockwillalwaysaffectatleastonenet.Whenthetemperatureislessthanasmallfractionoftheaveragecostofanet，itisunlikelythatanymovethatresultsinacostincreasewillbeaccepted，soweterminatetheanneal.RoutingAlgorithmVPR’srouterisbasedonthePathfindernegotiatedcongestionalgorithm[14，8].Basically，thisalgorithminitiallyrouteseachnetbytheshortestpathitcanfind，regardlessofanyoveruseofwiringsegmentsorlogicblockpinsthatmayresult.Oneiterationoftherouterconsistsofsequentiallyripping-upandre-routing(bythelowestcostpathfound)everynetinthecircuit.Thecostofusingaroutingresourceisafunctionofthecurrentoveruseofthatresourceandanyoverusethatoccurredinpriorroutingiterations.Bygraduallyincreasingthecostofoversubscribedroutingresources，thealgorithmforcesnetswithalternativeroutestoavoidusingoversubscribedresources，leavingonlythenetthatmostneedsagivenresourcebehind.Fortheexperimentalresultsinthispaperwesetthemaximumnumberofrouteriterationsto45;ifacircuithasnotsuccessfullyroutedinagivennumberoftracksin45iterationsitisassumedtobeunroutablewithchannelsofthatwidth.ToavoidoverlycircuitousroutesandtosaveCPUtime，weallowtheroutingofanettogoatmost3channelsoutsidetheboundingboxofthenetterminals.Oneimportantimplementationdetaildeservesmention.BoththeoriginalPathfinderalgorithmandVPR’srouteruseDijkstra’salgorithm(i.e.amazerouter[15])toconnecteachnet.Forakterminalnet，themazerouterisinvokedk-1timestoperformalltherequiredconnections.Inthefirstinvocation，themazeroutingwavefrontexpandsoutfromthenetsourceuntilitreachesanyoneofthek-1netsinks.Thepathfromsourcetosinkisnowthefirstpartofthisnet’srouting.Themazeroutingwavefrontisemptied，andanewwavefrontexpansionisstartedfromtheentirenetroutingfoundthusfar.Afterk-1invocationsofthemazerouterallkterminalsofthenetwillbeconnected.Unfortunately，thisapproachrequiresconsiderableCPUtimeforhigh-fanoutnets.High-fanoutnetsusuallyspanmostoralloftheFPGA.Therefore，inthelatterinvocationsofthemazerouterthepartialroutingusedasthenetsourcewillbeverylarge，anditwilltakealongtimetoexpandthemazerouterwavefrontouttothenextsink.Fortunatelythereisamoreefficientmethod.Whenanetsinkisreached，addalltheroutingresourcesegmentsrequiredtoconnectthesinkandthecurrentpartialroutingtothewavefront(i.e.theexpansionlist)withacostof0.Donotemptythecurrentmazeroutingwavefront;justcontinueexpandingnormally.Sincethenewpathaddedtothepartialroutinghasacostofzero，themazerouterwillexpandarounditatfirst.Sincethisnewpathistypicallyfairlysmall，itwilltakerelativelylittletimetoaddthisnewwavefront，andthenextsinkwillbereachedmuchmorequicklythaniftheentirewavefrontexpansionhadbeenstartedfromscratch.Figure3illustratesthedifferencegraphically.5ExperimentalResultsThevariousFPGAparametersusedinthissectionwerealwayschosentoallowadirectcomparisonwithpreviouslypublishedresults.Alltheresultsinthissectionwereobtainedwithalogicblockconsistingofa4-inputLUTplusaflipflop，asshowninFigure2.Theclocknetwasnotroutedinsequentialcircuits，asitisusuallyroutedviaadedicatedroutingnetworkincommercialFPGAs.EachLUTinputappearsononesideofthelogicblock，whilethelogicblockoutputisaccessiblefromboththebottomandrightsides，asshowninFigure4.Eachlogicblockinputoroutputcanconnecttoanytrackintheadjacentchannel(s)(i.e.Fc=W).Eachwiresegmentcanconnecttothreeotherwiringsegmentsatchannelintersections(i.eFs=3)andtheswitchboxtopologyis“disjoint”--thatis，awiringsegmentintrack0connectsonlytootherwiringsegmentsintrack0andsoon.5.1ExperimentalResultswithInputPinDoglegsMostpreviousFPGAroutingresultshaveassumedthat“inputpindoglegs”arepossible.IftheconnectionboxbetweenaninputpinandthetrackstowhichitconnectsconsistsofFcindependentpasstransistorscontrolledbyFcSRAMbits，itispossibletoturnontwooftheseswitchesinordertoelectricallyconnecttwotracksviatheinputpin.Wewillrefertothisasaninputpindogleg.CommercialFPGAs，however，implementtheconnectionboxfromaninputpintoachannelviaamultiplexer，soonlyonetrackmaybeconnectedtotheinputpin.UsingamultiplexerratherthanindependentpasstransistorssavesconsiderableareaintheFPGAlayout.Aswell，normallythereisabufferbetweenatrackandtheconnectionblockmultiplexerstowhichitconnectsinordertoimprovespeed;thisbufferalsomeansthatinputpindoglegscannotbeused.Therefore，whileweallowinputpindoglegsinthissectioninordertomakeafaircomparisonwithpastresults，itwouldbebestifinthefutureFPGAroutersweretestedwithoutinputpindoglegs.InthissectionwecomparetheminimumnumberoftracksperchannelrequiredforasuccessfulroutingbyvariousCADtoolsonasetof9benchmarkcircuits.1AlltheresultsinTable2areobtainedbyroutingaplacementproducedbyAltor[16]，amincutbasedplacementtool.Threeofthecolumnsconsistoftwo-step(globalthendetailed)routing，whiletheotherroutersperformcombinedglobalanddetailedrouting.VPRrequires10%fewertracksthanthesecondbestrouter，andthethirdbestrouterconsistsofVPR’sglobalroutephaseplusSEGAfordetailedrouting.Table3liststhenumberoftracksrequiredtoimplementthesebenchmarkswhennewCADtoolsareallowedtobothplaceandroutethecircuits.Thesizecolumnliststhenumberoflogicblocksineachcircuit.VPRuses13%fewertrackswhenitperformscombinedglobalanddetailedroutingthanitdoeswhenSEGAisusedtoperformdetailedroutingonaaVPR-generatedglobalroute.FPR，whichperformsplacementandglobalroutingsimultaneouslyinanattempttoimproveroutability，requires87%moretotaltracksthanVPR.Finally，allowingVPRtoplacethecircuitsinsteadofforcingittousetheAltorplacementsreducesthenumberoftracksVPRrequirestoroutethemby40%，indicatingthatVPR’ssimulatedannealingbasedplacerisconsiderablybetterthantheAltormin-cutplacer.5.2ExperimentalResultsWithoutInputPinDoglegsTable4comparestheperformanceofVPRwiththatoftheSPLACE/SROUTEtoolset，whichdoesnotallowinputpindoglegs.WhenbothtoolsareonlyallowedtorouteanAltor-generatedplacementVPRrequires13%fewertracksthanSROUTE.Whenthetoolsareallowedtobothplaceandroutethecircuits，VPRrequires29%fewertracksthantheSPLACE/SROUTEcombination.BothVPRandSPLACEarebasedonsimulatedannealing.WebelievetheVPRplaceroutperformsSPLACEpartiallybecauseithandleshigh-fanoutnetsmoreefficiently，allowingmoremovestobeevaluatedinagiventime，andpartiallybecauseofitsmoreefficientannealingschedule.5.3ExperimentalResultsonLargeCircuitsThebenchmarksusedinSections5.1and5.2rangeinsizefrom54to358logicblocks，andaccordinglyaretoosmalltobeveryrepresentativeoftoday’sFPGAs.Therefore，inthissectionwepresentexperimentalresultsforthe20largestMCNCbenchmarkcircuits[27]，whichrangeinsizefrom1047to8383logicblocks.WeuseFlowmap[28]totechnologymapeachcircuitto4-LUTsandflipflops，andVPACKtocombineflipflopsandLUTsintoourbasiclogicblock.ThenumberofI/Opadsthatfitperroworcolumnissetto2，inlinewithcurrentcommercialFPGAs.EachcircuitisplacedandroutedinthesmallestsquareFPGAwhichcancontainit.Inputpindoglegsarenotallowed.Notethatthreeofthebenchmarks，bigkey，des，anddsip，arepadlimitedintheFPGAarchitectureassumed.Table5comparesthenumberoftracksrequiredtoplaceandcompletelyroutecircuitswithVPRwiththenumberrequiredtoplaceandgloballyroutethecircuitswithVPRandthenperformdetailedroutingwithSEGA[23].Table5alsogivesthesizeofeachcircuit，intermsofthenumberoflogicblocks.TheentriesintheSEGAcolumnwitha3signcouldnotbesuccessfullyroutedbecauseSEGAranoutofmemory.UsingSEGAtoperformdetailedroutingonaglobalroutegeneratedbyVPRincreasesthetotalnumberoftracksrequiredtoroutethecircuitsbyover68%vs.havingVPRperformtheroutingcompletely.ClearlySEGAhasdifficultyroutinglargecircuitswheninputpindoglegsarenotallowed.ToencourageotherFPGAresearcherstopublishroutingresultsusingtheselargerbenchmarks，weissuethefollowing“FPGAchallenge.”Eachtimeverifiedresultswhichbeatthepreviouslybestverifiedresultsonthesebenchmarksareannounced，wewillpaytheauthors$1(sorry，$1Cdn.，not$1U.S.)foreachtrackbywhichtheyreducethetotalnumberoftracksrequiredfromthatofthepreviouslybestresults.Thetechnology-mappednetlists，theplacementsgeneratedbyVPRandthecurrentlybestroutingtracktotalareavailableat/~jayar/software.html.6ConclusionsandFutureWorkWehavepresentedanewFPGAplacementandroutingtoolthatoutperformsallsuchtoolstowhichwecanmakedirectcomparisons.InadditionwehavepresentedbenchmarkresultsonmuchlargercircuitsthanhavetypicallybeenusedtocharacterizeacademicFPGAplaceandroutetools.WehopethenextgenerationofFPGACADtoolswillbecomparedonthebasisoftheselargerbenchmarks，astheyareacloserapproximationofthekindofproblemsbeingmappedintotoday’sFPGAs.OneofthemaindesigngoalsforVPRwastokeepthetoolflexibleenoughtoallowitsuseinmanyFPGAarchitecturalstudies.WearecurrentlyworkingonseveralimprovementstoVPRtofurtherincreaseitsutilityinFPGAarchitectureresearch.InthenearfutureVPRwillsupportbufferedandsegmentedroutingstructures，andsoonafterthatweplantoaddatiminganalyzerandtiming-drivenrouting.References[1]S.Brown，R.Francis，J.Rose，andZ.Vranesic，F(xiàn)ield-ProgrammableGateAr

人人文庫(kù)> 全部分類(lèi)> 應(yīng)用文書(shū) > 研究報(bào)告

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

外文文獻(xiàn)翻譯-一種新的包裝布局和布線(xiàn)工具的FPGA研究

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔