版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、Port AMSS-NCKU code to GPU Zhoujian Cao Academy of Mathematics and System Science, CAS Cowork with Zhihui Du, Steven Brandt, Frank Loeffler and Quan Yang 2013-8-72013 International School on Numerical Relativity and Gravitational Waves, Pohang KoreaOutlineMotivations from gravitational wave detectio
2、nNew parallel mesh refinement numerical schemeGPU acceleration for NRSummaryThe most stringent test of GRthe anomalous precession of theperihelion of Mercury (1915, v )Deflection of Starlight (1919, v )gravitational redshift (1965, v )gravitational time delayeffect (1968, v )EvidenceofGravitational
3、Waves (1978, v )frame-draggingeffect (2010, v )Direct gravitational wave detection (?, v1)GR = Newtonian Gravity + PN(v) + PN(v2) + Gravitational wave astronomySearch back to extremely early universe Hear the dark universe Gravitational wave and its detectionCategory of Black HolesSuper massive blac
4、k hole: M: 105109 MsunStellar massive black hole: M: 1-10s MsunIntermediate massive black hole: M: 10s105 Msun (mainly in globular cluster)Farrell, et al, Nature 460 (2009) 73; Feng, et al, New Astronomy Reviews 55 (2011) 166Category of Black Holes BinaryIMBHALIAXuefei Gong, et al, CQG 28, 094012 (2
5、011)1:10001:1Advanced LIGOAbadie, et al, PRD 85, 102004 (2012)IMBH and GW detectionData analysis and templateRef to Sang Hoon Ohs lectureTemplate model for BBH?Yi Pans talk, 2013Template model for BBHPN templates: for early stage of inspirallingEOBNR (effective one body model together with numerical
6、 relativity): for full inspiral + merger + ring down stage; works well for mass ratio less than 1:8 and extreme mass ratio BBH, high spinning, precession!But no reliable template for mass ratio 1:10 to 1:100From a given separation of the two BHs, when mass ratio increases the number of orbit increas
7、es quickly. This requires that the numerical simulation with full GR increases much consequently. In contrast to 1:1, 1:100 needs 10 times more computation cost.PN estimationComputational cost1:1, 9 days1:100, 20 daysLSSC cluster II, 128 CPUs, for last 2 orbits computational cost 1 to 20!Challenge o
8、f large mass BBH to NRCompared to 1:1, the computational cost of 1:100 BBH increase roughly 200 times!For typical simulation of 1:1 BBH, 14 days are needed. So by straight forward method to 1:100, roughly 1year is needed!Possible ways out1. Physical level: approximation method, such as self force fr
9、ame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Possible ways out1. Physical level: approximation method, such as sel
10、f force frame work (but still first order yet), 2. Numerical Algorithm level: implicit scheme R. Lau et al, PRD 84, 084023 (2011), combine Cauchy evolution to null evolution, 3. Computer level: improve scalability to use more CPUs, use GPU, Mesh refinement schemeHigh resolution mesh grids for region
11、 near BH, while low resolution mesh grids for far regionMesh refinement in CFDResult based on PARAMESHPARAMESHGrACEJASMINComparison of NR and CFDNR (only for BH): computational expensive on single grid point, but functions quite smooth few grid points (handrads), high order finite differenceCFD: com
12、putation on single point is cheap, but fluid dynamics quite complex (compare the lectures on HD) grid number is quite large (millions)Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1Mesh refinement schemeScheme adopted by PARAMESHLevel 0Level 1txMesh refinement schemeScheme for NRLevel
13、 0Level 1Distribute data along one level to available processesMesh refinement schemeScheme for NRF. Loeffler et al, CQG 29, 115001 (2012)Level 0Level 1LS schemeMesh refinement schemeParallelization limit:200 x200 x2006th order finite difference (8 ghost points for two sides) processesHow about dist
14、ribute data on all levels and calculate them parallely?Parallel mesh level algorithmPX scheme: distribute data on all levels to all processes; calculate parallelyMesh refinement scheme Procs for lev0 procs for lev1 procs for lev2 run run run wait wait run wait run run wait wait run run run run Stron
15、g scalling property due to more data to distribute;Resource wasting (Lx procs of LS) due to waiting!Calculation speed: 2 times faster!timeParallel mesh level algorithmP2 scheme: distribute data on finest level to half processes and distribute data on other levels along the same level to another half
16、 processes; calculate parallely for finest level and other levels, while sequentially for other levelslev0lev2lev1Mesh refinement scheme Procs for lower levels procs for lev2 lev1 run lev0 run lev1 run wait run lev1 run Scalling property is weaker than PX;Less waiting (2x procs LS)!Calculation speed
17、: 2 times faster!timeComparison to LS schememore complicate casetxlev0lev1lev2 Now, procs for finest level have to wait!more complicate casetxlev0lev1lev2GPU accelerationFor system biology, Yamazaki, Igarashi, Neural Networks, 2013For GW data analysis, Zhihui Du, et al, CQG 29, 235018 (2012)Put RHS
18、calculation to GPUFor AMSS-NCKU code, time for RHS calculation 80%RHS function involves too many variables, even only transform their addresses are time consumingSo pack these addresses and store it in constant memory (do not transform any more during evolution), save shared memory at the same timeP
19、ut RHS calculation to GPUKeep the data on GPU till MPI data transfer between different processesUsing buffer point method to reduce MPI transfer for RK4 from 4 times to only 1 time; also reduce data transfer times between GPU and CPUPut RHS calculation to GPUArrange shared memoryDivide RHS calculati
20、on into 8 parts, let the memory requirement for each part can be satisfied with shared memoryFor one RHS calculation, copy data from global memory to shared memory once and use shared memory in most timePut restrict-prolong to GPUAfter put RHS to GPU, the most time consuming part is Restrict-Prolong interpolationHow to treat this part? The work is going on
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度農(nóng)業(yè)生態(tài)公園綠化景觀施工監(jiān)理合同4篇
- 2025年度冷鏈食品加工基地1#生產(chǎn)線冷鏈食品冷鏈配送服務合同4篇
- 二零二五版美術館東館館舍租賃消防安全管理合同3篇
- 二零二五年度模特形象代言人合同
- 二零二五年度互聯(lián)網(wǎng)數(shù)據(jù)中心運維人員聘用合同范本4篇
- 二零二五年度安置房買賣合同集錦:安置房維修基金管理規(guī)范3篇
- 二零二五年度應急救援駕駛員聘用合同4篇
- 二零二五年度儲煤場租賃及煤炭倉儲設施租賃與維護合同4篇
- 案例1-西南航空公司的核心競爭力
- 二零二五版農(nóng)業(yè)種植項目科技培訓與人才培養(yǎng)合同4篇
- (完整版)高考英語詞匯3500詞(精校版)
- 我的家鄉(xiāng)瓊海
- (2025)專業(yè)技術人員繼續(xù)教育公需課題庫(附含答案)
- 《互聯(lián)網(wǎng)現(xiàn)狀和發(fā)展》課件
- 【MOOC】計算機組成原理-電子科技大學 中國大學慕課MOOC答案
- 2024年上海健康醫(yī)學院單招職業(yè)適應性測試題庫及答案解析
- 2024年湖北省武漢市中考語文適應性試卷
- 非新生兒破傷風診療規(guī)范(2024年版)解讀
- 2024-2030年電炒鍋項目融資商業(yè)計劃書
- EDIFIER漫步者S880使用說明書
- 上海市華東師大二附中2025屆高二數(shù)學第一學期期末統(tǒng)考試題含解析
評論
0/150
提交評論