IBMPlatformLSF家族安裝和配置簡介.V1.0_第1頁
IBMPlatformLSF家族安裝和配置簡介.V1.0_第2頁
IBMPlatformLSF家族安裝和配置簡介.V1.0_第3頁
IBMPlatformLSF家族安裝和配置簡介.V1.0_第4頁
IBMPlatformLSF家族安裝和配置簡介.V1.0_第5頁
已閱讀5頁,還剩49頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、IBM Platform LSF家族安裝和配置簡介V1.0 版 馬雪潔2013.5.7目錄1 集群結(jié)構(gòu) 11.1 單純LSF環(huán)境(命令行提交) 11.2 LSF+PA(環(huán)境(WEB 提交)11.3 LSF+PM環(huán)境(PM 提交)32 LSF安裝和基本配置舉例32.1 LSF安裝步驟32.1.1 獲得LSF和 PAC安裝包 32.1.2 解壓縮 lsfinstall 安裝腳本文件 42.1.3 修改 install.config 配置文件 42.1.4 執(zhí)行安裝 42.1.5 配置開機(jī)自啟動(dòng) 42.1.6 測試安裝 42.1.7 啟動(dòng)/停止LSF進(jìn)程(三種方式) 52.1.8 測試提交作業(yè) 62

2、.1.9 使能 root 提交作業(yè) 62.1.10 修改配置文件后 reconfig 62.1.11 日志和 debug 62.2 配置文件說明 62.3 常用命令 72.4 配置公平競爭調(diào)度策略 72.4.1 添加輪循調(diào)度隊(duì)列 82.4.2 添加層次公平競爭策略 82.4.3 多隊(duì)列公平競爭策略 82.4.4 使能配置 92.5 配置搶占調(diào)度策略 102.6 配置全局限制策略 102.6.1 限制用戶運(yùn)行的作業(yè)數(shù)目 102.6.2 限制節(jié)點(diǎn)運(yùn)行作業(yè)數(shù)目 112.6.3 限制隊(duì)列作業(yè)的運(yùn)行限制 112.6.4 設(shè)定 General limits 112.6.5 使能配置 122.7 配置提交控

3、制腳本esub 122.8 配置資源管理 elim示例 132.8.1 匯報(bào) home 目錄空閑大小 132.8.2 匯報(bào) root 進(jìn)程數(shù)目 132.8.3 匯報(bào)應(yīng)用程序許可證數(shù)目 132.8.4 測試 elim 腳本 142.8.5 添加資源定義和資源地圖 142.8.6 查看資源數(shù)目 143 LSF命令行集成應(yīng)用示例143.1 CFD+集成(spoolingfile) 153.1.1 CFD+安裝和許可證 153.1.2 集成許可證管理 elim 153.1.3 添加 CFD+ job starter163.1.4 添加 CFD APP profile173.1.5 CFD+命令行提交腳

4、本實(shí)例 173.2 GAUSSIAN集成方式(spooling file) 183.2.1 Gaussian安裝和許可證 183.2.2 Gaussia n命令行提交腳本實(shí)例 183.3 Abaqus的腳本集成(bsub 命令)183.4 Platform MPI 作業(yè) 193.5 Openmpi 作業(yè) 233.6 Intel MPI 作業(yè) 243.6.1 Express版本不記賬方式243.6.2 Express版本 blaunch 記賬方式263.6.3 Standard 版本 PAM 集成方式304 安裝 PAC 335 使用PAC進(jìn)行應(yīng)用程序集成345.1 Gaussian 界面集成過

5、程 355.2 CFD+4集成后界面和后臺腳本405.3 在PAC中監(jiān)控許可證 426 安裝 License Scheduler 446.1 基本安裝測試 446.2 基本配置舉例 446.2.1 添加許可證服務(wù)器地址 446.2.2 映射許可證 feature : 446.2.3 使用許可證資源 456.2.4 配置許可證調(diào)度策略 457 常見問題 458 使用 man page 45售后技術(shù)支持 451集群結(jié)構(gòu)較大的集群都會(huì)設(shè)計(jì) 單獨(dú)的登錄節(jié)點(diǎn),用戶只能 ssh到登錄節(jié)點(diǎn),不能直接 ssh到集 群的任何主節(jié)點(diǎn)和計(jì)算節(jié)點(diǎn)。同時(shí)配置用戶在計(jì)算節(jié)點(diǎn)之間的ssh互信,為了并行作業(yè)的運(yùn)行。登錄節(jié)點(diǎn)也

6、安裝LSF配置為LSF靜態(tài)Client或者M(jìn)XJ值為0,也即不運(yùn)行作業(yè)的客戶 端。集群的 WEB節(jié)點(diǎn)與辦公訪問局域網(wǎng)一個(gè)網(wǎng)段。如需使用浮動(dòng)client,主節(jié)點(diǎn)網(wǎng)卡需要1.1單純LSF環(huán)境(命令行提交)#|Page#|Page訪問網(wǎng)絡(luò)DesktopLSF Float Clie ntDesktopLSF Float Clie ntDesktop LSF Float Clie nt作業(yè)提交腳本 設(shè)計(jì)流程腳本 bsubjobsDesktopLSF Float Clie nt#|PageSSHLSF主節(jié)點(diǎn)(可擴(kuò)展到3個(gè))SSH作業(yè)提交腳本 設(shè)計(jì)流程腳本bsubjobs登錄節(jié)點(diǎn)F Static Clie

7、nt用戶隔離計(jì)算資源,W3腳本流程中的” bsub jobs 將作業(yè)散到*集群計(jì)算節(jié)點(diǎn)。#|Page#|Page管理網(wǎng)絡(luò)#|Page2|Page1.2 LSF+PAC 環(huán)境(WEB 提交)用戶通過portal提交作業(yè):4|Page1.3LSF+PM環(huán)境(PM提交)LSF主節(jié)點(diǎn)亀Process Man ager Serve登錄節(jié)點(diǎn)(WEB PORTAL)Linux于洗予熬高性能集群W嚴(yán)、管理網(wǎng)絡(luò)存儲網(wǎng)絡(luò)5|Page#|Page2 LSF安裝和基本配置舉例2.1安裝前的準(zhǔn)備工作NIS ready; NFS/GPFS ready2.2 LSF安裝步驟Use root to in stall.Get

8、NIS and NFS/GPFS ready.2.2.1 獲得LSF和PAC安裝包Isf8.3_li nux2.6-glibc2.3-x86_64.tar.ZIsf8.3 lsfinstall linux x86 64.tar.Z pac8.3_sta ndard_li nu x-x64.tar.Z 許可證文件 platform_hpc_std_entitlement.dat222解壓縮Isfinstall安裝腳本文件Put the package un der /root/lsfrootS2 lsf# gun zip Isf8.3_lsfi nstall_li nux_x86_64.tar.

9、Ztar -xvf lsf8.3_lsfi nstall_li nux_x86_64.tar2.2.3 修改 install.config配置文件首先添加集群管理員lsfadmin。cd lsf8.3_lsfi nstallvi in stall.c onfigrootS2 lsf8.3_lsfinstall# cat install.configLSF_TOP="/opt/lsf"(安裝目錄)LSF_ADMINS="lsfadmin"先創(chuàng)建 lsfadmin 的用戶名)LSF_CLUSTER_NAME="platformf集群名稱,任意指定)

10、LSF_MASTER_LIST="s2 s3" (LSF理節(jié)點(diǎn))LSF_ENTITLEMENT_FILE="/root/lsf/platform_hpc_std_entitlement.dat"(安裝源許可證的地址)LSF_TARDIR="/root/lsf/"(安裝源文件包的地址)2.2.4 執(zhí)行安裝./lsfi nstall -f in stall.c onfig2.2.5 配置開機(jī)自啟動(dòng)/opt/lsf/9.1/i nstallhostsetuprhostsetup2.2.6 測試安裝安裝目錄下的/conf目錄rootS2 co

11、nf# source profilesfAdd source profile .lsf to /e tc/profile if no rsh, set ssh in Isf.c onfrootS2 conf# tail Isf.co nfLSF_RSH="ssh"2.2.7 啟動(dòng)/停止LSF進(jìn)程(三種方式)rootS2 conf# lsfstartup/lsfstop或者lsadm in limstatup/limshutdow nlsadm in resstartup/resshutdow nbadm in hstartup/hshutdow n或者lsf_daem on

12、s start/stoprootS2 conf# lsidIBM Platform LSF Express 8.3 for IBM Platform HPC, May 10 2012Copyright Platform Computing Inc., an IBM Company, 1992-2012.US Gover nment Users Restricted Rights - Use, duplicati on or disclosure restricted by GSA ADP Schedule Con tract with IBM Corp.My cluster n ame is

13、platformMy master n ame is s2You have new mail in /var/spool/mail/rootrootS2 conf# lsloadHOST_NAMEstatus r15sr1mr15mutpg lsit tmpswp mems2ok0.00.00.01%0.010 151G20G61Gs4ok0.00.00.02%0.012 183G20G62Gs6ok0.00.00.03%0.012 3734M2G30Gs5ok0.00.00.05%0.012 3468M2G30G228測試提交作業(yè)bsub sleep 100000 2.2.9 使能root提

14、交作業(yè)en able root to submit job:LSF_ROOT_REX=local重啟LSF進(jìn)程。2.2.10修改配置文件后reconfig修改lsf.*配置文件后lsadmin reconfig修改lsb.*配置文件后badmin reconfig部分參數(shù)需要重啟LSF主調(diào)度或者其他進(jìn)程:badmin mbdrestart; lsadmin limrestart;lsadm in resrestart; badm in hrestart2.2.11 日志和 debugFind the logs un der log directory. LSF will run mainly

15、3 processes on each no de, on master node will have 2 more.Master: lim,res,sbatchd,mbatchd,mbschedCompute:lim,res,sbatchdTurn on debug in comma nd line:Run lim -2 directly on node to check why lim not startup.2.3配置文件說明目錄 /e tc/i nit.d:/etc/i nit.s/lsflsf服務(wù)自啟動(dòng)腳本目錄 /apps/platform/8.3/lsf/c onf:lsf.c o

16、nfIs配置文件lsf.cluster.cluster83 集群配置文件lsf.shared共享資源定義文件./lsbatch/cluster83/co nfigdir/lsb.* 調(diào)度系統(tǒng)配置文件Isb.queuesIsf隊(duì)列配置文件Isb.paramsIsf調(diào)度參數(shù)配置文件Isf用戶與用戶組配置文件Isb.applications Isf 應(yīng)用配置文件Isb.hostsIsf機(jī)器與機(jī)器組配置文件Isb.resourcesIsf資源配置文件Isb.modulesIsf模塊配置文件Isb.users2.4常用命令bsub:提交作業(yè);bjobs:查看作業(yè)信息;bhist:查看作業(yè)歷史;Ishos

17、ts:查看節(jié)點(diǎn)靜態(tài)資源;bhosts, IsIoad:查看節(jié)點(diǎn)狀態(tài)和資源信息;bqueues :查看隊(duì)列配置;bIimits :查看限制Iimit信息;Isid:集群版本和主節(jié)點(diǎn);bmod:修改 bsub option ;2.5基于資源的調(diào)度策略bsub -R “ (type=LINUX2.4 && r1m < 2.0)|(type=AIX && r1m < 1.0) ” 或者在隊(duì)歹V Isb.queues或者Isb.application 文件定義:RES_REQ=select(type=LINUX2.4 && r1m < 2

18、.0)|(type=AIX && r1m < 1.0) bsub -R "selecttype=a ny && swap>=300 && mem>500 orderswap:memrusageswap=300,mem=500" job1bsub -R rusagemem=500:app_lic_v2=1 | mem=400:app_lic_v1.5=1" joblbsub -R "selecttype=a ny && swp>=300 && mem&g

19、t;500 ordermem" jobl2.6配置公平競爭調(diào)度策略2.6.1 添加輪循調(diào)度隊(duì)列Modify lsb.queues, add follow ingBegi n QueueQUEUE_NAME = rou ndRobinPRIORITY = 40FAIRSHARE = USER_SHARESdefault,1#USERS = userGroupA Define your own usergroupEnd QueueRun badm in reconfig to en able the cha nge.Run bqueues - to check the queue 

20、9;sconfigure2.6.2 添加層次公平競爭策略Add following queue to add hierarchicalshare policy:Begi n QueueQUEUE_NAME = hierarchicalSharePRIORITY = 40USERS = userGroupB userGroupCFAIRSHARE = USER_SHARESuserGroupB,7 userGroupC,3End Queue2.6.3 多隊(duì)列公平競爭策略在lsb.queues中添加下列隊(duì)列,注意節(jié)點(diǎn)組和用戶組定義。Begi n QueueQUEUE_NAME = verilogD

21、ESCRIPTION = master queue defi niti on cross-queuePRIORITY = 50FAIRSHARE = USER_SHARESuser1,100 defaultFAIRSHARE_QUEUES = normal shortHOSTS = hostGroupC # resource contention #RES_REQ = rusageverilog = 1End QueueBegi n QueueQUEUE_NAME = shortDESCRIPTION = short jobs PRIORITY = 70highestHOSTS = hostG

22、roupCRUNLIMIT = 5 10End QueueBegi n QueueQUEUE_NAME = normalDESCRIPTION = default queuePRIORITY = 40# lowestHOSTS = hostGroupCEnd Queue2.6.4 使能配置badm in reconfig提交作業(yè),并查看隊(duì)列的用戶動(dòng)態(tài)優(yōu)先級變化:bqueues -l normal12|Page2.7配置搶占調(diào)度策略配置最基本的slots搶占:Begi n QueueQUEUE_NAME = shortPRIORITY = 70HOSTS = hostGroupC# pote n

23、tial co nflictPREEMPTION = PREEMPTIVE no rmalEnd QueueBegi n QueueQUEUE_NAME = normalPRIORITY = 40HOSTS = hostGroupC# pote ntial c on flictPREEMPTION = PREEMPTABLEshortEnd Queue向兩個(gè)隊(duì)列提交作業(yè),查看被preempt的作業(yè)的pending原因。2.8配置全局限制策略2.8.1限制用戶運(yùn)行的作業(yè)數(shù)目在lsb.users文件中添加:Begi n User13 | Page# | PageUSER_NAMEMAX_JOBS

24、JL/P# | Page# | Pageuser1 4user221user3-2 groupA 8 groupBDefaultEnd User282限制節(jié)點(diǎn)運(yùn)行作業(yè)數(shù)目在Isb.hosts文件中:Begi n HostHOST_NAME MXJ JL/Uhostl42host221host3!-End Host2.8.3 限制隊(duì)列作業(yè)的運(yùn)行限制在lsb.queues中添加:Begi n QueueQUEUE_NAME = myQueueHJOB_LIMIT = 2PJOB_LIMIT = 1UJOB_LIMIT = 4HOSTS = hostGroupAUSERS = userGroupAE

25、nd Queue2.8.4 設(shè)定 Ge neral limits在lsb.resources 文件定義全局 general limits 示例:Begi n LimitUSERS QUEUES HOSTS SLOTS MEM SWP14 | Page# | Pageuser1hostB -20%15 | Pageuser2 no rmal hostA2016 | PageEnd LimitBegi n LimitNAME = limitlUSERS = userlPER_HOST = hostA hostCTMP = 30%SWP = 50%MEM = 10%End LimitBegi n L

26、imitPER_USER QUEUES HOSTS SLOTS MEM SWP TMPSgroupA - hgroup1 -2user2 normal -200- short200End Limit2.8.5 使能配置badm in reconfig2.9配置提交控制腳本esub全局esub腳本在作業(yè)被提交是調(diào)用,可以被自動(dòng)的或者顯式的調(diào)用從而控制用戶作業(yè)提 交的行為。編輯ject文件在$LSF_SERVERD下面(chmod為可執(zhí)行):#!/bi n/shif "_$LSB_SUB_PARM_FILE" != "_" ; the n.$

27、LSB_SUB_PARM_FILEif "_$LsB_SUB_PROJECT_NAME" = "_" ; the necho "You must specify a project!" >&2exit $LSB_SUB_ABORT_VALUEfi fiexit 0在 Isf.conf 中定義 LSB_ESUB_METHOD”roject ”2.10配置資源管理elim示例2.10.1匯報(bào)home目錄空閑大小編輯elim文件elim.home,放置在$LSF_SERVERD下面。chmod為可執(zhí)行。#!/bi n/sh w

28、hile true ; dohome='df -k /home | tail -1 | awk 'pri ntf "%4.1f", $4/(1024*1024)'、echo 1 home $homesleep 30done2.10.2匯報(bào)root進(jìn)程數(shù)目編輯elim.root,放置在$LSF_SERVERD下面。chmod為可執(zhí)行。#!/bi n/sh while true ; doroot='ps -ef | grep -v grep | grep -c Aroot' echo 1 rootprocs $rootsleep 30do

29、ne2.10.3匯報(bào)應(yīng)用程序許可證數(shù)目#!/bi n/sh lic_X=0 ; num=0while true ; do# only want the master to gather lic_Xif "$LSF_MASTER" = "Y" ; the nlic_X='lmstat -a -c lic_X.dat | grep .'>&2fi# only want tra inin g8, training1 to gather simpt on lice nsesif "'host name'&q

30、uot; = "tra inin g8" -o "'host name'" = "tra inin g1" ; the nnum='lmstat simpt on _lic.dat | grep .'> &2fi# all hosts in cludi ng master, will gather the follow ingroot='ps -efw | grep grep | grep -c root'>>1 &2tmp='df /var/t

31、mp | grep var | awk 'print $4 /1024'、> &2if "$LSF_MASTER" = "Y" ; the necho 4 lic_X $lic_X simpt on $num rtprc $root tmp $tmpelseecho 3 simpt on $num rtprc $root tmp $tmpfi# the same INTERVAL values defi ned in lsf.sharedsleep 60done2.10.4測試elim腳本直接運(yùn)行./elim.root查看e

32、lim輸出是否正確。2.10.5添加資源定義和資源地圖在lsf.shared文件中添加rootprocs定義,并在lsf.cluster resources Map中添加資源和節(jié)點(diǎn) 的映射關(guān)系。使能配置:lsadm in recon fig; badm in reconfig2.10.6查看資源數(shù)目lsload -3 LSF命令行集成應(yīng)用示例本節(jié)例舉幾個(gè)應(yīng)用的不同集成方式。使用spooling文件或者bsub命令行都可以自由轉(zhuǎn)換3.1 CFD+集成(spooling file )3.1.1 CFD+安裝和許可證安裝路徑:ln-3620-4許可證:/gpfs/software/cfdpp/mbi

33、 n/Metacomp.lic許可證服務(wù)器:ln-3620-4啟動(dòng)許可證服務(wù)器:hpcadmi nmn-3650 jessi$ ssh ln-3620-4Last login: Tue Mar 26 19:19:24 2013 from mn-3650.private.dns.zonehpcadm inln-3620-4 $ /gpfs/software/cfdpp/mbi n/lmgrd -c/gpfs/software/cfdpp/mbi n/Metacomp.lic確認(rèn)許可證服務(wù)器是否正常運(yùn)行:/gpfs/software/cfdpp/mb in/lmutil lmstat -a -c

34、/gpfs/software/cfdpp/mbi n/Metacomp.lic3.1.2 集成許可證管理elim添加elim方法:(elim全集群只需運(yùn)行一個(gè),因此只在頭節(jié)點(diǎn)放置elim腳本即可)在頭節(jié)點(diǎn):cd $LSF_SERVERDIR添加如下文件:elim.lic:rootm n-3650 jessi# cd $LSF_SERVERDIRroot mn-3650 etc# pwd/opt/lsf/8.3/li nux2.6-glibc2.3-x86_64/etcroot mn-3650 etc# cat elim.lic#!/bi n/shtotallice nces='/gpf

35、s/software/cfdpp/mbi n/lmutil lmstat -a -c/gpfs/software/cfdpp/mbi n/Metacomp.lic |grep "Users of CFD+_SOLV_Ser" | /bi n/cut -d' ' -f7' while true dousedlice nces='/gpfs/software/cfdpp/mbi n/lmutil lmstat -a -c/gpfs/software/cfdpp/mbi n/Metacomp.lic | /bi n/grep "Users

36、 of CFD+_SOLV_Ser" | /bi n/cut -d' -f13'cfd_lic=$( $totallice nces-$usedlice nces)echo "1 cfdic $cfdic"/bin/sleep 30doneroot mn-3650 etc# chmod a+x elim.lic修改如下的配置文件:rootmn-3650 etc# vi $LSF_ENVDIR/lsf.shared添加如下一行:cfd_lic Numeric 30 Y(CFD+ Lice nse)rootm n-3650 etc# vi $LSF_E

37、NVDIR/lsf.cluster 在resourcemap 一段添加如下一行:Begi n ResourceMapRESOURCENAME LOCATIONcfd_licallhostiddefa ult root mn-3650 etc# lsadm in rec on fig; badm in reconfig3.1.3 添加 CFD+ job starter如果使用spooling file可不用添加。(Portal集成方式使用)添加jobstarter可執(zhí)行文件:hpcadm inmn-3650 jessi$ cat /opt/lsf/jobstarter/cfd_starter#!

38、/bi n/shMPI_RUN=/gpfs/software/cfdpp/hpmpi/bi n/mpir uncase "$PRESSION" inSINGLE_PRESSION)CFD_CMD=/gpfs/software/cfdpp/mbi n/mcfd.11.1/r4_hpmpimcfdJ JDOUBLE_PRESSION)CFD_CMD=/gpfs/software/cfdpp/mbi n/mcfd.11.1/hpmpimcfdJ JesacCMD="$* -hostfile $LSB_DJOB_HOSTFILE $CFD_CMD"eval &q

39、uot;$CMD"3.1.4 添加 CFD APP profilerootmn-3650 etc# vi $LSF_ENVDIR/lsf.shared添加如下配置:Begi n Applicati onNAME = cfdJOB_STARTER=/opt/lsf/jobstarter/cfd_starterRES_REQ="rusagecfd_lic=1"End Applicati onbadmnin reconfig使得此文件生效,使用bapp - cfd查看是否成功:root mn-3650 bin# bapp -l cfdAPPLICATION NAME:

40、cfd-No descripti on provided.STATISTICS:NJOBS PEND RUN SSUSP USUSP RSV12 12 0 0 0 0PARAMETERS:JOB_STARTER: /opt/lsf/jobstarter/cfd_starterRES_REQ: "rusagecfd_lic=1"3.1.5 CFD+命令行提交腳本實(shí)例hpcadm inmn-3650 jessi$ cat cfd.sh#!/bi n/sh#BSUB -n 12#BSUB -o %J.out#BSUB -e %J.err#BSUB -app cfd#BSUB -R

41、 "rusagecfdl_l ic=1"cd /gpfs/software/cfd+/test/ogive/gpfs/software/cfdpp/hpmpi/bi n/mpiru n -hostfile $LSB_DJOB_HOSTFILE /gpfs/software/cfdpp/mbi n/mcfd.11.1/hpmpimcfd然后bsub < cfd.sh提交作業(yè)。3.2 GAUSSIAN 集成方式(spooling file )3.2.1 Gaussian安裝和許可證路徑:/gpfs/software/Gaussia n/許可證:無許可證版本,單個(gè)作業(yè)只能單

42、機(jī)運(yùn)行。3.2.2 Gaussian命令行提交腳本實(shí)例下面腳本:g03.sh#!/bi n/sh#BSUB -q qchem#BSUB -n 4#BSUB -R "spa n hosts=1"#BSUB -cwd .#BSUB -e %J.err#BSUB -o %J.outJOB=Full_codes_112_ipr_C1_ JOBNAME='base name "$JOB" .com'export g03root=/gpfs/software/Gaussia nexport GAUSS_SCRDIR=/tmpsource $g03ro

43、ot/g03/bsd/file/gpfs/software/Gaussia n/g03/g03 < $JOB > "$JOBNAME.log"提交作業(yè):bsub < g03.sh3.3 Abaqus的腳本集成(bsub命令)編輯腳本abaqus_run.sh腳本#!/bi n/sh# versio n: 1.3.0export ABAQUS_CMD="/gpfs/software/Abaqus/Comma nds/abaqus"export LM_LICENSE_FILE="/gpfs/software/Abaq

44、us/Lice nse/abq612.lic"#指定cpu number,注意要與bsub命令行中-n指定的cpu個(gè)數(shù)一致export NCPU=16#指定輸入文件export INPUT_FILE=beam.i np#指定作業(yè)名export JOB_NAME=abaqusob3$ABAQUS_CMD job=$JOB_NAME cpus=$NCPU in put="$INP_INPUT_FILE"2)通過LSF提交#進(jìn)入輸入數(shù)據(jù)所在目錄,執(zhí)行bsub命令bsub -q qeng -n 16 ./abaqus_r un.sh3.4 Amber作業(yè)(blaunch集

45、成,可記賬)針對intelmpi,編寫mpdboot.lsf腳本。變?yōu)榭蓤?zhí)行,放置在 $LSF_SERVERD下面編寫提交作業(yè)腳本:ymeim nis test$ cat n ew.sh#!/bi n/sh#BSUB -q small#BSUB -n 128#BSUB -o %J.out#BSUB -e %J.err#BSUB -J IMPI#BSUB -x#export PATH=/gpfs01/software/i ntel/impi/24/i ntel64/bi n:$PATH#/gpfs01/home/ymei/jessi/mpdboot.lsfmpdboot.lsfex

46、port l_MPI_DEVICE=ssm#export l_MPI_FABRICS=shm:ofa#export l_MPI_FAST_STARTUP=1#export I_MPI_DEVICE=rdssm#mpiexec -np $LSB_DJOB_NUMPROC /gpfs01/software/in tel/impi/24/test/hellowordmpiexec -np $LSB_DJOB_NUMPROC $AMBERHOME/bi n/san der.MPI -ng 32 -groupfile remd10.groupfilempdallexit提交作業(yè):bsub

47、< n ew.sh3.5 Platform MPI 作業(yè)3.5.1 安裝 Platform MPI確認(rèn)用戶無密碼訪問 ssh OK)安裝Platform MPI到共享目錄下:sh platform_mpi-0-0320r.x64.sh -installdir=/opt/pmpi-norpm如果缺失C Compiler,執(zhí)行: yum in stall gcc3.5.2 LSF外面驗(yàn)證安裝 OK設(shè)置環(huán)境變量:export MPI_REMSH="ssh -x" export MPI_ROOT=/opt/pmpi/opt/ibm/platform_mpi/ 編

48、譯helloworld示例程序:/opt/pmpi/opt/ibm/platform_mpi/bi n/mpicc -o helloworld/opt/pmpi/opt/ibm/platform_mpi/help/hello_world.crootserver3 help# /opt/pmpi/opt/ibm/platform_mpi/bi n/mpiru n -f ./help/hostswarning: MPI_ROOT /opt/pmpi/opt/ibm/platform_mpi/ != mpiru n path /opt/pmpi/opt/ibm/platform_mpiHello w

49、orld! I'm 1 of 4 on server3Hello world! I'm 0 of 4 on server3Hello world! I'm 3 of 4 on computer007Hello world! I'm 2 of 4 on computer007rootserver3 help# cat ./help/hosts-h server3 -np 2 /opt/pmpi/opt/ibm/platform_mpi/help/helloworld-h computer007 -np 2 /opt/pmpi/opt/ibm/platform_mp

50、i/help/helloworld3.5.3 通過LSF提交export MPI_REMSH=bla unch$ mpirun -np 4 -IBV /helloworld$ mpirun -np 32 -IBV /helloworld$ mpirun -np 4 -TCP /helloworld或者rootserver3 conf# bsub -o %J.out -e.%J.err -n 4/opt/pmpi/opt/ibm/platform_mpi/b in/mpiru n -lsb_mcpu_hosts /opt/pmpi/opt/ibm/platform_mpi/help/hellow

51、orldJob <210> is submitted to default queue vno rmal>.rootserver3 conf# bjobsJOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME210 root PEND normal server3*elloworld May 9 10:55rootserver3 conf# cat 210.outSen der: LSF System vjessicomputer007>Subject: Job 210: </opt/pmpi/

52、opt/ibm/platform_mpi/b in/mpir un -lsb_mcpu_hosts /opt/pmpi/opt/ibm/platform_mpi/help/helloworld> in cluster <jessi_cluster> DoneJob </opt/pmpi/opt/ibm/platform_mpi/b in/mpir un -lsb_mcpu_hosts /opt/pmpi/opt/ibm/platform_mpi/help/helloworld> was submitted from host <server3> by

53、user <root> in cluster <jessi_cluster>.Job was executed on host(s) <4*computer007>, in queue <no rmal>, as user <root> in cluster <jessi_cluster>.</root> was used as the home directory.</opt/lsf/c onf> was used as the worki ng directory.Started at Thu

54、May 9 18:49:06 2013Results reported at Thu May 9 18:49:07 2013Your job looked like:# LSBATCH: User in put/opt/pmpi/opt/ibm/platform_mpi/b in/mpir un -lsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworldSuccessfully completed.Resource usage summary:CPU time :0.23 sec.Max Memory :2 MBAverage M

55、emory :2.00 MBTotal Requested Memory :-Delta Memory :-(Delta: the differe nee betwee n total requested memory and actual max usage.)Max Swap :36 MBMax Processes :1Max Threads :1The output (if any) follows:Hello world! I'm 2 of 4 on computer007Hello world! I'm 0 of 4 on computer007Hello world

56、! I'm 1 of 4 on computer007Hello world! I'm 3 of 4 on computer007PS:Read file v.210.err> for stderr output of this job.或者更多參數(shù)$ /opt/platform_mpi/bi n/mpiru n -np 120 -ibv -hostlist "cn-22-001 cn-22-002 cn-22-003 cn-22-004 cn-22-005 cn-22-006 cn-22-007 cn-22-008 cn-22-009 cn-22-010" /data/hello_world如果希望MPI作業(yè)不通過LSF提交運(yùn)行,修改MPI_USELF環(huán)境變量為n3.6 Openmpi 作業(yè)下載

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論