Running simulations on Jess¶
In [1]:
import os
from wetb import hawc2
from wetb.hawc2 import HTCFile
from wetb.hawc2.tests.test_files import tfp
Generate some HAWC2 input htc files
In [2]:
htc_lst = []
for wsp in [4,6]:
htc = HTCFile(tfp + "simulation_setup/DTU10MWRef6.0/htc/DTU_10MW_RWT.htc")
htc.simulation.time_stop = 1
htc.wind.wsp=wsp
htc.set_name("tmp%d"%wsp)
htc.save()
htc_lst.append(htc)
print (htc.filename)
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/htc/tmp4.htc
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/htc/tmp6.htc
In [3]:
pbs = htc.pbs_file("hawc2_path", "hawc2_cmd",
queue='workq', # workq, windq, xpresq
walltime=None, # defaults to expected (currently 600s) * 2
input_files=None, # If none, required files are autodetected from htc file
output_files=None, # If none, output files are autodetected from htc file
copy_turb=(True, True) # copy turbulence files (to, from) simulation
)
Generate PBS files¶
A PBS file defines a job that can be submitted to the queuing system of PBS featured clusters, e.g. Jess.
A PBS file has header that specifies:
output file for stdout and stderr
wall time (after which the job will be terminated)
nodes (numbers of nodes to request)
ppn (number of processors/CPUs to use at each node. Jess has 20 CPUs per node)
queue (e.g.
workq
,windq
,xpresq
)
PBS files can be generated from a HAWC2 input htc file. The body (command section) of these files will:
Copy HAWC2 to a common folder on the the scratch drive (i.e. a hard drive local to the node) if it is not already there.
Create a run folder on the scratch drive for the current simulation
Copy HAWC2 to the run folder
Copy all required input files (turbulence files are optional) to a common folder on the scratch drive if they are not already there
Copy all required input files to the run folder
Launch the simulation
Copy all output files (turbulence files are optional) back from the model directory
HAWC2 can be copied from a local folder or from the shared group folder
/mnt/aiolos/groups/hawc2sim/HAWC2/<version>/<platform>
. HAWC2 can be
a zip file, which will be unzipped at the scratch drive, and/or a set of
files (exe, dll, …)
In [4]:
version = "v12.8.0.0"
platform = "win32"
hawc2_path="/mnt/aiolos/groups/hawc2sim/HAWC2/%s/%s/" % (version, platform)
print(hawc2_path)
/mnt/aiolos/groups/hawc2sim/HAWC2/v12.8.0.0/win32/
The command needed to run HAWC2 must be specified. It can be obtained
via the wine_cmd
function:
In [5]:
from wetb.hawc2.hawc2_pbs_file import JESS_WINE32_HAWC2MB, wine_cmd
hawc2_cmd = wine_cmd(platform='win32', hawc2='hawc2mb.exe', cluster='jess')
print (hawc2_cmd)
WINEARCH=win32 WINEPREFIX=~/.wine32 winefix
WINEARCH=win32 WINEPREFIX=~/.wine32 wine hawc2mb.exe
The PBS files are generated from the htc files
In [6]:
pbs_lst = []
for htc in htc_lst:
pbs = htc.pbs_file(hawc2_path, hawc2_cmd,
queue='workq', # workq, windq, xpresq
walltime=None, # defaults to expected (currently 600s) * 2
input_files=None, # If none, required files are autodetected from htc file
output_files=None, # If none, output files are autodetected from htc file
copy_turb=(True, True) # copy turbulence files (to, from) simulation
)
pbs.save()
pbs_lst.append(pbs)
print (os.path.join(pbs.workdir, pbs.filename))
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_in/tmp4.in
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_in/tmp6.in
In [7]:
from wetb.utils.cluster_tools.os_path import pjoin, relpath, abspath,\
cluster_path, repl
print(abspath(pbs.exe_dir))
print(pbs.modelpath)
rel_exe_dir = relpath(pbs.exe_dir, pbs.modelpath)
print (rel_exe_dir)
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
.
You can see the contents of the last pbs file here
In [8]:
print(pbs)
### Jobid
#PBS -N tmp6
### Standard Output
#PBS -o /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout/tmp6.out
### merge stderr into stdout
#PBS -j oe
#PBS -W umask=0003
### Maximum wallclock time format HOURS:MINUTES:SECONDS
#PBS -l walltime=00:20:00
#PBS -l nodes=1:ppn=1
### Queue name
#PBS -q workq
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
mkdir -p "stdout"
if [ -z "$PBS_JOBID" ]; then echo "Run using qsub"; exit ; fi
pwd
#===============================================================================
echo copy hawc2 to scratch
#===============================================================================
(flock -x 200
mkdir -p "/scratch/$USER/$PBS_JOBID/hawc2/"
unzip -u -o -q "/mnt/aiolos/groups/hawc2sim/HAWC2/v12.8.0.0/win32/"*.zip -d "/scratch/$USER/$PBS_JOBID/hawc2/"
find "/mnt/aiolos/groups/hawc2sim/HAWC2/v12.8.0.0/win32/"* ! -name *.zip -exec cp -u -t "/scratch/$USER/$PBS_JOBID/hawc2/" {} +
) 200>"/scratch/$USER/$PBS_JOBID/lock_file_hawc2"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/."
cp "/scratch/$USER/$PBS_JOBID/hawc2/"* "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/."
#===============================================================================
echo copy input
#===============================================================================
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
(flock -x 200
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Tower_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/mech_brake.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Hub_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/dtu_we_controller_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/towclearsens.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Towertop_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/servo_with_limits_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/servo_with_limits.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/wpdata.100" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/mech_brake_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/generator_servo.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/dtu_we_controller.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/generator_servo_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_pc.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/towclearsens_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_ae.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/htc" && cp -u -r "htc/tmp6.htc" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/htc"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Shaft_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Blade_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
) 200>/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/lock_file_model
cd "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Tower_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/mech_brake.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Hub_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/dtu_we_controller_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/towclearsens.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Towertop_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/servo_with_limits_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/servo_with_limits.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/wpdata.100" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/mech_brake_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/generator_servo.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/dtu_we_controller.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/generator_servo_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_pc.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/towclearsens_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_ae.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/htc" && cp -u -r "htc/tmp6.htc" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/htc"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Shaft_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Blade_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
#===============================================================================
echo Run HAWC2
#===============================================================================
cd "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/."
WINEARCH=win32 WINEPREFIX=~/.wine32 winefix
WINEARCH=win32 WINEPREFIX=~/.wine32 wine hawc2mb.exe htc/tmp6.htc
#===============================================================================
echo Copy output
#===============================================================================
cd "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/log" && cp -u -r "log/tmp6.log" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/log"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res" && cp -u -r "res/tmp6.sel" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res" && cp -u -r "res/tmp6.dat" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res" && cp -u -r "res/at.dat" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res"
rm -r "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6"
echo Done
exit
Run single simulation¶
You can run a simulation by executing the pbs file in an interactive seession. This way is very handy for debugging.
qsub -I -l nodes=1:ppn=1 -l walltime=01:00:00
<...>/wetb/hawc2/tests/test_files/simulation_setup/pbs_in/tmp6.in
or by summiting the pbs file to the queing system
qsub <...>/wetb/hawc2/tests/test_files/simulation_setup\pbs_in/tmp6.in
This done here:
In [9]:
print(os.path.join(pbs.workdir,pbs.filename))
!qsub {os.path.join(pbs.workdir,pbs.filename)}
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_in/tmp6.in
3221545.jess.dtu.dk
The job will now enter the cluster queue and be launched when free resoureces are available. You can check the status of the job:
In [13]:
!qstat -n -u $USER
Wait as long as the qstat
command above prints information about the
job
When the job is finished we can check the output file
In [14]:
!cat {pbs.stdout_filename}
Start of prologue
/scratch/mmpe/3221545.jess.dtu.dk created
End of prologue
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
copy hawc2 to scratch
copy input
cp: cannot stat `control/dtu_we_controller_64.dll': No such file or directory
cp: cannot stat `control/servo_with_limits_64.dll': No such file or directory
cp: cannot stat `control/mech_brake_64.dll': No such file or directory
cp: cannot stat `control/generator_servo_64.dll': No such file or directory
cp: cannot stat `control/towclearsens_64.dll': No such file or directory
cp: cannot stat `control/dtu_we_controller_64.dll': No such file or directory
cp: cannot stat `control/servo_with_limits_64.dll': No such file or directory
cp: cannot stat `control/mech_brake_64.dll': No such file or directory
cp: cannot stat `control/generator_servo_64.dll': No such file or directory
cp: cannot stat `control/towclearsens_64.dll': No such file or directory
Run HAWC2
fixme:console:GetNumberOfConsoleMouseButtons (0x684ec44): stub
Copy output
Done
Start of epilogue on j-177
Resources Used: cput=00:00:04,mem=5744kb,vmem=3856592kb,walltime=00:00:07
End of epilogue on j-177
Highlights: - copy hawc2 to scratch - copy input - It states that it cannot copy the 64-bit control dlls (control/*_64.dll) - which does not matter as we are using the 32-bit HAWC2 - Run HAWC2 - Copy output - Done
In [15]:
!head -n 20 {os.path.join(htc.modelpath, htc.simulation.logfile[0])}
Run multiple simulations¶
Multiple simulations can easily be executed using the
PBSMultiRunner
.
The PBSMultiRunner
generates a top-level pbs_multirunner.all
pbs
job capable of launching all the HTC-specific PBS files in a folder.
The PBSMultiRunner
needs some information: - queue (e.g. workq,
windq, xpresq) - nodes (number of nodes) - ppn (processors per node). Be
careful, ppn does not limit the job to this number of CPUs, i.e. you may
occupy all resources of a full node even if you set ppn=10 - annoying
other users of the node. Hence ppn should be 20 if you need to run more
than a few simulations) - wall time in seconds (after which the job will
be terminated, i.e. approximately total simulation time divided by
number of)
In [16]:
from wetb.utils.cluster_tools.pbsfile import PBSMultiRunner
pbs_all = PBSMultiRunner(workdir=pbs.workdir,
queue='workq', # alternatives workq, windq, xpresq
walltime=10, # expected total simulation time in seconds
nodes=1, # Number of nodes
ppn=2, # number of processors of each node (normally 20)
pbsfiles=None # If None, the multirunner searches for *.in files
)
pbs_all.save()
print (os.path.join(pbs_all.workdir, pbs_all.filename))
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_multirunner.all
The pbs_multirunner.all will do the following:
Get list of nodes assigned for the current job
Make list of *.in pbs files.
Sort pbs files according to their wall time and distribute the files to the available nodes. Longest simulations are run first
Generate a file,
pbs.dict
, containing for each node a list of (pbs file, stdout filename, wall time):{'j-177': [('./pbs_in/tmp4.in', './stdout/tmp4.out', '00:20:00'), ('./pbs_in/tmp6.in', './stdout/tmp6.out', '00:20:00')]}
On each node, launch the assigned pbs files in parallel via Python’s multiprocessing module.
You can see the content of the pbs_multirunner.all
here:
In [17]:
!cat {os.path.join(pbs_all.workdir, pbs_all.filename)}
### Jobid
#PBS -N pbs_multirunner
### Standard Output
#PBS -o /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout/pbs_multirunner.out
### merge stderr into stdout
#PBS -j oe
#PBS -W umask=0003
### Maximum wallclock time format HOURS:MINUTES:SECONDS
#PBS -l walltime=00:00:10
#PBS -l nodes=1:ppn=2
### Queue name
#PBS -q workq
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
mkdir -p "stdout"
if [ -z "$PBS_JOBID" ]; then echo "Run using qsub"; exit ; fi
pwd
echo "import os
import glob
import numpy as np
import re
# find available nodes
with open(os.environ['PBS_NODEFILE']) as fid:
nodes = set([f.strip() for f in fid.readlines() if f.strip() != ''])
pbs_files = [os.path.join(root, f) for root, folders, f_lst in os.walk('.') for f in f_lst if f.endswith('.in')]
# Make a list of [(pbs_in_filename, stdout_filename, walltime),...]
pat = re.compile(r'[\s\S]*#\s*PBS\s+-o\s+(.*)[\s\S]*(\d\d:\d\d:\d\d)[\s\S]*')
def get_info(f):
try:
with open(f) as fid:
return (f,) + pat.match(fid.read()).groups()
except Exception:
return (f, f.replace('.in', '.out'), '00:30:00')
pbs_info_lst = map(get_info, pbs_files)
# sort wrt walltime
pbs_info_lst = sorted(pbs_info_lst, key=lambda fow: tuple(map(int, fow[2].split(':'))))[::-1]
# make dict {node1: pbs_info_lst1, ...} and save
d = dict([(f, pbs_info_lst[i::len(nodes)]) for i, f in enumerate(nodes)])
with open('pbs.dict', 'w') as fid:
fid.write(str(d))
" | python
for node in `cat $PBS_NODEFILE | sort | uniq`
do
ssh -T $node << EOF &
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
python -c "import os
import multiprocessing
import platform
import time
with open('pbs.dict') as fid:
pbs_info_lst = eval(fid.read())[platform.node()]
arg_lst = ['echo starting %s && mkdir -p "%s" && env PBS_JOBID=$PBS_JOBID "%s" &> "%s" && echo finished %s' %
(f, os.path.dirname(o), f, o, f) for f, o, _ in pbs_info_lst]
print(arg_lst[0])
print('Starting %d jobs on %s' % (len(arg_lst), platform.node()))
pool = multiprocessing.Pool(int('$PBS_NUM_PPN'))
res = pool.map_async(os.system, arg_lst)
t = time.time()
for (f, _, _), r in zip(pbs_info_lst, res.get()):
print('%-50s\t%s' % (f, ('Errorcode %d' % r, 'Done')[r == 0]))
print('Done %d jobs on %s in %ds' % (len(arg_lst), platform.node(), time.time() - t))
"
EOF
done
wait
exit
You can launch the multirunner via
qsub <...>/wetb/hawc2/tests/test_files/simulation_setup\pbs_multirunner.all
It is done here:
In [18]:
!qsub {os.path.join(pbs_all.workdir, pbs_all.filename)}
3221548.jess.dtu.dk
The job will now enter the cluster queue and be launched when free resoureces are available. You can check the status of the job:
In [20]:
!qstat -n -u $USER
Wait as long as the qstat command above prints information about the job
When the job is finished we can check the output file
In [21]:
!cat {pbs_all.stdout_filename}
Start of prologue
/scratch/mmpe/3221548.jess.dtu.dk created
End of prologue
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
echo starting ./pbs_in/tmp4.in && mkdir -p /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout && env PBS_JOBID=3221548.jess.dtu.dk ./pbs_in/tmp4.in &> /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout/tmp4.out && echo finished ./pbs_in/tmp4.in
Starting 2 jobs on j-176
starting ./pbs_in/tmp4.in
starting ./pbs_in/tmp6.in
finished ./pbs_in/tmp6.in
finished ./pbs_in/tmp4.in
./pbs_in/tmp4.in Done
./pbs_in/tmp6.in Done
Done 2 jobs on j-176 in 4s
Start of epilogue on j-176
Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:06
End of epilogue on j-176
In [ ]: