Running simulations on Jess

In [1]:
import os
from wetb import hawc2
from wetb.hawc2 import HTCFile
from wetb.hawc2.tests.test_files import tfp

Generate some HAWC2 input htc files

In [2]:
htc_lst = []
for wsp in [4,6]:
    htc = HTCFile(tfp + "simulation_setup/DTU10MWRef6.0/htc/DTU_10MW_RWT.htc")
    htc.simulation.time_stop = 1
    htc.wind.wsp=wsp
    htc.set_name("tmp%d"%wsp)
    htc.save()
    htc_lst.append(htc)
    print (htc.filename)
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/htc/tmp4.htc
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/htc/tmp6.htc
In [3]:
pbs = htc.pbs_file("hawc2_path", "hawc2_cmd",
               queue='workq', # workq, windq, xpresq
               walltime=None, # defaults to expected (currently 600s) * 2
               input_files=None, # If none, required files are autodetected from htc file
               output_files=None, # If none, output files are autodetected from htc file
               copy_turb=(True, True) # copy turbulence files (to, from) simulation
              )

Generate PBS files

A PBS file defines a job that can be submitted to the queuing system of PBS featured clusters, e.g. Jess.

A PBS file has header that specifies:

  • output file for stdout and stderr

  • wall time (after which the job will be terminated)

  • nodes (numbers of nodes to request)

  • ppn (number of processors/CPUs to use at each node. Jess has 20 CPUs per node)

  • queue (e.g. workq, windq, xpresq)

PBS files can be generated from a HAWC2 input htc file. The body (command section) of these files will:

  • Copy HAWC2 to a common folder on the the scratch drive (i.e. a hard drive local to the node) if it is not already there.

  • Create a run folder on the scratch drive for the current simulation

  • Copy HAWC2 to the run folder

  • Copy all required input files (turbulence files are optional) to a common folder on the scratch drive if they are not already there

  • Copy all required input files to the run folder

  • Launch the simulation

  • Copy all output files (turbulence files are optional) back from the model directory

HAWC2 can be copied from a local folder or from the shared group folder /mnt/aiolos/groups/hawc2sim/HAWC2/<version>/<platform>. HAWC2 can be a zip file, which will be unzipped at the scratch drive, and/or a set of files (exe, dll, …)

In [4]:
version = "v12.8.0.0"
platform = "win32"
hawc2_path="/mnt/aiolos/groups/hawc2sim/HAWC2/%s/%s/" % (version, platform)
print(hawc2_path)
/mnt/aiolos/groups/hawc2sim/HAWC2/v12.8.0.0/win32/

The command needed to run HAWC2 must be specified. It can be obtained via the wine_cmd function:

In [5]:
from wetb.hawc2.hawc2_pbs_file import JESS_WINE32_HAWC2MB, wine_cmd
hawc2_cmd = wine_cmd(platform='win32', hawc2='hawc2mb.exe', cluster='jess')
print (hawc2_cmd)
WINEARCH=win32 WINEPREFIX=~/.wine32 winefix
WINEARCH=win32 WINEPREFIX=~/.wine32 wine hawc2mb.exe

The PBS files are generated from the htc files

In [6]:
pbs_lst = []
for htc in htc_lst:
    pbs = htc.pbs_file(hawc2_path, hawc2_cmd,
                   queue='workq', # workq, windq, xpresq
                   walltime=None, # defaults to expected (currently 600s) * 2
                   input_files=None, # If none, required files are autodetected from htc file
                   output_files=None, # If none, output files are autodetected from htc file
                   copy_turb=(True, True) # copy turbulence files (to, from) simulation
                  )
    pbs.save()
    pbs_lst.append(pbs)
    print (os.path.join(pbs.workdir, pbs.filename))
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_in/tmp4.in
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_in/tmp6.in
In [7]:
from wetb.utils.cluster_tools.os_path import pjoin, relpath, abspath,\
    cluster_path, repl
print(abspath(pbs.exe_dir))
print(pbs.modelpath)

rel_exe_dir = relpath(pbs.exe_dir, pbs.modelpath)
print (rel_exe_dir)
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
.

You can see the contents of the last pbs file here

In [8]:
print(pbs)
### Jobid
#PBS -N tmp6
### Standard Output
#PBS -o /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout/tmp6.out
### merge stderr into stdout
#PBS -j oe
#PBS -W umask=0003
### Maximum wallclock time format HOURS:MINUTES:SECONDS
#PBS -l walltime=00:20:00
#PBS -l nodes=1:ppn=1
### Queue name
#PBS -q workq
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
mkdir -p "stdout"
if [ -z "$PBS_JOBID" ]; then echo "Run using qsub"; exit ; fi
pwd


#===============================================================================
echo copy hawc2 to scratch
#===============================================================================
(flock -x 200
mkdir -p "/scratch/$USER/$PBS_JOBID/hawc2/"
unzip -u -o -q "/mnt/aiolos/groups/hawc2sim/HAWC2/v12.8.0.0/win32/"*.zip -d "/scratch/$USER/$PBS_JOBID/hawc2/"
find "/mnt/aiolos/groups/hawc2sim/HAWC2/v12.8.0.0/win32/"* ! -name *.zip -exec cp -u -t "/scratch/$USER/$PBS_JOBID/hawc2/" {} +
) 200>"/scratch/$USER/$PBS_JOBID/lock_file_hawc2"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/."
cp "/scratch/$USER/$PBS_JOBID/hawc2/"* "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/."

#===============================================================================
echo copy input
#===============================================================================

cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
(flock -x 200
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Tower_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/mech_brake.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Hub_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/dtu_we_controller_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/towclearsens.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Towertop_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/servo_with_limits_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/servo_with_limits.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/wpdata.100" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/mech_brake_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/generator_servo.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/dtu_we_controller.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/generator_servo_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_pc.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control" && cp -u -r "control/towclearsens_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_ae.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/htc" && cp -u -r "htc/tmp6.htc" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/htc"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Shaft_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data" && cp -u -r "data/DTU_10MW_RWT_Blade_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/data"
) 200>/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/lock_file_model
cd "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Tower_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/mech_brake.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Hub_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/dtu_we_controller_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/towclearsens.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Towertop_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/servo_with_limits_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/servo_with_limits.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/wpdata.100" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/mech_brake_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/generator_servo.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/dtu_we_controller.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/generator_servo_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_pc.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control" && cp -u -r "control/towclearsens_64.dll" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/control"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_ae.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/htc" && cp -u -r "htc/tmp6.htc" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/htc"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Shaft_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"
mkdir -p "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data" && cp -u -r "data/DTU_10MW_RWT_Blade_st.dat" "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/data"


#===============================================================================
echo Run HAWC2
#===============================================================================
cd "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6/."
WINEARCH=win32 WINEPREFIX=~/.wine32 winefix
WINEARCH=win32 WINEPREFIX=~/.wine32 wine hawc2mb.exe htc/tmp6.htc

#===============================================================================
echo Copy output
#===============================================================================
cd "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/log" && cp -u -r "log/tmp6.log" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/log"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res" && cp -u -r "res/tmp6.sel" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res" && cp -u -r "res/tmp6.dat" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res"
mkdir -p "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res" && cp -u -r "res/at.dat" "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/res"

rm -r "/scratch/$USER/$PBS_JOBID/DTU10MWRef6.0/run_tmp6"

echo Done

exit

Run single simulation

You can run a simulation by executing the pbs file in an interactive seession. This way is very handy for debugging.

qsub -I -l nodes=1:ppn=1 -l walltime=01:00:00
<...>/wetb/hawc2/tests/test_files/simulation_setup/pbs_in/tmp6.in

or by summiting the pbs file to the queing system

qsub <...>/wetb/hawc2/tests/test_files/simulation_setup\pbs_in/tmp6.in

This done here:

In [9]:
print(os.path.join(pbs.workdir,pbs.filename))
!qsub {os.path.join(pbs.workdir,pbs.filename)}
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_in/tmp6.in
3221545.jess.dtu.dk

The job will now enter the cluster queue and be launched when free resoureces are available. You can check the status of the job:

In [13]:
!qstat -n -u $USER

Wait as long as the qstat command above prints information about the job

When the job is finished we can check the output file

In [14]:
!cat {pbs.stdout_filename}
Start of prologue
/scratch/mmpe/3221545.jess.dtu.dk created
End of prologue
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0
copy hawc2 to scratch
copy input
cp: cannot stat `control/dtu_we_controller_64.dll': No such file or directory
cp: cannot stat `control/servo_with_limits_64.dll': No such file or directory
cp: cannot stat `control/mech_brake_64.dll': No such file or directory
cp: cannot stat `control/generator_servo_64.dll': No such file or directory
cp: cannot stat `control/towclearsens_64.dll': No such file or directory
cp: cannot stat `control/dtu_we_controller_64.dll': No such file or directory
cp: cannot stat `control/servo_with_limits_64.dll': No such file or directory
cp: cannot stat `control/mech_brake_64.dll': No such file or directory
cp: cannot stat `control/generator_servo_64.dll': No such file or directory
cp: cannot stat `control/towclearsens_64.dll': No such file or directory
Run HAWC2










fixme:console:GetNumberOfConsoleMouseButtons (0x684ec44): stub










Copy output
Done
Start of epilogue on j-177
Resources Used: cput=00:00:04,mem=5744kb,vmem=3856592kb,walltime=00:00:07
End of epilogue on j-177

Highlights: - copy hawc2 to scratch - copy input - It states that it cannot copy the 64-bit control dlls (control/*_64.dll) - which does not matter as we are using the 32-bit HAWC2 - Run HAWC2 - Copy output - Done

In [15]:
!head -n 20 {os.path.join(htc.modelpath, htc.simulation.logfile[0])}




















Run multiple simulations

Multiple simulations can easily be executed using the PBSMultiRunner.

The PBSMultiRunner generates a top-level pbs_multirunner.all pbs job capable of launching all the HTC-specific PBS files in a folder.

The PBSMultiRunner needs some information: - queue (e.g. workq, windq, xpresq) - nodes (number of nodes) - ppn (processors per node). Be careful, ppn does not limit the job to this number of CPUs, i.e. you may occupy all resources of a full node even if you set ppn=10 - annoying other users of the node. Hence ppn should be 20 if you need to run more than a few simulations) - wall time in seconds (after which the job will be terminated, i.e. approximately total simulation time divided by number of)

In [16]:
from wetb.utils.cluster_tools.pbsfile import PBSMultiRunner
pbs_all = PBSMultiRunner(workdir=pbs.workdir,
                     queue='workq', # alternatives workq, windq, xpresq
                     walltime=10,   # expected total simulation time in seconds
                     nodes=1,       # Number of nodes
                     ppn=2,         # number of processors of each node (normally 20)
                     pbsfiles=None  # If None, the multirunner searches for *.in files
                     )
pbs_all.save()
print (os.path.join(pbs_all.workdir, pbs_all.filename))
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/pbs_multirunner.all

The pbs_multirunner.all will do the following:

  • Get list of nodes assigned for the current job

  • Make list of *.in pbs files.

  • Sort pbs files according to their wall time and distribute the files to the available nodes. Longest simulations are run first

  • Generate a file, pbs.dict, containing for each node a list of (pbs file, stdout filename, wall time):

    {'j-177': [('./pbs_in/tmp4.in', './stdout/tmp4.out', '00:20:00'), ('./pbs_in/tmp6.in', './stdout/tmp6.out', '00:20:00')]}

  • On each node, launch the assigned pbs files in parallel via Python’s multiprocessing module.

You can see the content of the pbs_multirunner.all here:

In [17]:
!cat {os.path.join(pbs_all.workdir, pbs_all.filename)}
### Jobid
#PBS -N pbs_multirunner
### Standard Output
#PBS -o /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout/pbs_multirunner.out
### merge stderr into stdout
#PBS -j oe
#PBS -W umask=0003
### Maximum wallclock time format HOURS:MINUTES:SECONDS
#PBS -l walltime=00:00:10
#PBS -l nodes=1:ppn=2
### Queue name
#PBS -q workq
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
mkdir -p "stdout"
if [ -z "$PBS_JOBID" ]; then echo "Run using qsub"; exit ; fi
pwd
echo "import os
import glob
import numpy as np
import re

# find available nodes
with open(os.environ['PBS_NODEFILE']) as fid:
    nodes = set([f.strip() for f in fid.readlines() if f.strip() != ''])
pbs_files = [os.path.join(root, f) for root, folders, f_lst in os.walk('.') for f in f_lst if f.endswith('.in')]

# Make a list of [(pbs_in_filename, stdout_filename, walltime),...]
pat = re.compile(r'[\s\S]*#\s*PBS\s+-o\s+(.*)[\s\S]*(\d\d:\d\d:\d\d)[\s\S]*')

def get_info(f):
    try:
        with open(f) as fid:
            return (f,) + pat.match(fid.read()).groups()
    except Exception:
        return (f, f.replace('.in', '.out'), '00:30:00')
pbs_info_lst = map(get_info, pbs_files)

# sort wrt walltime
pbs_info_lst = sorted(pbs_info_lst, key=lambda fow: tuple(map(int, fow[2].split(':'))))[::-1]
# make dict {node1: pbs_info_lst1, ...} and save
d = dict([(f, pbs_info_lst[i::len(nodes)]) for i, f in enumerate(nodes)])
with open('pbs.dict', 'w') as fid:
    fid.write(str(d))

" | python

for node in `cat $PBS_NODEFILE | sort | uniq`
do

     ssh -T $node << EOF &
cd "/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0"
python -c "import os
import multiprocessing
import platform
import time
with open('pbs.dict') as fid:
    pbs_info_lst = eval(fid.read())[platform.node()]
arg_lst = ['echo starting %s && mkdir -p "%s" && env PBS_JOBID=$PBS_JOBID "%s" &> "%s" && echo finished %s' %
           (f, os.path.dirname(o), f, o, f) for f, o, _ in pbs_info_lst]
print(arg_lst[0])
print('Starting %d jobs on %s' % (len(arg_lst), platform.node()))
pool = multiprocessing.Pool(int('$PBS_NUM_PPN'))
res = pool.map_async(os.system, arg_lst)
t = time.time()
for (f, _, _), r in zip(pbs_info_lst, res.get()):
    print('%-50s\t%s' % (f, ('Errorcode %d' % r, 'Done')[r == 0]))
print('Done %d jobs on %s in %ds' % (len(arg_lst), platform.node(), time.time() - t))

"
EOF
done
wait

exit

You can launch the multirunner via

qsub <...>/wetb/hawc2/tests/test_files/simulation_setup\pbs_multirunner.all

It is done here:

In [18]:
!qsub {os.path.join(pbs_all.workdir, pbs_all.filename)}
3221548.jess.dtu.dk

The job will now enter the cluster queue and be launched when free resoureces are available. You can check the status of the job:

In [20]:
!qstat -n -u $USER

Wait as long as the qstat command above prints information about the job

When the job is finished we can check the output file

In [21]:
!cat {pbs_all.stdout_filename}
Start of prologue
/scratch/mmpe/3221548.jess.dtu.dk created
End of prologue
/home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0

echo starting ./pbs_in/tmp4.in && mkdir -p /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout && env PBS_JOBID=3221548.jess.dtu.dk ./pbs_in/tmp4.in &> /home/mmpe/gitlab/WindEnergyToolbox/wetb/hawc2/tests/test_files/simulation_setup/DTU10MWRef6.0/stdout/tmp4.out && echo finished ./pbs_in/tmp4.in
Starting 2 jobs on j-176
starting ./pbs_in/tmp4.in
starting ./pbs_in/tmp6.in
finished ./pbs_in/tmp6.in
finished ./pbs_in/tmp4.in
./pbs_in/tmp4.in                                        Done
./pbs_in/tmp6.in                                        Done
Done 2 jobs on j-176 in 4s
Start of epilogue on j-176
Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:06
End of epilogue on j-176
In [ ]: