Home

Awesome

Tip and tricks for IBM platform LSF HPC job scheduler

Warning: some information might be specific to IARC HPC cluster.

Useful links

Official IBM documentation

Run jobs

Monitor jobs

Administration

In /opt/lsf/conf/lsbatch/cluster_name/configdir/lsb.queues, we added to the normal queue:

RES_REQ = select[type==any] rusage[mem=4000] span[hosts=1] order[slots]
RESRSV_LIMIT = [mem=90000]
MEMLIMIT      = 4000 90000

This allows by default jobs to run on a single host and to reserve 4GB of RAM (max 90GB), to kill jobs if they exeed 4GB of RAM (can be changed up to 90GB). Note that reserving memory and setting the limit before LSF kills jobs are two indepent things. So for example if you need 25GB of RAM, add in bsub command: -R "rusage[mem=25000]" -M 25000.

To allow the jobs exeeding MEMLIMIT to be killed by LSF, add in /opt/lsf/conf/lsf.conf (see here):

LSB_MEMLIMIT_ENFORCE=Y