Home

Awesome

Main features

runjob --ngpus=8 --project=myproject --queue=myqueue python distributed_program.py

Exported environment variables:

Config file

See examples/config.yaml for config of projects and queues.

Usage

runjob-config examples/config.yaml
runjob --ngpus=8 --project=myproject --queue=myqueue python distributed_program.py

This will start a multi-gpu (possibly multi-node according to your queue config) job with 1 process per GPU and print the output (stdout and stderr) of one of the processes (the one with SLURM_LOCALID=0). Use keyboard interupt to cancel your job.

Running tests

Make sure you are on a SLURM cluster which sinfo should output something.

runjob-config examples/config.yaml
pytest -vs

Features that will be added in the future: