Awesome
Slurm on Google Cloud Platform
Guidance on usage
[!CAUTION] Terraform modules in this repo are not meant to be used directly. Instead, Cloud HPC Toolkit is the recommended way to use Slurm in GCP.
Notices
[!IMPORTANT] The naming scheme for SchedMD published images has changed with release 5.7.3. This is to ensure no incompatibilities between the terraform modules and old images that could be in the same family as a newer release. From now on, the image family includes the slurm-gcp major and minor version instead of the slurm version.
See Images doc for the latest published images.
FAQ | Troubleshooting | Glossary
<!-- mdformat-toc start --slug=github --no-anchors --maxlevel=3 --minlevel=2 --> <!-- mdformat-toc end -->Overview
slurm-gcp
is an open-source software solution that enables setting up
Slurm clusters on
Google Cloud Platform with ease. With it, you can
create and manage Slurm cluster infrastructure in
GCP, deployed in different configurations.
Google's HPC Toolkit, on github, can be used to manage and deploy Slurm clusters and other supporting infrastrucutre via HPC Blueprints.
Image Support
See supported Operating Systems and published Image Family for machine image support.
SchedMD
SchedMD provides professional services and commercial support to help you get up and running and stay running.
Issues and/or enhancement requests can be submitted to SchedMD's Bugzilla.
Also, join community discussions on either the Slurm User mailing list or the Google Cloud & Slurm Community Discussion Group.
Cluster Configurations
slurm-gcp
can be deployed and used in different configurations and methods to
meet your computing needs.
See HPC Blueprints for HPC Toolkit example cluster configurations that are production ready.
Cloud
All Slurm cluster resources will exist in the cloud.
See the Cloud Cluster Guide for details.
Hybrid
Only Slurm compute nodes will exist in the cloud. The Slurm controller and other Slurm components will remain in the onprem environment.
See the Hybrid Cluster Guide for details.
Multi-Cluster/Federation
Two or more clusters are connected, allowing for jobs to be submitted from and ran on different clusters. This can be a mix between onprem and cloud clusters.
See the Federated Cluster Guide for details.
Upgrade to v6
See the Upgrade to v6 Guide for details.
TPU support
slurm-gcp supports using TPU-vm nodes. See TPU guide for details.
Help and Support
- See the slurm-gcp FAQ for help with
slurm-gcp
. - See the Slurm FAQ for help with Slurm.
Please reach out to us here. We will be happy to support you!