This project sets up an auto-scaling Slurm cluster Slurm is a highly configurable open source workload manager. See the Slurm project site for an overview.
Slurm clusters running in CycleCloud versions 7.8 and later implement an updated version of the autoscaling APIs that allows the clusters to utilize multiple nodearrays and partitions. To facilitate this functionality in Slurm, CycleCloud pre-populates the execute nodes in the cluster. Because of this, you need to run a command on the Slurm scheduler node after making any changes to the cluster, such as autoscale limits or VM types.
The Slurm cluster deployed in CycleCloud contains a script that facilitates this. After making any changes to the cluster, run the following command as root on the Slurm scheduler node to rebuild the slurm.conf
and update the nodes in the cluster:
$ sudo -i
# cd /opt/cycle/slurm
# ./cyclecloud_slurm.sh apply_changes
If you only want to make changes to certain nodearrays, you can add the --nodearrays
argument.
$ sudo -i
# cd /opt/cycle/slurm
# ./cyclecloud_slurm.sh apply_changes --nodearrays array1[,array2,array3...]
As all the Slurm compute nodes have to be pre-created, it's required that all nodes in a cluster be completely removed when making big changes (such as VM type or Image). It is possible to remove all nodes via the UI, but the cyclecloud_slurm.sh
script has a remove_nodes
option that will remove any nodes that aren't currently running jobs.
The default template that ships with Azure CycleCloud has two partitions (hpc
and htc
), and you can define custom nodearrays that map directly to Slurm partitions. For example, to create a GPU partition, add the following section to your cluster template:
[[nodearray gpu]]
MachineType = $GPUMachineType
ImageName = $GPUImageName
MaxCoreCount = $MaxGPUExecuteCoreCount
Interruptible = $GPUUseLowPrio
AdditionalClusterInitSpecs = $ExecuteClusterInitSpecs
[[[configuration]]]
slurm.autoscale = true
# Set to true if nodes are used for tightly-coupled multi-node jobs
slurm.hpc = false
[[[cluster-init cyclecloud/slurm:execute:2.0.1]]]
[[[network-interface eth0]]]
AssociatePublicIpAddress = $ExecuteNodesPublic
If cyclecloud_slurm detects that autoscale is disabled (SuspendTime=-1), it will use the FUTURE state to denote nodes that are powered down instead of relying on the power state in Slurm. i.e. When autoscale is enabled, off nodes are denoted as idle~
in sinfo. When autoscale is disabled, the off nodes will not appear in sinfo at all. You can still see their definition with scontrol show nodes --future
.
To start new nodes, run /opt/cycle/slurm/resume_program.sh node_list
(e.g. htc-[1-10]).
To shutdown nodes, run /opt/cycle/slurm/suspend_program.sh node_list
(e.g. htc-[1-10]).
To start a cluster in this mode, simply add SuspendTime=-1
to the additional slurm config in the template.
To switch a cluster to this mode, add SuspendTime=-1
to the slurm.conf and run scontrol reconfigure
. Then run cyclecloud_slurm.sh remove_nodes && cyclecloud_slurm.sh scale
.
By default, this project uses a UID and GID of 11100 for the Slurm user and 11101 for the Munge user. If this causes a conflict with another user or group, these defaults may be overridden.
To override the UID and GID, click the edit button for both the scheduler
node:
And for each nodearray, for example the htc
array:
and add the following attributes at the end of the Configuration
section:
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.