Installing Slurm on a Single Workstation (Ubuntu 24.04, Ultra-Simple Setup Without Authentication)


Installing Slurm on a Single Workstation (Ubuntu 24.04, Ultra-Simple Setup Without Authentication)

If we’ve used supercomputers, we’ve probably dealt with queueing systems like Slurm. It’s very convenient — we just submit jobs and let the scheduler take care of the rest. I decided to install it on my personal workstation as well.

That said, I’m not running a cluster — just a single workstation (PC). So I skipped authentication (like munge) and went for the bare minimum setup. My lab network is isolated from the outside world, and no one else uses this machine, so I’m ignoring security concerns. If you’re following this setup, proceed with caution.

Install via apt

We can install it using apt.

Terminal window
$ sudo apt install slurm-wlm

munge will also be installed by default, but we won’t be using it.

Create slurm.conf

Terminal window
$ sudo vim /etc/slurm/slurm.conf

Here’s a minimal configuration.

/etc/slurm/slurm.conf
ClusterName=local
ControlMachine=hostname
NodeName=hostname
PartitionName=main Nodes=hostname Default=YES MaxTime=INFINITE State=UP
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/none
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
TaskPlugin=task/none

The hostname should match the value shown by hostname -s.

Technically, NodeName can include properties like CPUs, but since I’m not dividing resources, I left it blank. Running slurmd -C will output system info, so Slurm may auto-detect the specs. If you need resource partitioning, you may want to explicitly set those values.

Set AuthType=auth/none.

Create necessary directories and set permissions

Terminal window
$ sudo mkdir -p /var/spool/slurm
$ sudo mkdir -p /var/spool/slurmd
$ sudo chown -R slurm: /var/spool/slurm /var/spool/slurmd

Disable munge

Since we’re using auth/none, munge isn’t required. It doesn’t hurt to leave it running, but I disabled it just in case.

Terminal window
$ sudo systemctl disable --now munge

Start Slurm

Terminal window
$ sudo systemctl enable --now slurmctld
$ sudo systemctl enable --now slurmd

Check that it’s running correctly.

Terminal window
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up infinite 1 idle localhost

Submit a job

Try submitting a test job.

Terminal window
$ set +H
$ echo -e "#!/bin/bash\necho Hello, Slurm!" > test.sh
$ chmod +x test.sh
$ sbatch test.sh
$ squeue
$ cat slurm-*.out

Fixing STATE=DOWN after reboot

Sometimes, after a reboot, node STATE appears as DOWN. You can reset it with the command below, although the cause remains unclear.

Terminal window
sudo scontrol update nodename=hostname state=idle

Prioritize a job (job preemption)

If you’ve submitted many jobs and want to prioritize a new one urgently, you can do the following:

Submit the job as usual.

Terminal window
$ sbatch job.sh

Adjust its priority.

Terminal window
$ sudo scontrol update jobid=<jobid> Nice=-10

By default, Nice is set to 0. Lower values (negative) are prioritized. You’ll need sudo to change it.

Using sacct

If we want to view job history using sacct, we’ll need the following setup.

Terminal window
$ sudo apt install slurmdbd mysql-server-8.0
$ sudo service mysql start
$ sudo mysql -u root
(mysql) CREATE DATABASE slurm_acct_db;
(mysql) CREATE USER 'slurm'@'localhost' IDENTIFIED BY 'パスワード';
(mysql) GRANT ALL ON slurm_acct_db.* TO 'slurm'@'localhost';
(mysql) FLUSH PRIVILEGES;

Add this to /etc/slurm/slurmdbd.conf.

/etc/slurm/slurmdbd.conf
AuthType=auth/none
DbdHost=localhost
DbdPort=6819
StorageType=accounting_storage/mysql
StorageHost=localhost
StoragePass=パスワード
StorageUser=slurm
StorageLoc=slurm_acct_db
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
SlurmUser=slurm

Change the file’s ownership and permissions accordingly.

Terminal window
$ sudo chown slurm: /etc/slurm/slurmdbd.conf
$ sudo chmod 600 /etc/slurm/slurmdbd.conf

Then, add the following lines to /etc/slurm/slurm.conf.

/etc/slurm/slurm.conf
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=<slurmdbdが動くホスト名>

Start slurmdbd, and restart slurmctld and slurmd.

Terminal window
sudo systemctl start slurmdbd
sudo systemctl restart slurmctld slurmd

Try running sacct.

Terminal window
$ sacct -o User,JobID,Partition,NNodes,Submit,Start,End,Elapsed,State -X

Stop Slurm after current jobs finish (e.g., for maintenance)

We can suspend new jobs after the current ones complete.

Terminal window
$ sudo scontrol update NodeName=<ノード名> State=DRAIN Reason="Maintenance after current job"

To revert this behavior, run the following.

Terminal window
$ sudo scontrol update NodeName=<ノード名> State=RESUME

Author

me

Fumito Iriya

Scientist (Ph.D.), Programer, Web Developer, Guitarist, Photographer

more...