Scheduling Jobs in a Super Computer Cluster Using Github Actions
A simple setup to programmatically enqueue jobs in a SLURM cluster.
Introduction
For those that have worked with a super computer cluster before, you know that scheduling batches of jobs where each job may not only have different settings (Ex: Hyper-parameters) but also different code can be a cumbersome task. Moreover, keeping track of all the different versions of code when analyzing the results can be tricky and a simple note taking error can result in many hours rerunning jobs.
Here, I demonstrate how to setup a Github Actions based approach to deploying jobs to a supercomputing cluster managed by SLURM. A similar approach could be extended to non-SLURM clusters as well.
The approach here relies on using a Git repository to keep track of code changes. A Github Action runs automatically when code is changed and deploys the new code as a job on the cluster independently from other queued jobs.
Methods
Part 1: Setup a Git Repository
- If you don’t already have a Github account create one here. It is free.
- Select New repository from the top right corner of the page.