Scheduling Jobs in a Super Computer Cluster Using Github Actions

Paulo Carvalho
5 min readOct 21, 2020

A simple setup to programmatically enqueue jobs in a SLURM cluster.

Titan 3 Supercomputer Image from Pixabay

Introduction

For those that have worked with a super computer cluster before, you know that scheduling batches of jobs where each job may not only have different settings (Ex: Hyper-parameters) but also different code can be a cumbersome task. Moreover, keeping track of all the different versions of code when analyzing the results can be tricky and a simple note taking error can result in many hours rerunning jobs.

Here, I demonstrate how to setup a Github Actions based approach to deploying jobs to a supercomputing cluster managed by SLURM. A similar approach could be extended to non-SLURM clusters as well.

The approach here relies on using a Git repository to keep track of code changes. A Github Action runs automatically when code is changed and deploys the new code as a job on the cluster independently from other queued jobs.

Methods

Part 1: Setup a Git Repository

  1. If you don’t already have a Github account create one here. It is free.
  2. Select New repository from the top right corner of the page.

--

--

Paulo Carvalho

Want to chat about startups, consulting or engineering? Just send me an email on paulo@avantsoft.com.br.