Member-only story

Scheduling Jobs in a Super Computer Cluster Using Github Actions

5 min readOct 21, 2020

A simple setup to programmatically enqueue jobs in a SLURM cluster.

Titan 3 Supercomputer Image from Pixabay

Introduction

For those that have worked with a super computer cluster before, you know that scheduling batches of jobs where each job may not only have different settings (Ex: Hyper-parameters) but also different code can be a cumbersome task. Moreover, keeping track of all the different versions of code when analyzing the results can be tricky and a simple note taking error can result in many hours rerunning jobs.

Here, I demonstrate how to setup a Github Actions based approach to deploying jobs to a supercomputing cluster managed by SLURM. A similar approach could be extended to non-SLURM clusters as well.

The approach here relies on using a Git repository to keep track of code changes. A Github Action runs automatically when code is changed and deploys the new code as a job on the cluster independently from other queued jobs.

Methods

Part 1: Setup a Git Repository

If you don’t already have a Github account create one here. It is free.
Select New repository from the top right corner of the page.

Scheduling Jobs in a Super Computer Cluster Using Github Actions

Introduction

Methods

Part 1: Setup a Git Repository

Create an account to read the full story.

Written by Paulo Carvalho

No responses yet