Member-only story

Scheduling Jobs in a Super Computer Cluster Using Github Actions

Paulo Carvalho

A simple setup to programmatically enqueue jobs in a SLURM cluster.

Titan 3 Supercomputer Image from Pixabay

Introduction

For those that have worked with a super computer cluster before, you know that scheduling batches of jobs where each job may not only have different settings (Ex: Hyper-parameters) but also different code can be a cumbersome task. Moreover, keeping track of all the different versions of code when analyzing the results can be tricky and a simple note taking error can result in many hours rerunning jobs.

Here, I demonstrate how to setup a Github Actions based approach to deploying jobs to a supercomputing cluster managed by SLURM. A similar approach could be extended to non-SLURM clusters as well.

The approach here relies on using a Git repository to keep track of code changes. A Github Action runs automatically when code is changed and deploys the new code as a job on the cluster independently from other queued jobs.

Methods

Part 1: Setup a Git Repository

  1. If you don’t already have a Github account create one here. It is free.
  2. Select New repository from the top right corner of the page.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Paulo Carvalho
Paulo Carvalho

Written by Paulo Carvalho

Want to chat about startups, consulting or engineering? Just send me an email on paulo@avantsoft.com.br.

No responses yet

Write a response