Slurm Bridge

Run Slurm as a Kubernetes scheduler. A Slinky project.

Overview

Slurm and Kubernetes are workload managers originally designed for different kinds of workloads. Kubernetes excels at scheduling workloads that run for an indefinite amount of time, with potentially vague resource requirements, on a single node, with loose policy, but can scale its resource pool infinitely to meet demand; Slurm excels at quickly scheduling workloads that run for a finite amount of time, with well-defined resource requirements and topology, on multiple nodes, with strict policy, and a known resource pool.

This project enables the best of both workload managers. It contains a Kubernetes scheduler to manage select workloads from Kubernetes, which allows for co-location of Kubernetes and Slurm workloads within the same cluster. This means the same hardware can be used to run both traditional HPC and cloud-like workloads, reducing operating costs.

Using slurm-bridge, workloads can be submitted from within a Kubernetes context as a Pod, PodGroup, Job, JobSet, or LeaderWorkerSet and from a Slurm context using salloc or sbatch. Workloads submitted via Slurm will execute as they would in a Slurm-only environment, using slurmd. Workloads submitted from Kubernetes will have their resource requirements translated into a representative Slurm job by slurm-bridge. That job will serve as a placeholder and will be scheduled by the Slurm controller. Upon resource allocation to a K8s workload by the Slurm controller, slurm-bridge will bind the workload's pod(s) to the allocated node(s). At that point, the kubelet will launch and run the pod the same as it would within a standard Kubernetes instance

For additional architectural notes, see the architecture docs.

Features

Slurm

Slurm is a full featured HPC workload manager. To highlight a few features:

Priority: assigns priorities to jobs upon submission and on an ongoing basis (e.g. as they age).
Preemption: stop one or more low-priority jobs to let a high-priority job run.
QoS: sets of policies affecting scheduling priority, preemption, and resource limits.
Fairshare: distribute resources equitably among users and accounts based on historical usage.

Requirements

Kubernetes Version: >= v1.29
Slurm Version: >= 25.05

Limitations

Exclusive, whole node allocations are made for each pod.

Installation

Create a secret for slurm-bridge to communicate with Slurm.

export SLURM_JWT=$(scontrol token username=slurm lifespan=infinite)
kubectl create namespace slurm-bridge
kubectl create secret generic slurm-bridge-jwt-token --namespace=slinky --from-literal="auth-token=$SLURM_JWT" --type=Opaque

Install the slurm-bridge scheduler:

helm install slurm-bridge oci://ghcr.io/slinkyproject/charts/slurm-bridge \
  --namespace=slinky --create-namespace

For additional instructions, see the quickstart guide.

Documentation

Project documentation is located in the docs directory of this repository.

Slinky documentation can be found here.

Support and Development

Feature requests, code contributions, and bug reports are welcome!

Github/Gitlab submitted issues and PRs/MRs are handled on a best effort basis.

The SchedMD official issue tracker is at https://support.schedmd.com/.

To schedule a demo or simply to reach out, please contact SchedMD.

License

Copyright (C) SchedMD LLC.

Licensed under the Apache License, Version 2.0 you may not use project except in compliance with the license.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
.gitlab		.gitlab
.vscode		.vscode
CHANGELOG		CHANGELOG
LICENSES		LICENSES
cmd		cmd
config		config
docs		docs
hack		hack
helm/slurm-bridge		helm/slurm-bridge
internal		internal
test/e2e		test/e2e
tools		tools
.codespellrc		.codespellrc
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.golangci.yaml		.golangci.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
commitlint.config.ts		commitlint.config.ts
docker-bake.hcl		docker-bake.hcl
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Slurm Bridge

Table of Contents

Overview

Features

Slurm

Requirements

Limitations

Installation

Documentation

Support and Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

SlinkyProject/slurm-bridge

Folders and files

Latest commit

History

Repository files navigation

Slurm Bridge

Table of Contents

Overview

Features

Slurm

Requirements

Limitations

Installation

Documentation

Support and Development

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages