Skip to content

Commit 09aa9ce

Browse files
authored
Merge pull request #105 from VectorInstitute/slurm_dependency
Add Slurm dependency example
2 parents b61bc4f + bdcf4e6 commit 09aa9ce

File tree

6 files changed

+95
-1
lines changed

6 files changed

+95
-1
lines changed

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@
99
- [`logits.py`](logits/logits.py): Python example of getting logits from hosted model.
1010
- [`api`](api): Examples for using the Python API
1111
- [`basic_usage.py`](api/basic_usage.py): Basic Python example demonstrating the Vector Inference API
12+
- [`slurm_dependency`](slurm_dependency): Example of launching a model with `vec-inf` and running a downstream SLURM job that waits for the server to be ready before sending a request.

examples/slurm_dependency/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# SLURM Dependency Workflow Example
2+
3+
This example demonstrates how to launch a model server using `vec-inf`, and run a downstream SLURM job that waits for the server to become ready before querying it.
4+
5+
## Files
6+
7+
This directory contains the following:
8+
9+
1. [run_workflow.sh](run_workflow.sh)
10+
Launches the model server and submits the downstream job with a dependency, so it starts only after the server job begins running.
11+
12+
2. [downstream_job.sbatch](downstream_job.sbatch)
13+
A SLURM job script that runs the downstream logic (e.g., prompting the model).
14+
15+
3. [run_downstream.py](run_downstream.py)
16+
A Python script that waits until the inference server is ready, then sends a request using the OpenAI-compatible API.
17+
18+
## What to update
19+
20+
Before running this example, update the following in [downstream_job.sbatch](downstream_job.sbatch):
21+
22+
- `--job-name`, `--output`, and `--error` paths
23+
- Virtual environment path in the `source` line
24+
- SLURM resource configuration (e.g., partition, memory, GPU)
25+
26+
Also update the model name in [run_downstream.py](run_downstream.py) to match what you're launching.
27+
28+
## Running the example
29+
30+
First, activate a virtual environment where `vec-inf` is installed. Then, from this directory, run:
31+
32+
```bash
33+
bash run_workflow.sh
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
#SBATCH --job-name=Meta-Llama-3.1-8B-Instruct-downstream
3+
#SBATCH --partition=a40
4+
#SBATCH --qos=m2
5+
#SBATCH --time=08:00:00
6+
#SBATCH --nodes=1
7+
#SBATCH --gpus-per-node=1
8+
#SBATCH --cpus-per-task=4
9+
#SBATCH --mem=8G
10+
#SBATCH --output=$HOME/.vec-inf-logs/Meta-Llama-3.1-8B-Instruct-downstream.%j.out
11+
#SBATCH --error=$HOME/.vec-inf-logs/Meta-Llama-3.1-8B-Instruct-downstream.%j.err
12+
13+
# Activate your environment
14+
# TODO: update this path to match your venv location
15+
source $HOME/vector-inference/.venv/bin/activate
16+
17+
# Wait for the server to be ready using the job ID passed as CLI arg
18+
python run_downstream.py "$SERVER_JOB_ID"
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
"""Example script to query a launched model via the OpenAI-compatible API."""
2+
3+
import sys
4+
5+
from openai import OpenAI
6+
7+
from vec_inf.client import VecInfClient
8+
9+
10+
if len(sys.argv) < 2:
11+
raise ValueError("Expected server job ID as the first argument.")
12+
job_id = int(sys.argv[1])
13+
14+
vi_client = VecInfClient()
15+
print(f"Waiting for SLURM job {job_id} to be ready...")
16+
status = vi_client.wait_until_ready(slurm_job_id=job_id)
17+
print(f"Server is ready at {status.base_url}")
18+
19+
api_client = OpenAI(base_url=status.base_url, api_key="EMPTY")
20+
resp = api_client.completions.create(
21+
model="Meta-Llama-3.1-8B-Instruct",
22+
prompt="Where is the capital of Canada?",
23+
max_tokens=20,
24+
)
25+
26+
print(resp)
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
3+
# ---- Config ----
4+
MODEL_NAME="Meta-Llama-3.1-8B-Instruct"
5+
LAUNCH_ARGS="$MODEL_NAME"
6+
7+
# ---- Step 1: Launch the server
8+
RAW_JSON=$(vec-inf launch $LAUNCH_ARGS --json-mode)
9+
SERVER_JOB_ID=$(echo "$RAW_JSON" | python3 -c "import sys, json; print(json.load(sys.stdin)['slurm_job_id'])")
10+
echo "Launched server as job $SERVER_JOB_ID"
11+
echo "$RAW_JSON"
12+
13+
# ---- Step 2: Submit downstream job
14+
sbatch --dependency=after:$SERVER_JOB_ID --export=SERVER_JOB_ID=$SERVER_JOB_ID downstream_job.sbatch

vec_inf/cli/_cli.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
Stream real-time performance metrics
1919
"""
2020

21+
import json
2122
import time
2223
from typing import Optional, Union
2324

@@ -180,8 +181,9 @@ def launch(
180181

181182
# Display launch information
182183
launch_formatter = LaunchResponseFormatter(model_name, launch_response.config)
184+
183185
if json_mode:
184-
click.echo(launch_response.config)
186+
click.echo(json.dumps(launch_response.config))
185187
else:
186188
launch_info_table = launch_formatter.format_table_output()
187189
CONSOLE.print(launch_info_table)

0 commit comments

Comments
 (0)