Merge pull request #105 from VectorInstitute/slurm_dependency

XkunW · web-flow · commit 09aa9ced28f9 · 2025-05-22T13:00:55.000-04:00
Add Slurm dependency example
diff --git a/examples/README.md b/examples/README.md
@@ -9,3 +9,4 @@
   - [`logits.py`](logits/logits.py): Python example of getting logits from hosted model.
 - [`api`](api): Examples for using the Python API
   - [`basic_usage.py`](api/basic_usage.py): Basic Python example demonstrating the Vector Inference API
+- [`slurm_dependency`](slurm_dependency): Example of launching a model with `vec-inf` and running a downstream SLURM job that waits for the server to be ready before sending a request.
diff --git a/examples/slurm_dependency/README.md b/examples/slurm_dependency/README.md
@@ -0,0 +1,33 @@
+# SLURM Dependency Workflow Example
+
+This example demonstrates how to launch a model server using `vec-inf`, and run a downstream SLURM job that waits for the server to become ready before querying it.
+
+## Files
+
+This directory contains the following:
+
+1. [run_workflow.sh](run_workflow.sh)
+   Launches the model server and submits the downstream job with a dependency, so it starts only after the server job begins running.
+
+2. [downstream_job.sbatch](downstream_job.sbatch)
+   A SLURM job script that runs the downstream logic (e.g., prompting the model).
+
+3. [run_downstream.py](run_downstream.py)
+   A Python script that waits until the inference server is ready, then sends a request using the OpenAI-compatible API.
+
+## What to update
+
+Before running this example, update the following in [downstream_job.sbatch](downstream_job.sbatch):
+
+- `--job-name`, `--output`, and `--error` paths
+- Virtual environment path in the `source` line
+- SLURM resource configuration (e.g., partition, memory, GPU)
+
+Also update the model name in [run_downstream.py](run_downstream.py) to match what you're launching.
+
+## Running the example
+
+First, activate a virtual environment where `vec-inf` is installed. Then, from this directory, run:
+
+```bash
+bash run_workflow.sh
diff --git a/examples/slurm_dependency/downstream_job.sbatch b/examples/slurm_dependency/downstream_job.sbatch
@@ -0,0 +1,18 @@
+#!/bin/bash
+#SBATCH --job-name=Meta-Llama-3.1-8B-Instruct-downstream
+#SBATCH --partition=a40
+#SBATCH --qos=m2
+#SBATCH --time=08:00:00
+#SBATCH --nodes=1
+#SBATCH --gpus-per-node=1
+#SBATCH --cpus-per-task=4
+#SBATCH --mem=8G
+#SBATCH --output=$HOME/.vec-inf-logs/Meta-Llama-3.1-8B-Instruct-downstream.%j.out
+#SBATCH --error=$HOME/.vec-inf-logs/Meta-Llama-3.1-8B-Instruct-downstream.%j.err
+
+# Activate your environment
+# TODO: update this path to match your venv location
+source $HOME/vector-inference/.venv/bin/activate
+
+# Wait for the server to be ready using the job ID passed as CLI arg
+python run_downstream.py "$SERVER_JOB_ID"
diff --git a/examples/slurm_dependency/run_downstream.py b/examples/slurm_dependency/run_downstream.py
@@ -0,0 +1,26 @@
+"""Example script to query a launched model via the OpenAI-compatible API."""
+
+import sys
+
+from openai import OpenAI
+
+from vec_inf.client import VecInfClient
+
+
+if len(sys.argv) < 2:
+    raise ValueError("Expected server job ID as the first argument.")
+job_id = int(sys.argv[1])
+
+vi_client = VecInfClient()
+print(f"Waiting for SLURM job {job_id} to be ready...")
+status = vi_client.wait_until_ready(slurm_job_id=job_id)
+print(f"Server is ready at {status.base_url}")
+
+api_client = OpenAI(base_url=status.base_url, api_key="EMPTY")
+resp = api_client.completions.create(
+    model="Meta-Llama-3.1-8B-Instruct",
+    prompt="Where is the capital of Canada?",
+    max_tokens=20,
+)
+
+print(resp)
diff --git a/examples/slurm_dependency/run_workflow.sh b/examples/slurm_dependency/run_workflow.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+# ---- Config ----
+MODEL_NAME="Meta-Llama-3.1-8B-Instruct"
+LAUNCH_ARGS="$MODEL_NAME"
+
+# ---- Step 1: Launch the server
+RAW_JSON=$(vec-inf launch $LAUNCH_ARGS --json-mode)
+SERVER_JOB_ID=$(echo "$RAW_JSON" | python3 -c "import sys, json; print(json.load(sys.stdin)['slurm_job_id'])")
+echo "Launched server as job $SERVER_JOB_ID"
+echo "$RAW_JSON"
+
+# ---- Step 2: Submit downstream job
+sbatch --dependency=after:$SERVER_JOB_ID --export=SERVER_JOB_ID=$SERVER_JOB_ID downstream_job.sbatch
diff --git a/vec_inf/cli/_cli.py b/vec_inf/cli/_cli.py
@@ -18,6 +18,7 @@
     Stream real-time performance metrics
 """
 
+import json
 import time
 from typing import Optional, Union
 
@@ -180,8 +181,9 @@ def launch(
 
         # Display launch information
         launch_formatter = LaunchResponseFormatter(model_name, launch_response.config)
+
         if json_mode:
-            click.echo(launch_response.config)
+            click.echo(json.dumps(launch_response.config))
         else:
             launch_info_table = launch_formatter.format_table_output()
             CONSOLE.print(launch_info_table)