Skip to content

Commit b61bc4f

Browse files
authored
Merge pull request #112 from VectorInstitute/bugfix/multinode
Misc small features and bug fixes: - Fixed multi-node launch GPU placement group issue: `--exclusive` option is needed for slurm script and compilation config needs to stay at 0 - Set environment variables in the generated slurm script instead of in the helper to ensure reusability - Replaced `python3.10 -m vllm.entrypoints.openai.api_server` with `vllm serve` to support custom chat template usage - Added additional launch options: `--exclude` for excluding certain nodes, `--node-list` for targeting a specific list of nodes, and `--bind` for binding additional directories - Added remaining vLLM engine arg short-long name mappings for robustness - Added some notes in README to capture some gotchas
2 parents 8de1c41 + c765087 commit b61bc4f

File tree

8 files changed

+79
-36
lines changed

8 files changed

+79
-36
lines changed

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ models:
8585
vllm_args:
8686
--max-model-len: 1010000
8787
--max-num-seqs: 256
88-
--compilation-confi: 3
88+
--compilation-config: 3
8989
```
9090
9191
You would then set the `VEC_INF_CONFIG` path using:
@@ -94,7 +94,11 @@ You would then set the `VEC_INF_CONFIG` path using:
9494
export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
9595
```
9696

97-
Note that there are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](vec_inf/client/config.py) for details.
97+
**NOTE**
98+
* There are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](vec_inf/client/config.py) for details.
99+
* Check [vLLM Engine Arguments](https://docs.vllm.ai/en/stable/serving/engine_args.html) for the full list of available vLLM engine arguments, the default parallel size for any parallelization is default to 1, so none of the sizes were set specifically in this example
100+
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`
101+
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.
98102

99103
#### Other commands
100104

@@ -161,7 +165,7 @@ Once the inference server is ready, you can start sending in inference requests.
161165
"prompt_logprobs":null
162166
}
163167
```
164-
**NOTE**: For multimodal models, currently only `ChatCompletion` is available, and only one image can be provided for each prompt.
168+
**NOTE**: Certain models don't adhere to OpenAI's chat template, e.g. Mistral family. For these models, you can either change your prompt to follow the model's default chat template or provide your own chat template via `--chat-template: TEMPLATE_PATH`
165169

166170
## SSH tunnel from your local device
167171
If you want to run inference from your local device, you can open a SSH tunnel to your cluster environment like the following:

vec_inf/cli/_cli.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,21 @@ def cli() -> None:
7272
type=str,
7373
help="Quality of service",
7474
)
75+
@click.option(
76+
"--exclude",
77+
type=str,
78+
help="Exclude certain nodes from the resources granted to the job",
79+
)
80+
@click.option(
81+
"--node-list",
82+
type=str,
83+
help="Request a specific list of nodes for deployment",
84+
)
85+
@click.option(
86+
"--bind",
87+
type=str,
88+
help="Additional binds for the singularity container as a comma separated list of bind paths",
89+
)
7590
@click.option(
7691
"--time",
7792
type=str,
@@ -124,8 +139,16 @@ def launch(
124139
Number of nodes to use
125140
- gpus_per_node : int, optional
126141
Number of GPUs per node
142+
- account : str, optional
143+
Charge resources used by this job to specified account
127144
- qos : str, optional
128145
Quality of service tier
146+
- exclude : str, optional
147+
Exclude certain nodes from the resources granted to the job
148+
- node_list : str, optional
149+
Request a specific list of nodes for deployment
150+
- bind : str, optional
151+
Additional binds for the singularity container
129152
- time : str, optional
130153
Time limit for job
131154
- venv : str, optional

vec_inf/client/_client_vars.py

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,12 @@
2121
from pathlib import Path
2222
from typing import TypedDict
2323

24-
from vec_inf.client.slurm_vars import SINGULARITY_LOAD_CMD
24+
from vec_inf.client.slurm_vars import (
25+
LD_LIBRARY_PATH,
26+
SINGULARITY_IMAGE,
27+
SINGULARITY_LOAD_CMD,
28+
VLLM_NCCL_SO_PATH,
29+
)
2530

2631

2732
MODEL_READY_SIGNATURE = "INFO: Application startup complete."
@@ -60,6 +65,8 @@
6065
"qos": "qos",
6166
"time": "time",
6267
"nodes": "num_nodes",
68+
"exclude": "exclude",
69+
"nodelist": "node_list",
6370
"gpus-per-node": "gpus_per_node",
6471
"cpus-per-task": "cpus_per_task",
6572
"mem": "mem_per_node",
@@ -71,7 +78,12 @@
7178
VLLM_SHORT_TO_LONG_MAP = {
7279
"-tp": "--tensor-parallel-size",
7380
"-pp": "--pipeline-parallel-size",
81+
"-dp": "--data-parallel-size",
82+
"-dpl": "--data-parallel-size-local",
83+
"-dpa": "--data-parallel-address",
84+
"-dpp": "--data-parallel-rpc-port",
7485
"-O": "--compilation-config",
86+
"-q": "--quantization",
7587
}
7688

7789

@@ -117,6 +129,8 @@ class SlurmScriptTemplate(TypedDict):
117129
Commands for Singularity container setup
118130
imports : str
119131
Import statements and source commands
132+
env_vars : list[str]
133+
Environment variables to set
120134
singularity_command : str
121135
Template for Singularity execution command
122136
activate_venv : str
@@ -134,6 +148,7 @@ class SlurmScriptTemplate(TypedDict):
134148
shebang: ShebangConfig
135149
singularity_setup: list[str]
136150
imports: str
151+
env_vars: list[str]
137152
singularity_command: str
138153
activate_venv: str
139154
server_setup: ServerSetupConfig
@@ -152,10 +167,14 @@ class SlurmScriptTemplate(TypedDict):
152167
},
153168
"singularity_setup": [
154169
SINGULARITY_LOAD_CMD,
155-
"singularity exec {singularity_image} ray stop",
170+
f"singularity exec {SINGULARITY_IMAGE} ray stop",
156171
],
157172
"imports": "source {src_dir}/find_port.sh",
158-
"singularity_command": "singularity exec --nv --bind {model_weights_path}:{model_weights_path} --containall {singularity_image}",
173+
"env_vars": [
174+
f"export LD_LIBRARY_PATH={LD_LIBRARY_PATH}",
175+
f"export VLLM_NCCL_SO_PATH={VLLM_NCCL_SO_PATH}",
176+
],
177+
"singularity_command": f"singularity exec --nv --bind {{model_weights_path}}{{additional_binds}} --containall {SINGULARITY_IMAGE}",
159178
"activate_venv": "source {venv}/bin/activate",
160179
"server_setup": {
161180
"single_node": [
@@ -203,8 +222,7 @@ class SlurmScriptTemplate(TypedDict):
203222
' && mv temp.json "$json_path"',
204223
],
205224
"launch_cmd": [
206-
"python3.10 -m vllm.entrypoints.openai.api_server \\",
207-
" --model {model_weights_path} \\",
225+
"vllm serve {model_weights_path} \\",
208226
" --served-model-name {model_name} \\",
209227
' --host "0.0.0.0" \\',
210228
" --port $vllm_port_number \\",

vec_inf/client/_helper.py

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
"""
66

77
import json
8-
import os
98
import time
109
import warnings
1110
from pathlib import Path
@@ -36,10 +35,6 @@
3635
ModelType,
3736
StatusResponse,
3837
)
39-
from vec_inf.client.slurm_vars import (
40-
LD_LIBRARY_PATH,
41-
VLLM_NCCL_SO_PATH,
42-
)
4338

4439

4540
class ModelLauncher:
@@ -230,11 +225,6 @@ def _get_launch_params(self) -> dict[str, Any]:
230225

231226
return params
232227

233-
def _set_env_vars(self) -> None:
234-
"""Set environment variables for the launch command."""
235-
os.environ["LD_LIBRARY_PATH"] = LD_LIBRARY_PATH
236-
os.environ["VLLM_NCCL_SO_PATH"] = VLLM_NCCL_SO_PATH
237-
238228
def _build_launch_command(self) -> str:
239229
"""Generate the slurm script and construct the launch command.
240230
@@ -259,9 +249,6 @@ def launch(self) -> LaunchResponse:
259249
SlurmJobError
260250
If SLURM job submission fails
261251
"""
262-
# Set environment variables
263-
self._set_env_vars()
264-
265252
# Build and execute the launch command
266253
command_output, stderr = utils.run_bash_command(self._build_launch_command())
267254

vec_inf/client/_slurm_script_generator.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
SLURM_JOB_CONFIG_ARGS,
1313
SLURM_SCRIPT_TEMPLATE,
1414
)
15-
from vec_inf.client.slurm_vars import SINGULARITY_IMAGE
1615

1716

1817
class SlurmScriptGenerator:
@@ -40,6 +39,9 @@ def __init__(self, params: dict[str, Any]):
4039
self.params = params
4140
self.is_multinode = int(self.params["num_nodes"]) > 1
4241
self.use_singularity = self.params["venv"] == "singularity"
42+
self.additional_binds = self.params.get("bind", "")
43+
if self.additional_binds:
44+
self.additional_binds = f" --bind {self.additional_binds}"
4345
self.model_weights_path = str(
4446
Path(params["model_weights_parent_dir"], params["model_name"])
4547
)
@@ -87,11 +89,8 @@ def _generate_server_setup(self) -> str:
8789
"""
8890
server_script = ["\n"]
8991
if self.use_singularity:
90-
server_script.append(
91-
"\n".join(SLURM_SCRIPT_TEMPLATE["singularity_setup"]).format(
92-
singularity_image=SINGULARITY_IMAGE,
93-
)
94-
)
92+
server_script.append("\n".join(SLURM_SCRIPT_TEMPLATE["singularity_setup"]))
93+
server_script.append("\n".join(SLURM_SCRIPT_TEMPLATE["env_vars"]))
9594
server_script.append(
9695
SLURM_SCRIPT_TEMPLATE["imports"].format(src_dir=self.params["src_dir"])
9796
)
@@ -104,7 +103,7 @@ def _generate_server_setup(self) -> str:
104103
"SINGULARITY_PLACEHOLDER",
105104
SLURM_SCRIPT_TEMPLATE["singularity_command"].format(
106105
model_weights_path=self.model_weights_path,
107-
singularity_image=SINGULARITY_IMAGE,
106+
additional_binds=self.additional_binds,
108107
),
109108
)
110109
else:
@@ -136,7 +135,7 @@ def _generate_launch_cmd(self) -> str:
136135
launcher_script.append(
137136
SLURM_SCRIPT_TEMPLATE["singularity_command"].format(
138137
model_weights_path=self.model_weights_path,
139-
singularity_image=SINGULARITY_IMAGE,
138+
additional_binds=self.additional_binds,
140139
)
141140
+ " \\"
142141
)

vec_inf/client/config.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,16 @@ class ModelConfig(BaseModel):
108108
partition: Union[PARTITION, str] = Field(
109109
default=cast(str, DEFAULT_ARGS["partition"]), description="GPU partition type"
110110
)
111+
exclude: Optional[str] = Field(
112+
default=None,
113+
description="Exclude certain nodes from the resources granted to the job",
114+
)
115+
node_list: Optional[str] = Field(
116+
default=None, description="Request a specific list of nodes for deployment"
117+
)
118+
bind: Optional[str] = Field(
119+
default=None, description="Additional binds for the singularity container"
120+
)
111121
venv: str = Field(
112122
default="singularity", description="Virtual environment/container system"
113123
)

vec_inf/client/models.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,12 @@ class LaunchOptions:
170170
Quality of Service level
171171
time : str, optional
172172
Time limit for the job
173+
exclude : str, optional
174+
Exclude certain nodes from the resources granted to the job
175+
node_list : str, optional
176+
Request a specific list of nodes for deployment
177+
bind : str, optional
178+
Additional binds for the singularity container
173179
vocab_size : int, optional
174180
Size of model vocabulary
175181
data_type : str, optional
@@ -191,6 +197,9 @@ class LaunchOptions:
191197
gpus_per_node: Optional[int] = None
192198
account: Optional[str] = None
193199
qos: Optional[str] = None
200+
exclude: Optional[str] = None
201+
node_list: Optional[str] = None
202+
bind: Optional[str] = None
194203
time: Optional[str] = None
195204
vocab_size: Optional[int] = None
196205
data_type: Optional[str] = None

vec_inf/config/models.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ models:
1414
--tensor-parallel-size: 4
1515
--max-model-len: 8192
1616
--max-num-seqs: 256
17-
--compilation-config: 3
1817
c4ai-command-r-plus-08-2024:
1918
model_family: c4ai-command-r
2019
model_variant: plus-08-2024
@@ -30,7 +29,6 @@ models:
3029
--tensor-parallel-size: 4
3130
--max-model-len: 65536
3231
--max-num-seqs: 256
33-
--compilation-config: 3
3432
c4ai-command-r-08-2024:
3533
model_family: c4ai-command-r
3634
model_variant: 08-2024
@@ -494,7 +492,6 @@ models:
494492
--tensor-parallel-size: 4
495493
--max-model-len: 16384
496494
--max-num-seqs: 256
497-
--compilation-config: 3
498495
Mistral-7B-Instruct-v0.1:
499496
model_family: Mistral
500497
model_variant: 7B-Instruct-v0.1
@@ -566,7 +563,6 @@ models:
566563
--tensor-parallel-size: 4
567564
--max-model-len: 32768
568565
--max-num-seqs: 256
569-
--compilation-config: 3
570566
Mistral-Large-Instruct-2411:
571567
model_family: Mistral
572568
model_variant: Large-Instruct-2411
@@ -582,7 +578,6 @@ models:
582578
--tensor-parallel-size: 4
583579
--max-model-len: 32768
584580
--max-num-seqs: 256
585-
--compilation-config: 3
586581
Mixtral-8x7B-Instruct-v0.1:
587582
model_family: Mixtral
588583
model_variant: 8x7B-Instruct-v0.1
@@ -613,7 +608,6 @@ models:
613608
--tensor-parallel-size: 4
614609
--max-model-len: 65536
615610
--max-num-seqs: 256
616-
--compilation-config: 3
617611
Mixtral-8x22B-Instruct-v0.1:
618612
model_family: Mixtral
619613
model_variant: 8x22B-Instruct-v0.1
@@ -629,7 +623,6 @@ models:
629623
--tensor-parallel-size: 4
630624
--max-model-len: 65536
631625
--max-num-seqs: 256
632-
--compilation-config: 3
633626
Phi-3-medium-128k-instruct:
634627
model_family: Phi-3
635628
model_variant: medium-128k-instruct

0 commit comments

Comments
 (0)