Skip to content

Commit 7f382ba

Browse files
authored
Merge pull request #117 from VectorInstitute/release/v0.6.1
Release v0.6.1
2 parents c618fef + eda2913 commit 7f382ba

File tree

4 files changed

+12
-7
lines changed

4 files changed

+12
-7
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10+
[![vLLM](https://img.shields.io/badge/vllm-0.8.5.post1-blue)](https://docs.vllm.ai/en/v0.8.5.post1/index.html)
1011
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1112

1213
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update the environment variables in [`vec_inf/client/slurm_vars.py`](vec_inf/client/slurm_vars.py), and the model config for cached model weights in [`vec_inf/config/models.yaml`](vec_inf/config/models.yaml) accordingly.
@@ -17,7 +18,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
1718
```bash
1819
pip install vec-inf
1920
```
20-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package
21+
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
2122

2223
## Usage
2324

@@ -164,8 +165,9 @@ Once the inference server is ready, you can start sending in inference requests.
164165
},
165166
"prompt_logprobs":null
166167
}
168+
167169
```
168-
**NOTE**: Certain models don't adhere to OpenAI's chat template, e.g. Mistral family. For these models, you can either change your prompt to follow the model's default chat template or provide your own chat template via `--chat-template: TEMPLATE_PATH`
170+
**NOTE**: Certain models don't adhere to OpenAI's chat template, e.g. Mistral family. For these models, you can either change your prompt to follow the model's default chat template or provide your own chat template via `--chat-template: TEMPLATE_PATH`.
169171

170172
## SSH tunnel from your local device
171173
If you want to run inference from your local device, you can open a SSH tunnel to your cluster environment like the following:

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ If you are using the Vector cluster environment, and you don't need any customiz
1010
pip install vec-inf
1111
```
1212

13-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package.
13+
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.

docs/user_guide.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,11 @@ You would then set the `VEC_INF_CONFIG` path using:
9191
export VEC_INF_CONFIG=/h/<username>/my-model-config.yaml
9292
```
9393

94-
Note that there are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/client/config.py) for details.
94+
**NOTE**
95+
* There are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/client/config.py) for details.
96+
* Check [vLLM Engine Arguments](https://docs.vllm.ai/en/stable/serving/engine_args.html) for the full list of available vLLM engine arguments. The default parallel size for any parallelization defaults to 1, so none of the sizes were set specifically in this example.
97+
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`.
98+
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.
9599

96100
### `status` command
97101

@@ -254,8 +258,7 @@ Once the inference server is ready, you can start sending in inference requests.
254258
}
255259
```
256260

257-
258-
**NOTE**: For multimodal models, currently only `ChatCompletion` is available, and only one image can be provided for each prompt.
261+
**NOTE**: Certain models don't adhere to OpenAI's chat template, e.g. Mistral family. For these models, you can either change your prompt to follow the model's default chat template or provide your own chat template via `--chat-template: TEMPLATE_PATH`.
259262

260263
## SSH tunnel from your local device
261264

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "vec-inf"
3-
version = "0.6.0"
3+
version = "0.6.1"
44
description = "Efficient LLM inference on Slurm clusters using vLLM."
55
readme = "README.md"
66
authors = [{name = "Marshall Wang", email = "marshall.wang@vectorinstitute.ai"}]

0 commit comments

Comments
 (0)