VectorInstitute
diff --git a/‎.github/workflows/docs_build.yml
Lines changed: 0 additions & 45 deletions b/‎.github/workflows/docs_build.yml
Lines changed: 0 additions & 45 deletions
diff --git a/‎.github/workflows/docs_deploy.yml
Lines changed: 0 additions & 55 deletions b/‎.github/workflows/docs_deploy.yml
Lines changed: 0 additions & 55 deletions
diff --git a/‎.github/workflows/integration_tests.yml
Lines changed: 0 additions & 61 deletions b/‎.github/workflows/integration_tests.yml
Lines changed: 0 additions & 61 deletions
diff --git a/‎.gitignore
Lines changed: 1 addition & 1 deletion b/‎.gitignore
Lines changed: 1 addition & 1 deletion
diff --git a/‎.pre-commit-config.yaml
Lines changed: 4 additions & 21 deletions b/‎.pre-commit-config.yaml
Lines changed: 4 additions & 21 deletions
diff --git a/‎Dockerfile
Lines changed: 1 addition & 1 deletion b/‎Dockerfile
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 3 additions & 3 deletions b/‎README.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/README.md
Lines changed: 1 addition & 1 deletion b/‎examples/README.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/inference/llm/chat_completions.py
Lines changed: 9 additions & 6 deletions b/‎examples/inference/llm/chat_completions.py
Lines changed: 9 additions & 6 deletions
diff --git a/‎examples/inference/llm/completions.py
Lines changed: 1 addition & 1 deletion b/‎examples/inference/llm/completions.py
Lines changed: 1 addition & 1 deletion
@@ -154,4 +154,4 @@ scripts/
 collect_env.py
 
 # build files
-dist/
+dist/
@@ -1,6 +1,6 @@
 repos:
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.4.0  # Use the ref you want to point at
+    rev: v4.6.0  # Use the ref you want to point at
     hooks:
     - id: trailing-whitespace
     - id: check-ast
@@ -16,7 +16,7 @@ repos:
     - id: check-toml
 
   - repo: https://github.com/charliermarsh/ruff-pre-commit
-    rev: 'v0.2.2'
+    rev: 'v0.6.2'
     hooks:
     - id: ruff
       args: [--fix, --exit-non-zero-on-fix]
@@ -25,7 +25,7 @@ repos:
       types_or: [python, jupyter]
 
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.8.0
+    rev: v1.11.1
     hooks:
     - id: mypy
       entry: python3 -m mypy --config-file pyproject.toml
@@ -34,24 +34,7 @@ repos:
       exclude: "tests"
 
   - repo: https://github.com/nbQA-dev/nbQA
-    rev: 1.7.1
+    rev: 1.8.7
     hooks:
     - id: nbqa-ruff
       args: [--fix, --exit-non-zero-on-fix]
-
-  - repo: local
-    hooks:
-    - id: doctest
-      name: doctest
-      entry: python3 -m doctest -o NORMALIZE_WHITESPACE
-      files: "^aieng_template/"
-      language: system
-
-  - repo: local
-    hooks:
-    - id: pytest
-      name: pytest
-      entry: python3 -m pytest -m "not integration_test"
-      language: system
-      pass_filenames: false
-      always_run: true
@@ -67,7 +67,7 @@ RUN python3.10 -m pip install flash-attn --no-build-isolation
 
 # Move nccl to accessible location
 RUN mkdir -p /vec-inf/nccl
-RUN mv /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 /vec-inf/nccl/libnccl.so.2.18.1; 
+RUN mv /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 /vec-inf/nccl/libnccl.so.2.18.1;
 
 # Set the default command to start an interactive shell
 CMD ["bash"]
@@ -1,5 +1,5 @@
 # Vector Inference: Easy inference on Slurm clusters
-This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec-inf/launch_server.sh), [`vllm.slurm`](vec-inf/vllm.slurm), [`multinode_vllm.slurm`](vec-inf/multinode_vllm.slurm) and [`models.csv`](vec-inf/models/models.csv) accordingly.  
+This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, update [`launch_server.sh`](vec-inf/launch_server.sh), [`vllm.slurm`](vec-inf/vllm.slurm), [`multinode_vllm.slurm`](vec-inf/multinode_vllm.slurm) and [`models.csv`](vec-inf/models/models.csv) accordingly.
 
 ## Installation
 If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
@@ -17,7 +17,7 @@ You should see an output like the following:
 
 <img width="400" alt="launch_img" src="https://github.com/user-attachments/assets/557eb421-47db-4810-bccd-c49c526b1b43">
 
-The model would be launched using the [default parameters](vec-inf/models/models.csv), you can override these values by providing additional options, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), you'll need to specify all model launching related options to run a successful run. 
+The model would be launched using the [default parameters](vec-inf/models/models.csv), you can override these values by providing additional options, use `--help` to see the full list. You can also launch your own customized model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html), you'll need to specify all model launching related options to run a successful run.
 
 You can check the inference server status by providing the Slurm job ID to the `status` command:
 ```bash
@@ -32,7 +32,7 @@ There are 5 possible states:
 
 * **PENDING**: Job submitted to Slurm, but not executed yet. Job pending reason will be shown.
 * **LAUNCHING**: Job is running but the server is not ready yet.
-* **READY**: Inference server running and ready to take requests. 
+* **READY**: Inference server running and ready to take requests.
 * **FAILED**: Inference server in an unhealthy state. Job failed reason will be shown.
 * **SHUTDOWN**: Inference server is shutdown/cancelled.
 
 
@@ -5,4 +5,4 @@
   - [`llm/completions.sh`](inference/llm/completions.sh): Bash example of sending completion requests to OpenAI compatible server, supports JSON mode
   - [`vlm/vision_completions.py`](inference/vlm/vision_completions.py): Python example of sending chat completion requests with image attached to prompt to OpenAI compatible server for vision language models
 - [`logits`](logits): Example for logits generation
-  - [`logits.py`](logits/logits.py): Python example of getting logits from hosted model.  
+  - [`logits.py`](logits/logits.py): Python example of getting logits from hosted model.
@@ -5,11 +5,14 @@
 
 # Update the model path accordingly
 completion = client.chat.completions.create(
-  model="/model-weights/Meta-Llama-3-8B-Instruct",
-  messages=[
-    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
-    {"role": "user", "content": "Who are you?"},
-  ]
+    model="/model-weights/Meta-Llama-3-8B-Instruct",
+    messages=[
+        {
+            "role": "system",
+            "content": "You are a pirate chatbot who always responds in pirate speak!",
+        },
+        {"role": "user", "content": "Who are you?"},
+    ],
 )
 
-print(completion)
+print(completion)
@@ -10,4 +10,4 @@
     max_tokens=20,
 )
 
-print(completion)
+print(completion)
Original file line number	Diff line number	Diff line change
`@@ -10,4 +10,4 @@`
`10`	`10`	`max_tokens=20,`
`11`	`11`	`)`
`12`	`12`
`13`		`-print(completion)`
	`13`	`+print(completion)`