Skip to content

Commit 70989a4

Browse files
authored
Merge pull request #138 from VectorInstitute/feature/update-docs
Update docs
2 parents e6a4762 + 7b09e4d commit 70989a4

File tree

4 files changed

+334
-416
lines changed

4 files changed

+334
-416
lines changed

MODEL_TRACKING.md

Lines changed: 324 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,324 @@
1+
# Model Weights Tracking
2+
3+
This document tracks all model weights available in the `/model-weights` directory on Killarney cluster and indicates which ones have existing configurations in the cached model config (`/model-weights/vec-inf-shared/models.yaml`). By default, `vec-inf` would use the cached model config. To request new model weights to be downloaded or model configuration to be added, please open an issue for "Model request".
4+
5+
**NOTE**: The [`models.yaml`](./vec_inf/config/models.yaml) file in the package is not always up to date with the latest cached model config on Killarney cluster, new model config would be added to the cached model config. `models.yaml` would be updated to reflect the cached model config when a new version of the package is released.
6+
7+
## Legend
8+
-**Configured**: Model has a complete configuration in `models.yaml`
9+
-**Not Configured**: Model exists in `/model-weights` but lacks configuration
10+
11+
---
12+
13+
## Text Generation Models (LLM)
14+
15+
### Cohere for AI: Command R
16+
| Model | Configuration |
17+
|:------|:-------------|
18+
| `c4ai-command-r-plus-08-2024` ||
19+
| `c4ai-command-r-08-2024` ||
20+
21+
### Code Llama
22+
| Model | Configuration |
23+
|:------|:-------------|
24+
| `CodeLlama-7b-hf` ||
25+
| `CodeLlama-7b-Instruct-hf` ||
26+
| `CodeLlama-13b-hf` ||
27+
| `CodeLlama-13b-Instruct-hf` ||
28+
| `CodeLlama-34b-hf` ||
29+
| `CodeLlama-34b-Instruct-hf` ||
30+
| `CodeLlama-70b-hf` ||
31+
| `CodeLlama-70b-Instruct-hf` ||
32+
| `CodeLlama-7b-Python-hf` ||
33+
| `CodeLlama-13b-Python-hf` ||
34+
| `CodeLlama-70b-Python-hf` ||
35+
36+
### Google: Gemma
37+
| Model | Configuration |
38+
|:------|:-------------|
39+
| `gemma-2b` ||
40+
| `gemma-2b-it` ||
41+
| `gemma-7b` ||
42+
| `gemma-7b-it` ||
43+
| `gemma-2-9b` ||
44+
| `gemma-2-9b-it` ||
45+
| `gemma-2-27b` ||
46+
| `gemma-2-27b-it` ||
47+
| `gemma-3-1b-it` ||
48+
| `gemma-3-4b-it` ||
49+
| `gemma-3-12b-it` ||
50+
| `gemma-3-27b-it` ||
51+
52+
### Meta: Llama 2
53+
| Model | Configuration |
54+
|:------|:-------------|
55+
| `Llama-2-7b-hf` ||
56+
| `Llama-2-7b-chat-hf` ||
57+
| `Llama-2-13b-hf` ||
58+
| `Llama-2-13b-chat-hf` ||
59+
| `Llama-2-70b-hf` ||
60+
| `Llama-2-70b-chat-hf` ||
61+
62+
### Meta: Llama 3
63+
| Model | Configuration |
64+
|:------|:-------------|
65+
| `Meta-Llama-3-8B` ||
66+
| `Meta-Llama-3-8B-Instruct` ||
67+
| `Meta-Llama-3-70B` ||
68+
| `Meta-Llama-3-70B-Instruct` ||
69+
70+
### Meta: Llama 3.1
71+
| Model | Configuration |
72+
|:------|:-------------|
73+
| `Meta-Llama-3.1-8B` ||
74+
| `Meta-Llama-3.1-8B-Instruct` ||
75+
| `Meta-Llama-3.1-70B` ||
76+
| `Meta-Llama-3.1-70B-Instruct` ||
77+
| `Meta-Llama-3.1-405B-Instruct` ||
78+
79+
### Meta: Llama 3.2
80+
| Model | Configuration |
81+
|:------|:-------------|
82+
| `Llama-3.2-1B` ||
83+
| `Llama-3.2-1B-Instruct` ||
84+
| `Llama-3.2-3B` ||
85+
| `Llama-3.2-3B-Instruct` ||
86+
87+
### Meta: Llama 3.3
88+
| Model | Configuration |
89+
|:------|:-------------|
90+
| `Llama-3.3-70B-Instruct` ||
91+
92+
### Meta: Llama 4
93+
| Model | Configuration |
94+
|:------|:-------------|
95+
| `Llama-4-Scout-17B-16E-Instruct` ||
96+
97+
### Mistral AI: Mistral
98+
| Model | Configuration |
99+
|:------|:-------------|
100+
| `Mistral-7B-v0.3` ||
101+
| `Mistral-7B-Instruct-v0.1` ||
102+
| `Mistral-7B-Instruct-v0.2` ||
103+
| `Mistral-7B-Instruct-v0.3` ||
104+
| `Mistral-Large-Instruct-2407` ||
105+
| `Mistral-Large-Instruct-2411` ||
106+
107+
### Mistral AI: Mixtral
108+
| Model | Configuration |
109+
|:------|:-------------|
110+
| `Mixtral-8x7B-Instruct-v0.1` ||
111+
| `Mixtral-8x22B-v0.1` ||
112+
| `Mixtral-8x22B-Instruct-v0.1` ||
113+
114+
### Microsoft: Phi
115+
| Model | Configuration |
116+
|:------|:-------------|
117+
| `Phi-3-medium-128k-instruct` ||
118+
| `phi-4` ||
119+
120+
### Nvidia: Llama-3.1-Nemotron
121+
| Model | Configuration |
122+
|:------|:-------------|
123+
| `Llama-3.1-Nemotron-70B-Instruct-HF` ||
124+
125+
### Qwen: Qwen2.5
126+
| Model | Configuration |
127+
|:------|:-------------|
128+
| `Qwen2.5-0.5B-Instruct` ||
129+
| `Qwen2.5-1.5B-Instruct` ||
130+
| `Qwen2.5-3B-Instruct` ||
131+
| `Qwen2.5-7B-Instruct` ||
132+
| `Qwen2.5-14B-Instruct` ||
133+
| `Qwen2.5-32B-Instruct` ||
134+
| `Qwen2.5-72B-Instruct` ||
135+
136+
### Qwen: Qwen2.5-Math
137+
| Model | Configuration |
138+
|:------|:-------------|
139+
| `Qwen2.5-Math-1.5B-Instruct` ||
140+
| `Qwen2.5-Math-7B-Instruct` ||
141+
| `Qwen2.5-Math-72B-Instruct` ||
142+
143+
### Qwen: Qwen2.5-Coder
144+
| Model | Configuration |
145+
|:------|:-------------|
146+
| `Qwen2.5-Coder-7B-Instruct` ||
147+
148+
### Qwen: QwQ
149+
| Model | Configuration |
150+
|:------|:-------------|
151+
| `QwQ-32B` ||
152+
153+
### Qwen: Qwen2
154+
| Model | Configuration |
155+
|:------|:-------------|
156+
| `Qwen2-1.5B-Instruct` ||
157+
| `Qwen2-7B-Instruct` ||
158+
| `Qwen2-Math-1.5B-Instruct` ||
159+
| `Qwen2-Math-7B-Instruct` ||
160+
| `Qwen2-Math-72B` ||
161+
| `Qwen2-Math-72B-Instruct` ||
162+
| `Qwen2-VL-7B-Instruct` ||
163+
164+
### Qwen: Qwen3
165+
| Model | Configuration |
166+
|:------|:-------------|
167+
| `Qwen3-14B` ||
168+
| `Qwen3-8B` ||
169+
| `Qwen3-32B` ||
170+
| `Qwen3-235B-A22B` ||
171+
| `Qwen3-Embedding-8B` ||
172+
173+
### DeepSeek: DeepSeek-R1
174+
| Model | Configuration |
175+
|:------|:-------------|
176+
| `DeepSeek-R1-Distill-Llama-8B` ||
177+
| `DeepSeek-R1-Distill-Llama-70B` ||
178+
| `DeepSeek-R1-Distill-Qwen-1.5B` ||
179+
| `DeepSeek-R1-Distill-Qwen-7B` ||
180+
| `DeepSeek-R1-Distill-Qwen-14B` ||
181+
| `DeepSeek-R1-Distill-Qwen-32B` ||
182+
183+
### DeepSeek: Other Models
184+
| Model | Configuration |
185+
|:------|:-------------|
186+
| `DeepSeek-Coder-V2-Lite-Instruct` ||
187+
| `deepseek-math-7b-instruct` ||
188+
189+
### Other LLM Models
190+
| Model | Configuration |
191+
|:------|:-------------|
192+
| `AI21-Jamba-1.5-Mini` ||
193+
| `aya-expanse-32b` | ✅ (as Aya-Expanse-32B) |
194+
| `gpt2-large` ||
195+
| `gpt2-xl` ||
196+
| `gpt-oss-120b` ||
197+
| `instructblip-vicuna-7b` ||
198+
| `internlm2-math-plus-7b` ||
199+
| `Janus-Pro-7B` ||
200+
| `Kimi-K2-Instruct` ||
201+
| `Ministral-8B-Instruct-2410` ||
202+
| `Molmo-7B-D-0924` ||
203+
| `OLMo-1B-hf` ||
204+
| `OLMo-7B-hf` ||
205+
| `OLMo-7B-SFT` ||
206+
| `pythia` ||
207+
| `Qwen1.5-72B-Chat` ||
208+
| `ReasonFlux-PRM-7B` ||
209+
| `t5-large-lm-adapt` ||
210+
| `t5-xl-lm-adapt` ||
211+
| `mt5-xl-lm-adapt` ||
212+
213+
---
214+
215+
## Vision Language Models (VLM)
216+
217+
### LLaVa
218+
| Model | Configuration |
219+
|:------|:-------------|
220+
| `llava-1.5-7b-hf` ||
221+
| `llava-1.5-13b-hf` ||
222+
| `llava-v1.6-mistral-7b-hf` ||
223+
| `llava-v1.6-34b-hf` ||
224+
| `llava-med-v1.5-mistral-7b` ||
225+
226+
### Microsoft: Phi 3 Vision
227+
| Model | Configuration |
228+
|:------|:-------------|
229+
| `Phi-3-vision-128k-instruct` ||
230+
| `Phi-3.5-vision-instruct` ||
231+
232+
### Meta: Llama 3.2 Vision
233+
| Model | Configuration |
234+
|:------|:-------------|
235+
| `Llama-3.2-11B-Vision` ||
236+
| `Llama-3.2-11B-Vision-Instruct` ||
237+
| `Llama-3.2-90B-Vision` ||
238+
| `Llama-3.2-90B-Vision-Instruct` ||
239+
240+
### Mistral: Pixtral
241+
| Model | Configuration |
242+
|:------|:-------------|
243+
| `Pixtral-12B-2409` ||
244+
245+
### OpenGVLab: InternVL2.5
246+
| Model | Configuration |
247+
|:------|:-------------|
248+
| `InternVL2_5-8B` ||
249+
| `InternVL2_5-26B` ||
250+
| `InternVL2_5-38B` ||
251+
252+
### THUDM: GLM-4
253+
| Model | Configuration |
254+
|:------|:-------------|
255+
| `glm-4v-9b` ||
256+
257+
### DeepSeek: DeepSeek-VL2
258+
| Model | Configuration |
259+
|:------|:-------------|
260+
| `deepseek-vl2` ||
261+
| `deepseek-vl2-small` ||
262+
263+
### Other VLM Models
264+
| Model | Configuration |
265+
|:------|:-------------|
266+
| `MiniCPM-Llama3-V-2_5` ||
267+
268+
---
269+
270+
## Text Embedding Models
271+
272+
### Liang Wang: e5
273+
| Model | Configuration |
274+
|:------|:-------------|
275+
| `e5-mistral-7b-instruct` ||
276+
277+
### BAAI: bge
278+
| Model | Configuration |
279+
|:------|:-------------|
280+
| `bge-base-en-v1.5` ||
281+
| `bge-m3` ||
282+
| `bge-multilingual-gemma2` ||
283+
284+
### Sentence Transformers: MiniLM
285+
| Model | Configuration |
286+
|:------|:-------------|
287+
| `all-MiniLM-L6-v2` ||
288+
289+
### Other Embedding Models
290+
| Model | Configuration |
291+
|:------|:-------------|
292+
| `data2vec` ||
293+
| `gte-modernbert-base` ||
294+
| `gte-Qwen2-7B-instruct` ||
295+
| `m2-bert-80M-32k-retrieval` ||
296+
| `m2-bert-80M-8k-retrieval` ||
297+
298+
---
299+
300+
## Reward Modeling Models
301+
302+
### Qwen: Qwen2.5-Math
303+
| Model | Configuration |
304+
|:------|:-------------|
305+
| `Qwen2.5-Math-RM-72B` ||
306+
| `Qwen2.5-Math-PRM-7B` ||
307+
308+
---
309+
310+
## Multimodal Models
311+
312+
### CLIP
313+
| Model | Configuration |
314+
|:------|:-------------|
315+
| `clip-vit-base-patch16` ||
316+
| `clip-vit-large-patch14-336` ||
317+
318+
### Stable Diffusion
319+
| Model | Configuration |
320+
|:------|:-------------|
321+
| `sd-v1-4-full-ema` ||
322+
| `stable-diffusion-v1-4` ||
323+
324+
---

0 commit comments

Comments
 (0)