|
| 1 | +# Model Weights Tracking |
| 2 | + |
| 3 | +This document tracks all model weights available in the `/model-weights` directory on Killarney cluster and indicates which ones have existing configurations in the cached model config (`/model-weights/vec-inf-shared/models.yaml`). By default, `vec-inf` would use the cached model config. To request new model weights to be downloaded or model configuration to be added, please open an issue for "Model request". |
| 4 | + |
| 5 | +**NOTE**: The [`models.yaml`](./vec_inf/config/models.yaml) file in the package is not always up to date with the latest cached model config on Killarney cluster, new model config would be added to the cached model config. `models.yaml` would be updated to reflect the cached model config when a new version of the package is released. |
| 6 | + |
| 7 | +## Legend |
| 8 | +- ✅ **Configured**: Model has a complete configuration in `models.yaml` |
| 9 | +- ❌ **Not Configured**: Model exists in `/model-weights` but lacks configuration |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Text Generation Models (LLM) |
| 14 | + |
| 15 | +### Cohere for AI: Command R |
| 16 | +| Model | Configuration | |
| 17 | +|:------|:-------------| |
| 18 | +| `c4ai-command-r-plus-08-2024` | ✅ | |
| 19 | +| `c4ai-command-r-08-2024` | ✅ | |
| 20 | + |
| 21 | +### Code Llama |
| 22 | +| Model | Configuration | |
| 23 | +|:------|:-------------| |
| 24 | +| `CodeLlama-7b-hf` | ✅ | |
| 25 | +| `CodeLlama-7b-Instruct-hf` | ✅ | |
| 26 | +| `CodeLlama-13b-hf` | ✅ | |
| 27 | +| `CodeLlama-13b-Instruct-hf` | ✅ | |
| 28 | +| `CodeLlama-34b-hf` | ✅ | |
| 29 | +| `CodeLlama-34b-Instruct-hf` | ✅ | |
| 30 | +| `CodeLlama-70b-hf` | ✅ | |
| 31 | +| `CodeLlama-70b-Instruct-hf` | ✅ | |
| 32 | +| `CodeLlama-7b-Python-hf` | ❌ | |
| 33 | +| `CodeLlama-13b-Python-hf` | ❌ | |
| 34 | +| `CodeLlama-70b-Python-hf` | ❌ | |
| 35 | + |
| 36 | +### Google: Gemma |
| 37 | +| Model | Configuration | |
| 38 | +|:------|:-------------| |
| 39 | +| `gemma-2b` | ❌ | |
| 40 | +| `gemma-2b-it` | ❌ | |
| 41 | +| `gemma-7b` | ❌ | |
| 42 | +| `gemma-7b-it` | ❌ | |
| 43 | +| `gemma-2-9b` | ✅ | |
| 44 | +| `gemma-2-9b-it` | ✅ | |
| 45 | +| `gemma-2-27b` | ✅ | |
| 46 | +| `gemma-2-27b-it` | ✅ | |
| 47 | +| `gemma-3-1b-it` | ❌ | |
| 48 | +| `gemma-3-4b-it` | ❌ | |
| 49 | +| `gemma-3-12b-it` | ❌ | |
| 50 | +| `gemma-3-27b-it` | ❌ | |
| 51 | + |
| 52 | +### Meta: Llama 2 |
| 53 | +| Model | Configuration | |
| 54 | +|:------|:-------------| |
| 55 | +| `Llama-2-7b-hf` | ✅ | |
| 56 | +| `Llama-2-7b-chat-hf` | ✅ | |
| 57 | +| `Llama-2-13b-hf` | ✅ | |
| 58 | +| `Llama-2-13b-chat-hf` | ✅ | |
| 59 | +| `Llama-2-70b-hf` | ✅ | |
| 60 | +| `Llama-2-70b-chat-hf` | ✅ | |
| 61 | + |
| 62 | +### Meta: Llama 3 |
| 63 | +| Model | Configuration | |
| 64 | +|:------|:-------------| |
| 65 | +| `Meta-Llama-3-8B` | ✅ | |
| 66 | +| `Meta-Llama-3-8B-Instruct` | ✅ | |
| 67 | +| `Meta-Llama-3-70B` | ✅ | |
| 68 | +| `Meta-Llama-3-70B-Instruct` | ✅ | |
| 69 | + |
| 70 | +### Meta: Llama 3.1 |
| 71 | +| Model | Configuration | |
| 72 | +|:------|:-------------| |
| 73 | +| `Meta-Llama-3.1-8B` | ✅ | |
| 74 | +| `Meta-Llama-3.1-8B-Instruct` | ✅ | |
| 75 | +| `Meta-Llama-3.1-70B` | ✅ | |
| 76 | +| `Meta-Llama-3.1-70B-Instruct` | ✅ | |
| 77 | +| `Meta-Llama-3.1-405B-Instruct` | ✅ | |
| 78 | + |
| 79 | +### Meta: Llama 3.2 |
| 80 | +| Model | Configuration | |
| 81 | +|:------|:-------------| |
| 82 | +| `Llama-3.2-1B` | ✅ | |
| 83 | +| `Llama-3.2-1B-Instruct` | ✅ | |
| 84 | +| `Llama-3.2-3B` | ✅ | |
| 85 | +| `Llama-3.2-3B-Instruct` | ✅ | |
| 86 | + |
| 87 | +### Meta: Llama 3.3 |
| 88 | +| Model | Configuration | |
| 89 | +|:------|:-------------| |
| 90 | +| `Llama-3.3-70B-Instruct` | ✅ | |
| 91 | + |
| 92 | +### Meta: Llama 4 |
| 93 | +| Model | Configuration | |
| 94 | +|:------|:-------------| |
| 95 | +| `Llama-4-Scout-17B-16E-Instruct` | ❌ | |
| 96 | + |
| 97 | +### Mistral AI: Mistral |
| 98 | +| Model | Configuration | |
| 99 | +|:------|:-------------| |
| 100 | +| `Mistral-7B-v0.3` | ✅ | |
| 101 | +| `Mistral-7B-Instruct-v0.1` | ✅ | |
| 102 | +| `Mistral-7B-Instruct-v0.2` | ✅ | |
| 103 | +| `Mistral-7B-Instruct-v0.3` | ✅ | |
| 104 | +| `Mistral-Large-Instruct-2407` | ✅ | |
| 105 | +| `Mistral-Large-Instruct-2411` | ✅ | |
| 106 | + |
| 107 | +### Mistral AI: Mixtral |
| 108 | +| Model | Configuration | |
| 109 | +|:------|:-------------| |
| 110 | +| `Mixtral-8x7B-Instruct-v0.1` | ✅ | |
| 111 | +| `Mixtral-8x22B-v0.1` | ✅ | |
| 112 | +| `Mixtral-8x22B-Instruct-v0.1` | ✅ | |
| 113 | + |
| 114 | +### Microsoft: Phi |
| 115 | +| Model | Configuration | |
| 116 | +|:------|:-------------| |
| 117 | +| `Phi-3-medium-128k-instruct` | ✅ | |
| 118 | +| `phi-4` | ❌ | |
| 119 | + |
| 120 | +### Nvidia: Llama-3.1-Nemotron |
| 121 | +| Model | Configuration | |
| 122 | +|:------|:-------------| |
| 123 | +| `Llama-3.1-Nemotron-70B-Instruct-HF` | ✅ | |
| 124 | + |
| 125 | +### Qwen: Qwen2.5 |
| 126 | +| Model | Configuration | |
| 127 | +|:------|:-------------| |
| 128 | +| `Qwen2.5-0.5B-Instruct` | ✅ | |
| 129 | +| `Qwen2.5-1.5B-Instruct` | ✅ | |
| 130 | +| `Qwen2.5-3B-Instruct` | ✅ | |
| 131 | +| `Qwen2.5-7B-Instruct` | ✅ | |
| 132 | +| `Qwen2.5-14B-Instruct` | ✅ | |
| 133 | +| `Qwen2.5-32B-Instruct` | ✅ | |
| 134 | +| `Qwen2.5-72B-Instruct` | ✅ | |
| 135 | + |
| 136 | +### Qwen: Qwen2.5-Math |
| 137 | +| Model | Configuration | |
| 138 | +|:------|:-------------| |
| 139 | +| `Qwen2.5-Math-1.5B-Instruct` | ✅ | |
| 140 | +| `Qwen2.5-Math-7B-Instruct` | ✅ | |
| 141 | +| `Qwen2.5-Math-72B-Instruct` | ✅ | |
| 142 | + |
| 143 | +### Qwen: Qwen2.5-Coder |
| 144 | +| Model | Configuration | |
| 145 | +|:------|:-------------| |
| 146 | +| `Qwen2.5-Coder-7B-Instruct` | ✅ | |
| 147 | + |
| 148 | +### Qwen: QwQ |
| 149 | +| Model | Configuration | |
| 150 | +|:------|:-------------| |
| 151 | +| `QwQ-32B` | ✅ | |
| 152 | + |
| 153 | +### Qwen: Qwen2 |
| 154 | +| Model | Configuration | |
| 155 | +|:------|:-------------| |
| 156 | +| `Qwen2-1.5B-Instruct` | ❌ | |
| 157 | +| `Qwen2-7B-Instruct` | ❌ | |
| 158 | +| `Qwen2-Math-1.5B-Instruct` | ❌ | |
| 159 | +| `Qwen2-Math-7B-Instruct` | ❌ | |
| 160 | +| `Qwen2-Math-72B` | ❌ | |
| 161 | +| `Qwen2-Math-72B-Instruct` | ❌ | |
| 162 | +| `Qwen2-VL-7B-Instruct` | ❌ | |
| 163 | + |
| 164 | +### Qwen: Qwen3 |
| 165 | +| Model | Configuration | |
| 166 | +|:------|:-------------| |
| 167 | +| `Qwen3-14B` | ✅ | |
| 168 | +| `Qwen3-8B` | ❌ | |
| 169 | +| `Qwen3-32B` | ❌ | |
| 170 | +| `Qwen3-235B-A22B` | ❌ | |
| 171 | +| `Qwen3-Embedding-8B` | ❌ | |
| 172 | + |
| 173 | +### DeepSeek: DeepSeek-R1 |
| 174 | +| Model | Configuration | |
| 175 | +|:------|:-------------| |
| 176 | +| `DeepSeek-R1-Distill-Llama-8B` | ✅ | |
| 177 | +| `DeepSeek-R1-Distill-Llama-70B` | ✅ | |
| 178 | +| `DeepSeek-R1-Distill-Qwen-1.5B` | ✅ | |
| 179 | +| `DeepSeek-R1-Distill-Qwen-7B` | ✅ | |
| 180 | +| `DeepSeek-R1-Distill-Qwen-14B` | ✅ | |
| 181 | +| `DeepSeek-R1-Distill-Qwen-32B` | ✅ | |
| 182 | + |
| 183 | +### DeepSeek: Other Models |
| 184 | +| Model | Configuration | |
| 185 | +|:------|:-------------| |
| 186 | +| `DeepSeek-Coder-V2-Lite-Instruct` | ❌ | |
| 187 | +| `deepseek-math-7b-instruct` | ❌ | |
| 188 | + |
| 189 | +### Other LLM Models |
| 190 | +| Model | Configuration | |
| 191 | +|:------|:-------------| |
| 192 | +| `AI21-Jamba-1.5-Mini` | ❌ | |
| 193 | +| `aya-expanse-32b` | ✅ (as Aya-Expanse-32B) | |
| 194 | +| `gpt2-large` | ❌ | |
| 195 | +| `gpt2-xl` | ❌ | |
| 196 | +| `gpt-oss-120b` | ❌ | |
| 197 | +| `instructblip-vicuna-7b` | ❌ | |
| 198 | +| `internlm2-math-plus-7b` | ❌ | |
| 199 | +| `Janus-Pro-7B` | ❌ | |
| 200 | +| `Kimi-K2-Instruct` | ❌ | |
| 201 | +| `Ministral-8B-Instruct-2410` | ❌ | |
| 202 | +| `Molmo-7B-D-0924` | ✅ | |
| 203 | +| `OLMo-1B-hf` | ❌ | |
| 204 | +| `OLMo-7B-hf` | ❌ | |
| 205 | +| `OLMo-7B-SFT` | ❌ | |
| 206 | +| `pythia` | ❌ | |
| 207 | +| `Qwen1.5-72B-Chat` | ❌ | |
| 208 | +| `ReasonFlux-PRM-7B` | ❌ | |
| 209 | +| `t5-large-lm-adapt` | ❌ | |
| 210 | +| `t5-xl-lm-adapt` | ❌ | |
| 211 | +| `mt5-xl-lm-adapt` | ❌ | |
| 212 | + |
| 213 | +--- |
| 214 | + |
| 215 | +## Vision Language Models (VLM) |
| 216 | + |
| 217 | +### LLaVa |
| 218 | +| Model | Configuration | |
| 219 | +|:------|:-------------| |
| 220 | +| `llava-1.5-7b-hf` | ✅ | |
| 221 | +| `llava-1.5-13b-hf` | ✅ | |
| 222 | +| `llava-v1.6-mistral-7b-hf` | ✅ | |
| 223 | +| `llava-v1.6-34b-hf` | ✅ | |
| 224 | +| `llava-med-v1.5-mistral-7b` | ❌ | |
| 225 | + |
| 226 | +### Microsoft: Phi 3 Vision |
| 227 | +| Model | Configuration | |
| 228 | +|:------|:-------------| |
| 229 | +| `Phi-3-vision-128k-instruct` | ✅ | |
| 230 | +| `Phi-3.5-vision-instruct` | ✅ | |
| 231 | + |
| 232 | +### Meta: Llama 3.2 Vision |
| 233 | +| Model | Configuration | |
| 234 | +|:------|:-------------| |
| 235 | +| `Llama-3.2-11B-Vision` | ✅ | |
| 236 | +| `Llama-3.2-11B-Vision-Instruct` | ✅ | |
| 237 | +| `Llama-3.2-90B-Vision` | ✅ | |
| 238 | +| `Llama-3.2-90B-Vision-Instruct` | ✅ | |
| 239 | + |
| 240 | +### Mistral: Pixtral |
| 241 | +| Model | Configuration | |
| 242 | +|:------|:-------------| |
| 243 | +| `Pixtral-12B-2409` | ✅ | |
| 244 | + |
| 245 | +### OpenGVLab: InternVL2.5 |
| 246 | +| Model | Configuration | |
| 247 | +|:------|:-------------| |
| 248 | +| `InternVL2_5-8B` | ✅ | |
| 249 | +| `InternVL2_5-26B` | ✅ | |
| 250 | +| `InternVL2_5-38B` | ✅ | |
| 251 | + |
| 252 | +### THUDM: GLM-4 |
| 253 | +| Model | Configuration | |
| 254 | +|:------|:-------------| |
| 255 | +| `glm-4v-9b` | ✅ | |
| 256 | + |
| 257 | +### DeepSeek: DeepSeek-VL2 |
| 258 | +| Model | Configuration | |
| 259 | +|:------|:-------------| |
| 260 | +| `deepseek-vl2` | ✅ | |
| 261 | +| `deepseek-vl2-small` | ✅ | |
| 262 | + |
| 263 | +### Other VLM Models |
| 264 | +| Model | Configuration | |
| 265 | +|:------|:-------------| |
| 266 | +| `MiniCPM-Llama3-V-2_5` | ❌ | |
| 267 | + |
| 268 | +--- |
| 269 | + |
| 270 | +## Text Embedding Models |
| 271 | + |
| 272 | +### Liang Wang: e5 |
| 273 | +| Model | Configuration | |
| 274 | +|:------|:-------------| |
| 275 | +| `e5-mistral-7b-instruct` | ✅ | |
| 276 | + |
| 277 | +### BAAI: bge |
| 278 | +| Model | Configuration | |
| 279 | +|:------|:-------------| |
| 280 | +| `bge-base-en-v1.5` | ✅ | |
| 281 | +| `bge-m3` | ❌ | |
| 282 | +| `bge-multilingual-gemma2` | ❌ | |
| 283 | + |
| 284 | +### Sentence Transformers: MiniLM |
| 285 | +| Model | Configuration | |
| 286 | +|:------|:-------------| |
| 287 | +| `all-MiniLM-L6-v2` | ✅ | |
| 288 | + |
| 289 | +### Other Embedding Models |
| 290 | +| Model | Configuration | |
| 291 | +|:------|:-------------| |
| 292 | +| `data2vec` | ❌ | |
| 293 | +| `gte-modernbert-base` | ❌ | |
| 294 | +| `gte-Qwen2-7B-instruct` | ❌ | |
| 295 | +| `m2-bert-80M-32k-retrieval` | ❌ | |
| 296 | +| `m2-bert-80M-8k-retrieval` | ❌ | |
| 297 | + |
| 298 | +--- |
| 299 | + |
| 300 | +## Reward Modeling Models |
| 301 | + |
| 302 | +### Qwen: Qwen2.5-Math |
| 303 | +| Model | Configuration | |
| 304 | +|:------|:-------------| |
| 305 | +| `Qwen2.5-Math-RM-72B` | ✅ | |
| 306 | +| `Qwen2.5-Math-PRM-7B` | ✅ | |
| 307 | + |
| 308 | +--- |
| 309 | + |
| 310 | +## Multimodal Models |
| 311 | + |
| 312 | +### CLIP |
| 313 | +| Model | Configuration | |
| 314 | +|:------|:-------------| |
| 315 | +| `clip-vit-base-patch16` | ❌ | |
| 316 | +| `clip-vit-large-patch14-336` | ❌ | |
| 317 | + |
| 318 | +### Stable Diffusion |
| 319 | +| Model | Configuration | |
| 320 | +|:------|:-------------| |
| 321 | +| `sd-v1-4-full-ema` | ❌ | |
| 322 | +| `stable-diffusion-v1-4` | ❌ | |
| 323 | + |
| 324 | +--- |
0 commit comments