-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Closed
Labels
Description
System Info
I'm trying to use a Gemma3 model (non-instruction tuned) for a classification task. I was glad that I saw that the model seems to be supported in the current code for this task: #39465
When trying
model = transformers.AutoModelForSequenceClassification.from_pretrained("google/gemma-3-4b-pt")
it essentially reports the model as being uninitialized (it lists both the vision
and language_model
weights), which is unexpected:
[...], 'model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias', 'model.vision_
tower.vision_model.encoder.layers.9.self_attn.q_proj.weight', 'model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias', 'model.vision_tower.vision_model.enc
oder.layers.9.self_attn.v_proj.weight', 'model.vision_tower.vision_model.post_layernorm.bias', 'model.vision_tower.vision_model.post_layernorm.weight', 'score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
My transformers env
:
transformers
version: 4.55.0.dev0- Platform: Linux-6.8.0-60-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.34.3
- Safetensors version: 0.5.3
- Accelerate version: 1.9.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA RTX A6000
I installed the current HEAD
(abf101a) of the transformer repo via uv
.
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
model = transformers.AutoModelForSequenceClassification.from_pretrained("google/gemma-3-4b-pt")
Expected behavior
Loaded model with initialized weights.