Improve Gemma3n model and tests #39764

manueldeprada · 2025-07-29T17:58:29Z

Improves the Gemma3n model and tests by:

Remove hardcoded number of layers in the activation sparsity init.
Better explanation for layer reuse.
Enable and update integration tests.
Removing unused pan and scan configuration options from ImageProcessor.
Skipping some incompatible tests.

…ache-len-fix

…transformers into max-cache-len-fix

src/transformers/models/gemma3n/configuration_gemma3n.py

manueldeprada · 2025-07-29T18:00:33Z

tests/models/gemma3n/test_modeling_gemma3n.py

@@ -659,7 +658,6 @@ def test_automodelforcausallm(self):
            self.assertIsInstance(for_causal_lm, Gemma3nForCausalLM)


-@unittest.skip("Skipped for now!")


these tests were copied from gemma3 and were skipped. I updated and enabled them.

src/transformers/models/gemma3n/processing_gemma3n.py

HuggingFaceDocBuilderDev · 2025-07-29T18:32:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-07-30T08:22:40Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3', 'models/gemma3n']
quantizations: [] ...

manueldeprada · 2025-08-06T10:11:51Z

run-slow: gemma3n

Cyrilvallez

Once again, sorry for the delay! Currently catching up on reviews! Alright, thanks! Feel free to merge after solving conflicts and you think the tests are good enough!

tests/models/gemma3n/test_modeling_gemma3n.py

vasqu · 2025-08-18T12:49:46Z

@manueldeprada can you check the tests with #40163? SWA changes so some values might change here as well but I think this is the more appropriate PR to do this 👀

…fixes

…3n-fixes

manueldeprada · 2025-08-20T17:14:52Z

run-slow: gemma3n

github-actions · 2025-08-20T17:16:17Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3n']
quantizations: [] ...

…sformers into gemma3n-fixes

Cyrilvallez

Nice! We can still improve the sharing a bit though, to stop wasting memory. It's not ideal to write to the Cache like this, but alright for now as it actually fixes the cropped states issue. Can you check how it behaves with compile and static cache when you're done with the changes?

Cyrilvallez · 2025-08-21T13:57:15Z

src/transformers/models/gemma3n/modeling_gemma3n.py

@@ -1325,21 +1329,14 @@ def forward(
        query_states = apply_rotary_pos_emb(query_states, cos, sin, unsqueeze_dim=2)
        query_states = query_states.transpose(1, 2)

+        # For layers with shared KV (from kv sharing point onwards), we reuse the cached keys/values from the previous layer.
+        # During prefill, cache_position is a full range [0, 1, ..., max_cache_len-1], but in autoregressive mode it's a single position [last_token_idx].
+        # For sliding window layers, we must clamp or slice indices to the cache's max length to avoid out-of-bounds access.
        if self.is_kv_shared_layer and self.kv_shared_layer_index is not None and past_key_values is not None:


Unless I'm mistaken, we can simplify the path here

Suggested change

if self.is_kv_shared_layer and self.kv_shared_layer_index is not None and past_key_values is not None:

if self.is_kv_shared_layer and past_key_values is not None:

Cyrilvallez · 2025-08-21T14:04:14Z

src/transformers/models/gemma3n/modeling_gemma3n.py

+            if self.store_full_length_kv:
+                if not hasattr(past_key_values, "shared_layers"):
+                    past_key_values.shared_layers = {}
+                past_key_values.shared_layers[self.layer_idx] = key_states, value_states


I believe this should be done after update is called no?

src/transformers/models/gemma3n/modeling_gemma3n.py

github-actions · 2025-08-21T14:56:31Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3n

manueldeprada and others added 12 commits July 28, 2025 18:42

fix gemma

40c604b

fix min

d143de4

fix quant init issue

404208a

Merge branch 'main' of github.com:huggingface/transformers into max-c…

8aff749

…ache-len-fix

fix gemma 3n

ee1fe17

Merge branch 'max-cache-len-fix' of https://github.com/manueldeprada/…

31b1bbe

…transformers into max-cache-len-fix

skip quant cache test

82a2c5f

fix modular

e4e6cc7

new test for Gemma

ffb2c61

include cyril change

e3ca2a3

gemma3n tests and code improvements

4243098

Merge branch main

83f2599

manueldeprada commented Jul 29, 2025

View reviewed changes

src/transformers/models/gemma3n/configuration_gemma3n.py Show resolved Hide resolved

manueldeprada commented Jul 29, 2025

View reviewed changes

manueldeprada added 2 commits July 29, 2025 20:10

modular fix

501f651

opsie

49d52d7

manueldeprada commented Jul 29, 2025

View reviewed changes

src/transformers/models/gemma3n/processing_gemma3n.py Show resolved Hide resolved

manueldeprada added 3 commits July 29, 2025 20:59

modular

3f630b8

modular

b23e3ca

fix audio

7d79307

huggingface deleted a comment from github-actions bot Jul 30, 2025

manueldeprada added 6 commits July 30, 2025 12:06

fix test, remove sliding_window_pattern mention

afeca3b

add flash_attn pytest marks

b8f7f09

ops docstring

9d4ecb6

ops

8131164

add cleanup

dd77392

try to fix OOMs

c9ca022

cyril review

1c5b653

Cyrilvallez approved these changes Aug 12, 2025

View reviewed changes

tests/models/gemma3n/test_modeling_gemma3n.py Show resolved Hide resolved

tests/models/gemma3n/test_modeling_gemma3n.py Outdated Show resolved Hide resolved

vasqu mentioned this pull request Aug 18, 2025

🚨 [Flash Attention] Fix sliding window size #40163

Merged

cyril review

c12d304

manueldeprada force-pushed the gemma3n-fixes branch from 3d2c73b to c12d304 Compare August 18, 2025 15:10

manueldeprada added 7 commits August 18, 2025 17:32

Merge commit 'cf243a1bf85e2197dac2cfc1f9b23c0e99493fa2' into gemma3n-…

5d4785a

…fixes

Merge commit '95510ab0182b6581822c55472cd53e85daa3379b' into gemma3n-…

eb88eb0

…fixes

Merge commit 'dc11a3cbb2c6cd96986519a144d4a22610fd8487' into gemma3n-…

e9bcc4c

…fixes

Merge commit 'a1a4fcd03e3455772415e6400fee91f3159e7ac5' into gemma3n-…

abd6c5d

…fixes

Merge commit 'e4223fa9150580beca9a3ae5fc72e0e1ef20fe37' into gemma3n-…

df1cf46

…fixes

Merge commit '5337f3052db90e8f5f8f64afcbf257da603d56fb' into gemma3n-…

3eda455

…fixes

Merge branch 'main' of github.com:huggingface/transformers into gemma…

84b80af

…3n-fixes

manueldeprada force-pushed the gemma3n-fixes branch from e7f3cc8 to 5263b2e Compare August 20, 2025 16:42

fix gemma3n cache layer sharing

46cd717

manueldeprada force-pushed the gemma3n-fixes branch from 5263b2e to 46cd717 Compare August 20, 2025 16:50

Merge branch 'main' into gemma3n-fixes

c7047e3

huggingface deleted a comment from github-actions bot Aug 20, 2025

manueldeprada added 2 commits August 21, 2025 10:08

fix tests

32e252d

Merge branch 'gemma3n-fixes' of https://github.com/manueldeprada/tran…

d083330

…sformers into gemma3n-fixes

Cyrilvallez reviewed Aug 21, 2025

View reviewed changes

no cache update on top layers

4f69ef3

manueldeprada added 2 commits August 21, 2025 17:06

style

0436f99

fix

db221f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Gemma3n model and tests #39764

Improve Gemma3n model and tests #39764

manueldeprada commented Jul 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

manueldeprada Jul 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

manueldeprada commented Aug 6, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Uh oh!

vasqu commented Aug 18, 2025

Uh oh!

manueldeprada commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Aug 21, 2025

Uh oh!

Cyrilvallez Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Uh oh!

		@@ -659,7 +658,6 @@ def test_automodelforcausallm(self):
		self.assertIsInstance(for_causal_lm, Gemma3nForCausalLM)


		@unittest.skip("Skipped for now!")

	if self.is_kv_shared_layer and self.kv_shared_layer_index is not None and past_key_values is not None:
	if self.is_kv_shared_layer and past_key_values is not None:

Improve Gemma3n model and tests #39764

Are you sure you want to change the base?

Improve Gemma3n model and tests #39764

Conversation

manueldeprada commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

manueldeprada Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

manueldeprada commented Aug 6, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu commented Aug 18, 2025

Uh oh!

manueldeprada commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Uh oh!

manueldeprada commented Jul 29, 2025 •

edited

Loading

manueldeprada Jul 29, 2025 •

edited

Loading