Skip to content

Conversation

aijadugar
Copy link

Summary

This PR adds testing and registry validation for the Perceiver model within the Hugging Face Transformers codebase.

Changes Made

  • Tested a standalone test_registry.py to validate Perceiver model registration and tokenizer mapping.
  • Corrected auto model and tokenizer mappings to ensure perceiver is properly recognized in:
    • modeling_auto.py
    • tokenization_auto.py
  • Verified successful forward pass for PerceiverModel with output shape torch.Size([1, 256, 1280]).
  • Fixed missing config attributes (e.g., input_channels) to enable correct model instantiation.
  • Confirmed tokenizer functionality via PerceiverTokenizer.

Verification

Screenshot 2025-10-08 122041 Screenshot 2025-10-08 121126 Screenshot 2025-10-08 122210

Comment on lines +38 to +47
class PerceptionEncoder(PreTrainedModel):
config_class = PretrainedConfig

def __init__(self, config):
super().__init__(config)
self.dummy_layer = None

def forward(self, x):
return x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is what we want to do. If the model does not exist, we should delete it from modeling_auto which is the case for perception LM

Comment on lines +21 to +34

class ParakeetCTCTokenizer(PreTrainedTokenizerBase):
def __init__(self, vocab_file=None, **kwargs):
super().__init__()
self.vocab_file = vocab_file

def _tokenize(self, text):
return text.split()

def _convert_token_to_id(self, token):
return 0

def _convert_id_to_token(self, index):
return ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we can set the slow tokenizer to None if the model has only fast tokenizer (e..g see Chameleon)

@aijadugar
Copy link
Author

I couldn't find the what I need to do...

@zucchini-nlp
Copy link
Member

sorry, PerceiverLM was already fixed in another PR

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, parakeet, perception_lm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants