Skip to content

⚠️ 🔴 Add ministral model #40247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

manueldeprada
Copy link
Contributor

@manueldeprada manueldeprada commented Aug 18, 2025

Coming from #39799, adding Ministral model to support its interleaved attention.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@manueldeprada
Copy link
Contributor Author

manueldeprada commented Aug 20, 2025

still have to check the slow tests, but rest is ready @ArthurZucker

@manueldeprada manueldeprada marked this pull request as ready for review August 20, 2025 17:16
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@manueldeprada manueldeprada removed the request for review from Rocketknight1 August 21, 2025 13:27
@huggingface huggingface deleted a comment from github-actions bot Aug 21, 2025
@manueldeprada
Copy link
Contributor Author

run-slow: ministral

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/ministral']
quantizations: [] ...

@manueldeprada
Copy link
Contributor Author

run-slow: ministral

@huggingface huggingface deleted a comment from github-actions bot Aug 22, 2025
@manueldeprada
Copy link
Contributor Author

@ArthurZucker tests are ready. One note: after this merges, if i open a pr in mistralai/Ministral-8B-Instruct-2410 to change the architecture to MinistralForCausalLM, older transformers versions won’t load the model. Are we ok with this? Is there a preferred way of handling these situations? maybe a warning in the readme?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!
Regarding BC, we are kinda obliged to break with a proper ⚠️ 🔴 on PR title should be alirght

@manueldeprada manueldeprada changed the title Add ministral model ⚠️ 🔴 Add ministral model Aug 25, 2025
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ministral

@manueldeprada
Copy link
Contributor Author

run-slow: auto, ministral

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/auto', 'models/ministral']
quantizations: [] ...

Comment on lines +375 to +377
# The sliding window alternating layers are not always activated depending on the config
if self.has_sliding_layers:
causal_mask_mapping["sliding_attention"] = create_sliding_window_causal_mask(**mask_kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ministral must have siliding and non sliding! let's update the forwaard to make sur it always creates it!

@ArthurZucker
Copy link
Collaborator

maybe a warning in the readme?

Yep we can have that!
Older version will load Mistral instead but should be fine it's not breaking

@manueldeprada
Copy link
Contributor Author

manueldeprada commented Aug 25, 2025

Yep we can have that! Older version will load Mistral instead but should be fine it's not breaking

It is breaking: once we change mistralai/Ministral-8B-Instruct-2410/config.json to be

{
"architectures": [
"MiNIstralForCausalLM"
], ...

old versions will not recognize that architecture. Thats why I was worried 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants