Skip to content

Conversation

wbruna
Copy link
Contributor

@wbruna wbruna commented Oct 5, 2025

For #851 . Allow the model loading logic to tolerate missing layers, which is enough to run the 12B Pruning variant:

https://huggingface.co/OPPOer/Qwen-Image-Pruning

Tested with the Q4_K_M quant from https://huggingface.co/wsbagnsv1/Qwen-Image-Pruning-GGUF :

teste_1759693079

@wbruna
Copy link
Contributor Author

wbruna commented Oct 5, 2025

Quality seems a little worse than the Lightning model, with ~30% less peak VRAM usage, and similar speed gains.

wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 6, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant