[MODEL] support qwen3.5 series w/o vision#869
Conversation
Co-authored-by: johnmai-dev <johnmai-dev@users.noreply.github.com>
|
Do we need to add support for qwen3_5_text and qwen3_5_moe_text? |
imo it’s not need, because we’ll only release the vlm ckpts, so just following previous vlms e.g. qwen3vlmoe should be ok. |
mlx_lm/models/qwen3_5.py
Outdated
| if any(k.endswith(sfx) for sfx in norm_keys): | ||
| if v.ndim == 1: | ||
| weights[k] = v + 1.0 |
There was a problem hiding this comment.
I think this is a bug. The sanitize function is called every time a mlx model is loaded so if you do convert the model (which will call sanitize) then run it (which will call sanitize) you will add 1.0 to these values twice.
Instead we should only apply this scaling once. An easy way to do that is to have a condition which can tell you if the model has already been sanitized. (For example if the "mpt" layer is in the weights or something).
There was a problem hiding this comment.
got it, update the sanitize logic and add a test🫡
|
@JJJYmmm should we go ahead and merge this? Have you tested it on an actual model yet? |
|
I’ve tested it on preview ckpts, so it’s fine to merge now. I’ll check if it still works when the official version drops. 🤗 |
This PR adds model support for the upcoming Qwen3.5 models, including both dense and MoE variants.
It's a refine version of #861 by @johnmai-dev.
Reference HF implementation - huggingface/transformers#43830