Skip to content

Comments

[MODEL] support qwen3.5 series w/o vision#869

Merged
awni merged 5 commits intoml-explore:mainfrom
JJJYmmm:add_qwen3_5
Feb 12, 2026
Merged

[MODEL] support qwen3.5 series w/o vision#869
awni merged 5 commits intoml-explore:mainfrom
JJJYmmm:add_qwen3_5

Conversation

@JJJYmmm
Copy link
Contributor

@JJJYmmm JJJYmmm commented Feb 10, 2026

This PR adds model support for the upcoming Qwen3.5 models, including both dense and MoE variants.

It's a refine version of #861 by @johnmai-dev.

Reference HF implementation - huggingface/transformers#43830

JJJYmmm and others added 2 commits February 11, 2026 01:11
Co-authored-by: johnmai-dev <johnmai-dev@users.noreply.github.com>
@johnmai-dev johnmai-dev mentioned this pull request Feb 10, 2026
1 task
@johnmai-dev
Copy link
Contributor

Do we need to add support for qwen3_5_text and qwen3_5_moe_text?

https://github.com/huggingface/transformers/blob/42791a34fdeae197f60f11ace3807c81f44b0729/src/transformers/models/auto/modeling_auto.py#L356-L357

@JJJYmmm
Copy link
Contributor Author

JJJYmmm commented Feb 10, 2026

Do we need to add support for qwen3_5_text and qwen3_5_moe_text?

https://github.com/huggingface/transformers/blob/42791a34fdeae197f60f11ace3807c81f44b0729/src/transformers/models/auto/modeling_auto.py#L356-L357

imo it’s not need, because we’ll only release the vlm ckpts, so just following previous vlms e.g. qwen3vlmoe should be ok.

Comment on lines 305 to 307
if any(k.endswith(sfx) for sfx in norm_keys):
if v.ndim == 1:
weights[k] = v + 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug. The sanitize function is called every time a mlx model is loaded so if you do convert the model (which will call sanitize) then run it (which will call sanitize) you will add 1.0 to these values twice.

Instead we should only apply this scaling once. An easy way to do that is to have a condition which can tell you if the model has already been sanitized. (For example if the "mpt" layer is in the weights or something).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, update the sanitize logic and add a test🫡

@awni awni mentioned this pull request Feb 10, 2026
Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@awni
Copy link
Member

awni commented Feb 11, 2026

@JJJYmmm should we go ahead and merge this? Have you tested it on an actual model yet?

@JJJYmmm
Copy link
Contributor Author

JJJYmmm commented Feb 11, 2026

I’ve tested it on preview ckpts, so it’s fine to merge now. I’ll check if it still works when the official version drops. 🤗

@awni awni merged commit 0fd3126 into ml-explore:main Feb 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants