Skip to content

Fix AutoAWQ -> MLX scales and bias dtype mismatch. Fixes #33.#38

Merged
liang2kl merged 1 commit into
z-lab:mainfrom
jeethu:mlx-bugfix
Apr 23, 2026
Merged

Fix AutoAWQ -> MLX scales and bias dtype mismatch. Fixes #33.#38
liang2kl merged 1 commit into
z-lab:mainfrom
jeethu:mlx-bugfix

Conversation

@jeethu
Copy link
Copy Markdown
Contributor

@jeethu jeethu commented Apr 23, 2026

_convert_awq_linear emitted fp32 scales and fp16 biases, but mx.quantized_matmul requires matching dtypes for the pair. This mismatch produced NaN logits on Apple Silicon, causing generation to stream token 0 ("!") for AutoAWQ paroquant checkpoints.

Text generation works fine after applying this 2 line fix.
image

@jeethu jeethu mentioned this pull request Apr 23, 2026
@liang2kl liang2kl self-requested a review April 23, 2026 12:25
Copy link
Copy Markdown
Member

@liang2kl liang2kl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This should be the change in #19 that breaks the inference (as also pointed out in ml-explore/mlx#3434).

@liang2kl liang2kl linked an issue Apr 23, 2026 that may be closed by this pull request
@liang2kl liang2kl merged commit d8db28d into z-lab:main Apr 23, 2026
@jeethu jeethu deleted the mlx-bugfix branch April 23, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fail to run

2 participants