Skip to content

Quantization both talker and thinker failed for QWEN2.5-OMNI #2121

@LiMa-cas

Description

@LiMa-cas

For OMNI-2.5, I separately quantized the "thinker" and the "talker" (the entire model). But when I merge them (thinkermodel.talker = talker), inference fails reporting that the "thinker" is affected and the output contains NaNs. Could the author please advise on the correct way to quantize them together? Is there a plan to quantize all linear layers?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions