[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

max410011 · 2025-07-14T17:56:04Z

Summary

This PR fixes the activation quantization issue described in Issue #394, where the input scale shape was incorrect when using the Dynamic TOKEN strategy.

Fix

Corrected the reduction dimensions to ensure only the hidden dimension is reduced.
This ensures the input scale shape is (batch_size, seq_len, 1) instead of (1, seq_len, hidden_dim).

brian-dellabetta · 2025-07-15T21:53:34Z

Hi @max410011 , appreciate the thorough detail in the issue! I tried your PR, and both original main and your branch seem to work, the resultant models can be loaded up and run in vllm, which surprises me. This is some old code, and per-token/per-channel always slips me up. I will ask around to see if your reasoning in the issue description is correct.

Fix per-token dynamic quant

b8c5a91

max410011 mentioned this pull request Jul 14, 2025

Unexpected Input Scale Shape for Dynamic Per-Token Activation Quantization #394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

Uh oh!

max410011 commented Jul 14, 2025 •

edited

Loading

Uh oh!

brian-dellabetta commented Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

Are you sure you want to change the base?

[Bugfix] Fix Per-Token Dynamic Activation Quantization #393

Uh oh!

Conversation

max410011 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Uh oh!

brian-dellabetta commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

max410011 commented Jul 14, 2025 •

edited

Loading

brian-dellabetta commented Jul 15, 2025 •

edited

Loading