Remove IMatrixGatherer by dshane1903 · Pull Request #2716 · vllm-project/llm-compressor

dshane1903 · 2026-05-18T00:41:19Z

SUMMARY:

remove the temporary IMatrixGatherer modifier and its dedicated tests
keep imatrix_mse importance statistics on the observer instead of modules
route QuantizationModifier recipes using imatrix_mse through calibration automatically
update iMatrix docs and examples to configure imatrix_mse directly as the weight observer

TEST PLAN:

.venv/bin/python -m pytest tests/llmcompressor/observers/test_imatrix.py tests/llmcompressor/pipelines/test_registry.py
.venv/bin/ruff check src/llmcompressor/observers/imatrix.py src/llmcompressor/modifiers/quantization/quantization/mixin.py src/llmcompressor/pipelines/registry.py tests/llmcompressor/observers/test_imatrix.py tests/llmcompressor/pipelines/test_registry.py tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py examples/imatrix/llama3_imatrix_example.py

github-actions · 2026-05-18T00:41:28Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

coderabbitai · 2026-05-18T00:41:30Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb5c1f8e-1d16-46f7-82a3-2196d42b74c1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This PR consolidates imatrix importance-accumulation logic from a standalone IMatrixGatherer modifier into the IMatrixMSEObserver itself, eliminating the separate modifier entirely. The observer now maintains importance accumulators internally using forward pre-hooks during calibration. The quantization pipeline is updated to detect calibration requirements by querying the modifier directly. All examples, tests, and documentation are revised to reflect the observer-native architecture.

Changes

IMatrix Observer Consolidation

Layer / File(s)	Summary
IMatrixMSEObserver refactoring `src/llmcompressor/observers/imatrix.py`, `src/llmcompressor/modifiers/transform/imatrix/__init__.py`	Observer now maintains `_imatrix_sum` / `_imatrix_count` internally with `RemovableHandle` hook tracking. `attach()` initializes accumulators and registers a forward pre-hook; `detach()` removes only the hook. Module-state two-pass behavior is removed.
Quantization calibration detection `src/llmcompressor/modifiers/quantization/quantization/mixin.py`, `src/llmcompressor/pipelines/registry.py`	New `QuantizationMixin.requires_calibration_data()` method checks config first, then scans for `"imatrix_mse"` weight observer. `CalibrationPipeline._infer_pipeline` uses this method instead of inspecting resolved config.
IMatrixMSEObserver unit tests `tests/llmcompressor/observers/test_imatrix.py`	Test helpers refactored: Linear factory, importance generator, and observer-based importance injection (via `_make_observer(importance=...)` and `_set_importance(observer, importance)`). All lifecycle, strategy, and validation tests updated to use observer-internal accumulators instead of preloaded module state.
End-to-end integration validation `tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`	`TestIMatrixObserverIntegration` replaces gatherer-dependent tests. Main test verifies `weight_scale` exists and `_imatrix_sum` / `_imatrix_count` are cleaned up after `oneshot`. Additional test confirms importance affects quantization by comparing against `memoryless_mse` baseline.
Pipeline test updates `tests/llmcompressor/pipelines/test_registry.py`	Removed `IMatrixGatherer` import and test case; added two `QuantizationModifier` variants using `imatrix_mse` observer via `weight_observer=` and `observer=` config forms.
Documentation and examples `docs/guides/observers.md`, `examples/imatrix/README.md`, `examples/imatrix/llama3_imatrix_example.py`	Removed `IMatrixGatherer` documentation section entirely. Updated `imatrix_mse` description to specify per-channel `E[x²]` collection via forward pre-hooks. All code examples use `observer="imatrix_mse"` directly in `QuantizationModifier` without separate `IMatrixGatherer` import or recipe entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

vllm-project/llm-compressor#2698: This PR directly implements the plan to remove IMatrixGatherer, move imatrix accumulators into the observer, and ensure quantization pipelines require calibration when using the imatrix observer.

Possibly related PRs

vllm-project/llm-compressor#2131: Both PRs modify CalibrationPipeline._infer_pipeline to change how calibration requirements are determined for quantization modifiers.
vllm-project/llm-compressor#2585: The retrieved PR refactors imatrix observer lifecycle and importance accumulation; this PR extends that refactor by removing the separate IMatrixGatherer and updating recipes and tests accordingly.

Suggested labels

refactor, transforms, two-reviews

Suggested reviewers

HDCharles
dsikka
kylesayrs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.13% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Remove IMatrixGatherer' directly summarizes the main change—removal of the IMatrixGatherer modifier—which is a core objective mentioned in the PR description and reflected across multiple file changes.
Description check	✅ Passed	The description clearly outlines the PR's objectives including removing IMatrixGatherer, relocating importance statistics to the observer, enabling automatic calibration routing, and updating documentation—all of which align with the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mergify · 2026-05-18T00:41:58Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for

#approved-reviews-by >= 2

This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

#approved-reviews-by >= 2
#changes-requested-reviews-by = 0

gemini-code-assist

Code Review

This pull request simplifies the iMatrix quantization workflow by removing the IMatrixGatherer modifier and integrating its activation collection logic directly into the IMatrixMSEObserver. The observer now manages its own lifecycle hooks and internal state for importance statistics, leading to cleaner recipes and documentation. Feedback was provided to optimize tensor initialization by placing importance accumulators on the same device as the target module to prevent unnecessary data transfers.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py (2)

82-86: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make hook cleanup assertion baseline-aware.

Line 85 currently assumes the model starts with zero pre-hooks; this can cause false failures if upstream model internals add legitimate hooks. Capture pre-run hook count and assert it is restored after oneshot.

Suggested diff

-        oneshot(
+        initial_hooks = sum(len(m._forward_pre_hooks) for m in model.modules())
+
+        oneshot(
             model=model,
             dataset=DATASET,
             splits={"calibration": "train[:5%]"},
             recipe=recipe,
             num_calibration_samples=NUM_CALIB_SAMPLES,
             max_seq_length=MAX_SEQ_LEN,
         )
@@
-        total_hooks = sum(len(m._forward_pre_hooks) for m in model.modules())
-        assert (
-            total_hooks == 0
-        ), f"Expected 0 forward pre-hooks after completion, found {total_hooks}"
+        total_hooks = sum(len(m._forward_pre_hooks) for m in model.modules())
+        assert total_hooks == initial_hooks, (
+            f"Expected hook count to return to baseline ({initial_hooks}), "
+            f"found {total_hooks}"
+        )

As per coding guidelines, tests under tests/**/*.py should be comprehensive and robust for quantization scenarios.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`
around lines 82 - 86, Capture the number of forward pre-hooks present on the
model before running oneshot and assert that the count after oneshot equals the
captured baseline instead of assuming zero; specifically, compute baseline_hooks
= sum(len(m._forward_pre_hooks) for m in model.modules()) before calling
oneshot, run the oneshot workflow, then compute total_hooks the same way and
assert total_hooks == baseline_hooks, referencing the existing model and
_forward_pre_hooks usage to locate the code and the oneshot invocation to place
the baseline capture.

182-213: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit postconditions to this integration test.

Line 182-213 only verifies that oneshot does not raise. Please assert at least quantization effect (e.g., weight_scale exists on targeted modules) and imatrix cleanup (_imatrix_sum/_imatrix_count removed) to validate intended behavior.

As per coding guidelines, tests under tests/**/*.py should be clear and comprehensive, including edge-case validation for quantization behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`
around lines 182 - 213, The test
test_observer_completes_without_separate_gatherer currently only ensures oneshot
runs without error; update it to assert postconditions by locating the modules
targeted by the QuantizationModifier (the "Linear" targets configured in the
recipe) after the oneshot call and assert that quantization metadata was applied
(e.g., a weight-scale attribute such as weight_scale or similar quantization
field exists on those modules) and that temporary imatrix artifacts were removed
(assert _imatrix_sum and _imatrix_count attributes are not present on those same
modules); use the same symbols from the diff (QuantizationModifier, observer:
"imatrix_mse", oneshot, and the test function name) to find where to add these
assertions. Ensure assertions fail loudly if weight-scale is missing or imatrix
attrs remain.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llmcompressor/observers/imatrix.py`:
- Around line 73-77: The attach() method currently returns early when module
lacks in_features without removing a previously attached hook, leaving the old
hook active; fix by first checking if self._imatrix_hook is not None and calling
self._imatrix_hook.remove() and clearing self._imatrix_hook (set to None) before
the hasattr(module, "in_features") early return, then proceed with the
in_features check and normal hook installation logic so reattaching to a module
without in_features does not leave stale hooks.

---

Outside diff comments:
In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`:
- Around line 82-86: Capture the number of forward pre-hooks present on the
model before running oneshot and assert that the count after oneshot equals the
captured baseline instead of assuming zero; specifically, compute baseline_hooks
= sum(len(m._forward_pre_hooks) for m in model.modules()) before calling
oneshot, run the oneshot workflow, then compute total_hooks the same way and
assert total_hooks == baseline_hooks, referencing the existing model and
_forward_pre_hooks usage to locate the code and the oneshot invocation to place
the baseline capture.
- Around line 182-213: The test
test_observer_completes_without_separate_gatherer currently only ensures oneshot
runs without error; update it to assert postconditions by locating the modules
targeted by the QuantizationModifier (the "Linear" targets configured in the
recipe) after the oneshot call and assert that quantization metadata was applied
(e.g., a weight-scale attribute such as weight_scale or similar quantization
field exists on those modules) and that temporary imatrix artifacts were removed
(assert _imatrix_sum and _imatrix_count attributes are not present on those same
modules); use the same symbols from the diff (QuantizationModifier, observer:
"imatrix_mse", oneshot, and the test function name) to find where to add these
assertions. Ensure assertions fail loudly if weight-scale is missing or imatrix
attrs remain.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 96edc479-202f-4ee8-8110-cba933ca2a05

📥 Commits

Reviewing files that changed from the base of the PR and between 1a20a61 and 6d8517b.

📒 Files selected for processing (12)

docs/guides/observers.md
examples/imatrix/README.md
examples/imatrix/llama3_imatrix_example.py
src/llmcompressor/modifiers/quantization/quantization/mixin.py
src/llmcompressor/modifiers/transform/imatrix/__init__.py
src/llmcompressor/modifiers/transform/imatrix/base.py
src/llmcompressor/observers/imatrix.py
src/llmcompressor/pipelines/registry.py
tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py
tests/llmcompressor/modifiers/transform/imatrix/test_imatrix_gatherer.py
tests/llmcompressor/observers/test_imatrix.py
tests/llmcompressor/pipelines/test_registry.py

💤 Files with no reviewable changes (3)

src/llmcompressor/modifiers/transform/imatrix/init.py
tests/llmcompressor/modifiers/transform/imatrix/test_imatrix_gatherer.py
src/llmcompressor/modifiers/transform/imatrix/base.py

HDCharles · 2026-05-18T13:45:02Z

                return True
            elif isinstance(modifier, QuantizationModifier):
-                config = modifier.resolve_quantization_config()
-                return config.requires_calibration_data()


What's going on here? Looks like you removed a bunch of existing functionality

Good catch, the previous diff made this look like it removed the existing config-based calibration check...I pushed a follow-up commit that keeps config.requires_calibration_data() explicit in registry.py

In the original version of this PR, I moved the config.requires_calibration_data() call into a helper on QuantizationModifier, then added the imatrix_mse weight-observer case there

It makes more sense to move this change here:

https://github.com/vllm-project/compressed-tensors/blob/c763417afecf5fef123cbc1dbca16dc449ab8e9a/src/compressed_tensors/quantization/quant_config.py#L257

instead of duplicating this calibration check, can you update that?

I opened a companion compressed-tensors PR to move this into QuantizationConfig.requires_calibration_data() where the existing calibration inference lives: https://github.com/vllm-project/compressed-tensors/pull/710/changes

Once that lands, I can simplify this PR back to only calling config.requires_calibration_data()

HDCharles · 2026-05-18T14:29:58Z

@Yatimai also should review

Yatimai

This is a clean execution of the plan from #2698. IMatrixGatherer was always meant as a temporary scaffold in #2473 : the plan was to remove it once the observer lifecycle could run quantization after calibration, which #2585 made possible.

One small observation below on the tests. Non-blocking.

dshane1903 added 4 commits May 17, 2026 19:53

Store imatrix statistics on observer

9d28c98

Route imatrix quantization through calibration

0e71b26

Remove IMatrixGatherer modifier

57c8754

Update imatrix observer docs and example

6d8517b

dshane1903 requested review from HDCharles, dsikka and kylesayrs as code owners May 18, 2026 00:41

mergify Bot added the two-reviews When a PR requires two reviews label May 18, 2026

coderabbitai Bot added transforms Related to transforms-based modifiers like SpinQuant and Quip Refactor Code cleanup and/or improvements to existing features labels May 18, 2026

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/llmcompressor/observers/imatrix.py Outdated

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread src/llmcompressor/observers/imatrix.py Outdated

HDCharles self-assigned this May 18, 2026

HDCharles reviewed May 18, 2026

View reviewed changes

Clarify imatrix calibration pipeline inference

58ed176

Yatimai reviewed May 18, 2026

View reviewed changes

Comment thread tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py Outdated

Assert imatrix observer cleanup in integration tests

e3f7700

dshane1903 mentioned this pull request May 19, 2026

Treat imatrix weight observer as calibration data dependent vllm-project/compressed-tensors#710

Open

Fix imatrix observer hook lifecycle

110b562

Conversation

dshane1903 commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

mergify Bot commented May 18, 2026

Merge Protections

🔴 Require two reviews

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HDCharles May 18, 2026

Choose a reason for hiding this comment

Uh oh!

dshane1903 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

dshane1903 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

HDCharles May 18, 2026

Choose a reason for hiding this comment

Uh oh!

dshane1903 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

HDCharles commented May 18, 2026

Uh oh!

Yatimai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 18, 2026 •

edited

Loading