Skip to content

Remove IMatrixGatherer#2716

Open
dshane1903 wants to merge 7 commits into
vllm-project:mainfrom
dshane1903:remove-imatrix-gatherer
Open

Remove IMatrixGatherer#2716
dshane1903 wants to merge 7 commits into
vllm-project:mainfrom
dshane1903:remove-imatrix-gatherer

Conversation

@dshane1903
Copy link
Copy Markdown
Contributor

SUMMARY:

  • remove the temporary IMatrixGatherer modifier and its dedicated tests
  • keep imatrix_mse importance statistics on the observer instead of modules
  • route QuantizationModifier recipes using imatrix_mse through calibration automatically
  • update iMatrix docs and examples to configure imatrix_mse directly as the weight observer

TEST PLAN:

  • .venv/bin/python -m pytest tests/llmcompressor/observers/test_imatrix.py tests/llmcompressor/pipelines/test_registry.py
  • .venv/bin/ruff check src/llmcompressor/observers/imatrix.py src/llmcompressor/modifiers/quantization/quantization/mixin.py src/llmcompressor/pipelines/registry.py tests/llmcompressor/observers/test_imatrix.py tests/llmcompressor/pipelines/test_registry.py tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py examples/imatrix/llama3_imatrix_example.py

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb5c1f8e-1d16-46f7-82a3-2196d42b74c1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR consolidates imatrix importance-accumulation logic from a standalone IMatrixGatherer modifier into the IMatrixMSEObserver itself, eliminating the separate modifier entirely. The observer now maintains importance accumulators internally using forward pre-hooks during calibration. The quantization pipeline is updated to detect calibration requirements by querying the modifier directly. All examples, tests, and documentation are revised to reflect the observer-native architecture.

Changes

IMatrix Observer Consolidation

Layer / File(s) Summary
IMatrixMSEObserver refactoring
src/llmcompressor/observers/imatrix.py, src/llmcompressor/modifiers/transform/imatrix/__init__.py
Observer now maintains _imatrix_sum / _imatrix_count internally with RemovableHandle hook tracking. attach() initializes accumulators and registers a forward pre-hook; detach() removes only the hook. Module-state two-pass behavior is removed.
Quantization calibration detection
src/llmcompressor/modifiers/quantization/quantization/mixin.py, src/llmcompressor/pipelines/registry.py
New QuantizationMixin.requires_calibration_data() method checks config first, then scans for "imatrix_mse" weight observer. CalibrationPipeline._infer_pipeline uses this method instead of inspecting resolved config.
IMatrixMSEObserver unit tests
tests/llmcompressor/observers/test_imatrix.py
Test helpers refactored: Linear factory, importance generator, and observer-based importance injection (via _make_observer(importance=...) and _set_importance(observer, importance)). All lifecycle, strategy, and validation tests updated to use observer-internal accumulators instead of preloaded module state.
End-to-end integration validation
tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py
TestIMatrixObserverIntegration replaces gatherer-dependent tests. Main test verifies weight_scale exists and _imatrix_sum / _imatrix_count are cleaned up after oneshot. Additional test confirms importance affects quantization by comparing against memoryless_mse baseline.
Pipeline test updates
tests/llmcompressor/pipelines/test_registry.py
Removed IMatrixGatherer import and test case; added two QuantizationModifier variants using imatrix_mse observer via weight_observer= and observer= config forms.
Documentation and examples
docs/guides/observers.md, examples/imatrix/README.md, examples/imatrix/llama3_imatrix_example.py
Removed IMatrixGatherer documentation section entirely. Updated imatrix_mse description to specify per-channel E[x²] collection via forward pre-hooks. All code examples use observer="imatrix_mse" directly in QuantizationModifier without separate IMatrixGatherer import or recipe entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • vllm-project/llm-compressor#2698: This PR directly implements the plan to remove IMatrixGatherer, move imatrix accumulators into the observer, and ensure quantization pipelines require calibration when using the imatrix observer.

Possibly related PRs

  • vllm-project/llm-compressor#2131: Both PRs modify CalibrationPipeline._infer_pipeline to change how calibration requirements are determined for quantization modifiers.
  • vllm-project/llm-compressor#2585: The retrieved PR refactors imatrix observer lifecycle and importance accumulation; this PR extends that refactor by removing the separate IMatrixGatherer and updating recipes and tests accordingly.

Suggested labels

refactor, transforms, two-reviews

Suggested reviewers

  • HDCharles
  • dsikka
  • kylesayrs
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Remove IMatrixGatherer' directly summarizes the main change—removal of the IMatrixGatherer modifier—which is a core objective mentioned in the PR description and reflected across multiple file changes.
Description check ✅ Passed The description clearly outlines the PR's objectives including removing IMatrixGatherer, relocating importance statistics to the observer, enabling automatic calibration routing, and updating documentation—all of which align with the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot added the two-reviews When a PR requires two reviews label May 18, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 18, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

  • #approved-reviews-by >= 2
  • #changes-requested-reviews-by = 0

@coderabbitai coderabbitai Bot added transforms Related to transforms-based modifiers like SpinQuant and Quip Refactor Code cleanup and/or improvements to existing features labels May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the iMatrix quantization workflow by removing the IMatrixGatherer modifier and integrating its activation collection logic directly into the IMatrixMSEObserver. The observer now manages its own lifecycle hooks and internal state for importance statistics, leading to cleaner recipes and documentation. Feedback was provided to optimize tensor initialization by placing importance accumulators on the same device as the target module to prevent unnecessary data transfers.

Comment thread src/llmcompressor/observers/imatrix.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py (2)

82-86: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make hook cleanup assertion baseline-aware.

Line 85 currently assumes the model starts with zero pre-hooks; this can cause false failures if upstream model internals add legitimate hooks. Capture pre-run hook count and assert it is restored after oneshot.

Suggested diff
-        oneshot(
+        initial_hooks = sum(len(m._forward_pre_hooks) for m in model.modules())
+
+        oneshot(
             model=model,
             dataset=DATASET,
             splits={"calibration": "train[:5%]"},
             recipe=recipe,
             num_calibration_samples=NUM_CALIB_SAMPLES,
             max_seq_length=MAX_SEQ_LEN,
         )
@@
-        total_hooks = sum(len(m._forward_pre_hooks) for m in model.modules())
-        assert (
-            total_hooks == 0
-        ), f"Expected 0 forward pre-hooks after completion, found {total_hooks}"
+        total_hooks = sum(len(m._forward_pre_hooks) for m in model.modules())
+        assert total_hooks == initial_hooks, (
+            f"Expected hook count to return to baseline ({initial_hooks}), "
+            f"found {total_hooks}"
+        )

As per coding guidelines, tests under tests/**/*.py should be comprehensive and robust for quantization scenarios.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`
around lines 82 - 86, Capture the number of forward pre-hooks present on the
model before running oneshot and assert that the count after oneshot equals the
captured baseline instead of assuming zero; specifically, compute baseline_hooks
= sum(len(m._forward_pre_hooks) for m in model.modules()) before calling
oneshot, run the oneshot workflow, then compute total_hooks the same way and
assert total_hooks == baseline_hooks, referencing the existing model and
_forward_pre_hooks usage to locate the code and the oneshot invocation to place
the baseline capture.

182-213: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit postconditions to this integration test.

Line 182-213 only verifies that oneshot does not raise. Please assert at least quantization effect (e.g., weight_scale exists on targeted modules) and imatrix cleanup (_imatrix_sum/_imatrix_count removed) to validate intended behavior.

As per coding guidelines, tests under tests/**/*.py should be clear and comprehensive, including edge-case validation for quantization behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`
around lines 182 - 213, The test
test_observer_completes_without_separate_gatherer currently only ensures oneshot
runs without error; update it to assert postconditions by locating the modules
targeted by the QuantizationModifier (the "Linear" targets configured in the
recipe) after the oneshot call and assert that quantization metadata was applied
(e.g., a weight-scale attribute such as weight_scale or similar quantization
field exists on those modules) and that temporary imatrix artifacts were removed
(assert _imatrix_sum and _imatrix_count attributes are not present on those same
modules); use the same symbols from the diff (QuantizationModifier, observer:
"imatrix_mse", oneshot, and the test function name) to find where to add these
assertions. Ensure assertions fail loudly if weight-scale is missing or imatrix
attrs remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llmcompressor/observers/imatrix.py`:
- Around line 73-77: The attach() method currently returns early when module
lacks in_features without removing a previously attached hook, leaving the old
hook active; fix by first checking if self._imatrix_hook is not None and calling
self._imatrix_hook.remove() and clearing self._imatrix_hook (set to None) before
the hasattr(module, "in_features") early return, then proceed with the
in_features check and normal hook installation logic so reattaching to a module
without in_features does not leave stale hooks.

---

Outside diff comments:
In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`:
- Around line 82-86: Capture the number of forward pre-hooks present on the
model before running oneshot and assert that the count after oneshot equals the
captured baseline instead of assuming zero; specifically, compute baseline_hooks
= sum(len(m._forward_pre_hooks) for m in model.modules()) before calling
oneshot, run the oneshot workflow, then compute total_hooks the same way and
assert total_hooks == baseline_hooks, referencing the existing model and
_forward_pre_hooks usage to locate the code and the oneshot invocation to place
the baseline capture.
- Around line 182-213: The test
test_observer_completes_without_separate_gatherer currently only ensures oneshot
runs without error; update it to assert postconditions by locating the modules
targeted by the QuantizationModifier (the "Linear" targets configured in the
recipe) after the oneshot call and assert that quantization metadata was applied
(e.g., a weight-scale attribute such as weight_scale or similar quantization
field exists on those modules) and that temporary imatrix artifacts were removed
(assert _imatrix_sum and _imatrix_count attributes are not present on those same
modules); use the same symbols from the diff (QuantizationModifier, observer:
"imatrix_mse", oneshot, and the test function name) to find where to add these
assertions. Ensure assertions fail loudly if weight-scale is missing or imatrix
attrs remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 96edc479-202f-4ee8-8110-cba933ca2a05

📥 Commits

Reviewing files that changed from the base of the PR and between 1a20a61 and 6d8517b.

📒 Files selected for processing (12)
  • docs/guides/observers.md
  • examples/imatrix/README.md
  • examples/imatrix/llama3_imatrix_example.py
  • src/llmcompressor/modifiers/quantization/quantization/mixin.py
  • src/llmcompressor/modifiers/transform/imatrix/__init__.py
  • src/llmcompressor/modifiers/transform/imatrix/base.py
  • src/llmcompressor/observers/imatrix.py
  • src/llmcompressor/pipelines/registry.py
  • tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py
  • tests/llmcompressor/modifiers/transform/imatrix/test_imatrix_gatherer.py
  • tests/llmcompressor/observers/test_imatrix.py
  • tests/llmcompressor/pipelines/test_registry.py
💤 Files with no reviewable changes (3)
  • src/llmcompressor/modifiers/transform/imatrix/init.py
  • tests/llmcompressor/modifiers/transform/imatrix/test_imatrix_gatherer.py
  • src/llmcompressor/modifiers/transform/imatrix/base.py

Comment thread src/llmcompressor/observers/imatrix.py Outdated
@HDCharles HDCharles self-assigned this May 18, 2026
return True
elif isinstance(modifier, QuantizationModifier):
config = modifier.resolve_quantization_config()
return config.requires_calibration_data()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Looks like you removed a bunch of existing functionality

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the previous diff made this look like it removed the existing config-based calibration check...I pushed a follow-up commit that keeps config.requires_calibration_data() explicit in registry.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original version of this PR, I moved the config.requires_calibration_data() call into a helper on QuantizationModifier, then added the imatrix_mse weight-observer case there

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to move this change here:

https://github.com/vllm-project/compressed-tensors/blob/c763417afecf5fef123cbc1dbca16dc449ab8e9a/src/compressed_tensors/quantization/quant_config.py#L257

instead of duplicating this calibration check, can you update that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a companion compressed-tensors PR to move this into QuantizationConfig.requires_calibration_data() where the existing calibration inference lives: https://github.com/vllm-project/compressed-tensors/pull/710/changes

Once that lands, I can simplify this PR back to only calling config.requires_calibration_data()

@HDCharles
Copy link
Copy Markdown
Collaborator

@Yatimai also should review

Copy link
Copy Markdown
Contributor

@Yatimai Yatimai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clean execution of the plan from #2698. IMatrixGatherer was always meant as a temporary scaffold in #2473 : the plan was to remove it once the observer lifecycle could run quantization after calibration, which #2585 made possible.

One small observation below on the tests. Non-blocking.

Comment thread tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Refactor Code cleanup and/or improvements to existing features transforms Related to transforms-based modifiers like SpinQuant and Quip two-reviews When a PR requires two reviews

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants