Remove IMatrixGatherer#2716
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR consolidates imatrix importance-accumulation logic from a standalone ChangesIMatrix Observer Consolidation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewsWaiting for
This rule is failing.PRs labelled "two-reviews" must have at least two approving reviews before merging.
|
There was a problem hiding this comment.
Code Review
This pull request simplifies the iMatrix quantization workflow by removing the IMatrixGatherer modifier and integrating its activation collection logic directly into the IMatrixMSEObserver. The observer now manages its own lifecycle hooks and internal state for importance statistics, leading to cleaner recipes and documentation. Feedback was provided to optimize tensor initialization by placing importance accumulators on the same device as the target module to prevent unnecessary data transfers.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py (2)
82-86:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake hook cleanup assertion baseline-aware.
Line 85 currently assumes the model starts with zero pre-hooks; this can cause false failures if upstream model internals add legitimate hooks. Capture pre-run hook count and assert it is restored after
oneshot.Suggested diff
- oneshot( + initial_hooks = sum(len(m._forward_pre_hooks) for m in model.modules()) + + oneshot( model=model, dataset=DATASET, splits={"calibration": "train[:5%]"}, recipe=recipe, num_calibration_samples=NUM_CALIB_SAMPLES, max_seq_length=MAX_SEQ_LEN, ) @@ - total_hooks = sum(len(m._forward_pre_hooks) for m in model.modules()) - assert ( - total_hooks == 0 - ), f"Expected 0 forward pre-hooks after completion, found {total_hooks}" + total_hooks = sum(len(m._forward_pre_hooks) for m in model.modules()) + assert total_hooks == initial_hooks, ( + f"Expected hook count to return to baseline ({initial_hooks}), " + f"found {total_hooks}" + )As per coding guidelines, tests under
tests/**/*.pyshould be comprehensive and robust for quantization scenarios.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py` around lines 82 - 86, Capture the number of forward pre-hooks present on the model before running oneshot and assert that the count after oneshot equals the captured baseline instead of assuming zero; specifically, compute baseline_hooks = sum(len(m._forward_pre_hooks) for m in model.modules()) before calling oneshot, run the oneshot workflow, then compute total_hooks the same way and assert total_hooks == baseline_hooks, referencing the existing model and _forward_pre_hooks usage to locate the code and the oneshot invocation to place the baseline capture.
182-213:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd explicit postconditions to this integration test.
Line 182-213 only verifies that
oneshotdoes not raise. Please assert at least quantization effect (e.g.,weight_scaleexists on targeted modules) and imatrix cleanup (_imatrix_sum/_imatrix_countremoved) to validate intended behavior.As per coding guidelines, tests under
tests/**/*.pyshould be clear and comprehensive, including edge-case validation for quantization behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py` around lines 182 - 213, The test test_observer_completes_without_separate_gatherer currently only ensures oneshot runs without error; update it to assert postconditions by locating the modules targeted by the QuantizationModifier (the "Linear" targets configured in the recipe) after the oneshot call and assert that quantization metadata was applied (e.g., a weight-scale attribute such as weight_scale or similar quantization field exists on those modules) and that temporary imatrix artifacts were removed (assert _imatrix_sum and _imatrix_count attributes are not present on those same modules); use the same symbols from the diff (QuantizationModifier, observer: "imatrix_mse", oneshot, and the test function name) to find where to add these assertions. Ensure assertions fail loudly if weight-scale is missing or imatrix attrs remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llmcompressor/observers/imatrix.py`:
- Around line 73-77: The attach() method currently returns early when module
lacks in_features without removing a previously attached hook, leaving the old
hook active; fix by first checking if self._imatrix_hook is not None and calling
self._imatrix_hook.remove() and clearing self._imatrix_hook (set to None) before
the hasattr(module, "in_features") early return, then proceed with the
in_features check and normal hook installation logic so reattaching to a module
without in_features does not leave stale hooks.
---
Outside diff comments:
In `@tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py`:
- Around line 82-86: Capture the number of forward pre-hooks present on the
model before running oneshot and assert that the count after oneshot equals the
captured baseline instead of assuming zero; specifically, compute baseline_hooks
= sum(len(m._forward_pre_hooks) for m in model.modules()) before calling
oneshot, run the oneshot workflow, then compute total_hooks the same way and
assert total_hooks == baseline_hooks, referencing the existing model and
_forward_pre_hooks usage to locate the code and the oneshot invocation to place
the baseline capture.
- Around line 182-213: The test
test_observer_completes_without_separate_gatherer currently only ensures oneshot
runs without error; update it to assert postconditions by locating the modules
targeted by the QuantizationModifier (the "Linear" targets configured in the
recipe) after the oneshot call and assert that quantization metadata was applied
(e.g., a weight-scale attribute such as weight_scale or similar quantization
field exists on those modules) and that temporary imatrix artifacts were removed
(assert _imatrix_sum and _imatrix_count attributes are not present on those same
modules); use the same symbols from the diff (QuantizationModifier, observer:
"imatrix_mse", oneshot, and the test function name) to find where to add these
assertions. Ensure assertions fail loudly if weight-scale is missing or imatrix
attrs remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 96edc479-202f-4ee8-8110-cba933ca2a05
📒 Files selected for processing (12)
docs/guides/observers.mdexamples/imatrix/README.mdexamples/imatrix/llama3_imatrix_example.pysrc/llmcompressor/modifiers/quantization/quantization/mixin.pysrc/llmcompressor/modifiers/transform/imatrix/__init__.pysrc/llmcompressor/modifiers/transform/imatrix/base.pysrc/llmcompressor/observers/imatrix.pysrc/llmcompressor/pipelines/registry.pytests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.pytests/llmcompressor/modifiers/transform/imatrix/test_imatrix_gatherer.pytests/llmcompressor/observers/test_imatrix.pytests/llmcompressor/pipelines/test_registry.py
💤 Files with no reviewable changes (3)
- src/llmcompressor/modifiers/transform/imatrix/init.py
- tests/llmcompressor/modifiers/transform/imatrix/test_imatrix_gatherer.py
- src/llmcompressor/modifiers/transform/imatrix/base.py
| return True | ||
| elif isinstance(modifier, QuantizationModifier): | ||
| config = modifier.resolve_quantization_config() | ||
| return config.requires_calibration_data() |
There was a problem hiding this comment.
What's going on here? Looks like you removed a bunch of existing functionality
There was a problem hiding this comment.
Good catch, the previous diff made this look like it removed the existing config-based calibration check...I pushed a follow-up commit that keeps config.requires_calibration_data() explicit in registry.py
There was a problem hiding this comment.
In the original version of this PR, I moved the config.requires_calibration_data() call into a helper on QuantizationModifier, then added the imatrix_mse weight-observer case there
There was a problem hiding this comment.
It makes more sense to move this change here:
instead of duplicating this calibration check, can you update that?
There was a problem hiding this comment.
I opened a companion compressed-tensors PR to move this into QuantizationConfig.requires_calibration_data() where the existing calibration inference lives: https://github.com/vllm-project/compressed-tensors/pull/710/changes
Once that lands, I can simplify this PR back to only calling config.requires_calibration_data()
|
@Yatimai also should review |
SUMMARY:
TEST PLAN:
.venv/bin/python -m pytest tests/llmcompressor/observers/test_imatrix.py tests/llmcompressor/pipelines/test_registry.py.venv/bin/ruff check src/llmcompressor/observers/imatrix.py src/llmcompressor/modifiers/quantization/quantization/mixin.py src/llmcompressor/pipelines/registry.py tests/llmcompressor/observers/test_imatrix.py tests/llmcompressor/pipelines/test_registry.py tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py examples/imatrix/llama3_imatrix_example.py