-
Notifications
You must be signed in to change notification settings - Fork 473
Qonnx binary quant #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Qonnx binary quant #1292
Changes from 4 commits
d093913
4b66180
721d598
768c6a9
6a74bfb
10e1af0
2dfdb25
89e2136
76968b5
8a20361
7bd4d94
144d427
8d6aae2
18ce38e
08dafdf
63feffc
2649762
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,3 +14,4 @@ docs/autodoc/* | |
hls4mlprj_* | ||
*~ | ||
*.ipynb_checkpoints/ | ||
*.bak |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
""" | ||
This file includes optimizations related to BipolarQuant nodes. | ||
|
||
As a first step, QuantConstantParameters converts the extra inputs to attributes. | ||
|
||
The next step differs between the case of (1) (positive) power-of-2 scale and zero offset, or (2) other cases. In the first | ||
case no explicit scaling is required, so a Quant node logically becomes a linear activation. (Cases when the scale is a | ||
power of 2 not equal to one are implicitly scaled with fixed precision types.) When the activation is applied to a constant | ||
weight, the activation is immediately merged with the weight, quantizing the weights. In case (2), we need to explicitly | ||
scale and unscale, so the Quant node becomes 3 nodes, an ApplyAlpha node to apply a scale/shift, a Linear node to apply the | ||
quantization, and another ApplyAlpha to unscale/shift. We depend on optimization steps to move the unscaling ApplyAlpha | ||
down as needed so that we can do integer or fixed-point calculations. When the Quant is a applied to a weight, the scaling | ||
and Linear nodes are immediately merged into the Constant. | ||
|
||
""" | ||
|
||
import numpy as np | ||
|
||
from hls4ml.model.layers import Activation, BipolarQuant, Constant | ||
from hls4ml.model.optimizer import OptimizerPass | ||
from hls4ml.model.quantizers import BinaryQuantizer | ||
from hls4ml.model.types import XnorPrecisionType | ||
|
||
_ALSO_MATCH_PO2 = True | ||
|
||
|
||
class BipolarQuantConstantParameters(OptimizerPass): | ||
"""Remove Constant from the Qaunt node parameters (but not input[0])""" | ||
|
||
def match(self, node): | ||
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 2 | ||
and (node.get_input_node(node.inputs[1]) and isinstance(node.get_input_node(node.inputs[1]), Constant)) | ||
) | ||
|
||
return is_match | ||
|
||
def transform(self, model, node): | ||
""" | ||
Remove Constant from the Quant node parameters (but not input[0]) | ||
""" | ||
if node.get_input_node(node.inputs[1]): | ||
scale_node = node.get_input_node(node.inputs[1]) | ||
if isinstance(scale_node, Constant): | ||
node.set_attr('scale', scale_node.get_attr('value')) | ||
node.inputs[1] = '' | ||
model.remove_node(scale_node) | ||
|
||
node.inputs = [inp for inp in node.inputs if inp] | ||
if len(node.inputs) != 1: | ||
raise RuntimeError("hls4ml only supports constant scale") | ||
|
||
return True | ||
|
||
|
||
class BipolarQuantToActivation(OptimizerPass): | ||
""" | ||
This is for the case when scale is a (positive) power of 2 and zeropt is 0. It is a a 1:1 transformation of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this really work if the scale is a power of 2? It's fine if it doesn't but if it doesn't we should change the matching criteria. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure. I'll add a test case to check. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does not work for scale != 1. Since BinaryQuantizer does not have a scaling factor. One option would be to lower BipolarQuant with scaling factor != 1 to a activation node followed by a mul node. However, I am not sure if there is much sense in this? Since these mul nodes are, I presume, not implementable? On the other hand, perhaps they could be absorbed into other nodes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The multiplier nodes are the ApplyAlpha nodes I mention in another comment. The ApplyAlpha is just a shift and scale layer, and you can use it as just a scale. (The name comes from it's original application, and it is not very good. We have talked about changing the name.) Fundamentally it's implmented the same way as a batchnorm (since we don't actually update the scaling in a batchnorm). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added some tests with non-unit po2 scale factors, and it appears that the transformations work as is. I tried setting the scaling factors bellow 1 and above 1. And both cases worked. Perhaps some down stream transformation is handling this case? |
||
a BipolarQuant to an Activation. | ||
|
||
As an optimization, this is not called when the input is constant. | ||
""" | ||
|
||
def match(self, node): | ||
# only matches after the other inputs are already folded | ||
|
||
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 1 | ||
and not isinstance(node.get_input_node(node.inputs[0]), Constant) | ||
) | ||
|
||
# Only match if the scale is power of 2 and the zero-point is 0s | ||
if is_match: # to make sure this is a quant node with inputs | ||
scale = node.get_attr('scale') | ||
# check if scale is ones-like or a power of two | ||
scale_unit_or_po2 = (scale == np.ones_like(scale)).all() | ||
is_match = scale_unit_or_po2 | ||
|
||
return is_match | ||
|
||
def transform(self, model, node): | ||
""" | ||
Change quant node to Activation | ||
""" | ||
scale = node.get_attr('scale') | ||
assert np.all(scale == 1.0) # TODO: Is this required? | ||
|
||
precision = XnorPrecisionType() | ||
quantizer = BinaryQuantizer(bits=1) | ||
|
||
attributes = {'activation': 'linear', 'quantizer': quantizer} | ||
|
||
# update the configuration | ||
config = model.config.get_layer_config(node) | ||
prec_config = config.setdefault('Precision', {}) | ||
prec_config['result'] = str(precision) | ||
new_name = f'{node.name}_act' | ||
model.config.set_name_config(new_name, config) | ||
model.config.parse_name_config(new_name, config) | ||
|
||
new_node = model.make_node(Activation, new_name, attributes, [node.inputs[0]], [x for x in node.outputs]) | ||
model.replace_node(node, new_node) | ||
return True | ||
|
||
|
||
class FuseBipolarQuantWithConstant(OptimizerPass): | ||
""" | ||
This is for the case when scale is a positive power of 2 and zeropt is 0. | ||
""" | ||
|
||
def match(self, node): | ||
# only matches after the other inputs are already folded | ||
is_match = ( | ||
isinstance(node, BipolarQuant) | ||
and len(node.inputs) == 1 | ||
and isinstance(node.get_input_node(node.inputs[0]), Constant) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, does this really work if scale != 1? If it doesn't, the matching criteria should change. |
||
) | ||
|
||
# Only match if the scale is power of 2 and the zero-point is 0s | ||
if is_match: # to make sure this is a quant node with inputs | ||
scale = node.get_attr('scale') | ||
|
||
# check if scale is ones-like or a power of two | ||
scale_unit_or_po2 = (scale == np.ones_like(scale)).all() | ||
is_match = scale_unit_or_po2 | ||
|
||
return is_match | ||
|
||
def transform(self, model, node): | ||
""" | ||
Fuse Quant with Constant. | ||
""" | ||
|
||
scale = node.get_attr('scale') | ||
assert np.all(scale == 1.0) # TODO: Is this required? | ||
|
||
precision = XnorPrecisionType() | ||
quantizer = BinaryQuantizer(bits=1) | ||
|
||
const_node = node.get_input_node(node.inputs[0]) | ||
const_node.set_attr('quantizer', quantizer) | ||
const_node.get_output_variable().type.precision = precision | ||
|
||
# Should we update the configuration to reflect the new precision? I don't think it's necessary | ||
|
||
# remove the Quant node | ||
model.remove_node(node) | ||
|
||
return True |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if this is a good place for an onnx model. I wonder if some of the models from the model zoo will do. If not, we should put this in https://github.com/fastmachinelearning/example-models, where we keep other inputs. (Note that that project is instantiated as a submodule.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, this is the qonnx model zoo that I am referring to: https://github.com/fastmachinelearning/qonnx_model_zoo. It does have some binary models, though I am not sure if they are suitable for the test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it should not be in this repository. However, It is kind of an artificial model, so I am not sure if it should be in the model zoo. I prefer using these kind of artificial models, because they can be generated quickly, and they are very simple. That way if something goes wrong its a very small model, and its easier to find the issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example-models repository is a good place for not real models that are used for testing. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
from qonnx.core.modelwrapper import ModelWrapper | ||
from qonnx.transformation.channels_last import ConvertToChannelsLastAndClean | ||
from qonnx.transformation.gemm_to_matmul import GemmToMatMul | ||
from qonnx.util.cleanup import cleanup_model | ||
|
||
import hls4ml | ||
|
||
|
@@ -428,3 +429,45 @@ def test_simple_model(model_name, io_type, backend, request): | |
y_hls4ml = hls_model.predict(X) | ||
|
||
np.testing.assert_allclose(y_qonnx.ravel(), y_hls4ml.ravel(), atol=1e-2, rtol=1) | ||
|
||
|
||
@pytest.mark.parametrize('backend', ['Vitis']) | ||
@pytest.mark.parametrize('io_type', ['io_parallel', 'io_stream']) | ||
def test_bnn(io_type, backend): | ||
"Checks if a basic binarized model works correctly." | ||
test_dir = os.path.dirname(os.path.abspath(__file__)) | ||
qonnx_model = ModelWrapper(f'{test_dir}/bnn_model_fc_1layer.onnx') | ||
qonnx_model = cleanup_model(qonnx_model) | ||
qonnx_model = qonnx_model.transform(GemmToMatMul()) # ishape = (1, 3) | ||
qonnx_model = qonnx.util.cleanup.cleanup_model(qonnx_model) | ||
config = hls4ml.utils.config.config_from_onnx_model( | ||
qonnx_model, granularity='name', backend=backend, default_precision='fixed<16,6>' | ||
) | ||
model_name = 'bnn_model_fc_1layer' | ||
hls_model = hls4ml.converters.convert_from_onnx_model( | ||
qonnx_model, | ||
output_dir=str(test_root_path / f'hls4mlprj_onnx_{model_name}_{io_type}_{backend}'), | ||
io_type=io_type, | ||
backend=backend, | ||
hls_config=config, | ||
) | ||
hls_model.compile() | ||
|
||
X = np.array( | ||
[ | ||
[[+1, +1, +1]], | ||
[[+1, +1, -1]], | ||
[[+1, -1, +1]], | ||
[[-1, -1, -1]], | ||
[[-1, +1, +1]], | ||
[[-1, +1, -1]], | ||
[[-1, -1, +1]], | ||
[[-1, -1, -1]], | ||
], | ||
dtype=np.float32, | ||
) | ||
for x in X: | ||
idict = {qonnx_model.graph.input[0].name: x} | ||
y_qonnx = oxe.execute_onnx(qonnx_model, idict)[qonnx_model.graph.output[0].name] | ||
y_hls4ml = hls_model.predict(X) | ||
np.array_equal(y_qonnx.ravel(), y_hls4ml.ravel()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't actually fail the test if there's a failure. The array_equal function returns false, but the return value is ignored. In fact, we have a mismatch for all three tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to handle to case when scale != 1. Ideally we should be able to extract ApplyAlpha scales in such a case that we propagate up and down. I think basic support can be fairly straightforwadly added, in the style of the Quant support. (If we don't support scale != 1, we should catch those cases and exit gracefully, with an error message.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the the BinaryQuantizer code and it does not define a scaling factor. Meaning that this can only work for scale factors 1. Further more this whole optimizer pass becomes irrelevant. So I will delete it.
What does ApplyAlpha do? I am not familiar with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a quant layer with a scale and/or zero offset really means scale/shift, then quantize, then unscale/unshift. The ApplyAlpha are scale and shift layers in the hls4ml IR. When a quant node is applied to a weight, the initial scaling/shifting can actually be done to the weights (assuming they are constant and not update able). Otherwise, the hope is that the scaling and unscaling can be moved around the graph to where the implementation is easiest. There are optimizers that already exist for that.