Skip to content

Qonnx binary quant #1292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ docs/autodoc/*
hls4mlprj_*
*~
*.ipynb_checkpoints/
*.bak
11 changes: 11 additions & 0 deletions hls4ml/converters/onnx/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,14 @@ def parse_quant_layer(node, input_names, input_shapes, graph):
layer['signed'] = bool(get_onnx_attribute(node, 'signed'))

return layer


@onnx_handler('BipolarQuant')
def parse_bipolar_quant_layer(node, input_names, input_shapes, graph):
layer = {}

layer['class_name'] = 'BipolarQuant'
layer['name'] = node.name
layer['inputs'] = input_names
layer['outputs'] = list(node.output)
return layer
16 changes: 16 additions & 0 deletions hls4ml/model/layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,21 @@ def initialize(self):
self.add_output_variable(shape, dims)


class BipolarQuant(Layer): # The QONNX quantization layer
"""
This is a QONNX quantization layer. Optimizations should convert it
before HLS is produced.
"""

_expected_attributes = []

def initialize(self):
inp = self.get_input_variable(self.inputs[0])
shape = inp.shape
dims = inp.dim_names
self.add_output_variable(shape, dims)


class Reshape(Layer):
_expected_attributes = [
Attribute('target_shape', value_type=typing.Sequence),
Expand Down Expand Up @@ -1724,6 +1739,7 @@ def initialize(self):
'GarNet': GarNet,
'GarNetStack': GarNetStack,
'Quant': Quant,
'BipolarQuant': BipolarQuant,
'ApplyAlpha': ApplyAlpha,
'BatchNormOnnx': BatchNormOnnx,
'LayerGroup': LayerGroup,
Expand Down
3 changes: 3 additions & 0 deletions hls4ml/model/optimizer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@
'fuse_quant_with_constant',
'const_quant_to_const_alpha',
'quant_to_alpha_activation_alpha',
'bipolar_quant_constant_parameters',
'bipolar_quant_to_activation',
'fuse_bipolar_quant_with_constant',
'batch_norm_onnx_constant_parameters',
'constant_batch_norm_fusion',
'merge_two_constants',
Expand Down
151 changes: 151 additions & 0 deletions hls4ml/model/optimizer/passes/bipolar_quant_opt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
"""
This file includes optimizations related to BipolarQuant nodes.

As a first step, QuantConstantParameters converts the extra inputs to attributes.

The next step differs between the case of (1) (positive) power-of-2 scale and zero offset, or (2) other cases. In the first
case no explicit scaling is required, so a Quant node logically becomes a linear activation. (Cases when the scale is a
power of 2 not equal to one are implicitly scaled with fixed precision types.) When the activation is applied to a constant
weight, the activation is immediately merged with the weight, quantizing the weights. In case (2), we need to explicitly
scale and unscale, so the Quant node becomes 3 nodes, an ApplyAlpha node to apply a scale/shift, a Linear node to apply the
quantization, and another ApplyAlpha to unscale/shift. We depend on optimization steps to move the unscaling ApplyAlpha
down as needed so that we can do integer or fixed-point calculations. When the Quant is a applied to a weight, the scaling
and Linear nodes are immediately merged into the Constant.

"""

import numpy as np

from hls4ml.model.layers import Activation, BipolarQuant, Constant
from hls4ml.model.optimizer import OptimizerPass
from hls4ml.model.quantizers import BinaryQuantizer
from hls4ml.model.types import XnorPrecisionType

_ALSO_MATCH_PO2 = True


class BipolarQuantConstantParameters(OptimizerPass):
"""Remove Constant from the Qaunt node parameters (but not input[0])"""

def match(self, node):
is_match = (
isinstance(node, BipolarQuant)
and len(node.inputs) == 2
and (node.get_input_node(node.inputs[1]) and isinstance(node.get_input_node(node.inputs[1]), Constant))
)

return is_match

def transform(self, model, node):
"""
Remove Constant from the Quant node parameters (but not input[0])
"""
if node.get_input_node(node.inputs[1]):
scale_node = node.get_input_node(node.inputs[1])
if isinstance(scale_node, Constant):
node.set_attr('scale', scale_node.get_attr('value'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to handle to case when scale != 1. Ideally we should be able to extract ApplyAlpha scales in such a case that we propagate up and down. I think basic support can be fairly straightforwadly added, in the style of the Quant support. (If we don't support scale != 1, we should catch those cases and exit gracefully, with an error message.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the the BinaryQuantizer code and it does not define a scaling factor. Meaning that this can only work for scale factors 1. Further more this whole optimizer pass becomes irrelevant. So I will delete it.
What does ApplyAlpha do? I am not familiar with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a quant layer with a scale and/or zero offset really means scale/shift, then quantize, then unscale/unshift. The ApplyAlpha are scale and shift layers in the hls4ml IR. When a quant node is applied to a weight, the initial scaling/shifting can actually be done to the weights (assuming they are constant and not update able). Otherwise, the hope is that the scaling and unscaling can be moved around the graph to where the implementation is easiest. There are optimizers that already exist for that.

node.inputs[1] = ''
model.remove_node(scale_node)

node.inputs = [inp for inp in node.inputs if inp]
if len(node.inputs) != 1:
raise RuntimeError("hls4ml only supports constant scale")

return True


class BipolarQuantToActivation(OptimizerPass):
"""
This is for the case when scale is a (positive) power of 2 and zeropt is 0. It is a a 1:1 transformation of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really work if the scale is a power of 2? It's fine if it doesn't but if it doesn't we should change the matching criteria.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I'll add a test case to check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work for scale != 1. Since BinaryQuantizer does not have a scaling factor. One option would be to lower BipolarQuant with scaling factor != 1 to a activation node followed by a mul node. However, I am not sure if there is much sense in this? Since these mul nodes are, I presume, not implementable? On the other hand, perhaps they could be absorbed into other nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiplier nodes are the ApplyAlpha nodes I mention in another comment. The ApplyAlpha is just a shift and scale layer, and you can use it as just a scale. (The name comes from it's original application, and it is not very good. We have talked about changing the name.) Fundamentally it's implmented the same way as a batchnorm (since we don't actually update the scaling in a batchnorm).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests with non-unit po2 scale factors, and it appears that the transformations work as is. I tried setting the scaling factors bellow 1 and above 1. And both cases worked. Perhaps some down stream transformation is handling this case?

a BipolarQuant to an Activation.

As an optimization, this is not called when the input is constant.
"""

def match(self, node):
# only matches after the other inputs are already folded

is_match = (
isinstance(node, BipolarQuant)
and len(node.inputs) == 1
and not isinstance(node.get_input_node(node.inputs[0]), Constant)
)

# Only match if the scale is power of 2 and the zero-point is 0s
if is_match: # to make sure this is a quant node with inputs
scale = node.get_attr('scale')
# check if scale is ones-like or a power of two
scale_unit_or_po2 = (scale == np.ones_like(scale)).all()
is_match = scale_unit_or_po2

return is_match

def transform(self, model, node):
"""
Change quant node to Activation
"""
scale = node.get_attr('scale')
assert np.all(scale == 1.0) # TODO: Is this required?

precision = XnorPrecisionType()
quantizer = BinaryQuantizer(bits=1)

attributes = {'activation': 'linear', 'quantizer': quantizer}

# update the configuration
config = model.config.get_layer_config(node)
prec_config = config.setdefault('Precision', {})
prec_config['result'] = str(precision)
new_name = f'{node.name}_act'
model.config.set_name_config(new_name, config)
model.config.parse_name_config(new_name, config)

new_node = model.make_node(Activation, new_name, attributes, [node.inputs[0]], [x for x in node.outputs])
model.replace_node(node, new_node)
return True


class FuseBipolarQuantWithConstant(OptimizerPass):
"""
This is for the case when scale is a positive power of 2 and zeropt is 0.
"""

def match(self, node):
# only matches after the other inputs are already folded
is_match = (
isinstance(node, BipolarQuant)
and len(node.inputs) == 1
and isinstance(node.get_input_node(node.inputs[0]), Constant)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, does this really work if scale != 1? If it doesn't, the matching criteria should change.

)

# Only match if the scale is power of 2 and the zero-point is 0s
if is_match: # to make sure this is a quant node with inputs
scale = node.get_attr('scale')

# check if scale is ones-like or a power of two
scale_unit_or_po2 = (scale == np.ones_like(scale)).all()
is_match = scale_unit_or_po2

return is_match

def transform(self, model, node):
"""
Fuse Quant with Constant.
"""

scale = node.get_attr('scale')
assert np.all(scale == 1.0) # TODO: Is this required?

precision = XnorPrecisionType()
quantizer = BinaryQuantizer(bits=1)

const_node = node.get_input_node(node.inputs[0])
const_node.set_attr('quantizer', quantizer)
const_node.get_output_variable().type.precision = precision

# Should we update the configuration to reflect the new precision? I don't think it's necessary

# remove the Quant node
model.remove_node(node)

return True
Binary file added test/pytest/bnn_model_fc_1layer.onnx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is a good place for an onnx model. I wonder if some of the models from the model zoo will do. If not, we should put this in https://github.com/fastmachinelearning/example-models, where we keep other inputs. (Note that that project is instantiated as a submodule.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this is the qonnx model zoo that I am referring to: https://github.com/fastmachinelearning/qonnx_model_zoo. It does have some binary models, though I am not sure if they are suitable for the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should not be in this repository. However, It is kind of an artificial model, so I am not sure if it should be in the model zoo. I prefer using these kind of artificial models, because they can be generated quickly, and they are very simple. That way if something goes wrong its a very small model, and its easier to find the issue.
Would you be open to adding Brevitas to the testing dependencies? That way we could auto-generate various quantized models as in chisel4ml.
Another option would be to use one of the models from the zoo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example-models repository is a good place for not real models that are used for testing.

Binary file not shown.
43 changes: 43 additions & 0 deletions test/pytest/test_qonnx.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.transformation.channels_last import ConvertToChannelsLastAndClean
from qonnx.transformation.gemm_to_matmul import GemmToMatMul
from qonnx.util.cleanup import cleanup_model

import hls4ml

Expand Down Expand Up @@ -428,3 +429,45 @@ def test_simple_model(model_name, io_type, backend, request):
y_hls4ml = hls_model.predict(X)

np.testing.assert_allclose(y_qonnx.ravel(), y_hls4ml.ravel(), atol=1e-2, rtol=1)


@pytest.mark.parametrize('backend', ['Vitis'])
@pytest.mark.parametrize('io_type', ['io_parallel', 'io_stream'])
def test_bnn(io_type, backend):
"Checks if a basic binarized model works correctly."
test_dir = os.path.dirname(os.path.abspath(__file__))
qonnx_model = ModelWrapper(f'{test_dir}/bnn_model_fc_1layer.onnx')
qonnx_model = cleanup_model(qonnx_model)
qonnx_model = qonnx_model.transform(GemmToMatMul()) # ishape = (1, 3)
qonnx_model = qonnx.util.cleanup.cleanup_model(qonnx_model)
config = hls4ml.utils.config.config_from_onnx_model(
qonnx_model, granularity='name', backend=backend, default_precision='fixed<16,6>'
)
model_name = 'bnn_model_fc_1layer'
hls_model = hls4ml.converters.convert_from_onnx_model(
qonnx_model,
output_dir=str(test_root_path / f'hls4mlprj_onnx_{model_name}_{io_type}_{backend}'),
io_type=io_type,
backend=backend,
hls_config=config,
)
hls_model.compile()

X = np.array(
[
[[+1, +1, +1]],
[[+1, +1, -1]],
[[+1, -1, +1]],
[[-1, -1, -1]],
[[-1, +1, +1]],
[[-1, +1, -1]],
[[-1, -1, +1]],
[[-1, -1, -1]],
],
dtype=np.float32,
)
for x in X:
idict = {qonnx_model.graph.input[0].name: x}
y_qonnx = oxe.execute_onnx(qonnx_model, idict)[qonnx_model.graph.output[0].name]
y_hls4ml = hls_model.predict(X)
np.array_equal(y_qonnx.ravel(), y_hls4ml.ravel())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually fail the test if there's a failure. The array_equal function returns false, but the return value is ignored. In fact, we have a mismatch for all three tests.

Loading