Skip to content

Fix incorrect value_info setting in MergeONNXModels #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 15, 2024

Conversation

fpjentzsch
Copy link
Contributor

We connect the output of the pre_model to the input tensor of the post_model.

Old behavior: The output tensor of pre_model is added to the value_info of the output model, even though this tensor is no longer used. The input tensor of post_model is missing a value_info entry. This causes errors in FINN shape inference because no shape can be annotated to this tensor, but CustomOps expect their input tensor shape to be annotated.

New behavior: The input tensor of post_model is added to the value_info of the output model.

@maltanar
Copy link
Collaborator

maltanar commented Dec 9, 2024

Thanks @fpjentzsch . Do you have a testcase suggestion that would catch the bug in the old version? Calling InferShapes() in the merged model in test_merge_onnx_models did not raise any errors for me.

@fpjentzsch
Copy link
Contributor Author

Thanks @fpjentzsch . Do you have a testcase suggestion that would catch the bug in the old version? Calling InferShapes() in the merged model in test_merge_onnx_models did not raise any errors for me.

The problem occurs in make_shape_compatible_op because many FINN CustomOps do the following there:

ishape = tuple(model.get_tensor_shape(self.onnx_node.input[0]))
assert ishape == exp_ishape

Which fails because get_tensor_shape() returns None if no value_info is found for the input. During shape inference make_shape_compatible_op is called on all nodes before the actual ONNX shape inference begins, so essentially we expect the model to have all shapes correctly inferred before any offending CustomOp is placed in the graph.

I'm not sure this assumption makes sense, maybe we should re-design the way tensor shapes are checked against node attributes or the "make_shape_compatible_op" mechanism in general?

At first glance only the ONNX CustomOps "quantavgpool2d", "max_pool", "maxpoolnhwc", and "conv" do this in make_shape_compatible_op. So we could build a failing test for the old behavior by applying MergeONNXModels() on a model where one of these ops is the first layer.

@maltanar
Copy link
Collaborator

Thanks for the clarification Felix. I'm happy to merge this as-is, though it would be even better to address the underlying issues with how we do shape inference for QONNX custom ops. I'll open an issue so we can track this separately.

On a separate note, for merging models https://onnx.ai/onnx/api/compose.html also exists and we should consider making MergeONNXModels a wrapper instead, provided that we get the same functionality. This would mean maintaining less code as part of QONNX which would be desirable.

@maltanar maltanar merged commit 2c91d6d into fastmachinelearning:main Dec 15, 2024
5 checks passed
@maltanar maltanar added this to the v0.4.0 milestone Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants