Skip to content

g++ error when running build #1269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks
ixiotidi opened this issue Apr 11, 2025 · 5 comments
Open
4 tasks

g++ error when running build #1269

ixiotidi opened this issue Apr 11, 2025 · 5 comments
Labels

Comments

@ixiotidi
Copy link

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the master branch. Make sure to include the commit hash of the commit you checked out.
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the hls4ml project files, which were created directly before and/or after the bug.

Quick summary

I'm getting the following error when running the HLS4ML build process.
g++: internal compiler error: Segmentation fault signal terminated program cc1plus
Please submit a full bug report,
with preprocessed source if appropriate.
See http://bugs.almalinux.org/ for instructions.

Details

I've tried for various HLS4ML and gcc versions, within a docker container and natively on my build machine and it's getting repeated. Basically I have a rather large CNN that I'm trying to get the firmware estimates. The model is the following:

Layer (type) Output Shape Param #

conv2d (Conv2D) (None, 72, 122, 128) 1280

max_pooling2d (MaxPooling2 (None, 36, 61, 128) 0
D)

conv2d_1 (Conv2D) (None, 34, 59, 128) 147584

max_pooling2d_1 (MaxPoolin (None, 17, 29, 128) 0
g2D)

conv2d_2 (Conv2D) (None, 15, 27, 128) 147584

max_pooling2d_2 (MaxPoolin (None, 7, 13, 128) 0
g2D)

flatten (Flatten) (None, 11648) 0

dense (Dense) (None, 16) 186384

dropout (Dropout) (None, 16) 0

dense_1 (Dense) (None, 1) 17

=================================================================
Total params: 482849 (1.84 MB)
Trainable params: 482849 (1.84 MB)
Non-trainable params: 0 (0.00 Byte)


The HLS4ML versions I've tried it with are: 0.8.1, 1.0.0, 1.1.0
also tried with Vitis: 2022.0 and 2024.1

the way of compiling it is the following:

import keras

cnn_model = keras.models.load_model('deep_CNN_98acc_mar26.keras', compile=False)
cnn_model.summary()

import hls4ml
import os

os.environ['PATH'] = os.environ['XILINX_VITIS'] + '/bin:' + os.environ['PATH']

hlsConfig = hls4ml.utils.config_from_keras_model(cnn_model, granularity='name', backend='Vitis', default_precision='ap_fixed<16,6>')
hlsModel = hls4ml.converters.convert_from_keras_model(cnn_model, hls_config=hlsConfig, backend='Vitis', output_dir='adamCNN/hls4ml_prj', part='xcvu9p-fsgd2104-2L-e')

hlsModel.compile()

HLS does generate the firmare files and the project but then fails. I've reverted back on testing the HLS4ML example FC and all works fine. So not sure if this is only related to this specific model.

Steps to Reproduce

I can provide the scripts and everything if needed to reproduce.

Expected behavior

I would have expected that it would finish the compile of the model as the files are all generated.

Actual behavior

Instead I get the error that my g++ compiler crashes without any other log information. I tried to change the GCC compiler and still the same issue

Optional

Possible fix

If you already know where the issue stems from, or you have a hint please let us know.

Additional context

Add any other context about the problem here.

@ixiotidi ixiotidi added the bug label Apr 11, 2025
@vloncar
Copy link
Contributor

vloncar commented Apr 11, 2025

Total params: 482849 (1.84 MB)

Not gonna work. A much, much smaller model may work with io_stream. And a much, much, much smaller model may work with io_parallel. Each "much" being an order of magnitude. Docs are your friend, consult them 😉 .

@ixiotidi
Copy link
Author

Total params: 482849 (1.84 MB)

Not gonna work. A much, much smaller model may work with io_stream. And a much, much, much smaller model may work with io_parallel. Each "much" being an order of magnitude. Docs are your friend, consult them 😉 .

Hi @vloncar, thanks for your reply, I get that it might not be synthesizable, however my issue is that the model doesn't finish the compile step of hls4ml (haven't called build yet), it generates the files I get the done flag and then it crashes the g++ compiler. I would assume that I can still get some HLS project out even if it's too big, no? :)

@vloncar
Copy link
Contributor

vloncar commented Apr 11, 2025

Because it generates huge source files in io_parallel (you didn't pass the option, so it defaults to that), and the compiler simply fails. Looks at the memory spike on your machine when you run the compile command. I think it will compile if you use io_stream, it will be a long process but it should work on a machine with the normal amount of memory. But then when you try to run predictions you'll see how slow ap_fixed truly is :-)

@ixiotidi
Copy link
Author

@vloncar thanks for the reply, indeed with io_stream it did compile, right away an wasn't even long. A bit puzzled though to understand what the "limit" is because it kind of crashed around 20GB of RAM which is not much on the machine I'm using. :)

@vloncar
Copy link
Contributor

vloncar commented Apr 11, 2025

Partly, it is the difference in algorithm behind this. To achieve the best performance, in io_parallel the im2col transformation of the convolution is manually unrolled with specific instructions for each pixel (see file firmware/nnet_utils/nnet_code_gen.h in your output dir). Unfortunately this works better than a proper implementation with a loop and a directive to unroll because the compiler is not smart enough. Then as you can imagine, this becomes quite large quite quick. The "limit" isn't clear, it depends on the convolution layer and the internals of the compiler. Standard doesn't insist on the maximum length of a single line, just sets a lower limit to 65k which no compiler enforces. We didn't fully explore it since we know it not going to be synthesizable anyway, and advise people to try smaller models. io_stream uses a different algorithm to process things sequentially so there's no codegen involved and no such issues. In io_stream you may get long compile time for the first phase of synthesis step when you call build(), because static arrays of weights are used. But during compile() which is entirely local and doesn't use Vivado/Vitis at all they are read from a file so no issues happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants