Skip to content

Conversation

calad0i
Copy link
Contributor

@calad0i calad0i commented Sep 8, 2025

Description

Provide partial fix to the parallel conv issue on Vitis:

  • Pointwise behavior is fixed. However, as I still can't recover the correct II behavior by tuning the pragmas back and forth, it is disabled except global n_partitions==1 case.
    • rf and pf isolated
    • rf, precision, trace flags are inherited now
    • use template instead of codegen for mostly static code
  • Standard conv:
    • pf give proper II now
    • Pipelining appears to work (2025.1 and 2023.2, but only tested on minimal examples now)

Each layer still fully blocks, and the dataflow pragma is not having the ideal behavior (i.e., kernel level piplining). At least, we have correct global II now and resource indeed decreases with lower pf. Vitis 2025.1 hangs at cosim but 2023.2 was fine, guess it is a bug on the vitis side. Didn't test with more version.

Since synth test CI is still not in place, I can't fire up a full regression test now. Please check if this PR breaks your use case.

Type of change

  • Other (performance tuning)

Tests

When will synthesis tests be ready?

Checklist

  • all

@JanFSchulte JanFSchulte added this to the v1.2.0 milestone Sep 8, 2025
@calad0i calad0i added the please test Trigger testing by creating local PR branch label Sep 8, 2025
@jmitrevs
Copy link
Contributor

jmitrevs commented Sep 8, 2025

Considering "rf, precision, trace flags are inherited now", doesn't this now give you warnings of them being overwritten? The reason they were not copied is because they were recreated again from the config, so if you copy them, they get overwritten when the config is parsed (and you get the warning). And you do have to recreate the config in order to make sure precisions are correctly transferred to the new pointwise, since they get generated again. Truthfully this behavior of hls4ml is a bit weird and should maybe be revisited, but just copying those values doesn't fix it.

def transform(self, model, node):
dim = node.__class__.__name__[-2:] # '1D' or '2D'
new_attrs = {k: v for k, v in node.attributes.items() if k not in ('trace', 'precision', 'reuse_factor')}
new_attrs = node.attributes.attributes.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in a comment in the conversation. I think trace, precision, and reuse_factor are regenerated no matter what, so the values you copy here get overriden (unless something has changed from before). It may be an hls4ml behavior worth revisiting and potentially revising, but I don't think this change fixes anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse_factor defined under Model key is not propagated as expected otherwise. Rm'ed warning if the update opr is trivial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see why it's not propagated properly? Shouldn't it come from a configuration in the first place?

// mult_params['n_partitions']
)
mult_params['n_out'] = int(node.get_attr('in_width') * node.get_attr('n_filt') / mult_params['reuse'])
mult_params['n_out'] = node.get_attr('in_width') * node.get_attr('n_filt') // mult_params['n_partitions']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are n_partitions and mult_params['n_partions'] different?

// Inlining helps reduce latency, but may also cause timing issues in some cases, use carefully.
//#pragma HLS INLINE recursive
// But without inlining Vitis HLS doesn't respect the parallelization factor config ¯\_(ツ)_/¯
// #pragma HLS PIPELINE II = CONFIG_T::reuse_factor * CONFIG_T::n_partitions
Copy link
Contributor

@jmitrevs jmitrevs Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe erase commented out pragmas throughout?

};

template <class data_T, class res_T, typename CONFIG_T>
class BatchedDenseForConv1D : public nnet::Conv1DKernel<data_T, res_T, CONFIG_T> {
Copy link
Contributor

@jmitrevs jmitrevs Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to say the purpose of this code (and also for the 1D version)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR allows modification from contributors. Feel free to add some if you find it necessary.

@JanFSchulte JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Sep 9, 2025
@jmitrevs jmitrevs removed the please test Trigger testing by creating local PR branch label Sep 22, 2025
@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label Sep 22, 2025
@calad0i calad0i added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Sep 23, 2025
typename CONFIG_T::weight_t weights[CONFIG_T::filt_width * CONFIG_T::n_chan * CONFIG_T::n_filt],
typename CONFIG_T::bias_t biases[CONFIG_T::n_filt]) {
//#pragma HLS INLINE region
// #pragma HLS INLINE recursive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already disabled, and now remains disabled but with a different action. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was enbaled and disabled when changing things around. Though, since inline region doesn't exist in vitis, the current one seems to be making a bit more sense...

node.get_attr('in_width') * node.get_attr('n_chan') * node.get_attr('filt_width') / mult_params['reuse']
if is_pointwise_parallel_latency and n_partitions == 1:
mult_params['n_in'] = (
node.get_attr('in_width') * node.get_attr('n_chan') * node.get_attr('filt_width') // n_partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically, this change now forces the parallelization to be based on n_partitions (in a sense it is PF) and not reuse_factor. I agree it is more intuitive, do we have to follow up in any docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior of replacing pf by rf is an undocumented change. This only aligns it back to the standard convolution behavior (albeit still not fully working). With the last pragma change maybe it is fixed, but will need to check again

if is_pointwise_parallel_latency:
mult_params['n_in'] = int(
node.get_attr('in_width') * node.get_attr('n_chan') * node.get_attr('filt_width') / mult_params['reuse']
if is_pointwise_parallel_latency and n_partitions == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this restrict the whole function to a single (fully parallel) case?

Also, ff we ever fix the HLS code to make it work for any combination of PF/RF, will this line be the only change we need to make on the python side? If so, maybe document it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does restrict it to the full parallel case, since otherwise the performance was worse than the standard implementation. Will need to check again with the last pragma change...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that operation instead of function would actually cause the synthesis to fail, not the pragma being ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In vivado_hls it fails, but in vitis it just ignored the pragma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants