Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b7602c3
Initial version of VitisAccelerator backend:
axiotisk Jun 1, 2023
6f596d2
fixing discrepancies post-merge
alex-yang-upenn May 16, 2024
448f342
reverting unnecessary changes
alex-yang-upenn May 16, 2024
bc7d5b0
final adjustments
alex-yang-upenn May 16, 2024
f8d74cf
minor fixes and testing notebook
alex-yang-upenn May 17, 2024
f225845
minor fixes
alex-yang-upenn May 17, 2024
b03e603
Updated host code and added more board support
alex-yang-upenn May 17, 2024
3d04799
cleaned up c++ code generation and added build functionality
alex-yang-upenn May 19, 2024
2e732d0
Added ability to use numpy array as I/O + CNN fixes
alex-yang-upenn May 24, 2024
a42816a
Optimizations for reading dat + copytree bugfix
alex-yang-upenn May 25, 2024
5d1c457
updated testing notebook
alex-yang-upenn May 25, 2024
ec58521
Cleaned-up host code + improved .dat generation
alex-yang-upenn May 28, 2024
6a5626c
fixed testing notebook
alex-yang-upenn May 28, 2024
08fbbe2
build() signature alignment + xcl update + write_host() overwrite
alex-yang-upenn Jun 3, 2024
402016f
Fix VCK5000 part definition
Jun 13, 2024
2dcad61
Documentation draft
axiotisk Jun 14, 2024
c657a48
Default directives + HLS Clock control
alex-yang-upenn Jun 19, 2024
a00f7a2
implementing hw quant option
alex-yang-upenn Jun 28, 2024
2be1e4f
Update makefile
Jun 13, 2024
2a0a52e
Fix vck5000 detection in makefile
Jun 13, 2024
76e3bc7
Remove messageDb from config file now that it is handled in makefile
Jun 13, 2024
ee47d33
build dir name + versal packaging + ultraclean
alex-yang-upenn Jun 21, 2024
950c0ed
minor fixes
alex-yang-upenn Jun 28, 2024
376c982
Fix Makefile template and Makefile generation
Jul 1, 2024
70c092e
Python black formating
Jul 1, 2024
b89ee78
Apply pre-commit suggested changes (formating)
Jul 1, 2024
ab9fc69
Update manifest and remove developpement requirement.txt
Jul 1, 2024
7137cc0
Update documentation.
Jul 1, 2024
90aee77
Documentation update
alex-yang-upenn Jul 1, 2024
1dba672
fixing build() behavior + documentation
alex-yang-upenn Jul 1, 2024
884aa3f
Whitespace cleanup
Jul 2, 2024
7fe12ed
Fix missing parameter in create_initial_config() (due to rebase)
Jul 2, 2024
aabea2d
Remove duplication in documentation
axiotisk Jul 4, 2024
d5e7619
Fix pre-commit
axiotisk Jul 5, 2024
4936c55
fixing spacing in generated code
alex-yang-upenn Jul 6, 2024
7c288bc
Fix typo
Jul 9, 2024
7a27c89
Update bulild():
Jul 9, 2024
cb98a48
Add a target parameter to hardware_predict()
Jul 9, 2024
ac5cd60
Update documentation.
Jul 10, 2024
737d750
Setup emu in Makefile and edit tb_input_features in host
axiotisk Jul 19, 2024
e340f79
Backend and Makefile fixes for emulation
axiotisk Jul 23, 2024
9b57d85
Update host code for clarity & better data handling
alex-yang-upenn Aug 2, 2024
83dd02b
Allowing flexibility with platforms
alex-yang-upenn Aug 7, 2024
98a40da
VitisAccelerator Host code refactor:
Jan 12, 2025
dc4df5c
fix(vitis_backend.py): command variable not initialized on hardware_p…
Djokzer Jun 4, 2025
8a43364
feat[shared_library]: started a predict function in host code
Djokzer Jun 18, 2025
2313999
feat[shared_library]: Can load test data from python to the host code
Djokzer Jun 19, 2025
cc919c7
feat[shared_library]: shared library works in python but not with jup…
Djokzer Jun 19, 2025
f3ca189
feat[hardware_predict]: add shared lib way in the hardware predict fu…
Djokzer Jun 19, 2025
e219e21
fix[hardware_predict]: Check if shared lib exists
Djokzer Jun 23, 2025
740396b
fix[hardware_predict]: Fixed hardcoded output directory
Djokzer Jun 23, 2025
d3529d1
feat[hardware_predict] : add debug print
Djokzer Jun 23, 2025
ca92022
fix[debug]: fixed a print error
Djokzer Jun 23, 2025
f8961bc
fix[shared_library] : Print on a file, because print don't work with …
Djokzer Jun 25, 2025
c688a5d
fix[hardware_predict]: Force x array to float64
Djokzer Jun 25, 2025
e38ad31
Fix[hardware_predict]: Fixed working directory changing
Djokzer Jun 25, 2025
638bc57
feat[shared_lib]: small opti for input buffer to vector
Djokzer Jun 25, 2025
06bce48
fix[hardware_predict]: file based makefile, fix make run
Djokzer Jun 25, 2025
25cc3cc
feat[platforms]: more user friendly platform selection
Djokzer Jul 15, 2025
d490466
fix[platforms] : fixed typo on print
Djokzer Jul 15, 2025
818d04f
fix[hardware_predict]: Fixed rebuild when hardware predict, changed d…
Djokzer Jul 22, 2025
4a30e67
fix[hardware_predict]: fixed x input for file based
Djokzer Jul 22, 2025
0d5d61c
Improves commande line help message
Jul 29, 2025
0d4b407
Enable dynamic batch size for io_stream wrapper
Jul 29, 2025
d74c3b7
Enable dynmamic batch size for io_parallel
Jul 29, 2025
1e59bb5
fix[io_parralel]: fixed dynamic batchsize for io_parallel
Djokzer Aug 4, 2025
45b6c60
fix[io_parallel]: override nnet utils with vitis backend
Djokzer Aug 18, 2025
5eb0247
docs[vitis_accel]: updated docs
Djokzer Aug 18, 2025
d0496d6
docs[vitis_accel]: updated example
Djokzer Aug 18, 2025
6cff973
docs[vitis_accel]: add platform selection
Djokzer Aug 25, 2025
b964d59
docs[vitis_accel]: add note about io_type
Djokzer Aug 25, 2025
abf0669
docs[vitis_accel]: formatting and typo fix
Djokzer Aug 25, 2025
50496be
Update doc
Aug 26, 2025
bedef86
Fix pre-commit formating
Aug 26, 2025
b0979dc
fix[feature_check]: align warning with vitis backend
Djokzer Aug 27, 2025
5d5854e
Make HGQ aware of VitisAccelerator backend.
Sep 8, 2025
c059020
fix[vitis_accel_writer]: fix issue with input shape following rebase
Glegouic Mar 30, 2026
1dbca20
fix[da4ml]: made da4ml aware of VitisAccelerator backend
Glegouic Mar 30, 2026
7100210
fix[io_stream]: fix insufficient reading of output stream
Glegouic Mar 30, 2026
6ab94d4
refactor[vitis_accelerator_writer] : cleaned up override function call
Djokzer Apr 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ recursive-include hls4ml *.py
recursive-include hls4ml/contrib *
global-exclude .git .gitmodules .gitlab-ci.yml *.pyc
include hls4ml/backends/vivado_accelerator/supported_boards.json
include hls4ml/backends/vitis_accelerator/supported_boards.json
include hls4ml/backends/vitis_accelerator/vivado_directives.json
153 changes: 153 additions & 0 deletions docs/backend/accelerator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,156 @@ The ``predict`` method will send the input data to the PL and return the output

nn = NeuralNetworkOverlay('hls4ml_nn.bit', X_test.shape, y_test.shape)
y_hw, latency, throughput = nn.predict(X_test, profile=True)

================
VitisAccelerator
================

The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project targeting `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
The Vitis accelerator backend has been tested with the following boards:

* `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_
* `Alveo u55c <https://www.xilinx.com/products/boards-and-kits/alveo/u55c.html>`_
* `Alveo u250 <https://www.xilinx.com/products/boards-and-kits/alveo/u250.html>`_
* `Versal vck5000 <https://www.xilinx.com/products/boards-and-kits/vck5000.html>`_

Options
=======

As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:

* ``num_kernel``: Number of kernel instance to implement in the hardware architecture.
* ``num_thread``: Number of host threads used to exercise the kernels in the host application.
* ``batchsize``: Number of samples to be processed in a single kernel execution.

Additionaly, the backend proposes the following options to customize the implementation:

* ``board``: The target board, must match one entry in ``supported_boards.json``.
* ``clock_period``: The target clock period in ns.
* ``hw_quant``: Is arbitrary precision quantization performed in hardware or not. If True, the quantization is performed in hardware and float are used at the kernel interface, otherwise it is performed in software and arbitrary precision types are used at the interface. (Defaults to ``False``).
* ``vivado_directives``: A list of strings to be added under the ``[Vivado]`` section of the generated ``accelerator_card.cfg`` link configuration file. Can be used to add custom directives to the Vivado project.


The backend also supports the global option ``io_type``, which also controls how input/output data is transferred between the FPGA memory banks and the model.

**Note:** ``io_stream`` may fail for very large inputs, while ``io_parallel`` can have issues with large convolutional models.

Platform selection
==================

The Vitis System Design Flow requires a platform (``.xpfm``) describing the hardware and runtime environment.
The backend always retrieves all installed platforms using ``platforminfo``.

* If a ``platform`` argument is provided, it will try to use that platform.
* If no ``platform`` is given, the backend will use the ``board`` argument to select a default platform.


Kernel wrapper
==============

To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.

The ``VitisAccelerator`` backend automatically generates a kernel wrapper, an host code example, and a Makefile to build the project.

**Note:** The current implementation of the kernel wrapper code is oriented toward throughput benchmarking and not general inference uses (See :ref:`here<hardware_predict-method>`). It can nonetheless be further customized to fit specific applications.


Build workflow
==============

At the call of the ``build`` method, the following option affect the build process:

* ``reset``: If True, clears files generated during previous build processes (Equivalent to ``make clean`` in build folder).
* ``target``: Can be one of ``hw``, ``hw_emu``, ``sw_emu``, to define which build target to use (Default is ``hw``).
* ``debug``: If True, compiles the c++ host code and the HLS in debug mode.

Once the project is generated, it possible to run manually the build steps by using one of the following ``make`` targets in the generated project directory:

* ``host``: Compiles the host application.
* ``hls``: Produces only the kernel's object file.
* ``xclbin``: Produces only the kernel's .xclbin file.
* ``clean``: Removes all generated files.
* ``run``: Run the host application using the .xclbin file and the input data present in ``tb_data/tb_input_features.dat``.

It is also possible to run the full build process by calling ``make`` without any target. Modifications to the ``accelerator_card.cfg`` file can be done manually before running the build process (e.g., to change the clock period, or add addition ``.xo`` kernel to the build).

Hardware Inference workflow
===========================

The host code can also be used for inference directly in a Python script through the ``hardware_predict`` method :

* ``target``: Can be one of ``hw``, ``hw_emu``, ``sw_emu``, to define which build target to use (Default is ``hw``).
* ``debug``: If True, uses the c++ host code compiled in debug mode.
* ``profilingRepeat``: Number of times to repeat the inference for profiling (Default is -1, no repeat).
* ``method``: Can be ``file`` or ``lib``, to define how the host code is called (Default is ``lib``).
If ``file``, the host code is called as a separate process (input and output are passed through files), if ``lib``, the host code is called as a shared library.

Example
=======

The following example is a modified version of `hsl4ml example 7 <https://github.com/fastmachinelearning/hls4ml-tutorial/blob/master/part7_deployment.ipynb>`_.

.. code-block:: Python

import hls4ml
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config=config,
output_dir='model_3/hls4ml_prj_vitis_accel',
backend='VitisAccelerator',
board='alveo-u55c',
num_kernel=4,
num_thread=8,
batchsize=8192,
hw_quant=False,
vivado_directives=["prop=run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=Explore"]
)

build = True # Change to False to skip the build step
if build:
hls_model.compile()
hls_model.build()

y = hls_model.predict_hardware(x) # Limited to batchsize * num_kernel * num_thread for now

C++ Host code
=============

This section describes the C++ host application provided alongside the Python interface.
It serves as a low-level example of how to interact with the generated model directly from C++, and is particularly useful for benchmarking and performance evaluation on FPGA hardware.

Once built, the host program can be executed to load the FPGA and run inferences:

.. code-block:: Bash

./host

Compared to the Python ``hardware_predict`` method, this C++ host code offers a more efficient way to benchmark execution time on the board.

If the FPGA contains multiple Computing Units (CUs - instances of the model), the program can take advantage of multithreading to access them in parallel.
It also supports multiple threads per CU to increase throughput by overlapping data transfer, computation, and result retrieval.
The batch size can be set dynamically for each inference.

The program reads input from an ASCII file containing space-separated values, one line per model input.
By default, the Python code generates input and reference (golden) output files if test data are provided when creating the model.
If not, you can generate such files manually, for example using ``numpy.savetxt``.

The generated host code application support the following options to tweak the execution:

* ``-d``: device BDF to use (can be specified multiple times)
* ``-x``: XCLBIN path (default to the relative path of generated XCLBIN)
* ``-b``: Batch size (default to value specified during model creation)
* ``-i``: input feature file
* ``-o``: output feature file
* ``-c``: maximum computing units count to use
* ``-n``: number of worker threads per CU
* ``-r``: Number of repetitions of the input feature file (useful for artificially increasing dataset size during benchmarking)
* ``-v``: enable verbose output
* ``-h``: print help

By default, all available CUs on all compatible devices will be used, with three worker threads per CU.
The following command limits execution to a single device, one CU, and one worker thread:

.. code-block:: Bash

./host -d 0000:c1:00.1 -c 1 -n 1
21 changes: 21 additions & 0 deletions docs/ir/modelgraph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,24 @@ The trace method is an advanced version of the ``predict`` method. It's used to

#We also support a similar function for keras
keras_trace = hls4ml.model.profiling.get_ymodel_keras(keras_model, X)

----

.. _hardware_predict-method:

``hardware_predict`` method
===========================

A specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.

**Note:** The host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications. Currently, the maximum number of input samples that can be processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size, the additional samples will be ignored.

An optional ``target`` argument can be used to specify the target emulation mode (``hw``, ``sw_emu``, ``hw_emu``) to run the project on. The default is ``hw``.

.. code-block:: python

# Suppose that you already have input array X
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the
# .xclbin file is successfully created, before using hardware_predict

y = hls_model.hardware_predict(X)
4 changes: 4 additions & 0 deletions hls4ml/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,23 @@
from hls4ml.backends.plugin_loader import load_backend_plugins
from hls4ml.backends.quartus.quartus_backend import QuartusBackend
from hls4ml.backends.symbolic.symbolic_backend import SymbolicExpressionBackend
from hls4ml.backends.vitis_accelerator.vitis_accelerator_config import VitisAcceleratorConfig # noqa: F401
from hls4ml.backends.vivado.vivado_backend import VivadoBackend
from hls4ml.backends.vivado_accelerator.vivado_accelerator_backend import VivadoAcceleratorBackend
from hls4ml.backends.vivado_accelerator.vivado_accelerator_config import VivadoAcceleratorConfig # noqa: F401

from hls4ml.backends.catapult.catapult_backend import CatapultBackend # isort: skip

from hls4ml.backends.vitis.vitis_backend import VitisBackend # isort: skip
from hls4ml.backends.vitis_accelerator.vitis_accelerator_backend import VitisAcceleratorBackend # isort: skip



def _register_builtin_backends():
register_backend('Vivado', VivadoBackend)
register_backend('VivadoAccelerator', VivadoAcceleratorBackend)
register_backend('Vitis', VitisBackend)
register_backend('VitisAccelerator', VitisAcceleratorBackend)
register_backend('Quartus', QuartusBackend)
register_backend('Catapult', CatapultBackend)
register_backend('SymbolicExpression', SymbolicExpressionBackend)
Expand Down
2 changes: 1 addition & 1 deletion hls4ml/backends/fpga/passes/hgq_proxy_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def generate_mask_fn(
) -> str:
"""Generate heterogenous quantization mask function, ONLY works for IOType=io_parallel"""
assert k.shape[0] == b.shape[0] == i.shape[0] == 1
assert backend.lower() in ('oneapi', 'quartus', 'vivado', 'vitis'), f'Backend {backend} not tested'
assert backend.lower() in ('oneapi','quartus', 'vivado', 'vitis', 'vitisaccelerator'), f'Backend {backend} not tested'
Ks, Bs, Is = k[0], b[0], i[0]
Ks, Bs, Is = np.broadcast_to(Ks, shape), np.broadcast_to(Bs, shape), np.broadcast_to(Is, shape)
Ks, Bs, Is = Ks.ravel(), Bs.ravel(), Is.ravel()
Expand Down
Empty file.
Empty file.
35 changes: 35 additions & 0 deletions hls4ml/backends/vitis_accelerator/passes/feature_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from hls4ml.model.optimizer import OptimizerPass


class ValidateConvImplementation(OptimizerPass):
def match(self, node):
return 'Conv' in node.class_name

def transform(self, model, node):
if node.get_attr('implementation', 'linebuffer') == 'encoded':
print(
f'WARNING: "Encoded" implementation in "{node.name}" ({node.class_name}) is not supported in Vitis backend. '
'Switching to "LineBuffer" implementation.'
)
node.set_attr('implementation', 'linebuffer')


class ValidateStrategy(OptimizerPass):
_resource_layer_cls = ['Conv1D', 'Conv2D', 'Dense']

def match(self, node):
is_resource_layer = len([layer_cls for layer_cls in self._resource_layer_cls if layer_cls in node.class_name]) > 0
is_resource_strategy = node.model.config.is_resource_strategy(node)

return is_resource_layer and is_resource_strategy

def transform(self, model, node):
n_in, _ = model.config.backend.get_layer_mult_size(node)
rf = node.get_attr('reuse_factor')
if rf > n_in and rf % n_in > 0:
print(
f'WARNING: "Resource" strategy in "{node.name}" ({node.class_name}) may have suboptimal QoR in Vitis '
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an issue? For the regular Vitis backend Resource now works fine.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I checked and the regular Vitis backend still has this warning in its feature_check. To keep consistency between backends, I just updated the code here to replicate the current Vitis behavior.

'backend due to use of "urem" cores in Vitis HLS <= 2022.1.\n'
'Consider using a different ReuseFactor or switching to "Latency" strategy if using older versions '
'of Vitis HLS.'
)
26 changes: 26 additions & 0 deletions hls4ml/backends/vitis_accelerator/supported_boards.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"alveo-u55c": {
"board_type": "alveo",
"part": "xcu55c-fsvh2892-2L-e",
"platform": ["xilinx_u55c_gen3x16_xdma_3_202210_1"],
"memory": {"type": "hbm", "channels": 32, "capacity": 16}
},
"alveo-u50": {
"board_type": "alveo",
"part": "xcu50-fsvh2104-2-e",
"platform": ["xilinx_u50_gen3x16_xdma_5_202210_1"],
"memory": {"type": "hbm", "channels": 32, "capacity": 8}
},
"alveo-u250": {
"board_type": "alveo",
"part": "xcu250-figd2104-2L-e",
"platform": ["xilinx_u250_xdma_201830_2"],
"memory": {"type": "ddr", "channels": 4, "capacity": 64}
},
"vck5000": {
"board_type": "versal",
"part": "xcvc1902-vsvd1760-2MP-e-S",
"platform": ["xilinx_vck5000_gen4x8_qdma_2_202220_1"],
"memory":{"type": "ddr", "channels": 3, "capacity": 12}
}
}
Loading