Skip to content

Commit 4518537

Browse files
jmitrevslaurilaatuJanFSchulte
authored
Beginnings of the oneAPI backend (#955)
* snapshot adding oneapi * fix reduce constexpr * further updates * update the bridge and testbench * fix issues discovered when compiling * update bridge writing files * build library (but not tested) * fix a bug in testbench * snapshot after some debugging * remove forgotten debug printing * add build * pre-commit fixes * fix more pre-commit * fix more pre-commit errors * snapshot of work before reworking types * Use using to decide array type, some preliminary updates * snapshot unifying types * fix the testbench and bridge * snapshot updating nnet_utils (not finished) * define array in nnet_types for oneAPI * fix parallel conv2d * add back the streaming versions of algs, most unconverted * tentatively complete streaming for dense but not functional * first version that compiles streaming * change how the pipe value type is extracted * fix pre-commit error * always treat elu as ELU class * fix batchnorm * snapshot towards fixing conv * snapshot fixing test for streaming * fix conv1d * fix conv2d * fix reshape and flatten for oneAPI * initial oneAPI tests * remove nnet_dense_compressed from oneAPI * add merge functionality (untested) * fix merge for oneAPI * fix merge for oneAPI (missing commit) * add zeropadding * standardize paralellization spelling * fix pointwise for oneAPI * remove references to quartus * more replace quartus with oneapi * snapshot on the way towards implementing pooling * fix io_stream pooling for oneAPI * add fix for Conv2DBatchnorm * accidentally committed CMakeLists.txt in my debug setup * reshaping, not fully tested * fix cloning of streams * fix pytest library loading * remove unused template * fix some activation bugs * fix the overwriting of directories in the pytest * update version of test repository * try to fix docker issue * bump hls4ml-testing tag to 0.5.2 * try not restricting tensorflow-model-optimizatoin * Update to 0.5.3 for testing * bump to docker image 0.5.4, suggested by Ben * fix pre-commit warning * dial down N_TESTS_PER_YAML to 4 * revert tensorflow-model-optimization change * fix issue of saving in "obsolete" h5 format * fix embedding for oneAPI * First attempt at adding RNNs to oneAPI * fix bug in array size * fix order or indices * make queues static in bridge * fix logic error in repack stream * changing the style, but functionally identical * update pointwise optimizer for oneAPI * add oneAPI to test_multi_dense.py * fix updating weight types * initial changes of templates, for testing * fix weight naming, product selection * make im2col the default; fix winograd size * fix up streaming dense and convolution * fix prelu, some batchnorm * fix weight array of exponential types * move ACExponentialPrecisionDefinition to oneapi_types * attempt to fix batchnorm and recurrent * fixed BatchNormalizationQuantizedTanhConfigTemplate template selection * fix embedding_stream * fix lstm and simple rnn * fix GRU * fix winograd, and also disable it by default * fix threshold name * split bn_quant to be backend-specific * add type inference to oneAPI * add oneAPI to pytorch tests * fix pooling with padding for oneAPI and Quartus * Compilation for larger models enabled by increasing -fconstexpr-steps * add oneapi clone tests; remove reduntand multi_clone test * remove some attributes to avoid overwrite warnings * make extra handling for oneAPI like others (as in PR #1067) * remove warnings for extra optimizers that are not scheduled on purpose * update parametrized activations * fix reference to alpha that had not been switched to param * add oneapi documentation * add parallelization factor to the attributes for oneAPI --------- Co-authored-by: Lauri Laatu <l.laatu@imperial.ac.uk> Co-authored-by: Jan-Frederik Schulte <jschulte@cern.ch>
1 parent 03096cf commit 4518537

File tree

101 files changed

+10764
-169
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+10764
-169
lines changed

docs/advanced/oneapi.rst

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
==============
2+
oneAPI Backend
3+
==============
4+
5+
The ``oneAPI`` backend of hls4ml is designed for deploying NNs on Intel/Altera FPGAs. It will eventually
6+
replace the ``Quartus`` backend, which should really have been called the Intel HLS backend. (The actual Quartus
7+
program continues to be used with IP produced by the ``oneAPI`` backend.)
8+
This section discusses details of the ``oneAPI`` backend.
9+
10+
The ``oneAPI`` code uses SYCL kernels to implement the logic that is deployed on FPGAs. It naturally leads to the
11+
accelerator style of programming. In the IP Component flow, which is currently the only flow supported, the
12+
kernel becomes the IP, and the "host code" becomes the testbench. An accelerator flow, with easier deployment on
13+
PCIe accelerator boards, is planned to be added in the future.
14+
15+
The produced work areas use cmake to build the projects in a style based
16+
`oneAPI-samples <https://github.com/oneapi-src/oneAPI-samples/tree/main/DirectProgramming/C%2B%2BSYCL_FPGA>`_.
17+
The standard ``fpga_emu``, ``report``, ``fpga_sim``, and ``fpga`` are supported. Additionally, ``make lib``
18+
produces the library used for calling the ``predict`` function from hls4ml. The ``compile`` and ``build`` commands
19+
in hls4ml interact with the cmake system, so one does not need to manually use the build system, but it there
20+
if desired.
21+
22+
The ``oneAPI`` backend, like the ``Quartus`` backend, only implements the ``Resource`` strategy for the layers. There
23+
is no ``Latency`` implementation of any of the layers.
24+
25+
Note: currently tracing and external weights (i.e. setting BramFactor) are not supported.
26+
27+
io_parallel and io_stream
28+
=========================
29+
30+
As mentioned in the :ref:`I/O Types` section, ``io_parallel`` is for small models, while ``io_stream`` is for
31+
larger models. In ``oneAPI``, there is an additional difference: ``io_stream`` implements each layer on its
32+
own ``task_sequence``. Thus, the layers run in parallel, with pipes connecting the inputs and outputs. This
33+
is similar in style to the `dataflow` implementation on Vitis, but more explicit. On the other hand, ``io_parallel``
34+
always uses a single task, relying on pipelining within the task for good performance. In contrast, the Vitis
35+
backend sometimes uses dataflow with ``io_parallel``.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424

2525
advanced/fifo_depth
2626
advanced/extension
27+
advanced/oneapi
2728
advanced/accelerator
2829
advanced/model_optimization
2930

hls4ml/backends/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from hls4ml.backends.backend import Backend, get_available_backends, get_backend, register_backend # noqa: F401
22
from hls4ml.backends.fpga.fpga_backend import FPGABackend # noqa: F401
3+
from hls4ml.backends.oneapi.oneapi_backend import OneAPIBackend
34
from hls4ml.backends.quartus.quartus_backend import QuartusBackend
45
from hls4ml.backends.symbolic.symbolic_backend import SymbolicExpressionBackend
56
from hls4ml.backends.vivado.vivado_backend import VivadoBackend
@@ -16,3 +17,4 @@
1617
register_backend('Quartus', QuartusBackend)
1718
register_backend('Catapult', CatapultBackend)
1819
register_backend('SymbolicExpression', SymbolicExpressionBackend)
20+
register_backend('oneAPI', OneAPIBackend)

hls4ml/backends/oneapi/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)