fastmachinelearning
diff --git a/‎docs/advanced/oneapi.rst
Lines changed: 35 additions & 0 deletions b/‎docs/advanced/oneapi.rst
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/index.rst
Lines changed: 1 addition & 0 deletions b/‎docs/index.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎hls4ml/backends/__init__.py
Lines changed: 2 additions & 0 deletions b/‎hls4ml/backends/__init__.py
Lines changed: 2 additions & 0 deletions
diff --git a/‎hls4ml/backends/fpga/passes/bn_quant.py renamed to ‎hls4ml/backends/catapult/passes/bn_quant.py b/‎hls4ml/backends/fpga/passes/bn_quant.py renamed to ‎hls4ml/backends/catapult/passes/bn_quant.py
diff --git a/‎hls4ml/backends/oneapi/__init__.py b/‎hls4ml/backends/oneapi/__init__.py
@@ -0,0 +1,35 @@
+==============
+oneAPI Backend
+==============
+
+The ``oneAPI`` backend of hls4ml is designed for deploying NNs on Intel/Altera FPGAs. It will eventually
+replace the ``Quartus`` backend, which should really have been called the Intel HLS backend. (The actual Quartus
+program continues to be used with IP produced by the ``oneAPI`` backend.)
+This section discusses details of the ``oneAPI`` backend.
+
+The ``oneAPI`` code uses SYCL kernels to implement the logic that is deployed on FPGAs. It naturally leads to the
+accelerator style of programming. In the IP Component flow, which is currently the only flow supported, the
+kernel becomes the IP, and the "host code" becomes the testbench. An accelerator flow, with easier deployment on
+PCIe accelerator boards, is planned to be added in the future.
+
+The produced work areas use cmake to build the projects in a style based
+`oneAPI-samples <https://github.com/oneapi-src/oneAPI-samples/tree/main/DirectProgramming/C%2B%2BSYCL_FPGA>`_.
+The standard ``fpga_emu``, ``report``, ``fpga_sim``, and ``fpga`` are supported. Additionally, ``make lib``
+produces the library used for calling the ``predict`` function from hls4ml. The ``compile`` and ``build`` commands
+in hls4ml interact with the cmake system, so one does not need to manually use the build system, but it there
+if desired.
+
+The ``oneAPI`` backend, like the ``Quartus`` backend, only implements the ``Resource`` strategy for the layers. There
+is no ``Latency`` implementation of any of the layers.
+
+Note:  currently tracing and external weights (i.e. setting BramFactor) are not supported.
+
+io_parallel and io_stream
+=========================
+
+As mentioned in the :ref:`I/O Types` section, ``io_parallel`` is for small models, while ``io_stream`` is for
+larger models. In ``oneAPI``, there is an additional difference: ``io_stream`` implements each layer on its
+own ``task_sequence``. Thus, the layers run in parallel, with pipes connecting the inputs and outputs. This
+is similar in style to the `dataflow` implementation on Vitis, but more explicit. On the other hand, ``io_parallel``
+always uses a single task, relying on pipelining within the task for good performance. In contrast, the Vitis
+backend sometimes uses dataflow with ``io_parallel``.
@@ -24,6 +24,7 @@
 
     advanced/fifo_depth
     advanced/extension
+    advanced/oneapi
     advanced/accelerator
     advanced/model_optimization
 
 
@@ -1,5 +1,6 @@
 from hls4ml.backends.backend import Backend, get_available_backends, get_backend, register_backend  # noqa: F401
 from hls4ml.backends.fpga.fpga_backend import FPGABackend  # noqa: F401
+from hls4ml.backends.oneapi.oneapi_backend import OneAPIBackend
 from hls4ml.backends.quartus.quartus_backend import QuartusBackend
 from hls4ml.backends.symbolic.symbolic_backend import SymbolicExpressionBackend
 from hls4ml.backends.vivado.vivado_backend import VivadoBackend
@@ -16,3 +17,4 @@
 register_backend('Quartus', QuartusBackend)
 register_backend('Catapult', CatapultBackend)
 register_backend('SymbolicExpression', SymbolicExpressionBackend)
+register_backend('oneAPI', OneAPIBackend)