You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Multiple devices support
- Selection of device by BDF
- OpenCL error checking
- Automatic memory bank association
- Inferences validation
- Improved command line parameters
- Improved debug output
- Dummy buffer copy to avoid benchmarking buffer allocation time
- Removal of mutexes preventing buffer copies overlap with kernel executions on the same CU with multiple workers
- Documentation
Copy file name to clipboardExpand all lines: docs/backend/accelerator.rst
+25-2Lines changed: 25 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -132,11 +132,34 @@ Once the project is generated, it possible to run manually the build steps by us
132
132
133
133
It is also possible to run the full build process by calling ``make`` without any target. Modifications to the ``accelerator_card.cfg`` file can be done manually before running the build process (e.g., to change the clock period, or add addition ``.xo`` kernel to the build).
134
134
135
-
The generated host code application and the xclbin file can be executed as such:
135
+
Host code
136
+
=========
137
+
138
+
Once built, the host program can be run to load the board and perform inferences:
139
+
140
+
.. code-block:: Bash
141
+
142
+
./host
143
+
144
+
By defaut, all Computing Unit (CU) on all compatible devices will be used, with 3 worker thread per CU.
145
+
146
+
The generated host code application support the following options to tweak the execution:
147
+
148
+
* ``-d``: device BDF to use (can be specified multiple times)
149
+
* ``-x``: XCLBIN path
150
+
* ``-i``: input feature file
151
+
* ``-o``: output feature file
152
+
* ``-c``: maximum computing units count to use
153
+
* ``-n``: number of worker threads to use
154
+
* ``-r``: number of repeatition of the input feature file (For artificially increasing the data size for benchmarking purpose)
155
+
* ``-v``: enable verbose output
156
+
* ``-h``: print help
157
+
158
+
The following example shows how to limit on only one device, one CU, and on worker thread:
0 commit comments