A fast C++ implementation of TensorFlow Lite Unet on a Jetson Nano.
Once overclocked to 2015 MHz, the app runs at 11 FPS.
Special made for a Jetson Nano see Q-engineering deep learning examples 
Papers: https://arxiv.org/abs/1606.00915 
Training set: VOC2017 
Size: 257x257 
| CPU 2015 MHz | GPU 2015 MHz | CPU 1479 MHz | GPU 1479 MHZ | RPi 4 64os 1950 MHz | 
|---|---|---|---|---|
| 11 FPS | 9.1 FPS | 9 FPS | 8.3 FPS | 7.2 FPS | 
To run the application, you have to:
- TensorFlow Lite framework installed. Install TensorFlow Lite 
- Optional OpenCV installed. Install OpenCV 4.5 
- Code::Blocks installed. ($ sudo apt-get install codeblocks)
To extract and run the network in Code::Blocks 
$ mkdir MyDir 
$ cd MyDir 
$ wget https://github.com/Qengineering/TensorFlow_Lite_Segmentation_Jetson-Nano/archive/refs/heads/main.zip 
$ unzip -j master.zip 
Remove master.zip, LICENSE and README.md as they are no longer needed. 
$ rm master.zip 
$ rm README.md 
 
Your MyDir folder must now look like this: 
cat.jpg.mp4 
deeplabv3_257_mv_gpu.tflite 
TestUnet.cpb 
Unet.cpp
Run TestTensorFlow_Lite.cpb with Code::Blocks.
You may need to adapt the specified library locations in TestTensorFlow_Lite.cpb to match your directory structure.
With the #define GPU_DELEGATE uncommented, the TensorFlow Lite will deploy GPU delegates, if you have, of course, the appropriate libraries compiled by bazel. Install GPU delegates 
See the RPi 4 movie at: https://www.youtube.com/watch?v=Kh9DLMgCIIE

