Skip to content

[gh-pages] RA: Web documentation for the new "deep learning-based human detection exercise" #3140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 27, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _pages/exercises.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ feature_row:
- image_path: /assets/images/exercises/human_detection/human_detection_teaser.jpeg
alt: "Human Detection"
title: "Human Detection"
excerpt: "Develop a DL human detection model to perform inference and benchmarking in real time."
excerpt: "Deep learning-based Human Detection Exercise."
url: "/exercises/ComputerVision/human_detection"
status: "prototype"
order: 2;
Expand Down
77 changes: 0 additions & 77 deletions _pages/exercises/ComputerVision/dl_digit_classifier.md

This file was deleted.

138 changes: 66 additions & 72 deletions _pages/exercises/ComputerVision/human_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,111 +19,92 @@ gallery:
alt: "Human Detection"
title: "Human Detection"

youtubeId1: vn4ahq8mElg
youtubeId1: XC4yJYnX7y4

---

# Human Detection Exercise using Deep Learning
<!-- title -->
# Deep learning-based Human Detection Exercise

A Human Detection Exercise to identify the presence of humans and identification of the rectangular boundary around them. Apart from the live and video inference features, the exercise also includes model benchmarking and model visualization.
The user is expected to upload a Deep Learning model which fits the required input and output specifications for inference. The input model is supposed to be in the ONNX format. We provide all the guidance in the exercise docs to the user, this includes everything from fine tuning pre built object detection models in different frameworks to its subsequent conversion to the ONNX format. For more information refer to the "Exercise Instructions" section below.

{% include gallery caption="Detection Example" %}
<!-- descriptions -->
<p style="text-align:justify;">The primary goal of the human detection exercise is to identify individuals in the video feed from a webcam and to draw a rectangular boundary around each person. This exercise supports live webcam video stream inference, allowing users to observe real-time human detection performance using their own trained models.</p>
<p style="text-align:justify;">Users are expected to upload a deep learning-based object detection model in the <a href="https://onnx.ai/" target="_blank" ><strong>ONNX (Open Neural Network Exchange)</strong></a> format. Users are encouraged to build and train their own human detection models using libraries such as <a href="https://pytorch.org/" target="_blank" ><strong>PyTorch</strong></a> or <a href="https://www.tensorflow.org/" target="_blank" ><strong>TensorFlow</strong></a>. After training, the model must be exported to the ONNX format to ensure compatibility with the exercise environment.</p>

<p style="text-align:justify;">After training, the model must be exported to the ONNX format to ensure compatibility with the exercise environment, and you must use the editor to write Python code that processes input from a live video feed, which is captured using your browser's webcam.</p>

## Launch Instructions
{% include gallery caption="Detection Example" %}

- There are two ways to run the exercise using web-template:

- Run the exercise with docker container
- Run it without container
<!-- Note Guide -->
<!-- <br/> -->

### Run with docker container
**Note**: If you haven't, take a look at the [user guide](https://jderobot.github.io/RoboticsAcademy/user_guide/#installation) to understand how the installation is made, how to launch a RoboticsBackend and how to perform the exercises.

- First you need to build the image. Then, you need to run a container.
- Note : We are currently facing problems connecting to the model visualizer via the docker execution. We will find a workaround and update the run command accordingly.

## Exercise API
- `GUI.getImage()` - to get the image. It can be None.
```python
while True:
image = GUI.getImage()
if image is not None:
# rest of the code.
```
git clone https://github.com/JdeRobot/RoboticsAcademy.git -b master
cd scripts
docker build -t image-name .
docker run -it --name=container_name -p 7164:7164 -p 2303:2303 -p 1905:1905 -p 8765:8765 -p 6080:6080 -p 1108:1108 --device /dev/video0:/dev/video0 jderobot/robotics-backend
```

- On the local machine navigate to 127.0.0.1:7164/ in the browser and choose the desired exercise.
- Click the connect button and wait for some time until an alert appears with the message Connection Established and button displays connected.
- The exercise can be used after the alert.
- It is necessary to map the port where the camera is located to the docker container.
- For ubuntu: The port to map will be in /dev/videoX , you should check the number where your camera is connected. For exaple /dev/video0
- For MacOs and Windows: A number of configurations must be made in order to map the ports. You can visit this [documentation](https://medium.com/@jijupax/connect-the-webcam-to-docker-on-mac-or-windows-51d894c44468) for it.
- The docker run command above includes the `--net=host` option. This is essential for opening the Model Visualizer in the exercise. This basically specifies Docker to use the host's network stack for the container.

### Run without docker container

The following dependencies should be pre-installed:
- Python 3 or later
- Python dependencies
- OpenCV
- onnxruntime
- WebsocketServer

- Clone the Robotics Academy repository to your local machine, switch to the master branch and head over to the Human_Detection exercise.
```
git clone https://github.com/JdeRobot/RoboticsAcademy.git && cd RoboticsAcademy && git checkout master
```

- Determine your machine dns server IP address which is generally in the form of **127.0.0.xx for Linux machine** by running this command
- `GUI.showImage(image)` - allows you to view a debug image or one with relevant information.

```bash
cat /etc/resolv.conf
<!-- Model Path -->
## File Path for Uploaded Model
The `model_path` holds the file path to the uploaded <strong>ONNX</strong> model.
```python
from model import model_path
```

- Inside `assets/websocket_address.js` file, change the **variable websocket_address** to the IP address found with the above command

- Start the host application along with the same IP address which is used for connection.

```bash
python exercise.py 127.0.0.xx
## Example Code
<!-- Load ONNX session -->
Recommended to load the ONNX model session
```python
# Import the required package
from model import model_path
import onnxruntime
import sys

# Load ONNX model
try:
ort_session = onnxruntime.InferenceSession(model_path)
except Exception as e:
print("ERROR: Model couldn't be loaded")
print(str(e))
sys.exit(1)
```

- Open the web template from `exercise.html`

- The page should says **[open]Connection established!**.Means it is working as expected.

**__NOTE:__** If you get **socket.error: [Errno 99] Cannot assign requested address** error,you need to check and pass the correct IP address.


## Exercise Instructions

- The uploaded ONNX format model should adhere to the input/output specifications, please keep that in mind while building your model.
- The user can train their model in any framework of their choice and export it to the ONNX format. Refer to this [**article**](https://docs.unity3d.com/Packages/com.unity.barracuda@1.0/manual/Exporting.html) to know more about exporting your model to the ONNX format.

### Model Input Specification

`input_shape` - The application code pre processes the input frame of shape (H, W, C) **TO** (1, 300, 300, 3) i.e (batch_size, H, W, C). This is a typical input shape for a `Conv2D` layer, so it is mandatory for your custom built model to have its first layer as `Conv2D`.
<p style="text-align:justify;">
`input_shape` - The application code pre processes the input frame of shape (H, W, C) <bold>TO</bold>> (1, 300, 300, 3) i.e (batch_size, H, W, C). This is a typical input shape for a `Conv2D` layer, so it is mandatory for your custom built model to have its first layer as `Conv2D`.
</p>

### Model Output Specification

Given 1 frame per batch, the model must return 4 tensor arrays in the following order:

`detection_boxes`: a list of bounding boxes. Each list item describes a box with top, left, bottom, right relative to the image size.

`detection_classes`: Array of detected classes. The class label must be **1** for humans.
`detection_classes`: Array of detected classes. The class label must be `1` for humans.

`detection_scores`: the score for each detection with values between 0 and 1 representing probability that a class was detected.

`num_detections`: the number of detections.

**Note**: Make sure to keep the class label for Humans while training your model as 1. Any object detected by your model with any other class label other than 1, will not be accounted for.

## Demo Model

A demo model has been provided inside the `Demo_Model` folder to test and play around with the application.

## Guide to Fine Tuning pre-existing models

Expecting the user to build the model from scratch would be an overkill, we have compliled and provided the revelevant guide for Fine Tuning pre exisiting models in TensorFlow and Pytorch. This includes everything from making the process of collecting data, preprocessing it and fine tuning with it on a pre-existing model architecture. Since the process of exporting models to ONNX format is different for different frameworks, we have also added so under the respective guide. We strongly suggest the user to go through the guide.

<p style="text-align:justify;">Expecting the user to build the model from scratch would be an overkill, we have compliled and provided the revelevant guide for Fine Tuning pre exisiting models in TensorFlow and Pytorch. This includes everything from making the process of collecting data, preprocessing it and fine tuning with it on a pre-existing model architecture. Since the process of exporting models to ONNX format is different for different frameworks, we have also added so under the respective guide. We strongly suggest the user to go through the guide.
</p>
### Pytorch

We have documented a guide for the PyTorch implementation. Please refer to it below for the detailed information.
Expand All @@ -141,28 +122,41 @@ This guide walks you through using the TensorFlow object detection API to train
## Exercise Features

* **Live Inference** - Perform live inference on the input feed from the web-cam.
* **Video Inference** - Perform inference on an uploaded video.
* **Model Benchmarking** - Evaluate the uploaded model by benchmarking against a ground truth dataset(Oxford Town Centre dataset).
* **Model Visualization** - Visualize and analyse the uploaded model to get a visual summary of the model, which will make it easier to identify trends and patterns, understand connections, and interact with your data.
* **Upload own model** - You can upload your own human detection model.
<!-- * **Video Inference** - Perform inference on an uploaded video. -->
<!-- * **Model Benchmarking** - Evaluate the uploaded model by benchmarking against a ground truth dataset(Oxford Town Centre dataset). -->
<!-- * **Model Visualization** - Visualize and analyse the uploaded model to get a visual summary of the model, which will make it easier to identify trends and patterns, understand connections, and interact with your data. -->


## Using the interface

* **Dropdown**: Use the dropdown menu to choose a specific mode. The required control buttons will pop-up accordingly.
<!-- * **Dropdown**: Use the dropdown menu to choose a specific mode. The required control buttons will pop-up accordingly.

* **Control Buttons**: The control buttons enable the control of the interface.
- **Live/Video/Benchmark buttons** - Send the uploaded model for inference to the core application.
- **Stop button**: Stops the inference process.
- **Visualizer button**: Opens the model visualizer.
- **Visualizer button**: Opens the model visualizer. -->

* **Browse and Upload buttons**: These are used to browse and upload the model and video. The control buttons for the specific mode will only activate once all the required files have been uploaded.

* **Frequency Slider**: This slider adjusts the running frequency of the iterative part of the model inference and benchmarking code. A smaller value implies the code runs less number of times. A higher value implies the code runs a large number of times. The Target Frequency is the one set on the Slider and Measured Frequency is the one measured by the computer (a frequency of execution the computer is able to maintain despite the commanded one). The student should adjust the Target Frequency according to the Measured Frequency.
<!-- * **Frequency Slider**: This slider adjusts the running frequency of the iterative part of the model inference and benchmarking code. A smaller value implies the code runs less number of times. A higher value implies the code runs a large number of times. The Target Frequency is the one set on the Slider and Measured Frequency is the one measured by the computer (a frequency of execution the computer is able to maintain despite the commanded one). The student should adjust the Target Frequency according to the Measured Frequency. -->

* **Debug Level**: This decides the debugging level of the application. A debug level of 1 implies no debugging at all. A debug level greater than or equal to 2 enables all the GUI functions working properly.

* **Pseudo Console**: This shows the error messages and a few intermediate outputs along the inference, benchmarking and file uploading process.
* **Pseudo Console**: This shows the error messages and a few intermediate outputs along the inference.

## Videos

{% include youtubePlayer.html id=page.youtubeId1 %}

<!-- contributors and maintainers -->
## Contributors
- Contributors: [David Pascual](https://github.com/dpascualhe), [Md. Shariar Kabir](https://github.com/codezerro) ,[Shashwat Dalakoti](https://github.com/shashwat623)
- Maintained by [David Pascual](https://github.com/dpascualhe), [Md. Shariar Kabir](https://github.com/codezerro)

<!-- Reference -->
## References
1. [https://onnx.ai/](https://onnx.ai/)
2. [https://pytorch.org/](https://pytorch.org/)
3. [https://www.tensorflow.org/](https://www.tensorflow.org/)
4. [https://debuggercafe.com/image-augmentation-using-pytorch-and-albumentations/](https://debuggercafe.com/image-augmentation-using-pytorch-and-albumentations/)
Loading