Datasets

Introduction

Here you'll learn how to create your own dataset for training or evaluation. There are two main ways of labelling your custom data.

Completely manual - if your dataset is very heterogeneous and previous automated detection attempts have failed annotate each spot by clicking on its position. For this please do ✏️ Manual labelling, and 📤 Manual label export.
TrackMate based - if your dataset is slightly easier you can use TrackMate to speed up the process. Please follow the steps described in 🤖 TrackMate based labelling and export.

Both - no matter which method you chose above, the final step is described in 🗃️ Create dataset npz file. In general the model can only be as good as the dataset. Therefore, make sure to label as precisely as reasonably possible.

Manual labelling

Labelling is done in Fiji (download here) using the Multi-point Tool. To open this tool, right-click on the Point Tool. You should have the view shown below. Double-click on the Icon to configure (if you want to remove label numbers, change point size, etc.).

Multi-point Tool in Fiji

After opening an image, each blob is labelled by clicking on the spot thereby adding a point. If you aren't happy with your selection, click+drag to move point (wait until the cursor turns into a hand), option(alt)+click to remove a point, or shift+A to delete all current points.

Example image labelling in Fiji

Now all that's left to do is saving a file with labels into an empty directory of your liking (see the next step).

Manual label export

After the previous step, you should have a directory with labelled images only. Please download our Fiji export macro, unzip, and open / drag it into Fiji and execute as shown below (if the download link does not work, save the raw file).

Run Fiji export macro

After execution, you should have a labels directory inside the select image directory. You can now skip over 🤖 TrackMate based labelling and export and directly continue with 🗃️ Create dataset npz file.

TrackMate based labelling and export

There are two quick setup steps that have to be done:

Install TrackMate.
Remove Fiji's default micron scale by opening Set Scale in Analyze>Set Scale... and pressing the button Click to Remove Scale.

Now that everything's set up, do the following:

Open up one image for labelling at a time.
Open TrackMate in Plugins>Tracking>TrackMate.
Follow the dialog prompt until reaching the Settings for detector screen.
Play around with the Estimated blob diameter and Threshold settings and visualising with Preview (1) until most spots have been detected.
Falsely detected or not detected spots can be added / removed by hovering over the spot and pressing A / D respectively. Additionally, spots can be moved using Space. Additional commands for editing can be found in Section 3.2 "Creating spots one by one" of the TrackMate manual.
Once all desired spots have been detected, export spot coordinates by clicking on the 🔧 wrench icon (2) to open up Display Options and Shift+click on the Analysis (3) button.
This should open up a table titled All Spots statistics. Save this table using File>Save As... or Command+S. Make sure to rename the file to match the name of the image just labelled. Otherwise images and labels won't match!
Continue these steps for all images and save all labels in the same folder.

Run Fiji export macro

Congrats! Most of the work is done. Now all that's left is to create a npz file for training. Follow the instructions below.

Create dataset npz file

Now, we can use deepBlink to convert the raw files into a single npz file ready for usage. Please run:

deepblink create --input INPUT --name NAME

Quick explanation on what is going on:

The NAME is the name of your dataset. The generated file will have the name NAME.npz. Feel free to pass in a path to change the saving location.
The INPUT will take the directory with all images and a labels subdirectory (as previously created) by default.
If you have a different file structure you can use the --labels LABELS flag to customise the path to the labels. The INPUT will then only be used as path to the images.
Change the ratio of train / validation / test split by using the --validsplit VALIDSPLIT or --testsplit TESTSPLIT flags. Both are values between 0-1 corresponding to the percentages of images used (e.g. TESTSPLIT of 0.2 will use 20% of images for testing). First TESTSPLIT will be applied to the entire dataset. Then, VALIDSPLIT will be applied to the remaining non-test data (i.e. a VALSPLIT of 0.2 is slightly less than 20% depending on the TESTSPLIT).
To resize images uniformly, use the --size SIZE flag. Note that for deepBlink to work properly, training images have to be square and a power of two (256, 512, 1024). So we don't train on duplicate images, any crops that would overlap with existing images are ignored. Similarly all images smaller than the specified size will not be included in the dataset.

Additional insights

This dataset npz file is nothing else than six numpy arrays bundled together. These arrays are x_train, y_train, x_valid, y_valid, x_test, y_test where x denotes the input / images and y the ground truth / labels. A npz file can be easily read in python using our deepblink.io.load_npz function.