-
Notifications
You must be signed in to change notification settings - Fork 8
Datasets
❗ Disclaimer - This does not yet work and will be added in a PR very soon.
Here you'll learn how to create your own dataset for training or evaluation. We divide the creation process into the following steps:
Labelling is done in Fiji (download here) using the Multi-point Tool
. To open this tool, right-click
on the Point Tool
. You should have the view shown below. Double-click
on the Icon to configure (if you want to remove label numbers, change point size, etc.).
After opening an image, each blob is labelled by clicking on the spot thereby adding a point. If you aren't happy with your selection, click+drag
to move point (wait until the cursor turns into a hand), option(alt)+click
to remove a point, or shift+A
to delete all current points.
Now all that's left to do is saving a file with labels into an empty directory of your liking (see the next step).
After the previous step, you should have a directory with labelled images only. Please download our Fiji export macro, unzip, and open / drag it into Fiji and execute as shown below (if the download link does not work, save the raw file).
After execution, you should have a labels
directory inside the select image directory.
Now, we can use deepBlink to convert the raw files into a single npz file ready for usage. Please run:
deepblink create --input INPUT --name NAME
Quick explanation on what is going on:
- The
NAME
is the name of your dataset. The generated file will have the nameNAME.npz
. Feel free to pass in a path to change the saving location. - The
INPUT
will take the directory with all images and alabels
subdirectory (as previously created) by default. - If you have a different file structure you can use the
--labels LABELS
flag to customise the path to the labels. TheINPUT
will then only be used as path to the images. - Change the ratio of train / validation / test split by using the
--validsplit VALIDSPLIT
or--testsplit TESTSPLIT
flags. Both are values between 0-1 corresponding to the percentages of images used (e.g.TESTSPLIT
of 0.2 will use 20% of images for testing). FirstTESTSPLIT
will be applied to the entire dataset. Then,VALIDSPLIT
will be applied to the remaining non-test data (i.e. aVALSPLIT
of 0.2 is slightly less than 20% depending on theTESTSPLIT
). - To resize images uniformly, use the
--size SIZE
flag. Note that for deepBlink to work properly, training images have to be square and a power of two (256, 512, 1024). So we don't train on duplicate images, any crops that would overlap with existing images are ignored. Similarly all images smaller than the specified size will not be included in the dataset.
This dataset npz file is nothing else than six numpy arrays bundled together. These arrays are x_train
, y_train
, x_valid
, y_valid
, x_test
, y_test
where x
denotes the input / images and y
the ground truth / labels. A npz file can be easily read in python using our deepblink.io.load_npz
function.