Skip to content

Conversation

scottclowe
Copy link
Member

I had accidentally put the division by 255 in the wrong place when calculating the standard deviation of pixel intensities across the dataset.

Previous method (incorrect):

import numpy as np
from tqdm.autonotebook import tqdm
from bioscan_dataset import BIOSCAN5M

ds = BIOSCAN5M("~/Datasets/bioscan-5m", modality="image", split="all")

rgb_total = np.zeros((3,), dtype=np.uint64)
rgb_sq_total = np.zeros((3,), dtype=np.uint64)
n_pixel = 0
for x, y in tqdm(ds):
    n_pixel += x.width * x.height
    rgb_total += np.sum(x, axis=0).sum(axis=0)
    rgb_sq_total += np.sum(np.asarray(x) ** 2, axis=0).sum(axis=0)

rgb_mean = rgb_total / n_pixel / 255
rgb_std = np.sqrt((rgb_sq_total - rgb_total * rgb_total / n_pixel) / (n_pixel - 1) / 255)

Since we already have an accurate estimate of the mean pixel intensity, I reused this to do the two-step implementation of the std estimation.

import numpy as np
from tqdm.autonotebook import tqdm
from bioscan_dataset import BIOSCAN5M

ds = BIOSCAN5M("~/Datasets/bioscan-5m", modality="image", split="all")

RGB_MEAN = np.load("rgb_mean__bs5m.npy")   # Load previous calculation

scaled_sq_total = np.zeros((3,), dtype=np.float64)
n_pixel = 0
for x, y in tqdm(ds):
    n_pixel += x.width * x.height
    x = np.asarray(x) / 255
    x = x - RGB_MEAN
    scaled_sq_total += np.sum(np.square(x), axis=0).sum(axis=0)

rgb_std = np.sqrt(scaled_sq_total / (n_pixel - 1))

@scottclowe scottclowe merged commit 4b97453 into master Dec 7, 2024
4 checks passed
@scottclowe scottclowe deleted the bug_data-stdev branch December 7, 2024 17:12
@scottclowe scottclowe added the bug Something isn't working label Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant