DynaCell Metrics #242

edyoshikun · 2025-04-21T22:07:53Z

This PR adds:

dataloader for getting inference vs experimental fluoresnce metrics
intensity metrcs
biological metrics
segmentation metrics
logic to run metrics via VisCy CLI

…ataset

…. one for pred one for target

ziw-liu · 2025-05-22T23:48:08Z

viscy/data/segmentation.py

+        if self.dtype is not None:
+            _pred = _pred.astype(self.dtype)
+            _target = _target.astype(self.dtype)
+        pred = torch.from_numpy(_pred.astype(self.dtype))
+        target = torch.from_numpy(_target.astype(self.dtype))


Why do this twice?

ziw-liu · 2025-05-22T23:49:12Z

applications/DynaCell/demo_script.py

@@ -0,0 +1,178 @@
+"""
+This script is a demo script for the DynaCell application.
+It loads the ome-zarr 0.4v format, calculates metrics and saves the results as csv files


Suggested change

It loads the ome-zarr 0.4v format, calculates metrics and saves the results as csv files

It loads the ome-zarr v0.4 format, calculates metrics and saves the results as csv files

ziw-liu · 2025-05-22T23:51:18Z

applications/DynaCell/demo_script.py

+
+csv_database_path = Path(
+    "/home/eduardo.hirata/repos/viscy/applications/DynaCell/dynacell_summary_table.csv"
+).expanduser()


Path.home() / "rel/path" is likely what you want.

ziw-liu · 2025-05-22T23:56:31Z

viscy/data/dynacell.py

+        return sample
+
+
+class DynaCellDataBase:


Capital XxxBase reads like the name of a base class. Also this is not a databse with a runtime, but a dataframe normalizer.

ziw-liu · 2025-05-22T23:57:11Z

viscy/data/dynacell.py

+
+        # Extract zarr store paths
+        self._filtered_db["Zarr path"] = self._filtered_db["Path"].apply(
+            lambda x: Path(*Path(x).parts[:-3])


Suggested change

lambda x: Path(*Path(x).parts[:-3])

lambda x: x.parent.parent.parent

ziw-liu · 2025-05-22T23:57:59Z

viscy/data/dynacell.py

+            "zarr_path": self.zarr_paths[idx],
+            "cell_type": self.cell_types_per_store[idx],
+            "organelle": self.organelles_per_store[idx],
+            "infection_condition": self.infection_per_store[idx],


Why are these first converted to lists?

ziw-liu · 2025-05-23T00:01:06Z

viscy/data/dynacell.py

+            pred_data = self.pred_database[i]
+
+            # Ensure target and prediction metadata match
+            self._validate_matching_metadata(target_data, pred_data, i)


How long is this loop?

for now we don't parallelize. Are you thinking on spitting it as batches?

I'm trying to understand the size of the loop. If it's long-running then vectorizing with pandas operations could be helpful.

ziw-liu · 2025-05-23T00:02:41Z

viscy/data/dynacell.py

+        # Check cell type
+        if target_data["cell_type"] != pred_data["cell_type"]:
+            raise ValueError(
+                f"Cell type mismatch at index {idx}: "
+                f"target={target_data['cell_type']}, pred={pred_data['cell_type']}"
+            )
+
+        # Check organelle
+        if target_data["organelle"] != pred_data["organelle"]:
+            raise ValueError(
+                f"Organelle mismatch at index {idx}: "
+                f"target={target_data['organelle']}, pred={pred_data['organelle']}"
+            )
+
+        # Check infection condition
+        if target_data["infection_condition"] != pred_data["infection_condition"]:
+            raise ValueError(
+                f"Infection condition mismatch at index {idx}: "
+                f"target={target_data['infection_condition']}, pred={pred_data['infection_condition']}"
+            )


This can be a loop over the string keys.

viscy/data/dynacell.py

ziw-liu · 2025-05-23T00:04:19Z

viscy/data/segmentation.py

+    A PyTorch Dataset providing paired target and prediction images / volumes from OME-Zarr
+    datasets.
+
+    Attributes:


Please use numpy-style docstring: #133

ziw-liu · 2025-05-23T00:05:51Z

viscy/data/typing.py

+
+    cell_type: str
+    organelle: str
+    infection: str


This key is not the same as the CSV column name?

ziw-liu · 2025-05-23T00:07:21Z

viscy/trainer.py

+            Batch size for processing, by default 1
+        num_workers : int, optional
+            Number of workers for data loading, by default 0
+        version : str, optional


What version is this?

ziw-liu · 2025-05-23T00:08:47Z

viscy/trainer.py

+        target_database: Path,
+        pred_database: Path,
+        output_dir: Path,
+        method: str = "intensity",


Can this be all supported literals rather than a free string? Then lightning CLI can check the input type.

yes, we will update this once we have all the possible ones. We might make it it's own list for the organelles. thanks

ziw-liu · 2025-05-23T00:30:54Z