gustaveroussy · quentinblampey · Mar 11, 2025 · Mar 10, 2025 · Mar 11, 2025 · Mar 11, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,8 @@
 ### Changed
 - Using `density_prior = "uniform"` by default for Tangram (#174)
 - [`spatialdata_plot`](https://spatialdata.scverse.org/projects/plot/en/latest/index.html) is now a default dependency of Sopa
+- Use `prob_thresh=0.2` and `nms_thresh=0.6` by default in `stardist`
+- During segmentation, pixels outside of the ROI / tissue use the mean channels value instead of 0
 
 ## [2.0.2] - 2025-02-21
 

diff --git a/docs/tutorials/api_usage.ipynb b/docs/tutorials/api_usage.ipynb
@@ -205,7 +205,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "On a real tissue, the `'region_of_interest'` could look like this black line:\n",
+    "On a real tissue, the `'region_of_interest'` could look like the black line below. This plot can be done via [`spatialdata_plot`](https://spatialdata.scverse.org/projects/plot/en/latest/index.html) (see the end of this tutorial for more visualization approaches).\n",
     "\n",
     "<p align=\"center\">\n",
     "  <img src=\"../../assets/example_tissue_segmentation.png\" alt=\"tissue_segmentation\" width=\"600px\"/>\n",

diff --git a/docs/tutorials/visium_hd.ipynb b/docs/tutorials/visium_hd.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "We can run Sopa on Visium HD, as the 2 micron bins are subcellular. You can follow the [\"normal\" API tutorial](../api_usage), or continue below to get exemples more specific to Visium HD data.\n",
     "\n",
-    "For this tutorial, we use the [mouse small intestine public dataset](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine) from 10X Genomics."
+    "For this tutorial, we use the [mouse small intestine public dataset](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine) from 10X Genomics.\n"
    ]
   },
   {
@@ -25,7 +25,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Reading the data"
+    "## Reading the data\n"
    ]
   },
   {
@@ -42,7 +42,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Then, we save it on-disk:"
+    "Then, we save it on-disk:\n"
    ]
   },
   {
@@ -51,42 +51,42 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "sdata.write(\"mouse_small_intestine.zarr\") # save it\n",
+    "sdata.write(\"mouse_small_intestine.zarr\")  # save it\n",
     "\n",
-    "sdata = spatialdata.read_zarr(\"mouse_small_intestine.zarr\") # open-it back"
+    "sdata = spatialdata.read_zarr(\"mouse_small_intestine.zarr\")  # open-it back"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Usual pipeline"
+    "## Usual pipeline\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run [Stardist](https://github.com/stardist/stardist) on the H&E image."
+    "Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run [Stardist](https://github.com/stardist/stardist) on the H&E image.\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First, we create the patches for the cell segmentation."
+    "First, we create the patches for the cell segmentation.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "\u001b[36;20m[INFO] (sopa.patches._patches)\u001b[0m 156 patches were added to sdata['image_patches']\n"
+      "\u001b[36;20m[INFO] (sopa.patches._patches)\u001b[0m Added 156 patche(s) to sdata['image_patches']\n"
      ]
     }
    ],
@@ -98,63 +98,63 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now we can run stardist on the H&E image. Here, we decrease `prob_thresh` and increase `nms_thresh` to get more cells."
+    "Now we can run stardist on the H&E image.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
       "\u001b[33;20m[WARNING] (sopa._settings)\u001b[0m Running without parallelization backend can be slow. Consider using a backend, e.g. via `sopa.settings.parallelization_backend = 'dask'`, or `export SOPA_PARALLELIZATION_BACKEND=dask`.\n",
-      "  3%|▎         | 4/156 [00:34<22:12,  8.77s/it]"
+      "  3%|▎         | 4/156 [00:34<22:03,  8.71s/it]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x3a211ee60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
+      "WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x5638ebb50> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "  3%|▎         | 5/156 [00:36<16:10,  6.42s/it]"
+      "  3%|▎         | 5/156 [00:36<16:07,  6.40s/it]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x4e57d3880> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
+      "WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x563ab4550> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "100%|██████████| 156/156 [19:26<00:00,  7.48s/it]\n",
-      "\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Found 428836 total cells\n",
-      "Resolving conflicts: 100%|██████████| 64106/64106 [00:22<00:00, 2885.00it/s]\n",
-      "\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Added 408458 cell boundaries in sdata['stardist_boundaries']\n"
+      "100%|██████████| 156/156 [16:27<00:00,  6.33s/it]\n",
+      "\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Found 430536 total cells\n",
+      "Resolving conflicts: 100%|██████████| 63782/63782 [00:23<00:00, 2714.30it/s]\n",
+      "\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Added 410196 cell boundaries in sdata['stardist_boundaries']\n"
      ]
     }
    ],
    "source": [
-    "sopa.segmentation.stardist(sdata, prob_thresh=0.2, nms_thresh=0.6, min_area=30)"
+    "sopa.segmentation.stardist(sdata, min_area=30)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Aggregation"
+    "## Aggregation\n"
    ]
   },
   {
@@ -165,7 +165,7 @@
     "\n",
     "Here, `expand_radius_ratio = 1`, which means that the cells will be expanded a value of `1 * mean_radius` before aggregating the means. You can choose any positive float value.\n",
     "\n",
-    "> There is an argument `bins_key`, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels."
+    "> There is an argument `bins_key`, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels.\n"
    ]
   },
   {
@@ -190,14 +190,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Single-cell table"
+    "## Single-cell table\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, we have an AnnData object with the gene expression **per cell**."
+    "Now, we have an AnnData object with the gene expression **per cell**.\n"
    ]
   },
   {
@@ -229,7 +229,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For instance, we can now use Scanpy to plot gene expression."
+    "For instance, we can now use Scanpy to plot gene expression.\n"
    ]
   },
   {
@@ -258,7 +258,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can then use `sc.pl.spatial` to show the gene expression per cells. Note that, here, we show **cells**, not bins."
+    "We can then use `sc.pl.spatial` to show the gene expression per cells. Note that, here, we show **cells**, not bins.\n"
    ]
   },
   {
@@ -289,21 +289,21 @@
     "\n",
     "The 2-micron bins are arranged in a grid, so they can be visualized as an image of `G` channels, where `G` is the number of genes.\n",
     "\n",
-    "Creating the image would be massive, so we need to create it lazily. This can be done with `spatialdata.rasterize_bins`."
+    "Creating the image would be massive, so we need to create it lazily. This can be done with `spatialdata.rasterize_bins`.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "sdata[\"square_002um\"].X = sdata[\"square_002um\"].X.tocsc() # optimisation with the csc format\n",
+    "sdata[\"square_002um\"].X = sdata[\"square_002um\"].X.tocsc()  # optimisation with the csc format\n",
     "\n",
     "lazy_bins_image = spatialdata.rasterize_bins(\n",
     "    sdata,\n",
-    "    bins=\"Visium_HD_Mouse_Small_Intestine_square_002um\", # key of the bins shapes\n",
-    "    table_name=\"square_002um\", # key of the table with the bins gene expression\n",
+    "    bins=\"Visium_HD_Mouse_Small_Intestine_square_002um\",  # key of the bins shapes\n",
+    "    table_name=\"square_002um\",  # key of the table with the bins gene expression\n",
     "    row_key=\"array_row\",\n",
     "    col_key=\"array_col\",\n",
     ")"
@@ -313,7 +313,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Note that `lazy_bins_image` is an image of size `(19059, 690, 690)`, that is `G=19059` genes, and `690x690` bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy."
+    "Note that `lazy_bins_image` is an image of size `(19059, 690, 690)`, that is `G=19059` genes, and `690x690` bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy.\n"
    ]
   },
   {
@@ -329,7 +329,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can save this image in the `sdata` object."
+    "We can save this image in the `sdata` object.\n"
    ]
   },
   {
@@ -345,7 +345,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous."
+    "Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous.\n"
    ]
   },
   {
@@ -367,15 +367,16 @@
     "\n",
     "<p align=\"center\">\n",
     "  <img src=\"../../assets/napari_visium_hd.png\" alt=\"napari_visium_hd\" width=\"600px\"/>\n",
-    "</p>"
+    "</p>\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Xenium Explorer\n",
-    "Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use `Napari` as detailed above."
+    "\n",
+    "Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use `Napari` as detailed above.\n"
    ]
   },
   {

diff --git a/sopa/segmentation/_stainings.py b/sopa/segmentation/_stainings.py
@@ -105,7 +105,8 @@ def _run_patch(self, patch: Polygon) -> gpd.GeoDataFrame:
             )
 
         if patch.area < box(*bounds).area:
-            image = image * shapes.rasterize(patch, image.shape[1:], bounds)
+            mask = shapes.rasterize(patch, image.shape[1:], bounds)
+            image = _channels_average_within_mask(image, mask)
 
         cells = shapes.vectorize(self.method(image))
         cells.geometry = cells.translate(*bounds[:2])
@@ -188,3 +189,9 @@ def add_shapes(cls, sdata: SpatialData, cells: gpd.GeoDataFrame, image_key: str,
         add_spatial_element(sdata, key_added, cells)
 
         log.info(f"Added {len(cells)} cell boundaries in sdata['{key_added}']")
+
+
+def _channels_average_within_mask(image: np.ndarray, mask: np.ndarray) -> np.ndarray:
+    channels_average = (image * mask).sum(axis=(1, 2)) / mask.sum().clip(1)
+
+    return image * mask + (1 - mask) * channels_average[:, None, None]
diff --git a/sopa/segmentation/methods/_cellpose.py b/sopa/segmentation/methods/_cellpose.py
@@ -37,7 +37,7 @@ def cellpose(
 
     Args:
         sdata: A `SpatialData` object
-        channels: Name of the channels to be used for segmentation (or list of channel names).
+        channels: Name of the channel(s) to be used for segmentation. If one channel, must be a nucleus channel. If a `list` of channels, it must be a cytoplasmic channel and then a nucleus channel.
         diameter: The Cellpose parameter for the expected cell diameter (in pixel).
         model_type: Cellpose model type.
         image_key: Name of the image in `sdata` to be used for segmentation.

diff --git a/sopa/segmentation/methods/_stardist.py b/sopa/segmentation/methods/_stardist.py
@@ -19,8 +19,8 @@ def stardist(
     min_area: int = 0,
     delete_cache: bool = True,
     recover: bool = False,
-    prob_thresh: float = 0.5,
-    nms_thresh: float = 0.4,
+    prob_thresh: float = 0.2,
+    nms_thresh: float = 0.6,
     clip_limit: float = 0,
     clahe_kernel_size: int | list[int] | None = None,
     gaussian_sigma: float = 0,

diff --git a/tests/test_segmentation.py b/tests/test_segmentation.py
@@ -1,9 +1,25 @@
+import numpy as np
 import shapely
 from geopandas.testing import assert_geodataframe_equal
 from shapely import MultiPolygon
 
 import sopa
 from sopa._constants import SopaKeys
+from sopa.segmentation._stainings import _channels_average_within_mask
+
+
+def test_channels_average_within_mask():
+    image = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 20, 30], [40, 50, 60], [70, 80, 90]]])
+    mask = np.array([[1, 1, 1], [0, 0, 1], [0, 1, 0]])
+
+    expected = np.array(
+        [
+            [[1.0, 2.0, 3.0], [4.0, 4.0, 6.0], [4.0, 8.0, 4.0]],
+            [[10.0, 20.0, 30.0], [40.0, 40.0, 60.0], [40.0, 80.0, 40.0]],
+        ]
+    )
+
+    assert (_channels_average_within_mask(image, mask) == expected).all()
 
 
 def test_cellpose_segmentation():
@@ -62,3 +78,7 @@ def test_tissue_segmentation():
     m1_default_level0 = MultiPolygon(geo_df.geometry.values)
 
     assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9
+
+    assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9
+
+    assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9