Skip to content

Commit 3f072a9

Browse files
Merge pull request #222 from gustaveroussy/mean_within_mask
Pixels outside of the ROI / tissue use the mean channels value instead of 0
2 parents d63f336 + c0e3ff2 commit 3f072a9

File tree

7 files changed

+72
-42
lines changed

7 files changed

+72
-42
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
### Changed
1111
- Using `density_prior = "uniform"` by default for Tangram (#174)
1212
- [`spatialdata_plot`](https://spatialdata.scverse.org/projects/plot/en/latest/index.html) is now a default dependency of Sopa
13+
- Use `prob_thresh=0.2` and `nms_thresh=0.6` by default in `stardist`
14+
- During segmentation, pixels outside of the ROI / tissue use the mean channels value instead of 0
1315

1416
## [2.0.2] - 2025-02-21
1517

docs/tutorials/api_usage.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@
205205
"cell_type": "markdown",
206206
"metadata": {},
207207
"source": [
208-
"On a real tissue, the `'region_of_interest'` could look like this black line:\n",
208+
"On a real tissue, the `'region_of_interest'` could look like the black line below. This plot can be done via [`spatialdata_plot`](https://spatialdata.scverse.org/projects/plot/en/latest/index.html) (see the end of this tutorial for more visualization approaches).\n",
209209
"\n",
210210
"<p align=\"center\">\n",
211211
" <img src=\"../../assets/example_tissue_segmentation.png\" alt=\"tissue_segmentation\" width=\"600px\"/>\n",

docs/tutorials/visium_hd.ipynb

Lines changed: 38 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"We can run Sopa on Visium HD, as the 2 micron bins are subcellular. You can follow the [\"normal\" API tutorial](../api_usage), or continue below to get exemples more specific to Visium HD data.\n",
1010
"\n",
11-
"For this tutorial, we use the [mouse small intestine public dataset](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine) from 10X Genomics."
11+
"For this tutorial, we use the [mouse small intestine public dataset](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine) from 10X Genomics.\n"
1212
]
1313
},
1414
{
@@ -25,7 +25,7 @@
2525
"cell_type": "markdown",
2626
"metadata": {},
2727
"source": [
28-
"## Reading the data"
28+
"## Reading the data\n"
2929
]
3030
},
3131
{
@@ -42,7 +42,7 @@
4242
"cell_type": "markdown",
4343
"metadata": {},
4444
"source": [
45-
"Then, we save it on-disk:"
45+
"Then, we save it on-disk:\n"
4646
]
4747
},
4848
{
@@ -51,42 +51,42 @@
5151
"metadata": {},
5252
"outputs": [],
5353
"source": [
54-
"sdata.write(\"mouse_small_intestine.zarr\") # save it\n",
54+
"sdata.write(\"mouse_small_intestine.zarr\") # save it\n",
5555
"\n",
56-
"sdata = spatialdata.read_zarr(\"mouse_small_intestine.zarr\") # open-it back"
56+
"sdata = spatialdata.read_zarr(\"mouse_small_intestine.zarr\") # open-it back"
5757
]
5858
},
5959
{
6060
"cell_type": "markdown",
6161
"metadata": {},
6262
"source": [
63-
"## Usual pipeline"
63+
"## Usual pipeline\n"
6464
]
6565
},
6666
{
6767
"cell_type": "markdown",
6868
"metadata": {},
6969
"source": [
70-
"Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run [Stardist](https://github.com/stardist/stardist) on the H&E image."
70+
"Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run [Stardist](https://github.com/stardist/stardist) on the H&E image.\n"
7171
]
7272
},
7373
{
7474
"cell_type": "markdown",
7575
"metadata": {},
7676
"source": [
77-
"First, we create the patches for the cell segmentation."
77+
"First, we create the patches for the cell segmentation.\n"
7878
]
7979
},
8080
{
8181
"cell_type": "code",
82-
"execution_count": 7,
82+
"execution_count": 5,
8383
"metadata": {},
8484
"outputs": [
8585
{
8686
"name": "stderr",
8787
"output_type": "stream",
8888
"text": [
89-
"\u001b[36;20m[INFO] (sopa.patches._patches)\u001b[0m 156 patches were added to sdata['image_patches']\n"
89+
"\u001b[36;20m[INFO] (sopa.patches._patches)\u001b[0m Added 156 patche(s) to sdata['image_patches']\n"
9090
]
9191
}
9292
],
@@ -98,63 +98,63 @@
9898
"cell_type": "markdown",
9999
"metadata": {},
100100
"source": [
101-
"Now we can run stardist on the H&E image. Here, we decrease `prob_thresh` and increase `nms_thresh` to get more cells."
101+
"Now we can run stardist on the H&E image.\n"
102102
]
103103
},
104104
{
105105
"cell_type": "code",
106-
"execution_count": 8,
106+
"execution_count": 6,
107107
"metadata": {},
108108
"outputs": [
109109
{
110110
"name": "stderr",
111111
"output_type": "stream",
112112
"text": [
113113
"\u001b[33;20m[WARNING] (sopa._settings)\u001b[0m Running without parallelization backend can be slow. Consider using a backend, e.g. via `sopa.settings.parallelization_backend = 'dask'`, or `export SOPA_PARALLELIZATION_BACKEND=dask`.\n",
114-
" 3%|▎ | 4/156 [00:34<22:12, 8.77s/it]"
114+
" 3%|▎ | 4/156 [00:34<22:03, 8.71s/it]"
115115
]
116116
},
117117
{
118118
"name": "stdout",
119119
"output_type": "stream",
120120
"text": [
121-
"WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x3a211ee60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
121+
"WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x5638ebb50> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
122122
]
123123
},
124124
{
125125
"name": "stderr",
126126
"output_type": "stream",
127127
"text": [
128-
" 3%|▎ | 5/156 [00:36<16:10, 6.42s/it]"
128+
" 3%|▎ | 5/156 [00:36<16:07, 6.40s/it]"
129129
]
130130
},
131131
{
132132
"name": "stdout",
133133
"output_type": "stream",
134134
"text": [
135-
"WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x4e57d3880> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
135+
"WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x563ab4550> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
136136
]
137137
},
138138
{
139139
"name": "stderr",
140140
"output_type": "stream",
141141
"text": [
142-
"100%|██████████| 156/156 [19:26<00:00, 7.48s/it]\n",
143-
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Found 428836 total cells\n",
144-
"Resolving conflicts: 100%|██████████| 64106/64106 [00:22<00:00, 2885.00it/s]\n",
145-
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Added 408458 cell boundaries in sdata['stardist_boundaries']\n"
142+
"100%|██████████| 156/156 [16:27<00:00, 6.33s/it]\n",
143+
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Found 430536 total cells\n",
144+
"Resolving conflicts: 100%|██████████| 63782/63782 [00:23<00:00, 2714.30it/s]\n",
145+
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Added 410196 cell boundaries in sdata['stardist_boundaries']\n"
146146
]
147147
}
148148
],
149149
"source": [
150-
"sopa.segmentation.stardist(sdata, prob_thresh=0.2, nms_thresh=0.6, min_area=30)"
150+
"sopa.segmentation.stardist(sdata, min_area=30)"
151151
]
152152
},
153153
{
154154
"cell_type": "markdown",
155155
"metadata": {},
156156
"source": [
157-
"## Aggregation"
157+
"## Aggregation\n"
158158
]
159159
},
160160
{
@@ -165,7 +165,7 @@
165165
"\n",
166166
"Here, `expand_radius_ratio = 1`, which means that the cells will be expanded a value of `1 * mean_radius` before aggregating the means. You can choose any positive float value.\n",
167167
"\n",
168-
"> There is an argument `bins_key`, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels."
168+
"> There is an argument `bins_key`, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels.\n"
169169
]
170170
},
171171
{
@@ -190,14 +190,14 @@
190190
"cell_type": "markdown",
191191
"metadata": {},
192192
"source": [
193-
"## Single-cell table"
193+
"## Single-cell table\n"
194194
]
195195
},
196196
{
197197
"cell_type": "markdown",
198198
"metadata": {},
199199
"source": [
200-
"Now, we have an AnnData object with the gene expression **per cell**."
200+
"Now, we have an AnnData object with the gene expression **per cell**.\n"
201201
]
202202
},
203203
{
@@ -229,7 +229,7 @@
229229
"cell_type": "markdown",
230230
"metadata": {},
231231
"source": [
232-
"For instance, we can now use Scanpy to plot gene expression."
232+
"For instance, we can now use Scanpy to plot gene expression.\n"
233233
]
234234
},
235235
{
@@ -258,7 +258,7 @@
258258
"cell_type": "markdown",
259259
"metadata": {},
260260
"source": [
261-
"We can then use `sc.pl.spatial` to show the gene expression per cells. Note that, here, we show **cells**, not bins."
261+
"We can then use `sc.pl.spatial` to show the gene expression per cells. Note that, here, we show **cells**, not bins.\n"
262262
]
263263
},
264264
{
@@ -289,21 +289,21 @@
289289
"\n",
290290
"The 2-micron bins are arranged in a grid, so they can be visualized as an image of `G` channels, where `G` is the number of genes.\n",
291291
"\n",
292-
"Creating the image would be massive, so we need to create it lazily. This can be done with `spatialdata.rasterize_bins`."
292+
"Creating the image would be massive, so we need to create it lazily. This can be done with `spatialdata.rasterize_bins`.\n"
293293
]
294294
},
295295
{
296296
"cell_type": "code",
297-
"execution_count": 12,
297+
"execution_count": null,
298298
"metadata": {},
299299
"outputs": [],
300300
"source": [
301-
"sdata[\"square_002um\"].X = sdata[\"square_002um\"].X.tocsc() # optimisation with the csc format\n",
301+
"sdata[\"square_002um\"].X = sdata[\"square_002um\"].X.tocsc() # optimisation with the csc format\n",
302302
"\n",
303303
"lazy_bins_image = spatialdata.rasterize_bins(\n",
304304
" sdata,\n",
305-
" bins=\"Visium_HD_Mouse_Small_Intestine_square_002um\", # key of the bins shapes\n",
306-
" table_name=\"square_002um\", # key of the table with the bins gene expression\n",
305+
" bins=\"Visium_HD_Mouse_Small_Intestine_square_002um\", # key of the bins shapes\n",
306+
" table_name=\"square_002um\", # key of the table with the bins gene expression\n",
307307
" row_key=\"array_row\",\n",
308308
" col_key=\"array_col\",\n",
309309
")"
@@ -313,7 +313,7 @@
313313
"cell_type": "markdown",
314314
"metadata": {},
315315
"source": [
316-
"Note that `lazy_bins_image` is an image of size `(19059, 690, 690)`, that is `G=19059` genes, and `690x690` bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy."
316+
"Note that `lazy_bins_image` is an image of size `(19059, 690, 690)`, that is `G=19059` genes, and `690x690` bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy.\n"
317317
]
318318
},
319319
{
@@ -329,7 +329,7 @@
329329
"cell_type": "markdown",
330330
"metadata": {},
331331
"source": [
332-
"We can save this image in the `sdata` object."
332+
"We can save this image in the `sdata` object.\n"
333333
]
334334
},
335335
{
@@ -345,7 +345,7 @@
345345
"cell_type": "markdown",
346346
"metadata": {},
347347
"source": [
348-
"Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous."
348+
"Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous.\n"
349349
]
350350
},
351351
{
@@ -367,15 +367,16 @@
367367
"\n",
368368
"<p align=\"center\">\n",
369369
" <img src=\"../../assets/napari_visium_hd.png\" alt=\"napari_visium_hd\" width=\"600px\"/>\n",
370-
"</p>"
370+
"</p>\n"
371371
]
372372
},
373373
{
374374
"cell_type": "markdown",
375375
"metadata": {},
376376
"source": [
377377
"## Xenium Explorer\n",
378-
"Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use `Napari` as detailed above."
378+
"\n",
379+
"Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use `Napari` as detailed above.\n"
379380
]
380381
},
381382
{

sopa/segmentation/_stainings.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,8 @@ def _run_patch(self, patch: Polygon) -> gpd.GeoDataFrame:
105105
)
106106

107107
if patch.area < box(*bounds).area:
108-
image = image * shapes.rasterize(patch, image.shape[1:], bounds)
108+
mask = shapes.rasterize(patch, image.shape[1:], bounds)
109+
image = _channels_average_within_mask(image, mask)
109110

110111
cells = shapes.vectorize(self.method(image))
111112
cells.geometry = cells.translate(*bounds[:2])
@@ -188,3 +189,9 @@ def add_shapes(cls, sdata: SpatialData, cells: gpd.GeoDataFrame, image_key: str,
188189
add_spatial_element(sdata, key_added, cells)
189190

190191
log.info(f"Added {len(cells)} cell boundaries in sdata['{key_added}']")
192+
193+
194+
def _channels_average_within_mask(image: np.ndarray, mask: np.ndarray) -> np.ndarray:
195+
channels_average = (image * mask).sum(axis=(1, 2)) / mask.sum().clip(1)
196+
197+
return image * mask + (1 - mask) * channels_average[:, None, None]

sopa/segmentation/methods/_cellpose.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def cellpose(
3737
3838
Args:
3939
sdata: A `SpatialData` object
40-
channels: Name of the channels to be used for segmentation (or list of channel names).
40+
channels: Name of the channel(s) to be used for segmentation. If one channel, must be a nucleus channel. If a `list` of channels, it must be a cytoplasmic channel and then a nucleus channel.
4141
diameter: The Cellpose parameter for the expected cell diameter (in pixel).
4242
model_type: Cellpose model type.
4343
image_key: Name of the image in `sdata` to be used for segmentation.

sopa/segmentation/methods/_stardist.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ def stardist(
1919
min_area: int = 0,
2020
delete_cache: bool = True,
2121
recover: bool = False,
22-
prob_thresh: float = 0.5,
23-
nms_thresh: float = 0.4,
22+
prob_thresh: float = 0.2,
23+
nms_thresh: float = 0.6,
2424
clip_limit: float = 0,
2525
clahe_kernel_size: int | list[int] | None = None,
2626
gaussian_sigma: float = 0,

tests/test_segmentation.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,25 @@
1+
import numpy as np
12
import shapely
23
from geopandas.testing import assert_geodataframe_equal
34
from shapely import MultiPolygon
45

56
import sopa
67
from sopa._constants import SopaKeys
8+
from sopa.segmentation._stainings import _channels_average_within_mask
9+
10+
11+
def test_channels_average_within_mask():
12+
image = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 20, 30], [40, 50, 60], [70, 80, 90]]])
13+
mask = np.array([[1, 1, 1], [0, 0, 1], [0, 1, 0]])
14+
15+
expected = np.array(
16+
[
17+
[[1.0, 2.0, 3.0], [4.0, 4.0, 6.0], [4.0, 8.0, 4.0]],
18+
[[10.0, 20.0, 30.0], [40.0, 40.0, 60.0], [40.0, 80.0, 40.0]],
19+
]
20+
)
21+
22+
assert (_channels_average_within_mask(image, mask) == expected).all()
723

824

925
def test_cellpose_segmentation():
@@ -62,3 +78,7 @@ def test_tissue_segmentation():
6278
m1_default_level0 = MultiPolygon(geo_df.geometry.values)
6379

6480
assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9
81+
82+
assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9
83+
84+
assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9

0 commit comments

Comments
 (0)