Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
### Changed
- Using `density_prior = "uniform"` by default for Tangram (#174)
- [`spatialdata_plot`](https://spatialdata.scverse.org/projects/plot/en/latest/index.html) is now a default dependency of Sopa
- Use `prob_thresh=0.2` and `nms_thresh=0.6` by default in `stardist`
- During segmentation, pixels outside of the ROI / tissue use the mean channels value instead of 0

## [2.0.2] - 2025-02-21

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/api_usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"On a real tissue, the `'region_of_interest'` could look like this black line:\n",
"On a real tissue, the `'region_of_interest'` could look like the black line below. This plot can be done via [`spatialdata_plot`](https://spatialdata.scverse.org/projects/plot/en/latest/index.html) (see the end of this tutorial for more visualization approaches).\n",
"\n",
"<p align=\"center\">\n",
" <img src=\"../../assets/example_tissue_segmentation.png\" alt=\"tissue_segmentation\" width=\"600px\"/>\n",
Expand Down
75 changes: 38 additions & 37 deletions docs/tutorials/visium_hd.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"We can run Sopa on Visium HD, as the 2 micron bins are subcellular. You can follow the [\"normal\" API tutorial](../api_usage), or continue below to get exemples more specific to Visium HD data.\n",
"\n",
"For this tutorial, we use the [mouse small intestine public dataset](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine) from 10X Genomics."
"For this tutorial, we use the [mouse small intestine public dataset](https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-mouse-intestine) from 10X Genomics.\n"
]
},
{
Expand All @@ -25,7 +25,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading the data"
"## Reading the data\n"
]
},
{
Expand All @@ -42,7 +42,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we save it on-disk:"
"Then, we save it on-disk:\n"
]
},
{
Expand All @@ -51,42 +51,42 @@
"metadata": {},
"outputs": [],
"source": [
"sdata.write(\"mouse_small_intestine.zarr\") # save it\n",
"sdata.write(\"mouse_small_intestine.zarr\") # save it\n",
"\n",
"sdata = spatialdata.read_zarr(\"mouse_small_intestine.zarr\") # open-it back"
"sdata = spatialdata.read_zarr(\"mouse_small_intestine.zarr\") # open-it back"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usual pipeline"
"## Usual pipeline\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run [Stardist](https://github.com/stardist/stardist) on the H&E image."
"Now, we run Sopa as usual. Although, since we don't have transcripts, we can't run Baysor. Therefore, we will run [Stardist](https://github.com/stardist/stardist) on the H&E image.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we create the patches for the cell segmentation."
"First, we create the patches for the cell segmentation.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[36;20m[INFO] (sopa.patches._patches)\u001b[0m 156 patches were added to sdata['image_patches']\n"
"\u001b[36;20m[INFO] (sopa.patches._patches)\u001b[0m Added 156 patche(s) to sdata['image_patches']\n"
]
}
],
Expand All @@ -98,63 +98,63 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can run stardist on the H&E image. Here, we decrease `prob_thresh` and increase `nms_thresh` to get more cells."
"Now we can run stardist on the H&E image.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[33;20m[WARNING] (sopa._settings)\u001b[0m Running without parallelization backend can be slow. Consider using a backend, e.g. via `sopa.settings.parallelization_backend = 'dask'`, or `export SOPA_PARALLELIZATION_BACKEND=dask`.\n",
" 3%|▎ | 4/156 [00:34<22:12, 8.77s/it]"
" 3%|▎ | 4/156 [00:34<22:03, 8.71s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x3a211ee60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
"WARNING:tensorflow:5 out of the last 5 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x5638ebb50> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" 3%|▎ | 5/156 [00:36<16:10, 6.42s/it]"
" 3%|▎ | 5/156 [00:36<16:07, 6.40s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x4e57d3880> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
"WARNING:tensorflow:6 out of the last 6 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x563ab4550> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 156/156 [19:26<00:00, 7.48s/it]\n",
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Found 428836 total cells\n",
"Resolving conflicts: 100%|██████████| 64106/64106 [00:22<00:00, 2885.00it/s]\n",
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Added 408458 cell boundaries in sdata['stardist_boundaries']\n"
"100%|██████████| 156/156 [16:27<00:00, 6.33s/it]\n",
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Found 430536 total cells\n",
"Resolving conflicts: 100%|██████████| 63782/63782 [00:23<00:00, 2714.30it/s]\n",
"\u001b[36;20m[INFO] (sopa.segmentation._stainings)\u001b[0m Added 410196 cell boundaries in sdata['stardist_boundaries']\n"
]
}
],
"source": [
"sopa.segmentation.stardist(sdata, prob_thresh=0.2, nms_thresh=0.6, min_area=30)"
"sopa.segmentation.stardist(sdata, min_area=30)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Aggregation"
"## Aggregation\n"
]
},
{
Expand All @@ -165,7 +165,7 @@
"\n",
"Here, `expand_radius_ratio = 1`, which means that the cells will be expanded a value of `1 * mean_radius` before aggregating the means. You can choose any positive float value.\n",
"\n",
"> There is an argument `bins_key`, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels."
"> There is an argument `bins_key`, but by default Sopa will understand that it's Visium HD data and that it should use the 2-microns bins. Also, on the example below, we only aggregate the bins, not the H&E channels.\n"
]
},
{
Expand All @@ -190,14 +190,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Single-cell table"
"## Single-cell table\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we have an AnnData object with the gene expression **per cell**."
"Now, we have an AnnData object with the gene expression **per cell**.\n"
]
},
{
Expand Down Expand Up @@ -229,7 +229,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For instance, we can now use Scanpy to plot gene expression."
"For instance, we can now use Scanpy to plot gene expression.\n"
]
},
{
Expand Down Expand Up @@ -258,7 +258,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then use `sc.pl.spatial` to show the gene expression per cells. Note that, here, we show **cells**, not bins."
"We can then use `sc.pl.spatial` to show the gene expression per cells. Note that, here, we show **cells**, not bins.\n"
]
},
{
Expand Down Expand Up @@ -289,21 +289,21 @@
"\n",
"The 2-micron bins are arranged in a grid, so they can be visualized as an image of `G` channels, where `G` is the number of genes.\n",
"\n",
"Creating the image would be massive, so we need to create it lazily. This can be done with `spatialdata.rasterize_bins`."
"Creating the image would be massive, so we need to create it lazily. This can be done with `spatialdata.rasterize_bins`.\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sdata[\"square_002um\"].X = sdata[\"square_002um\"].X.tocsc() # optimisation with the csc format\n",
"sdata[\"square_002um\"].X = sdata[\"square_002um\"].X.tocsc() # optimisation with the csc format\n",
"\n",
"lazy_bins_image = spatialdata.rasterize_bins(\n",
" sdata,\n",
" bins=\"Visium_HD_Mouse_Small_Intestine_square_002um\", # key of the bins shapes\n",
" table_name=\"square_002um\", # key of the table with the bins gene expression\n",
" bins=\"Visium_HD_Mouse_Small_Intestine_square_002um\", # key of the bins shapes\n",
" table_name=\"square_002um\", # key of the table with the bins gene expression\n",
" row_key=\"array_row\",\n",
" col_key=\"array_col\",\n",
")"
Expand All @@ -313,7 +313,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that `lazy_bins_image` is an image of size `(19059, 690, 690)`, that is `G=19059` genes, and `690x690` bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy."
"Note that `lazy_bins_image` is an image of size `(19059, 690, 690)`, that is `G=19059` genes, and `690x690` bins. This would correspond to a 33.80GB image in memory, if it wasn't lazy.\n"
]
},
{
Expand All @@ -329,7 +329,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can save this image in the `sdata` object."
"We can save this image in the `sdata` object.\n"
]
},
{
Expand All @@ -345,7 +345,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous."
"Then, we can visualize this image with Napari. When showing a gene, it will compute the corresponding layer of the lazy image, and it will be displayed in milliseconds, i.e. looking instantaneous.\n"
]
},
{
Expand All @@ -367,15 +367,16 @@
"\n",
"<p align=\"center\">\n",
" <img src=\"../../assets/napari_visium_hd.png\" alt=\"napari_visium_hd\" width=\"600px\"/>\n",
"</p>"
"</p>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Xenium Explorer\n",
"Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use `Napari` as detailed above."
"\n",
"Although the Xenium Explorer can be used (as below), it will not display the bins. If you want to see the bins, use `Napari` as detailed above.\n"
]
},
{
Expand Down
9 changes: 8 additions & 1 deletion sopa/segmentation/_stainings.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,8 @@ def _run_patch(self, patch: Polygon) -> gpd.GeoDataFrame:
)

if patch.area < box(*bounds).area:
image = image * shapes.rasterize(patch, image.shape[1:], bounds)
mask = shapes.rasterize(patch, image.shape[1:], bounds)
image = _channels_average_within_mask(image, mask)

cells = shapes.vectorize(self.method(image))
cells.geometry = cells.translate(*bounds[:2])
Expand Down Expand Up @@ -188,3 +189,9 @@ def add_shapes(cls, sdata: SpatialData, cells: gpd.GeoDataFrame, image_key: str,
add_spatial_element(sdata, key_added, cells)

log.info(f"Added {len(cells)} cell boundaries in sdata['{key_added}']")


def _channels_average_within_mask(image: np.ndarray, mask: np.ndarray) -> np.ndarray:
channels_average = (image * mask).sum(axis=(1, 2)) / mask.sum().clip(1)

return image * mask + (1 - mask) * channels_average[:, None, None]
2 changes: 1 addition & 1 deletion sopa/segmentation/methods/_cellpose.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def cellpose(

Args:
sdata: A `SpatialData` object
channels: Name of the channels to be used for segmentation (or list of channel names).
channels: Name of the channel(s) to be used for segmentation. If one channel, must be a nucleus channel. If a `list` of channels, it must be a cytoplasmic channel and then a nucleus channel.
diameter: The Cellpose parameter for the expected cell diameter (in pixel).
model_type: Cellpose model type.
image_key: Name of the image in `sdata` to be used for segmentation.
Expand Down
4 changes: 2 additions & 2 deletions sopa/segmentation/methods/_stardist.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ def stardist(
min_area: int = 0,
delete_cache: bool = True,
recover: bool = False,
prob_thresh: float = 0.5,
nms_thresh: float = 0.4,
prob_thresh: float = 0.2,
nms_thresh: float = 0.6,
clip_limit: float = 0,
clahe_kernel_size: int | list[int] | None = None,
gaussian_sigma: float = 0,
Expand Down
20 changes: 20 additions & 0 deletions tests/test_segmentation.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,25 @@
import numpy as np
import shapely
from geopandas.testing import assert_geodataframe_equal
from shapely import MultiPolygon

import sopa
from sopa._constants import SopaKeys
from sopa.segmentation._stainings import _channels_average_within_mask


def test_channels_average_within_mask():
image = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 20, 30], [40, 50, 60], [70, 80, 90]]])
mask = np.array([[1, 1, 1], [0, 0, 1], [0, 1, 0]])

expected = np.array(
[
[[1.0, 2.0, 3.0], [4.0, 4.0, 6.0], [4.0, 8.0, 4.0]],
[[10.0, 20.0, 30.0], [40.0, 40.0, 60.0], [40.0, 80.0, 40.0]],
]
)

assert (_channels_average_within_mask(image, mask) == expected).all()


def test_cellpose_segmentation():
Expand Down Expand Up @@ -62,3 +78,7 @@ def test_tissue_segmentation():
m1_default_level0 = MultiPolygon(geo_df.geometry.values)

assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9

assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9

assert m1_default_transformed.intersection(m1_default_level0).area / m1_default_transformed.area > 0.9