Skip to content

load_stac - prevent setting metadata to None due to missing band labels #752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
VictorVerhaert opened this issue Mar 19, 2025 · 10 comments
Closed

Comments

@VictorVerhaert
Copy link
Contributor

if metadata_from_stac() can't find band names it creates a BandDimension without band labels. This empty bands dimension causes an error here:

metadata = metadata_from_stac(url)
if isinstance(bands, list):
# TODO: also apply spatial/temporal filters to metadata?
metadata = metadata.filter_bands(band_names=bands)
except Exception:

which in turns causes the metadata to be set to None.

A nicer way to threat this is to set the requested bands in load_stac as band labels if the python-client could not derive band names

Having metadata set to None instead of default dims causes an error in this sequence of processes:

s2_datacube = connection.load_stac(
      "https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a",
      spatial_extent={'west': 499980, 'south': 4390200, 'east': 609780, 'north': 4500000, 'crs': 'EPSG:32629'},
      temporal_extent=['2020-01-01', '2020-12-31'],
      bands=['B01_60m', 'B02_60m', 'B03_60m', 'B04_60m', 'B8A_60m', 'B09_60m', 'B11_60m', 'B12_60m', 'SCL_60m'],
  ) #-> metadata is set to None

s2_resampled = s2_datacube.resample_spatial(resolution=60) # -> CubeMetadata is created with x and y dimensions (changed in version 0.39.0 which brings this to light)

s2_renamed = s2_resampled.rename_labels(dimension="bands", target=['B01', 'B02', 'B03', 'B04', 'B8A', 'B09', 'B11', 'B12', 'SCL']) # -> raises exception because metadata is present but does not contain spatial dimension
@soxofaan
Copy link
Member

Wouldn't this also be fixed by improving the band detection of that stac collection? e.g. #586

@soxofaan
Copy link
Member

  bands=['B01_60m', 'B02_60m', 'B03_60m', 'B04_60m', 'B8A_60m', 'B09_60m', 'B11_60m', 'B12_60m', 'SCL_60m'],

I also noticed that you are using different band names than the ones listed under "bands" in https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a: B01, B02, B03, ...

so there also some inconsistency here

@VictorVerhaert
Copy link
Contributor Author

I also noticed that you are using different band names than the ones listed under "bands" in https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a: B01, B02, B03, ...

Yes the collection contains those bands but the items list the assets at different resolutions as well. In the use-case where I encountered this bug we specifically want to directly load the 60m assets.

That is why #586 would not solve this use case. In an ideal world the stac collection would list all the bands with their different resolution, however even with #586 solved we would block the creation of a job due to metadata issues that do not occur on the backend.
In load_stac we know that the user requests specific bands so I suggest we throw a warning that those labels could not be found but still continue with these as new labels.

@soxofaan
Copy link
Member

soxofaan commented Mar 19, 2025

however even with #586 solved we would block the creation of a job due to metadata issues that do not occur on the backend.

That's the key question I guess: what are the right band names for that collection? For example, if I load that STAC url in this STAC browser https://radiantearth.github.io/stac-browser/#/external/stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a , it also detects these bands B01, B02, B03, ... which I'd like the Python client to do too.

Image

I think the python client should be as compliant as possible to the official/other STAC tooling, and avoid baking in a conflicting approach that is currently used in a particular backend (ours).

I however agree that we should alleviate the impedance mismatch with warnings instead of hard failures

@VictorVerhaert VictorVerhaert self-assigned this Mar 21, 2025
@soxofaan
Copy link
Member

FYI something that came up in a meeting today (with @m-mohr ):

Image

which confirms that the following snippet builds on an invalid assumption that the assets keys imply band names

 connection.load_stac(
      "https://stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a",
      ...
      bands=['B01_60m', 'B02_60m', 'B03_60m', 'B04_60m', 'B8A_60m', 'B09_60m', 'B11_60m', 'B12_60m', 'SCL_60m'],
  )

@bossie
Copy link
Collaborator

bossie commented Mar 25, 2025

That's the key question I guess: what are the right band names for that collection? For example, if I load that STAC url in this STAC browser https://radiantearth.github.io/stac-browser/#/external/stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a , it also detects these bands B01, B02, B03, ... which I'd like the Python client to do too.

The STAC browser has it easy: it merely has to list bands, not load assets. If assets with different resolutions have the same band name, there's no way to specify exactly which assets to load. Maybe the issue lies with the STAC collection itself?

@m-mohr
Copy link
Member

m-mohr commented Mar 25, 2025

@bossie Your statement is not really true, STAC Browser also tries to visualize assets. For this it also needs knowledge of the bands (and it always lists bands as you can see above?!). It tries to detect the best suitable assets and bands based on names or common names first, but let's users pick individual assets. It's fine if you let choose users specific asset keys, but that should likely not be the default.

Having that said, the S2 L2A collection from CDSE is indeed a bit weird in that it offers multiple resolutions. I recommended against it, but was overrruled in the end. Still, if you find multiple assets with the same band names, there are still heuristics how you can go ahead with it and use the metadata to hopefully figure out which to choose (e.g. based on the resolution).

The fallback can always be to let users pick asset keys.

@VictorVerhaert
Copy link
Contributor Author

That's the key question I guess: what are the right band names for that collection? For example, if I load that STAC url in this STAC browser https://radiantearth.github.io/stac-browser/#/external/stac.dataspace.copernicus.eu/v1/collections/sentinel-2-l2a , it also detects these bands B01, B02, B03, ... which I'd like the Python client to do too.

I do think the goal of the python client should be to detect the band names as described in the collection (otherwise requiring to load in specific items to determine those), and it should be the responsibility of the collection providers to provide a correct bands overview.

However I also think we should be lax in what we accept as input, so the scope of this issue is to allow a fallback on using the bands that the user specified in load_stac in case the python client cannot find them in the collection.

@soxofaan
Copy link
Member

However I also think we should be lax in what we accept as input, so the scope of this issue is to allow a fallback on using the bands that the user specified in load_stac in case the python client cannot find them in the collection

yes I'm fine with being lenient about this (e.g. warnings instead of exceptions) as long as we're still figuring out how to pragmatically cover various STAC use cases in the wild

@VictorVerhaert VictorVerhaert linked a pull request Mar 31, 2025 that will close this issue
soxofaan added a commit that referenced this issue Apr 8, 2025
- prepare logic for absent band dimension (related to #743)
- Introduce CubeMetadata.ensure_band_dimension (instead of confusing rename_labels usage)
- avoid assumption about name of band dimension
- more test coverage
soxofaan added a commit that referenced this issue Apr 14, 2025
- prepare logic for absent band dimension (related to #743)
- Introduce CubeMetadata.ensure_band_dimension (instead of confusing rename_labels usage)
- avoid assumption about name of band dimension
- more test coverage
@soxofaan
Copy link
Member

closing this one already as #755 was just merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants