Skip to content

made load_stac nicer to mismatch in band names #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

VictorVerhaert
Copy link
Contributor

Note: warning message could be made clearer but out of inspiration

@VictorVerhaert VictorVerhaert requested a review from soxofaan March 31, 2025 17:03
@VictorVerhaert
Copy link
Contributor Author

there is one test test_run_local_udf_from_file_netcdf_with_context that should not have changed in behaviour that fails.

@VictorVerhaert VictorVerhaert requested a review from soxofaan April 2, 2025 13:51
@VictorVerhaert VictorVerhaert requested a review from soxofaan April 4, 2025 15:12
cube = con120.load_stac(str(stac_path), bands=["B01", "B02"])
assert cube.metadata.band_names == ["B01", "B02"]
assert (
"The specified bands ['B01', 'B02'] are not a subset of the bands [] found in the STAC metadata (unknown bands: ['B01', 'B02']). Using specified bands as is."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... not a subset of the bands [] ...

hmm I'm a bit confused here: if there is no band dimension, doesn't the add_dimension() above at least define a band with name None ? How can the list of bands be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_band_metadata(eo_bands_location: dict) -> List[Band]:
# TODO: return None iso empty list when no metadata?
return [
Band(name=band["name"], common_name=band.get("common_name"), wavelength_um=band.get("center_wavelength"))
for band in eo_bands_location.get("eo:bands", [])
]
receives an empty dict and returns an empty lists which causes bands to be empty in
band_dimension = BandDimension(name="bands", bands=bands)

So I guess there isn't a case where the metadata has no bands dimension. I'll remove

if not metadata.has_band_dimension():
metadata = metadata.add_dimension(
name="bands",
type="bands",
label=None,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess there isn't a case where the metadata has no bands dimension

not yet, but that will be possible in the future with #743

VictorVerhaert and others added 2 commits April 8, 2025 13:18
- prepare logic for absent band dimension (related to #743)
- Introduce CubeMetadata.ensure_band_dimension (instead of confusing rename_labels usage)
- avoid assumption about name of band dimension
- more test coverage
@soxofaan
Copy link
Member

soxofaan commented Apr 8, 2025

While chewing on this PR, I decided to finetune this PR with 83ef8eb

so feel free to inverse-review :)

@@ -423,6 +423,25 @@ def add_dimension(self, name: str, label: Union[str, float], type: Optional[str]
dim = Dimension(type=type or "other", name=name)
return self._clone_and_update(dimensions=self._dimensions + [dim])

def ensure_band_dimension(
self, *, name: Optional[str] = None, bands: List[Union[Band, str]], warning: str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a fan of having a function in metadata that takes a warning as an argument just to log it. Logging could happen outside of this function imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was hesitant about that,
but forcing the caller to think of a warning message is to make clear that this function should only be used exceptionally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case I would rename it to replace_band_dimension or even set_band_dimension as ensure sound like running some checks.
Still not a big fan of the warning argument, but if we keep I'd at least make it optional.

I'm quickly working on a commit that also retains the existing bands so we don't lose fields like wavelength_um

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a note to the documentation to better explain this design

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for cases like this where there is doubt about the API that is added, simply making it somehow 'private' or 'internal' API can help to be future proof.

@VictorVerhaert
Copy link
Contributor Author

expanded the ensure_bands_dimension to preserve existing bands. Back to you for the reviews

@VictorVerhaert VictorVerhaert requested a review from soxofaan April 9, 2025 15:20
@soxofaan
Copy link
Member

soxofaan commented Apr 9, 2025

expanded the ensure_bands_dimension to preserve existing bands

I'm curious why that is necessary? When you get into the ensure_bands_dimension code path, it implies you don't trust the original band metadata (or there is none), so why doing all the effort to still copy stuff?

Is this to support a particular use case?

@VictorVerhaert
Copy link
Contributor Author

I figured we could keep as much data as possible, but indeed not for a specific use case.
Should I remove it to keep things simple or leave it as is (which is basicly just supporting another case)

@VictorVerhaert
Copy link
Contributor Author

reverted my last commit as indeed it complicates things for a cause we will not encounter
Made nesure_bands_dimension "private"

soxofaan added a commit that referenced this pull request Apr 14, 2025
- prepare logic for absent band dimension (related to #743)
- Introduce CubeMetadata.ensure_band_dimension (instead of confusing rename_labels usage)
- avoid assumption about name of band dimension
- more test coverage
@soxofaan
Copy link
Member

merged in 5e77f00

(FYI: I did a rebase first to eliminate the commit-revert pair and to get a cleaner history)

@soxofaan soxofaan closed this Apr 14, 2025
@soxofaan soxofaan deleted the issue752-load-stac-make-bands-metadata-flexible branch April 14, 2025 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load_stac - prevent setting metadata to None due to missing band labels
3 participants