Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bin/build-image
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ bin/clean-images
# version number from `docker/service_version.txt`.
# - "latest", so the test Dockerfile can use the service image as a base image.
#
docker build -t ${image}:${tag} -t ${image}:latest -f docker/service.Dockerfile .
docker build --platform linux/amd64 -t ${image}:${tag} -t ${image}:latest -f docker/service.Dockerfile .
2 changes: 1 addition & 1 deletion bin/build-test
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ if [ ! -z "$old" ] && [ "$2" != "--no-delete" ]; then
fi

# Build the image
docker build -t ${image}:${tag} -f docker/tests.Dockerfile .
docker build --platform linux/amd64 -t ${image}:${tag} -f docker/tests.Dockerfile .
2 changes: 1 addition & 1 deletion bin/run-test
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ mkdir -p coverage

# Run the tests in a Docker container with mounted volumes for XML report
# output and test coverage reporting
docker run --rm \
docker run --platform linux/amd64 --rm \
-v $(pwd)/test-reports:/home/tests/reports \
-v $(pwd)/coverage:/home/tests/coverage \
ghcr.io/nasa/harmony-opendap-subsetter-test "$@"
39 changes: 39 additions & 0 deletions hoss/coordinate_utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,3 +405,42 @@ def interpolate_dim_values_from_sample_pts(
dim_max = dim_values[1] + (dim_resolution * (dim_size - 1 - dim_indices[1]))

return np.linspace(dim_min, dim_max, dim_size)


def create_dimension_arrays_from_geotransform(
prefetch_dataset: Dataset,
latitude_coordinate: VariableFromDmr,
projected_dimension_names: list[str],
geotranform,
) -> dict[str, np.ndarray]:
"""Generate artificial 1D dimensions scales from geotransform"""
lat_arr = get_2d_coordinate_array(
prefetch_dataset,
latitude_coordinate.full_name_path,
)

# compute the x,y locations along a column and row
column_dimensions = [
col_row_to_xy(geotranform, i, 0) for i in range(lat_arr.shape[1])
]
row_dimensions = [col_row_to_xy(geotranform, 0, i) for i in range(lat_arr.shape[0])]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a massive nit. but can you use different temp vars in your comprehensions? I would use i, and j and follow the standard conventions, or I'd probably just use row and col.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should go look and see if this was in my code too :awkwardsockmonkey:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also just so I understand, this routine doesn't need lat_arr, an array of the latitudes, it just needs the shape of that variable? is there a way to get that without reading the whole array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a current way I'm aware of to get the shape of the variable without reading the whole array. DAS-2287 is addressing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use 'row' and 'col' in 7d5e34c

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good


# pull out dimension values
x_values = np.array([x for x, y in column_dimensions], dtype=np.float64)
y_values = np.array([y for x, y in row_dimensions], dtype=np.float64)
projected_y, projected_x = tuple(projected_dimension_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you copied this from the other code, but why not use * notation?
of if you know that projected_dimension_names is alway 2 values, just do direct unpacking

    projected_y, projected_x = projected_dimension_names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated in 7d5e34c


return {projected_y: y_values, projected_x: x_values}


def col_row_to_xy(geotransform, col: int, row: int) -> tuple[np.float64, np.float64]:
"""Convert grid cell location to x,y coordinate."""
# Geotransform is defined from upper left corner as (0,0), so adjust
# input value to the center of grid at (.5, .5)
adj_col = col + 0.5
adj_row = row + 0.5

x = geotransform[0] + adj_col * geotransform[1] + adj_row * geotransform[2]
y = geotransform[3] + adj_col * geotransform[4] + adj_row * geotransform[5]

return x, y
104 changes: 95 additions & 9 deletions hoss/hoss_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -119,13 +119,27 @@
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FT(P|P_E)",
"ShortNamePath": "SPL3FTP",
"VariablePattern": "(?i).*polar.*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be

Suggested change
"VariablePattern": "(?i).*polar.*"
"VariablePattern": "(?i).*Polar.*"

Well, I look dumb, the (?i) is the case insensitive flag..

That aside, if this is only for SPL3FTP, the group is defined as Freeze_Thaw_Retrieval_Data_Polar and I'm assuming (from looking at the file) the variables don't have the name polar except in their fully qualified path.

e.g.
Variable full name: Freeze_Thaw_Retrieval_Data_Polar/open_water_body_fraction
So why not just be explicit and make that

 "VariablePattern": "Freeze_Thaw_Retrieval_Data_Polar.*"

Would that work and be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would make it more clear. I've updated this in 7d5e34c. A leading slash is required so I decided to use:

"VariablePattern": "/Freeze_Thaw_Retrieval_Data_Polar/.*"

},
"Attributes": [
{
"Name": "grid_mapping",
"Value": "/EASE2_polar_projection"
"Value": "/EASE2_polar_projection_36km"
}
],
"_Description": "SMAP L3 collections omit polar grid mapping information"
},
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FTP_E",
"VariablePattern": "(?i).*polar.*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 7d5e34c

},
"Attributes": [
{
"Name": "grid_mapping",
"Value": "/EASE2_polar_projection_9km"
}
],
"_Description": "SMAP L3 collections omit polar grid mapping information"
Expand All @@ -134,7 +148,7 @@
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3SMP_E",
"VariablePattern": "Soil_Moisture_Retrieval_Data_(A|P)M/.*"
"VariablePattern": "/Soil_Moisture_Retrieval_Data_(A|P)M/.*"
},
"Attributes": [
{
Expand All @@ -148,12 +162,12 @@
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3SMP_E",
"VariablePattern": "Soil_Moisture_Retrieval_Data_Polar_(A|P)M/.*"
"VariablePattern": "/Soil_Moisture_Retrieval_Data_Polar_(A|P)M/.*"
},
"Attributes": [
{
"Name": "grid_mapping",
"Value": "/EASE2_polar_projection"
"Value": "/EASE2_polar_projection_9km"
}
],
"_Description": "SMAP L3 collections omit polar grid mapping information"
Expand All @@ -166,15 +180,15 @@
"Attributes": [
{
"Name": "grid_mapping",
"Value": "/EASE2_polar_projection"
"Value": "/EASE2_polar_projection_3km"
}
],
"_Description": "SMAP L3 collections omit polar grid mapping information"
},
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3SM(P|A|AP)|SPL2SMAP_S"
"ShortNamePath": "SPL3SM(P|A|AP)$|SPL2SMAP_S"
},
"Attributes": [
{
Expand Down Expand Up @@ -217,8 +231,8 @@
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FT(A|P|P_E)|SPL3SM(P|P_E|A|AP)|SPL2SMAP_S",
"VariablePattern": "/EASE2_polar_projection"
"ShortNamePath": "SPL3FTA",
"VariablePattern": "/EASE2_polar_projection_3km"
},
"Attributes": [
{
Expand All @@ -240,6 +254,78 @@
{
"Name": "false_northing",
"Value": 0.0
},
{
"Name": "master_geotransform",
"Value": [-9000000, 3000, 0, 9000000, 0, -3000]
}
],
"_Description": "Provide missing polar grid mapping attributes for SMAP L3 collections."
},
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FT_E|SPL3SMP_E",
"VariablePattern": "/EASE2_polar_projection_9km"
},
"Attributes": [
{
"Name": "grid_mapping_name",
"Value": "lambert_azimuthal_equal_area"
},
{
"Name": "longitude_of_projection_origin",
"Value": 0.0
},
{
"Name": "latitude_of_projection_origin",
"Value": 90.0
},
{
"Name": "false_easting",
"Value": 0.0
},
{
"Name": "false_northing",
"Value": 0.0
},
{
"Name": "master_geotransform",
"Value": [-9000000, 9000, 0, 9000000, 0, -9000]
}
],
"_Description": "Provide missing polar grid mapping attributes for SMAP L3 collections."
},
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FTP",
"VariablePattern": "/EASE2_polar_projection_36km"
},
"Attributes": [
{
"Name": "grid_mapping_name",
"Value": "lambert_azimuthal_equal_area"
},
{
"Name": "longitude_of_projection_origin",
"Value": 0.0
},
{
"Name": "latitude_of_projection_origin",
"Value": 90.0
},
{
"Name": "false_easting",
"Value": 0.0
},
{
"Name": "false_northing",
"Value": 0.0
},
{
"Name": "master_geotransform",
"Value": [-9000000, 36000, 0, 9000000, 0, -36000]
}
],
"_Description": "Provide missing polar grid mapping attributes for SMAP L3 collections."
Expand Down
17 changes: 11 additions & 6 deletions hoss/projection_utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,17 @@
Shape = Union[LineString, Point, Polygon, MultiShape]


def get_variable_crs(variable: str, varinfo: VarInfoFromDmr) -> CRS:
def get_variable_crs(cf_attributes: str) -> CRS:
"""Create a `pyproj.CRS` object from the grid mapping variable metadata
attributes.

"""
return CRS.from_cf(cf_attributes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead, you left this signature as it was, and moved a call to get_grid_mapping_attributes into here, then you wouldn't need to make two calls below and you've still exposed a function to get the cf_attributes.

And then actually, you could have another get_master_geotransform() function you could use down near L259 where you are switching how you get your dimension arrays.

let me know if that makes sense or sounds stupid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1b23ebd


def get_grid_mapping_attributes(variable: str, varinfo: VarInfoFromDmr) -> CRS:
"""Check the metadata attributes for the variable to find the associated
grid mapping variable. Create a `pyproj.CRS` object from the grid
mapping variable metadata attributes.
grid mapping variable.

All metadata attributes that contain references from one variable to
another are stored in the `Variable.references` dictionary attribute
Expand All @@ -70,7 +77,7 @@ def get_variable_crs(variable: str, varinfo: VarInfoFromDmr) -> CRS:
cf_attributes = varinfo.get_missing_variable_attributes(grid_mapping)

if cf_attributes:
crs = CRS.from_cf(cf_attributes)
return cf_attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this and going to complain that the complexity is a little out of hand, but I see that none of it was you... So this seem fine.

Copy link
Contributor

@vutrannasa vutrannasa Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought for optimization if grid_mapping_variable.attributes and varinfo.get_missing_variable_attributes() always return value

if grid_mapping_variable is not None:
return grid_mapping_variable.attributes

check for any overrides
return varinfo.get_missing_variable_attributes(grid_mapping)

Remove the if cf_attributes:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine now.

else:
raise MissingGridMappingVariable(grid_mapping, variable)

Expand All @@ -80,8 +87,6 @@ def get_variable_crs(variable: str, varinfo: VarInfoFromDmr) -> CRS:
else:
raise MissingGridMappingMetadata(variable)

return crs


def get_projected_x_y_variables(
varinfo: VarInfoFromDmr, variable: str
Expand Down
33 changes: 23 additions & 10 deletions hoss/spatial.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
)
from hoss.coordinate_utilities import (
create_dimension_arrays_from_coordinates,
create_dimension_arrays_from_geotransform,
get_coordinate_variables,
get_dimension_array_names_from_coordinate_variables,
get_variables_with_anonymous_dims,
Expand All @@ -49,6 +50,7 @@
get_dimension_index_range,
)
from hoss.projection_utilities import (
get_grid_mapping_attributes,
get_projected_x_y_extents,
get_projected_x_y_variables,
get_variable_crs,
Expand Down Expand Up @@ -183,7 +185,10 @@ def get_projected_x_y_index_ranges(
and projected_y is not None
and not set((projected_x, projected_y)).issubset(set(index_ranges.keys()))
):
crs = get_variable_crs(non_spatial_variable, varinfo)
grid_mapping_attributes = get_grid_mapping_attributes(
non_spatial_variable, varinfo
)
crs = get_variable_crs(grid_mapping_attributes)

x_y_extents = get_projected_x_y_extents(
dimensions_file[projected_x][:],
Expand Down Expand Up @@ -245,19 +250,27 @@ def get_x_y_index_ranges_from_coordinates(

"""

crs = get_variable_crs(non_spatial_variable, varinfo)
grid_mapping_attributes = get_grid_mapping_attributes(non_spatial_variable, varinfo)
crs = get_variable_crs(grid_mapping_attributes)

projected_dimension_names = get_dimension_array_names_from_coordinate_variables(
varinfo, non_spatial_variable
)

dimension_arrays = create_dimension_arrays_from_coordinates(
prefetch_coordinate_datasets,
latitude_coordinate,
longitude_coordinate,
crs,
projected_dimension_names,
)
if 'master_geotransform' in grid_mapping_attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked in a huddle about this. I'm seeing that it's used specifically for getting ranges, and this is sorta at the level of thought process in the function, so I don't think it buys you anything to bury this in another function just to hide this if statement.

I do still think you might have a dedicated get_master_geotransform() function that would hide all calls to get_grid_mapping_attributes from this level of abstraction (like I mentioned above) Also why not primary_geotransform instead of master_geotransform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to implement a dedicated get_master_geotransform() function as you suggest. As for master_geotransform vs primary_geotransform I went with the naming suggested by @D-Auty.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - "master geotransform" here refers to the notion that the geotransform for a "whole earth" grid - e.g., what is defined in the GPD files, and for the EASE-GRID standard grids. A geotransform as a granule attribute by itself could be considered redundant with the dimension variables - which provide the "coordinate" in meters for the data arrays, but is specific to the granule and the array sizes. A "master geotransform" would be a collection level attribute, applicable across many granules of different sizes (e.g. tiles) and likely, many collections even. Hopefully the reference of master geotransform avoids the confusion with the specific extents of the granule itself. It seemed that master was a better reference than primary in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1b23ebd

dimension_arrays = create_dimension_arrays_from_geotransform(
prefetch_coordinate_datasets,
latitude_coordinate,
projected_dimension_names,
grid_mapping_attributes['master_geotransform'],
)
else:
dimension_arrays = create_dimension_arrays_from_coordinates(
prefetch_coordinate_datasets,
latitude_coordinate,
longitude_coordinate,
crs,
projected_dimension_names,
)

projected_y, projected_x = dimension_arrays.keys()

Expand Down
Loading