Skip to content

Issue #761 better diff for apex reference check #765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dsamaey
Copy link
Contributor

@dsamaey dsamaey commented Apr 22, 2025

No description provided.

@soxofaan soxofaan self-requested a review April 23, 2025 09:43
:param atol: absolute tolerance
:raises AssertionError: if not equal within the given tolerance

.. versionadded:: 0.31.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next version will probably be 0.41.0

data_max = diff_data.max().item()
if data_max == 0:
data_max = 1
grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: where does this "gradient" come from?

also note that with the (github.com) font I'm currently looking looking at this, it doesn't seem to be monotonically increasing in "lightness": e.g. + comes after _ but it looks "darker"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed this warning when running tests:

  .../openeo-python-client/openeo/testing/results.py:99: SyntaxWarning: invalid escape sequence '\|'
    grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "

def pixelChar(v) -> str:
if np.isnan(v):
return " "
i = int(v * 69 / data_max)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably only works for positive values.

note that it's not uncommon to have EO data that is mixed negative-positive (e.g NDVI) or striclty negative (e.g. decibel scaled radar data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also: if you are going to use this for difference data, you are probably working with mixed negative-positive data anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an absolute diff

- Compare actual and expected data with `xarray.testing.assert_allclose` and specified tolerances.

:return: list of issues (empty if no issues)
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document what the difference is with the existing _compare_xarray_dataarray ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or isn't it possible to integrate this feature in the existing _compare_xarray_dataarray instead of duplicating most of it?

],
),
(
xarray.DataArray([[1], [2], [3]]),
[
"Dimension mismatch: ('dim_0', 'dim_1') != ('dim_0',)",
"Shape mismatch: (3, 1) != (3,)",
dirty_equals.IsStr(regex="Left and right DataArray objects are not close.*", regex_flags=re.DOTALL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't completely understand at the moment why these issues are not triggered anymore with the new implementation

dirty_equals.IsStr(
regex=r"Left and right DataArray objects are not close.*Differing dimensions:.*\(y: 2, x: 3\) != \(x: 2, y: 3\)",
regex_flags=re.DOTALL,
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression: there is interesting feedback being removed here

],
),
(
xarray.DataArray([[1, 2, 3], [4, 5, 6]], dims=["x", "z"]),
[
"Dimension mismatch: ('x', 'z') != ('x', 'y')",
dirty_equals.IsStr(
regex=r"Left and right DataArray objects are not close.*Differing dimensions:.*\(x: 2, z: 3\) != \(x: 2, y: 3\)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

r"t 2: differing pixels: 4/20 \(20.0%\), spread over 8.3% of the area"
):
assert_job_results_allclose(actual=actual_dir, expected=expected_dir, tmp_path=tmp_path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a simple test for ascii_art (e.g. with a simple 10 by 5 use case or something like that).
It's currently a public function, so people/projects might start depending on it


art = ascii_art(diff_data)
print(f"Difference ascii art for {key}")
print(art)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have my doubts about using a raw print here, because that might not play well with pytest based reporting (e.g. we also try to produce a HTML report in APEx).

I guess it's a matter of experimenting on what currently happens here, or what alternatives are possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it does seem to work in the HTML report:
image

but as mentioned elsewhere, the ascii art is a bit too large for good overview in that context

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started experimenting with this, and got in trouble with this new scipy dependency

import xarray
import xarray.testing
from scipy.spatial import ConvexHull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes scipy a runtime dependency (now it's just a test dependency). Scipy is quite a heavy dependency (in terms of compiled extensions and additional transitive dependencies), which we we generally want to avoid.

If we keep it as optional dependency, this should be documented, e.g. here:

Optional dependencies
======================
Depending on your use case, you might also want to install some additional libraries.
For example:
- ``netCDF4`` or ``h5netcdf`` for loading and writing NetCDF files (e.g. integrated in ``xarray.load_dataset()``)
- ``matplotlib`` for visualisation (e.g. integrated plot functionality in ``xarray`` )
- ``pyarrow`` for (read/write) support of Parquet files
(e.g. with :py:class:`~openeo.extra.job_management.MultiBackendJobManager`)
- ``rioxarray`` for GeoTIFF support in the assert helpers from ``openeo.testing.results``
- ``geopandas`` for working with dataframes with geospatial support,
(e.g. with :py:class:`~openeo.extra.job_management.MultiBackendJobManager`)
- ``pystac_client`` for creating a STAC API Job Database (e.g. with :py:class:`~openeo.extra.job_management.stac_job_db.STACAPIJobDatabase`)

Or, is it actually necessary to have scipy as hard dependency here? Is that convex hull feature vital for the reporting quality? I think there could be a simpler fallback fraction-of-area calculation if ConvexHull is not available

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example: you use the convex hull to estimate the impacted area of coordinates with differences, but a much simpler/coarser approximation is just taking the min/max values of these coordinates to get the bounding box. Which does not require scipy

data_max = diff_data.max().item()
if data_max == 0:
data_max = 1
grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed this warning when running tests:

  .../openeo-python-client/openeo/testing/results.py:99: SyntaxWarning: invalid escape sequence '\|'
    grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "

@@ -88,6 +91,129 @@ def _as_xarray_dataarray(data: Union[str, Path, xarray.DataArray]) -> xarray.Dat
return data


def ascii_art(diff_data: DataArray) -> str:
scale: int = max(1, int(diff_data.sizes["x"] / 100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted in #761 (comment) I think a size of 100 is too much

I would limit it to e.g. 40x40 or something like that

and while at it: just make this max size/width a function argument (with a reasonable default)

… with bbox, _compare_xarray_dataarray_xy now only adds to the original xarray implementation)
@dsamaey dsamaey merged commit 8bdedb9 into master May 13, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

APEx reference check needs better representation (ascii art diff, diff image, statistics)
2 participants