Issue #761 better diff for apex reference check #765

dsamaey · 2025-04-22T17:55:33Z

No description provided.

…id handling)

… issue)

setup.py

soxofaan · 2025-05-07T12:03:10Z

openeo/testing/results.py

+    :param atol: absolute tolerance
+    :raises AssertionError: if not equal within the given tolerance
+
+    .. versionadded:: 0.31.0


next version will probably be 0.41.0

soxofaan · 2025-05-07T12:07:47Z

openeo/testing/results.py

+    data_max = diff_data.max().item()
+    if data_max == 0:
+        data_max = 1
+    grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "


just curious: where does this "gradient" come from?

also note that with the (github.com) font I'm currently looking looking at this, it doesn't seem to be monotonically increasing in "lightness": e.g. + comes after _ but it looks "darker"

I also noticed this warning when running tests:

.../openeo-python-client/openeo/testing/results.py:99: SyntaxWarning: invalid escape sequence '\|' grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "

soxofaan · 2025-05-07T12:12:37Z

openeo/testing/results.py

+    def pixelChar(v) -> str:
+        if np.isnan(v):
+            return " "
+        i = int(v * 69 / data_max)


this probably only works for positive values.

note that it's not uncommon to have EO data that is mixed negative-positive (e.g NDVI) or striclty negative (e.g. decibel scaled radar data)

also: if you are going to use this for difference data, you are probably working with mixed negative-positive data anyway

it's an absolute diff

soxofaan · 2025-05-07T12:14:26Z

openeo/testing/results.py

+    - Compare actual and expected data with `xarray.testing.assert_allclose` and specified tolerances.
+
+    :return: list of issues (empty if no issues)
+    """


can you document what the difference is with the existing _compare_xarray_dataarray ?

or isn't it possible to integrate this feature in the existing _compare_xarray_dataarray instead of duplicating most of it?

openeo/testing/results.py

soxofaan · 2025-05-07T12:30:12Z

tests/testing/test_results.py

                ],
            ),
            (
                xarray.DataArray([[1], [2], [3]]),
                [
                    "Dimension mismatch: ('dim_0', 'dim_1') != ('dim_0',)",
                    "Shape mismatch: (3, 1) != (3,)",
-                    dirty_equals.IsStr(regex="Left and right DataArray objects are not close.*", regex_flags=re.DOTALL),


I don't completely understand at the moment why these issues are not triggered anymore with the new implementation

soxofaan · 2025-05-07T12:30:56Z

tests/testing/test_results.py

-                    dirty_equals.IsStr(
-                        regex=r"Left and right DataArray objects are not close.*Differing dimensions:.*\(y: 2, x: 3\) != \(x: 2, y: 3\)",
-                        regex_flags=re.DOTALL,
-                    ),


This looks like a regression: there is interesting feedback being removed here

soxofaan · 2025-05-07T12:31:11Z

tests/testing/test_results.py

                ],
            ),
            (
                xarray.DataArray([[1, 2, 3], [4, 5, 6]], dims=["x", "z"]),
                [
                    "Dimension mismatch: ('x', 'z') != ('x', 'y')",
-                    dirty_equals.IsStr(
-                        regex=r"Left and right DataArray objects are not close.*Differing dimensions:.*\(x: 2, z: 3\) != \(x: 2, y: 3\)",


soxofaan · 2025-05-07T12:33:28Z

tests/testing/test_results.py

+            r"t 2: differing pixels: 4/20 \(20.0%\), spread over 8.3% of the area"
+        ):
+            assert_job_results_allclose(actual=actual_dir, expected=expected_dir, tmp_path=tmp_path)
+


Can you also add a simple test for ascii_art (e.g. with a simple 10 by 5 use case or something like that).
It's currently a public function, so people/projects might start depending on it

soxofaan · 2025-05-07T12:55:44Z

openeo/testing/results.py

+
+                art = ascii_art(diff_data)
+                print(f"Difference ascii art for {key}")
+                print(art)


I also have my doubts about using a raw print here, because that might not play well with pytest based reporting (e.g. we also try to produce a HTML report in APEx).

I guess it's a matter of experimenting on what currently happens here, or what alternatives are possible

Ok it does seem to work in the HTML report:

but as mentioned elsewhere, the ascii art is a bit too large for good overview in that context

soxofaan

I started experimenting with this, and got in trouble with this new scipy dependency

soxofaan · 2025-05-07T13:14:09Z

openeo/testing/results.py

 import xarray
 import xarray.testing
+from scipy.spatial import ConvexHull


This makes scipy a runtime dependency (now it's just a test dependency). Scipy is quite a heavy dependency (in terms of compiled extensions and additional transitive dependencies), which we we generally want to avoid.

If we keep it as optional dependency, this should be documented, e.g. here:

openeo-python-client/docs/installation.rst

Lines 82 to 95 in c6bc1c9

Optional dependencies

======================

Depending on your use case, you might also want to install some additional libraries.

For example:

- ``netCDF4`` or ``h5netcdf`` for loading and writing NetCDF files (e.g. integrated in ``xarray.load_dataset()``)

- ``matplotlib`` for visualisation (e.g. integrated plot functionality in ``xarray`` )

- ``pyarrow`` for (read/write) support of Parquet files

(e.g. with :py:class:`~openeo.extra.job_management.MultiBackendJobManager`)

- ``rioxarray`` for GeoTIFF support in the assert helpers from ``openeo.testing.results``

- ``geopandas`` for working with dataframes with geospatial support,

(e.g. with :py:class:`~openeo.extra.job_management.MultiBackendJobManager`)

- ``pystac_client`` for creating a STAC API Job Database (e.g. with :py:class:`~openeo.extra.job_management.stac_job_db.STACAPIJobDatabase`)

Or, is it actually necessary to have scipy as hard dependency here? Is that convex hull feature vital for the reporting quality? I think there could be a simpler fallback fraction-of-area calculation if ConvexHull is not available

for example: you use the convex hull to estimate the impacted area of coordinates with differences, but a much simpler/coarser approximation is just taking the min/max values of these coordinates to get the bounding box. Which does not require scipy

soxofaan · 2025-05-07T13:28:44Z

openeo/testing/results.py

+    data_max = diff_data.max().item()
+    if data_max == 0:
+        data_max = 1
+    grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "


I also noticed this warning when running tests:

.../openeo-python-client/openeo/testing/results.py:99: SyntaxWarning: invalid escape sequence '\|' grayscale_characters = "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,\"^`'. "

soxofaan · 2025-05-07T13:30:45Z

openeo/testing/results.py

@@ -88,6 +91,129 @@ def _as_xarray_dataarray(data: Union[str, Path, xarray.DataArray]) -> xarray.Dat
    return data


+def ascii_art(diff_data: DataArray) -> str:
+    scale: int = max(1, int(diff_data.sizes["x"] / 100))


as noted in #761 (comment) I think a size of 100 is too much

I would limit it to e.g. 40x40 or something like that

and while at it: just make this max size/width a function argument (with a reasonable default)

…x_width and aspect ratio)

refs: Open-EO/openeo-python-client#761, Open-EO/openeo-python-client#765

…eporting refs: - Open-EO/openeo-python-client#761 - Open-EO/openeo-python-client#765

… with bbox, _compare_xarray_dataarray_xy now only adds to the original xarray implementation)

Issue #761 better diff for apex reference check

bdb906b

dsamaey linked an issue Apr 22, 2025 that may be closed by this pull request

APEx reference check needs better representation (ascii art diff, diff image, statistics) #761

Closed

Issue #761 better diff for apex reference check (added scipy dependency)

e723729

soxofaan self-requested a review April 23, 2025 09:43

dsamaey added 4 commits April 30, 2025 09:26

Issue #761 better diff for apex reference check (added ascii art diff)

a25597b

Issue #761 better diff for apex reference check (added ascii art diff)

d88a124

Issue #761 better diff for apex reference check (more robust xy/yx gr…

916b868

…id handling)

Issue #761 better diff for apex reference check (fixed boundary value…

ce0cc5f

… issue)

soxofaan reviewed May 7, 2025

View reviewed changes

dsamaey added 3 commits May 8, 2025 08:46

Issue #761 better diff for apex reference check (default ascii_art ma…

3e8860f

…x_width and aspect ratio)

Issue #761 better diff for apex reference check (deps fix)

ce58771

Issue #761 better diff for apex reference check

5bcc5e7

soxofaan added a commit to ESA-APEx/apex_algorithms that referenced this pull request May 8, 2025

Issue #149 force fusets_mogpr benchmark with experimental diff tool

e11426d

refs: Open-EO/openeo-python-client#761, Open-EO/openeo-python-client#765

soxofaan added a commit to ESA-APEx/apex_algorithms that referenced this pull request May 8, 2025

Issue #149 force fusets_mogpr benchmark with experimental diff tool

e6be898

refs: Open-EO/openeo-python-client#761, Open-EO/openeo-python-client#765

soxofaan added a commit to ESA-APEx/apex_algorithms that referenced this pull request May 8, 2025

benchmarks: use experimental openeo-python-client for advanced diff r…

5554b6a

…eporting refs: - Open-EO/openeo-python-client#761 - Open-EO/openeo-python-client#765

Issue #761 better diff for apex reference check (replaced convex hull…

fcad19e

… with bbox, _compare_xarray_dataarray_xy now only adds to the original xarray implementation)

dsamaey merged commit 8bdedb9 into master May 13, 2025
15 checks passed

	Optional dependencies
	======================

	Depending on your use case, you might also want to install some additional libraries.
	For example:

	- ``netCDF4`` or ``h5netcdf`` for loading and writing NetCDF files (e.g. integrated in ``xarray.load_dataset()``)
	- ``matplotlib`` for visualisation (e.g. integrated plot functionality in ``xarray`` )
	- ``pyarrow`` for (read/write) support of Parquet files
	(e.g. with :py:class:`~openeo.extra.job_management.MultiBackendJobManager`)
	- ``rioxarray`` for GeoTIFF support in the assert helpers from ``openeo.testing.results``
	- ``geopandas`` for working with dataframes with geospatial support,
	(e.g. with :py:class:`~openeo.extra.job_management.MultiBackendJobManager`)
	- ``pystac_client`` for creating a STAC API Job Database (e.g. with :py:class:`~openeo.extra.job_management.stac_job_db.STACAPIJobDatabase`)

Issue #761 better diff for apex reference check #765

Issue #761 better diff for apex reference check #765

Uh oh!

Conversation

dsamaey commented Apr 22, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soxofaan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!