Skip to content

Add provenance output support to execute() response #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion openeo/local/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
import logging
from pathlib import Path
from typing import Callable, Dict, List, Optional, Union
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is added?



import numpy as np
import xarray as xr
Expand Down Expand Up @@ -270,6 +272,7 @@ def execute(
*,
validate: Optional[bool] = None,
auto_decode: bool = True,
return_provenance: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are adding a custom argument here to a more public API of Connection.execute(), which is fine as long as you call this LocalConnection.execute() yourself. But in general this method will be called automatically without this argument (because it's not in the official API), e.g.:

cube = local_connection.load_collection(...)
res = cube.execute()  

the latter execute() is a method defined on DataCube and does not support return_provenance, let alone it will pass it properly to LocalConnection.execute

) -> xr.DataArray:
"""
Execute locally the process graph and return the result as an xarray.DataArray.
Expand All @@ -282,4 +285,16 @@ def execute(
if auto_decode is not True:
raise ValueError("LocalConnection requires auto_decode=True")
process_graph = as_flat_graph(process_graph)
return OpenEOProcessGraph(process_graph).to_callable(PROCESS_REGISTRY)()
pg = OpenEOProcessGraph(process_graph)

# Return the result and get the workflow provinance (yprov4wfs)
result = pg.to_callable(PROCESS_REGISTRY)()
workflow = pg.workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I understand this depends on a new feature of openeo-pg-parser-networkx, so the minimum version of this dependency has to be bumped at

localprocessing_require = [
"rioxarray>=0.13.0",
"pyproj",
"openeo_pg_parser_networkx>=2023.5.1",
"openeo_processes_dask[implementations]>=2023.7.1",
]


# To save the provenance file in the specific path use:
# workflow.prov_to_json(directory_path=save_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's useful to have this as comment here. If this is for users, it should be in the docblock


if return_provenance:
return result, workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This custom return should be documented in the docblock and return annotation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that being said, I'm not a big fan of the pattern of returning different data structures (tuple of two things instead of a single DataArray) depending on input arguments.
Especially, because in normal usage of the openEO python client, the execute method of connection objects (LocallConnection here) is usually not used directly by users, but indirectly through DataCube.execute() or something equivalent. So changing the input and output API of Connection.execute is going to create problems

else:
return result