Skip to content

Add provenance output support to execute() response #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hapix
Copy link

@hapix hapix commented May 5, 2025

Summary

This PR adds support for returning workflow provenance data as part of the execute() result in the OpenEO Python client. It complements the provenance generation added in openeo-pg-parser-networkx via the yProv4WFS library.

Key Changes

  • Added optional provenance output to execute() result structure

Dependencies

This PR depends on the provenance functionality in openeo-pg-parser-networkx.

@@ -2,6 +2,8 @@
import logging
from pathlib import Path
from typing import Callable, Dict, List, Optional, Union
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is added?


# Return the result and get the workflow provinance (yprov4wfs)
result = pg.to_callable(PROCESS_REGISTRY)()
workflow = pg.workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I understand this depends on a new feature of openeo-pg-parser-networkx, so the minimum version of this dependency has to be bumped at

localprocessing_require = [
"rioxarray>=0.13.0",
"pyproj",
"openeo_pg_parser_networkx>=2023.5.1",
"openeo_processes_dask[implementations]>=2023.7.1",
]

workflow = pg.workflow

# To save the provenance file in the specific path use:
# workflow.prov_to_json(directory_path=save_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's useful to have this as comment here. If this is for users, it should be in the docblock

# workflow.prov_to_json(directory_path=save_path)

if return_provenance:
return result, workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This custom return should be documented in the docblock and return annotation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that being said, I'm not a big fan of the pattern of returning different data structures (tuple of two things instead of a single DataArray) depending on input arguments.
Especially, because in normal usage of the openEO python client, the execute method of connection objects (LocallConnection here) is usually not used directly by users, but indirectly through DataCube.execute() or something equivalent. So changing the input and output API of Connection.execute is going to create problems

@@ -270,6 +272,7 @@ def execute(
*,
validate: Optional[bool] = None,
auto_decode: bool = True,
return_provenance: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are adding a custom argument here to a more public API of Connection.execute(), which is fine as long as you call this LocalConnection.execute() yourself. But in general this method will be called automatically without this argument (because it's not in the official API), e.g.:

cube = local_connection.load_collection(...)
res = cube.execute()  

the latter execute() is a method defined on DataCube and does not support return_provenance, let alone it will pass it properly to LocalConnection.execute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants