Skip to content

Conversation

ArneDePeuter
Copy link

@ArneDePeuter ArneDePeuter commented Oct 1, 2025

Description

This PR introduces the history feature (see issue #3155), which allows transformers to access parent records directly when keep_history=True is set.
This removes the need to explicitly propagate parent fields through every child transformer, simplifying pipelines and reducing schema clutter.

Implementation

  • When a resource is defined with keep_history=True, a history dictionary is made available to all subsequent nodes.
  • Only nodes marked with keep_history=True contribute their data to this dictionary, ensuring that only the required information is retained.
  • If no node (or any of its parents) uses the history feature, all nodes receive a shared EMPTY_HISTORY object. This avoids additional allocations and keeps the overhead negligible, since only a single immutable reference is passed around.

The design follows a similar approach to the existing meta feature for consistency.
Comprehensive tests have been added to validate the behavior of this new feature.

Tiny Additional Proposal

As a follow-up improvement, we could simplify function signatures by removing the need for explicit = None defaults.
For example, instead of writing:

def transformer(item, meta=None, history=None):
    ...

we could allow:

def transformer(item, meta, history):
    ...

This would make transformer definitions cleaner and reduce boilerplate in function signatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant