Skip to content

Conversation

MImmesberger
Copy link
Collaborator

@MImmesberger MImmesberger commented Jul 24, 2025

What problem do you want to solve?

Closes #1034

Tutorial on how to upsert

  • parameters
  • param_functions
  • policy_functions

hmgaudecker and others added 30 commits December 12, 2024 11:26
This PR adds the namespace infrastructure to GETTSIM.

- [x] Write `policy_function` decorator (rename `policy_info` and change
behavior so that a `PolicyFunction` instance is returned). ~Apply to all
TT functions.~ (that should be part of renamings)
- [x] Check that functions in module with same simple_name have the
correct start_date, end_date specs (this was removed from the
policy_info decorator).
- [x] Remove doubled levels in the functions tree automatically (to
avoid writing functions in `__init__.py`).
- [x] Go over type hints for aggregation functions.
- [x] Refactor interface module.
- [x] Implement some safety checks 
- [x] No function should have the same name as a module in the same
directory
- [x] No trailing underscores in module names (for [DAGS
PR](OpenSourceEconomics/dags#17))

---------

Co-authored-by: Marvin Immesberger <immesberger@posteo.de>
Co-authored-by: Tim Mensinger <mensingertim@gmail.com>
Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
The way we implemented the loading of namespaces in #780 does not quite
work.

We want to have them at the directory level to balance use of namespaces
and reducing the amount of qualified names.

Additionally, we had to change the order of the upsert operations in `combine_policy_functions_and_derived_functions`. Doesn't affect the happy path, but in case of conflicts the previous behaviour did not make sense.

---------

Co-authored-by: Marvin Immesberger <immesberger@posteo.de>
### What problem do you want to solve?

Uses the qualified name instead of the leaf name to look for rounding
specs in the params file. This is a temporary solution until we have
tackled #823.
### What problem do you want to solve?

This PR provides the necessary renamings of taxes and transfers
functions for #804.

ToDo:
- [x] Create new directory structure
- [x] Rename all function arguments
- [x] Set namespace of basic input variables
- [x] Update `pyproject.toml` to reflect new file structure
- [x] Make sure tests run (#841)
- [x] `kinderfreibetragempfänger` $\rightarrow$
`kinderfreibetragsempfänger`
- [x] Link issue #842 in relevant docstrings

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
### What problem do you want to solve?

This PR implementes the distinction between TTSIM (basically the
infrastructure) and DE (the German taxes and transfers) components of
GETTSIM. This was discussed
[here](#780 (comment)).

In particular, I

- Move modules from `_gettsim` to `ttsim/` or leave them in `_gettsim`
- Remove the `taxes` and `transfers` subdirs
- Split up `config.py` into a TTSIM and a DE part
- Adjust the loader accordingly
- Also split up tests in TTSIM and DE parts.
- Introduce quarters

For tests, the distinction is not always super sharp. There are some
tests that test a specific feature of the infrastructure (e.g.
vectorization), but do this by loading the functions tree from the DE
part. Still, I chose to label those tests as `ttsim`.

Similarly, we don't test `aggregate_by_p_id` directly in the `ttsim`
part, but do it by testing specific components of the TT system. I put
them in the `de` dir.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
Co-authored-by: Tim Mensinger <mensingertim@gmail.com>
### What problem do you want to solve?

Will close #852. Adapts tests to match GETTSIM src structure.
AggregationType-s instead of strings.
### What problem do you want to solve?

This PR makes a step towards separating TTSIM and GETTSIM by testing the
TTSIM infrastructure with its own instance of a fictitious taxes and
transfers system that makes use of all features.

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
Co-authored-by: Tim Mensinger <mensingertim@gmail.com>
`fg_id` creation did not work correctly for some orderings of adults
(#801). Now adds fg_id for both the einstandspartner and his children at
the same time.

- [x] Fix loop
- [x] Add test case for special case mentioned in #801

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This is a huge PR, which started innocently as a fix to #833.

In the end, it turned out to be very difficult to change things locally,
so in the process of the intense sprint during the week 7-12 April 2025, 
this ended up including the following:
- Updated type hierarchy (`TTSIMObject` as basic building block,
`PolicyInput` and `TTSIMFunction` inheriting from that, `TTSIMFunction`
has further subclasses for policy, aggregation, ...).
- Further separation of tests in ttsim / _gettsim. Including Middle
Earth Taxes an Transfers SIMulator METTSIM as tiny example for the
ttsim-side of tests (#856) and sensible structure for `_gettsim_tests`
(#858)
- Sensible treatment of Einnahmen / Einkünfte (#862)
- Specify rounding in a dataclass to be provided in the decorators
rather than referencing the yaml files from there (#859)
- Improve structure for AggregationSpecs, including an Enum for the type
of Aggregations (#860)

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
Co-authored-by: Marvin Immesberger <immesberger@posteo.de>
Co-authored-by: Marvin Immesberger <74215010+MImmesberger@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
# What problem do you want to solve?

Unify handling of dates to remove ambiguity and code duplication.

---------

Co-authored-by: Marvin Immesberger <74215010+MImmesberger@users.noreply.github.com>
We put some effort into trying to convert types. However, the code was a
mess and it would be a pain to maintain it. What Python/Pandas/Numpy/Jax
do is more than good enough for GETTSIM, too. Now that we have the
explicitly annotated `policy_inputs`, it will be easy to check and throw
errors if users want to be strict.

This PR removes the code which has been stale for the last week, anyhow.

---------

Co-authored-by: Marvin Immesberger <immesberger@posteo.de>
### What problem do you want to solve?

Fix #870 and related things. In particular, defer some checks so that
they are only done for variables that are present / set start/end dates
of explicit aggregation functions so they are derived from source
object.

---------

Co-authored-by: Marvin Immesberger <74215010+MImmesberger@users.noreply.github.com>
Co-authored-by: Tim Mensinger <mensingertim@gmail.com>
Co-authored-by: Max Jahn <max.jahn45@gmail.com>
### What problem do you want to solve?

Tests in `test_jax_jit_kindergeld.py` were failing because policy
functions were not jittable.

### Problems and Solutions

#### Non-Hashable Function in `jit`

The policy functions were non-jittable because the dataclasses were
non-frozen and had the equality argument set to True. This implies that
the dataclass get an equality method which compares the fields. To not
break the equality/hash contract (a == b implies hash(a) == hash(b)), a
dataclass with equality method that is not frozen has a deactivated
hash. This does not work with `jax.jit`, because for caching JAX
requires a hash of the object. By freezing the dataclasses they get
their hash back, and everything works nicely again with JAX.

> [!NOTE]
> Frozen dataclasses cannot have standard assignments in the post init
method. For this I had to implement a `frozen_safe_update_wrapper`.

### Todo

- [x] Freeze ttsim_objects dataclasses and update post init of
`TTSIMFunction` to be compatible
- [x] Understand why list `single_test` in `kindergeld_policy_test`
fixture has only one entry, although the yaml file says there are two
outputs
- [x] Fix `test_compute_taxes_and_transfers_kindergeld`

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
In limited set of experiments, it produced exactly the same result.
`ast.unparse` is available since Python 3.9, so it's fine to use.
- [x] Add a json (yaml) schema based on GEP-03 
- [x] Make sure manual validation of parameters passes
- [x] make a pre-commit hook out of this
### What problem do you want to solve?

Users can easily create a NestedDataDict by providing a mapper from the
paths used in the TTSIM instance to a column in the DataFrame or a
pandas Series or a single value.

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
### What problem do you want to solve?

Fix ruff complaints by

- Ignoring `synthetic.py` and its test file because those will be
rewritten soon. Same is true for `test_docs.py`.
- Removing old tests:
   - We don't need to test for type conversions anymore.
- We don't need to explicitly test for the handling of (qualified)
source column names for aggregation functions because those are handled
by `dags` now.
- Some minor adjustments to the rest of the code
### What problem do you want to solve?

Rough fix of the readthedocs build.

- Remove list of "typical outputs" (because `_gettsim.functions` does not exist anymore)
- Remove list of all policy functions (because `_gettsim.functions` does not exist anymore)
- Move outdated tutorials to a different folder. (all of them rely on `create_synthetic_data` or the visualisation mechanic, none of them works anymore (and we have rewrite them soon anyhow))

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
### What problem do you want to solve?

Vectorize the `piecewise_polynomial` function.
### What problem do you want to solve?

Infer groupings from the objects tree. This needs to be done by looking
for names in the top-level namespace that end with "_id". Filtering for
`group_creation_functions` does not work because this would miss
`hh_id`.

**Changes**

- Removed `SUPPORTED_GROUPINGS` global everywhere
- Removed explicit `groupings` argument from `compute_taxes_and_transfers`
- Added the `grouping_levels` property to the policy environment.
- Moved the `_fail_if_group_ids_are_outside_top_level_namespace` check to the policy environment.

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
- [x] Turn on mypy, ignore generic types for now (because of params)
- [x] Fix simple failures (@hmgaudecker)
- [x] Fix more complex cases (old man needs help from @timmens)

---------

Co-authored-by: Tim Mensinger <mensingertim@gmail.com>
MImmesberger and others added 14 commits July 23, 2025 15:13
Add `fail_if.backend_has_changed`. 

Lessons learned:
- Numpy can handle Jax arrays (see test)
- Jax can handle NumPy arrays that are passed as the processed data (see
test)
- The problematic case are parameters that are partialled to functions.
Unfortunately, these are typically custom objects. We to loop over them and
check whether any of them happens to be a numpy array
 (#1048)

Check whether the structure of the paths matches. E.g.:

- `input_data={"df_and_mapper": None}`: Fails because there needs to be
a dict below "df_and_mapper"
- `input_data={"not_around": None}`: Fails because `not_around` is not a
valid child of `input_data`
- `not_around=None`: Fails because not around is not a valid root node
(already taken care of by Python itself when calling `main`, but let's
be pedantic...)
…omatically created function (#1050)

### What problem do you want to solve?

Closes #1049

---------

Co-authored-by: Hans-Martin von Gaudecker <hmgaudecker@gmail.com>
### What problem do you want to solve?

- [x] Add a GEP for the revamped interface
- [x] Update earlier GEPs to reflect the changes that have become
necessary after GEP 6 (since our documentation is small, it does not
make sense to keep outdated things around).
- [x] Add the finalised schema from #880 as an appendix to GEP 3

[Resolution on Zulip.](https://gettsim.zulipchat.com/#narrow/channel/309998-GEPs/topic/GEP.2007/near/530389224)

---------

Co-authored-by: Marvin Immesberger <immesberger@posteo.de>
In sync with [TTSIM PR 1](ttsim-dev/ttsim#1),
this leaves just GETTSIM in here. Also includes the renamings in 
[TTSIM PR 3](ttsim-dev/ttsim#3), which are on
PyPI as 1.0a1

Fixes #1003.
@MImmesberger MImmesberger requested a review from hmgaudecker July 24, 2025 19:44
@hmgaudecker hmgaudecker changed the title DOC: Tutorial on how to modify the policy environment DOC: Comprehensive tutorial notebook Jul 25, 2025
Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! This is basically a tour de force of all GETTSIM features -- much more than what I had imagined here.

Maybe the title should be something like "Documentation of Taxes & Transfers Objects and how to modify a policy environment?".

Please include in the docs.

I just modified the ConsecutiveInt... example to use the function for setting it up.

@hmgaudecker
Copy link
Collaborator

hmgaudecker commented Jul 25, 2025

I just modified the ConsecutiveInt... example to use the function for setting it up.

Ah, and TTSIM #5 reminds me why I wanted to make the same comment for the piecewise polynomial thing...

Please rewrite the examples so they only use get_piecewise_parameters and the same structure of inputs as the YAML file.

@MImmesberger
Copy link
Collaborator Author

FYI I added this line in the piecewise part:

**Note:** It can be complex to build the `parameter_dict` for
`get_piecewise_parameters`. For more complex schedules, take a look at the tutorial on
piecewise polynomial functions [not available yet, add link eventually; in the mean
time, look at the implementation in GETTSIM's parameter yamls (or ask for help)].

But should be a credible promise, right?

@hmgaudecker hmgaudecker changed the title DOC: Comprehensive tutorial notebook DOC: Simple example and comprehensive how-to notebook Jul 25, 2025
@hmgaudecker
Copy link
Collaborator

@MImmesberger, I added the geburtsjahr and included the former interface playground as a simple example in the docs. If you are fine with that one, do go ahead and merge from my side!

@MImmesberger
Copy link
Collaborator Author

Thanks! Sorry for always forgetting about the changelog recently

@MImmesberger MImmesberger merged commit 85c587d into main Jul 25, 2025
9 checks passed
@MImmesberger MImmesberger deleted the upsering-tutorial branch July 25, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC: example for seriously modifying the policy environment

7 participants