Skip to content

Draft (new feature) : Model to estimate when a intervention had effect #480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

JeanVanDyk
Copy link

@JeanVanDyk JeanVanDyk commented May 28, 2025

New Feature: InterventionTimeEstimator for Unknown Treatment Timing

This PR introduces a new model, InterventionTimeEstimator, designed to estimate when an intervention has an effect in a time series — especially in cases where the exact treatment time is unknown.


Use Case

  • Enhances the Interrupted Time Series (ITS) feature by providing a way to infer the likely time of intervention
  • Supports scenarios where the treatment onset is uncertain or delayed
  • Helps identify lagged effects between intervention and observable outcomes

This addition gives users a flexible, Bayesian approach to model treatment timing uncertainty directly within the CausalPy framework.


Notes / Open Questions

  • Where should this model fit into the CausalPy workflow?
    I’m unsure whether InterventionTimeEstimator should be integrated within the InterruptedTimeSeries (ITS) feature, or used as a standalone tool.
    This affects how a user-defined model could be supported.
    Depending on the intended usage, I can propose a solution to allow users to inject their own custom models.

  • Custom model usage — base vs. intervention
    Should users be able to:

    • Provide a custom model to represent the base time series (e.g. intercept, trend, seasonality)?
    • Provide a custom model to capture the intervention effect (e.g. shape or dynamics of the post-switch impact)?
    • Or support both?
  • Covariates
    I considered adding time-varying covariates to improve the fit. Would that be useful or out of scope?

  • Multivariate Time Series
    It's relatively easy to extend the model for multivariate input. Let me know if this is something you'd like to see.


Model Summary

  • Inputs:

    • t: 1D array of time points
    • y: 1D array of observed values
    • Optional span: restricts the window for switchpoint detection
    • Optional coords: can include seasons for modeling periodic effects
    • effect: list of components, e.g. "trend", "level", "impulse"
    • grain_season: number of time steps per season
  • Model Components:

    • Time series is modeled as:
      intercept + trend + seasonal
    • A Uniform prior is used to place a switchpoint
    • A sigmoid curve models the onset of the effect after the switchpoint, applied to the selected effect components

Feel free to share any feedback or suggestions! I'm happy to refine the model or explore extensions based on your input.


📚 Documentation preview 📚: https://causalpy--480.org.readthedocs.build/en/480/

@JeanVanDyk JeanVanDyk added enhancement New feature or request help wanted Extra attention is needed labels May 28, 2025
Copy link

codecov bot commented May 30, 2025

Codecov Report

Attention: Patch coverage is 95.28796% with 9 lines in your changes missing coverage. Please review.

Project coverage is 94.42%. Comparing base (491adc1) to head (692d85c).

Files with missing lines Patch % Lines
causalpy/experiments/interrupted_time_series.py 95.28% 5 Missing ⚠️
causalpy/pymc_models.py 95.38% 3 Missing ⚠️
causalpy/custom_exceptions.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #480      +/-   ##
==========================================
+ Coverage   94.40%   94.42%   +0.01%     
==========================================
  Files          29       29              
  Lines        2075     2241     +166     
==========================================
+ Hits         1959     2116     +157     
- Misses        116      125       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JeanVanDyk JeanVanDyk force-pushed the intervention-time-estimator branch from b7b91de to ee701f2 Compare May 30, 2025 09:50
@JeanVanDyk
Copy link
Author

Note

In my last PR, I added functionality allowing users to specify priors for the effect of the intervention. This provides more flexibility for Bayesian modeling and allows users to incorporate domain knowledge directly into the inference process.

@drbenvincent drbenvincent marked this pull request as draft May 31, 2025 05:28
Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JeanVanDyk. I'm very excited about the new addition, thanks for putting it together. Bear with me if I'm sometimes slow to reply - busy dad life!

At this point, could I request that you put together an ipynb file that showcases the functionality. I'm assuming this PR will evolve and go through a couple of iterations, so this notebook will end up being a new docs page to help users understand the new functionality. If you edit the ./docs/source/notebooks/index.md file, you can add the notebook name under the interrupted time series section. That way the notebook will render if you either build the docs locally, but it should also render in the remote docs preview build.

Did you still want feedback on the GitHub Discussion at this point?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@JeanVanDyk
Copy link
Author

Simplified .fit() API

The InterventionTimeEstimator model has been streamlined. The .fit() method now only takes:

  • X: covariates
  • y: observed values
  • coords: coordinate labels for dimensions

Cleaner and more straightforward to use.


Optional Prior Specification

You can now pass a priors dictionary at init time to control intervention effects:

  • "level" and "trend"[mu, sigma] (Normal prior)
  • "impulse"[mu, sigma1, sigma2]
    • mu: amplitude mean (Normal)
    • sigma1: amplitude std (Normal)
    • sigma2: decay rate std (HalfNormal)

If an effect is omitted or an empty list is provided, the model uses default priors.


Support for Inferred Intervention Time

InterruptedTimeSeries now works with models that infer the treatment time.

  • Before fit():

    • If treatment_time is None or a Tuple, the model uses the full dataset to infer the switchpoint.
    • treatment_time is passed to the model via set_time_range().
  • After fit():

    • The inferred switchpoint is retrieved.
    • data_pre is reset accordingly for prediction, as in the standard flow.

Timeline Requirement

The time column t must be the last column of X (excluding y).
This is how the model locates the timeline internally.


Extras and Ongoing Work

  • Included a demo notebook for quick experimentation.
  • Currently working on:
    • More robust input_validation() inside InterruptedTimeSeries.
    • Supporting timelines expressed as datetime objects.

@JeanVanDyk JeanVanDyk requested a review from drbenvincent June 18, 2025 14:53
@drbenvincent
Copy link
Collaborator

@JeanVanDyk I just did a quick update from main. Please make sure you pull the latest to avoid any conflicts :)

Hoping to start taking a decent look at this now. Will try to provide some comments even though I know this is not 100% finished.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you run make uml? This will generate updated uml plots (e.g. classes.png)
  • Looking at that generated class diagram (and not yet having looked at the InterventionTimeEstimator the hope would be to avoid overwriting the methods calculate_impact, predict, score and fit methods. These are hopefully very generic and should hopefully be able to be dealt with in the PyMCModel base class. Like I say, I've not yet taken a look at the code itself, but this would be the hope.
  • I cannot currently run the new notebook. When the first model fit is attempted I get this
Traceback TypingError Traceback (most recent call last) Cell In[4], [line 10](vscode-notebook-cell:?execution_count=4&line=10) 2 from causalpy.pymc_models import InterventionTimeEstimator as ITE 4 model = ITE( 5 time_variable_name="t", 6 treatment_type_effect={"level": []}, 7 sample_kwargs={"sample_seed": seed}, 8 ) ---> [10](vscode-notebook-cell:?execution_count=4&line=10) result = ITS( 11 data=df, 12 treatment_time=None, 13 formula="y ~ 1 + t", 14 model=model, 15 )

File ~/git/CausalPy/causalpy/experiments/interrupted_time_series.py:284, in InterruptedTimeSeries.init(self, data, treatment_time, formula, model, **kwargs)
281 raise ValueError("Model type not recognized")
283 # score the goodness of fit to the pre-intervention data
--> 284 self.score = self.model.score(X=self.pre_X, y=self.pre_y)
286 # Postprocessing with handler
287 self.datapre, self.datapost, self.pre_y, self.pre_X, self.treatment_time = (
288 self.handler.data_postprocessing(
289 self.model, data, idata, treatment_time, self.pre_y, self.pre_X
290 )
291 )

File ~/git/CausalPy/causalpy/pymc_models.py:773, in InterventionTimeEstimator.score(self, X, y)
771 mu_ts = az.extract(mu_ts, group="posterior_predictive", var_names="mu_ts").T
772 # Note: First argument must be a 1D array
--> 773 return r2_score(y.data, mu_ts)

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats.py:1167, in r2_score(y_true, y_pred)
1134 def r2_score(y_true, y_pred):
1135 """R² for Bayesian regression models. Only valid for linear models.
1136
1137 Parameters
(...) 1165
1166 """
-> 1167 r_squared = r2_samples(y_true=y_true, y_pred=y_pred)
1168 return pd.Series([np.mean(r_squared), np.std(r_squared)], index=["r2", "r2_std"])

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats.py:1127, in r2_samples(y_true, y_pred)
1125 var_e = _numba_var(svar, np.var, (y_true - y_pred))
1126 else:
-> 1127 var_y_est = _numba_var(svar, np.var, y_pred, axis=1)
1128 var_e = _numba_var(svar, np.var, (y_true - y_pred), axis=1)
1129 r_squared = var_y_est / (var_y_est + var_e)

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/utils.py:372, in _numba_var(numba_function, standard_numpy_func, data, axis, ddof)
350 """Replace the numpy methods used to calculate variance.
351
352 Parameters
(...) 369
370 """
371 if Numba.numba_flag:
--> 372 return numba_function(data, axis=axis, ddof=ddof)
373 else:
374 return standard_numpy_func(data, axis=axis, ddof=ddof)

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats_utils.py:535, in stats_variance_2d(data, ddof, axis)
533 var = np.zeros(a_a)
534 for i in range(a_a):
--> 535 var[i] = stats_variance_1d(data[i], ddof=ddof)
536 else:
537 var = np.zeros(b_b)

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/utils.py:205, in maybe_numba_fn.call(self, *args, **kwargs)
203 """Call the jitted function or normal, depending on flag."""
204 if Numba.numba_flag:
--> 205 return self.numba_fn(*args, **kwargs)
206 else:
207 return self.function(*args, **kwargs)

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/numba/core/dispatcher.py:424, in _DispatcherBase._compile_for_args(self, *args, **kws)
420 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
421 f"by the following argument(s):\n{args_str}\n")
422 e.patch_message(msg)
--> 424 error_rewrite(e, 'typing')
425 except errors.UnsupportedError as e:
426 # Something unsupported is present in the user code, add help info
427 error_rewrite(e, 'unsupported_error')

File ~/mambaforge/envs/CausalPy/lib/python3.13/site-packages/numba/core/dispatcher.py:365, in _DispatcherBase._compile_for_args..error_rewrite(e, issue_type)
363 raise e
364 else:
--> 365 raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /Users/benjamv/mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats_utils.py (517)

File "../../../../../mambaforge/envs/CausalPy/lib/python3.13/site-packages/arviz/stats/stats_utils.py", line 517:
def copy(self, deep=True): # pylint:disable=overridden-final-method

@conditional_jit(nopython=True)
^

During: Pass nopython_type_inference

This error may have been caused by the following argument(s):

  • argument 0: Cannot determine Numba type of <class 'xarray.core.dataarray.DataArray'>
  • I think there should be no changes to its_skl.ipynb right? Can we revert those changes back to the state on main, so there's no change in that file in this PR.
  • Same for its_covid.ipynb?

@@ -40,6 +40,7 @@ did_pymc_banks.ipynb
its_skl.ipynb
its_pymc.ipynb
its_covid.ipynb
its_no_treatment_time.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. But in the notebook can you ensure you only have ONE top level markdown header. Otherwise it adds these as additional entries into the How To index page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure. I'll fix that!

As for the two notebooks you mentioned, I don't recall making any changes to them. I only ran them to check whether the modifications contained any small mistakes. I'll try reverting those changes.

@drbenvincent
Copy link
Collaborator

drbenvincent commented Jun 20, 2025

Could you send me a screenshot of the graphviz? At the moment I can't get the result because of the error mentioned above. But if it's working for you, then you can do this. It would be useful for me to get the birds-eye-view of the model as I'm reading the code.

Screenshot 2025-06-20 at 14 27 34

(that example is for the existing InterruptedTimeSeries model)

@JeanVanDyk
Copy link
Author

JeanVanDyk commented Jun 20, 2025

Here is the graph @drbenvincent :
image

@JeanVanDyk JeanVanDyk marked this pull request as ready for review June 24, 2025 10:07
@JeanVanDyk
Copy link
Author

Hi @drbenvincent,

I've made some changes and will summarize them here, addressing the points you raised:

  • Regarding the "overwriting of methods" : While I was able to remove the custom implementations of fit and calculate_impact, I still need to override predict because I need the model to resample additional variables. The core of this necessity lies in how the model infers the observed data. You can think of the time series as a combination of two components:

    • The intervention effect, denoted mu_in
    • The base time series, which I’ve named mu

    For plotting, computation, and overall coherence, I believe it makes more sense to name the base time series mu. As in classic interrupted time series models, the idea is for mu to represent the time series without the causal effect, so we can assess the intervention’s impact by comparing it to the full signal.

    However, since this model also infers the causal effect in mu_in, the observed data is associated with mu_ts = mu_in + mu. That’s why predict needs to return both mu_ts and mu_in. I considered swapping the roles of mu_ts and mu, but that would require significant changes in the plotting logic.

    Similarly, I had to adjust the score method so that it computes the score between mu_ts and the observed data. I believe this provides a clearer picture of how well the model captures the actual time series.

  • Regarding the notebooks: I’ve removed the changes in the one you mentioned, updated the notebook I had modified, and ensured that there is now only one top-level Markdown header in each.

  • Regarding the meeting we had:

    • On the time column: I’ve adjusted the implementation so that the user now specifies the name of the time column, rather than being forced to call it "t". However, since the model infers the time of intervention using this column, it still needs to be accessed in the HandlerUTT class. To preserve API compatibility, I now retrieve it via model.time_variable_name, which works as expected. That said, if a user provides a custom model for detecting an unknown treatment time, it must expose this attribute.

    • On uncertainty: I now compute the cumulative causal impact by aggregating the estimated impact across all posterior samples, at each time step. For each sample, I set the causal impact to zero at all time steps preceding its own inferred treatment time. This ensures that the cumulative sum reflects only the post-treatment causal effect, specific to each sample, and therefore fully incorporates both uncertainty in the effect size and in the timing of the intervention.

    • You’ve probably already seen the new examples I added to the notebooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants