#775: Support multiple treatments in CausalTreeRegressor and CausalRandomForestRegressor #852
+5,407
−1,137
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
Hi!
I am happy to write that I successfully finished the solution for #775 😄
Types of changes
CausalTreeRegressor
andCausalRandomForestRegressor
now support treatment vector with an arbitrary number of groups. The Cython part is written as close as possible to scikit-learn version leveraging multioutput option as a workaround to store outcomes per each group without additional tricks.DepthFirstCausalTreeBuilder
andBestFirstCausalTreeBuilder
got rid of GIL context manager. It may speed up tree growth for large chunks of data. However, I didn't estimate the actual performance change.fit()
method now can indeed introduce weights for observationscausal_trees_with_synthetic_data_multiple_treatment_groups.ipynb
What types of changes does your code introduce to CausalML?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
An example how the outcome vector$y$ is now internally represented:
Suppose, we have n observations and m groups. The first column, Group 0, will always store outcomes for a control group. Other columns are responsible for outcomes resulted after a particular type of treatment.