Skip to content

Question about non-strict hierarchical structures #328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
macarw opened this issue Feb 16, 2025 · 4 comments · May be fixed by #368
Open

Question about non-strict hierarchical structures #328

macarw opened this issue Feb 16, 2025 · 4 comments · May be fixed by #368

Comments

@macarw
Copy link

macarw commented Feb 16, 2025

Hi! Thanks for this amazing library.

While reproducing the example notebook AustralianPrisonPopulation.ipynb, I noticed the comment:
"Since the hierarchy structure is not strict, we can't use methods such as TopDown or MiddleOut."

This raises a couple of questions:

  1. Why is that? Is there a mathematical reason preventing their use? I’d particularly appreciate a conceptual clarification on this point.
  2. Surprisingly, TopDown(method="forecast_proportions") can be used in the example. Why is forecast_proportions compatible with non-strict hierarchical structures, while average_proportions and proportion_averages are not?
  3. Additionally, none of the three MiddleOut proportions variants seem to work. However, if TopDown method with forecast_proportions is allowed, why isn’t MiddleOut with forecast_proportions also supported?

Here’s the list of reconcilers I tested using the same dataset from the AustralianPrisonPopulation example notebook:

reconcilers = [
    # Used in the example notebook
    BottomUp(),    
    MinTrace(method='mint_shrink'), 
    
    # Added by me for testing
    TopDown(method="forecast_proportions"), # Works
    
    # The following raise ValueError: "Top-down reconciliation requires strictly hierarchical structures."
    # TopDown(method="average_proportions"),  
    # TopDown(method="proportion_averages"),  
    
    # The following raise ValueError: "Middle out reconciliation requires strictly hierarchical structures."
    # MiddleOut(middle_level="Country/Legal", top_down_method="forecast_proportions"),  
    # MiddleOut(middle_level="Country/Legal", top_down_method="average_proportions"),  
    # MiddleOut(middle_level="Country/Legal", top_down_method="proportion_averages"),  
]

I’d really appreciate any clarification on this!
Thanks in advance.

@elephaint
Copy link
Contributor

Thanks for the good questions! I don't have answers to them, so feel free to find out yourself!

  1. Good point, honestly I don't know why this was done in this way. I also can't think of a conceptual reason to limit it, at least not for the point reconciliation.
  2. Again, no idea, if I'd had to guess it relates to the probabilistic case.
  3. Same, although MiddleOut isn't compatible with levels at all.

@macarw
Copy link
Author

macarw commented Apr 6, 2025

Hi! Sorry for the late reply. I was trying to think it through and give a thoughtful answer.

From my understanding, I also can’t find a solid conceptual reason to restrict any of the TopDown methods in non-strict hierarchies, at least in the point case, as you mentioned. In non-strict hierarchies, the top and bottom levels are the same regardless of all the intermediate levels one might have, so it seems like all types of proportions (whether forecast_proportions, proportion_averages, or average_proportions) could, in principle, be calculated relative to the top level.

I’m a bit more unsure about MiddleOut, though. If the selected middle level only contains partial information (as is possible in non-strict structures), maybe it’s not possible to compute all the proportions needed to reconstruct the full hierarchy, and that might be the reason why MiddleOut is currently restricted to strict hierarchies. What do you think?

I’m also not entirely sure if this reasoning applies to the probabilistic case. Maybe there’s something more subtle going on there?

Thanks again. This library is awesome and I really enjoy working with it 👏 I'm currently using it for temporal hierarchies, too.

@elephaint
Copy link
Contributor

For TopDown (and then also for MiddleOut), the issue is the intermediate levels.

Suppose we have the following hierarchies (simplified from the example):

hiers = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Gender'], 
    ['Country', 'State', 'Gender']
]

All 4 levels can be constructed using TopDown using any of the methods. The issue is that the levels will not be coherent with each other (in general). If we divide (by any method) Country into Gender, that disaggregated forecast per gender will not be the same as the sum across States of the disaggrated Gender forecast under [Country , State, Gender], as State receives a different allocation of the forecast, which in turn trickles down to Gender.

That's why it doesn't work in general for TopDown or MiddleOut - we implicitly get incoherent forecasts.

The fact that it works for TopDown(method="forecast_proportions") is a bug. We properly capture it for TopDownSparse, but not for TopDown.

@macarw
Copy link
Author

macarw commented Apr 28, 2025

I agree that it felt strange and confusing to allow using TopDown only with forecast_proportions. Raising an error for all three TopDown methods when dealing with non-strict hierarchies seems much more consistent. However, I still have a conceptual doubt.

In the example you provided:

hiers = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Gender'], 
    ['Country', 'State', 'Gender']
]

The top and bottom levels are unique.
Referring to the latest edition of Forecasting: Principles and Practice, the Pythonic Way:

“...disaggregation proportions determine how the forecasts of the Total series are to be distributed to obtain forecasts for each series at the bottom level of the structure. (…) Once the bottom-level forecasts have been generated, these are aggregated to generate coherent forecasts for the rest of the series.”

So, if both the top and bottom levels are unique, and we have the base forecast for the top-level series along with the proportions needed to obtain forecasts for all bottom-level series, wouldn't we already have all the necessary information to construct coherent forecasts for the entire hierarchy? Especially considering that BottomUp works with non-strict hierarchies in the library.

It is not clear to me why the middle levels would be a problem in this example, since it is apparently not a problem for BottomUp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants