Skip to content

[ENH] [WIP] Standardize model output to 4d tensor #1895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 225 commits into
base: main
Choose a base branch
from

Conversation

PranavBhatP
Copy link
Contributor

@PranavBhatP PranavBhatP commented Jun 18, 2025

Reference Issues/PRs

Fixes #1894 . Stacks on #1874.

Files changed:
- _base_model_v2.py

What does this implement/fix? Explain your changes.

The PR implements a standardize_model_output function in the v2 BaseModel class. This will be used as a helper function, to enforce the proposed standard for model tensor outputs, in the format - (batch_size, timesteps, n_features, quantiles). The PR is still a draft and needs discussion before actually enforcing the output change at the model level, since it will lead to widespread test failure due to a major breaking change in the output format.

Once the design is decided, we can proceed with trying this out with two models - dlinear and timexer.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

- If the input tensor is 2D, it is reshaped to (batch_size, timesteps, 1, 1).
- If the input tensor is 3D, it is reshaped to (batch_size, timesteps, 1, 1) for a
multivariate single-target forecast, or (batch_size, timesteps, 1, last_dim) for a univariate quantile forecast.
- If the input tensor is 4D, it is assumed to be in the shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 4D, you need to be explicit about how it is getting reshaped.

but any value can be explicitly provided. A fallback of 1 is used in case where
no information is available on ``last_dim``.

[2] This can currently handle situations where a single target is used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain that and give an example? Can you give the simplest example that shows a 4D tensor is not possible in this case? Is this, for instance, that we use squared loss for variable 1 but parametric log-loss on mean/variance (of normal) on variable 2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe using nestedtensor-s?
https://docs.pytorch.org/tutorials/prototype/nestedtensor.html
(is this a stable feature? Looks like it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try looking at this..

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 20, 2025

nice, I now get it. I left some minor remraks.

We should add tests for all individual conversion pathways here.

@PranavBhatP PranavBhatP marked this pull request as ready for review June 23, 2025 05:31
@PranavBhatP PranavBhatP moved this from In Progress to PR in progress in May - Sep 2025 mentee projects Jun 23, 2025
@PranavBhatP PranavBhatP moved this from PR in progress to In Progress in May - Sep 2025 mentee projects Jun 23, 2025
@PranavBhatP PranavBhatP moved this from In Progress to PR in progress in May - Sep 2025 mentee projects Jun 23, 2025
@agobbifbk
Copy link

My 2 cents after the discussion of today (both with @PranavBhatP and @phoeenniixx ) maybe is better to setup the output format as a list of tensors (one for each output channels) and each tensor with the shape B x L x M where B is the batch size, L the length and M is a number that depends on the loss function chosen (1 for MSE, RMSE, N_quantiles for quantile loss and 2 for NorlmalDistribuion). This solves the problem related to the loss function (we are in most the same format of V1) and the output model shape. If we plan to have different loss function for the same output channel this needs to be refined (@fkiraly is this possible in the V1 of PTF?).
For computing the number of outputs the model should provided, we can compute it in the base model class summing up all the Ms and computing a list of shapes containing the Ms so we are able to cut the final 4D tensor according to the losses defined (e.g. [1,5,2] in case of MSE, quantile and ND).
Reintroducing this feature (different loss per different channels) will imply to rethink about the final output given to the user since the number of columns can be different for each output channel, maybe a list of pandas.DataFrames?

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Jun 23, 2025

maybe is better to setup the output format as a list of tensors

Sorry, I didn't get you fully, are you suggesting that we make all outputs as a list of tensor?

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 23, 2025

  • agree with @agobbifbk - the only way I see how we can be consistent across the multioutput loss case is having the "list-of-tensor" approach or something equivalent, like nestedtensor. I do not think "list-of-tensor" is so much worse, so we can just stick with that?
  • regarding the tensors in the list, we can again try to be polymorphic.

Overall though, I think we should move towards writing an enhancement proposal, i.e., a proper API design document. This episode has shown me that this is a difficult problem where we should map out all use case vignettes and features before committing to a specific design - we have already changed it twice, partly also due to me not catching the difficulty earlier and recommending to go to API design first.

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Jun 26, 2025

@fkiraly @agobbifbk @phoeenniixx

The proposal by this PR for a standard 4d tensor, can be scrapped. A better approach is outlined in the design doc. (high-level). Why should this be scrapped?

  • Design misconception: This is something which stemmed from the lack of a design document for this PR's changes. This PR assumes two things wrong
    1. n_features being greater than one. At a fundamental level, all multi-target forecasts are list of tensor where each tensor is a single target, so an n_features dimension is redundant and does not match with the current API design.
    2. Standardizing across 2 dimensions in some cases. in case of 2d tensor inputs, we unsqueeze the tensor to a 4d tensor and handling this with the ground truth would be expensive since we again have to maintain backwards compatibility while also ensuring 4d tensor compatibility in every loss function. very tedious work.

Imo, working on the design document was a really useful step and it did cover lots of gaps. We can ignore this proposal PR for now. Maybe close this pr also?

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 27, 2025

I would agree - for me the main argument would be the mixed loss case.

Though I would not delete the branch yet, the code might still be useful as lookup later on, and we have not fully decided on a target design yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module:models
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

[ENH] standardize model output format to (batch, timesteps, features, quantiles) in v2
5 participants