Skip to content

[ENH] Refactor metrics to be polymorphic #1897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

phoeenniixx
Copy link
Contributor

@phoeenniixx phoeenniixx commented Jun 19, 2025

This PR refactors base_metrics to make them polymorphic, i.e, it can handle any dimension.
The current Metrics support the 2D ([batch, time]) and 3D ([batch, time, params]) inputs, This PR tries to add 4D ([batch, time, num_target, params]) input support to Metrics.

For this we mainly try to change the base Metric class, so that the changes are easily propagated to the child classes.
The original to_prediction and to_quantile methods (which support 3D inputs) are renamed to private functions - _to_prediction_3D and _to_quantile_3D. The new to_prediction and to_quantile also add support for 4D inputs.

Some other cases like MultiHorizonMetric, MultiLoss etc are also changed.

  • MultiLoss - Adds _prepare_y_pred to ensure y_pred is a list of tensors for mulitple types of inputs - 3D, 4D.
  • MultiHorizonMetric - make update polymorphic
  • DistributionLoss - Add 4D input support by changing to_quantile and to_prediction functions similar to (in terms of naming only) Metrics

@phoeenniixx
Copy link
Contributor Author

FYI @agobbifbk, @fkiraly , @PranavBhatP

@agobbifbk
Copy link

Very hard to debug, a lot of overhead for just computing the loss :-) @fkiraly did you write it or do you know the author? Not sure to be able to say something useful here, sorry.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am deeply uncomfortable with all the case distinctions here.
This is very difficult to parse.

What we need to work with:

  • proper docstrings. Do not just say the output is a torch.Tensor, but be clear about dimension, size, and type. To not say the output is y_pred without any type etc. Same everywhere
  • ensure that is also tested reliably. Test all changed and newly introduced methods with individual input/output pairs.

If we do not do this, we risk introducing a lot of bugs and brittleness. The code would be impossible to maintain in the current state.

Minor comment: It is easier to document clearly if you stick to the numpydoc format for docstring that we use in sktime.

@fkiraly fkiraly added enhancement New feature or request module:models labels Jun 19, 2025
@phoeenniixx
Copy link
Contributor Author

proper docstrings.

I still need to add docstrings, for now, I just copied the docstrings of original methods, I will update them with numpydoc format and add clear dimensions

ensure that is also tested reliably.

I tested the metrics using test_metrics, and all the tests passed, but test_all_estimators showed a bug in DistributionLoss functions so, I will work on it and will add tests for 4D inputs as well

"""
Convert network prediction into a point prediction.

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring should be clarified - assumed input type, guaranteed output type

def to_quantiles(
self, y_pred: torch.Tensor, quantiles: list[float] = None
) -> torch.Tensor:
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring should be clarified - assumed input type, guaranteed output type

@@ -320,6 +370,35 @@ def __len__(self) -> int:
"""
return len(self.metrics)

def _prepare_y_pred(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring should be clarified - assumed input type, guaranteed output type

def to_prediction(self, y_pred: torch.Tensor, n_samples: int = 100) -> torch.Tensor:
def _to_prediction_3d(
self, y_pred: torch.Tensor, n_samples: int = 100
) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this docstring should be clarified - assumed input type, guaranteed output type

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 19, 2025

I still need to add docstrings, for now, I just copied the docstrings of original methods, I will update them with numpydoc format and add clear dimensions

I understand that this is already how it was - which is part of the reason, perhaps, that the code base is a bit confusing.

Because we are now changing this - and working in a group of people - I think we need to be super precise as to what the contracts are.

We are in a really central and crucial location of the package API, I think only clean documentation (from the start) and systematic testing ensures that we do not introduce problems that are hard to spot later.

@phoeenniixx
Copy link
Contributor Author

phoeenniixx commented Jun 22, 2025

Hi, I tried to debug the errors, I was able to do so for most part but for these params in _deepar_pkg

dict(
loss=ImplicitQuantileNetworkDistributionLoss(hidden_size=8),
),
dict(
loss=MultivariateNormalDistributionLoss(),
trainer_kwargs=dict(accelerator="cpu"),
),

I am a little clueless :

  • for ImplicitQuantileNetworkDistributionLoss, just adding input_size = 98 in param works that means
dict(
                loss=ImplicitQuantileNetworkDistributionLoss(hidden_size=8, input_size=98),
            ),

but what I am more concerned about is why this is not required in the current implementation?
the error it shows is:

  emb_inputs = x.unsqueeze(-2) * (
            1.0 + cos_emb_tau
        )  # ... x n_quantiles x input_size
E       RuntimeError: The size of tensor a (98) must match the size of tensor b (16) at non-singleton dimension 3

it is from this line :

emb_inputs = x.unsqueeze(-2) * (
1.0 + cos_emb_tau
) # ... x n_quantiles x input_size
emb_outputs = self.output_layer(emb_inputs).squeeze(-1) # ... x n_quantiles
return emb_outputs

at first i thought this is because the default value of input_size is 16 and that is causing the issue, and i changed it to 98 and it worked! but the problem is why this was not raised in the current implementation, also I tried using a debugger to see when this shape problem occurs, but i never found the exact point where it fails as in the debugger, it always shows x to be of shape [a,b,16] (before unsqueeze) which means there should never be a requirement of 98?

Edit: I ran the debugger again but now with more brakpoints, i found that at first (for most of the y_pred, the shape is correct of x - [a,b,16], but I dont know why, suddenly after many y_preds it changes to [a,b,98].

  • for MultivariateNormalDistributionLoss:
    the tests pass if we use target_normalizer (which is the next param):
 dict(
                loss=MultivariateNormalDistributionLoss(),
                data_loader_kwargs=dict(
                    target_normalizer=GroupNormalizer(
                        groups=["agency", "sku"], transformation="log1p"
                    )
                ),
                trainer_kwargs=dict(accelerator="cpu"),
            ),

I am not sure why it is not working without normalisation

this is the error:

        K.view(-1, m * m)[:, :: m + 1] += 1  # add identity matrix to K
>       return torch.linalg.cholesky(K)
E       torch._C._LinAlgError: linalg.cholesky: (Batch element 1): The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

so the normalisation makes it compatible, i understand that, but why not without it? and how my reshaping the inputs affecting things to become "not positive-definite"

@phoeenniixx
Copy link
Contributor Author

phoeenniixx commented Jun 22, 2025

I have added the docstrings, please see if that helps or if I need to makethem more descriptive,
Also please see if my changes are breaking the code ( rn I am a little confused with what is happening in these two params as mentioned above)

I think I should start adding tests, maybe that can help as well... so for now I will work on tests

@PranavBhatP
Copy link
Contributor

PranavBhatP commented Jun 22, 2025

so the normalisation makes it compatible, i understand that, but why not without it? and how my reshaping the inputs affecting things to become "not positive-definite"

I ran a quick search through co-pilot, it says that this kind of loss always requires normalization due to its existing implementation. There is no work-around to not using normalization when using this specific loss. Ig we should stick to this, not sure about the mathematical reason behind this.

@phoeenniixx
Copy link
Contributor Author

phoeenniixx commented Jun 22, 2025

so the normalisation makes it compatible, i understand that, but why not without it? and how my reshaping the inputs affecting things to become "not positive-definite"

I ran a quick search through co-pilot, it says that this kind of loss always requires normalization due to its existing implementation. There is no work-around to not using normalization when using this specific loss. Ig we should stick to this, not sure about the mathematical reason behind this.

but then why it never failed before? I mean this param was added without normalisation right? and it passed at that time? then why now it fails?

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 22, 2025

Hi, I tried to debug the errors

FYI, unfortunately I do not know why these parameters were chosen as they were - I just transferred them from the pre-existing tests in 1.2.0 and prior, from the tests folder in root. I did notice that the test parameters were "brittle" in the sense that varying them may cause hard to diagnose failures, but I did not drill down on that.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 22, 2025

maybe sth to discuss in a tech meeting - to give any useful input, I would need to understand the exact context of what is failing and when.


Returns:
torch.Tensor: point prediction
Parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good docstring!

Regarding the API: is this "as current" or "as reworked"?

  • how do we ensure downwards compatibility with v1
  • did we not decide on (batch, variable, time, params)?

This is as current, right?

Copy link
Contributor Author

@phoeenniixx phoeenniixx Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a private method (_to_prediction_3D) - it contains the original logic of to_prediction that handles 2D and 3D inputs. If you look at the next function (to_prediction) , it contains the logic to handle the (batch, time, num_target, params).
The new to_prediction can handle 2D, 3D inputs (like the original implementation) as well as the 4D input.

For 2D, 3D inputs, to_prediction uses this _to_prediction_3D and for 4D it passes the 3D slices of [batch, time, params] for each target in num_target to _to_prediction_3D then stacks it up...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this "as current" or "as reworked"?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, can you please be more detailed of what you are doing, please?

  • what is the state prior to the PR
  • how are you changing it

Please be precise.

Also, a "refactor" means that public APIs remain the same - I guess this is not the case here? Or are you widening the contract?

@phoeenniixx
Copy link
Contributor Author

Also, a "refactor" means that public APIs remain the same - I guess this is not the case here? Or are you widening the contract?

So as discussed on discord, we decided to make the metrics polymorphic, i.e, it can handle any dimension. So if this is the case, then even the original model implementations (that has 3D outputs) are also "ideally" compatible with metrics - This is what I am trying to do. So if even the original model implementations are compatible with metrics, we are just adding the support for 4D model outputs, I thought word "refactor" would explain it well. Am I messing things up here?

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 23, 2025

Am I messing things up here?

Not sure - what I am missing is a clear understanding of the status quo design, and a clear description of the target design, both including signatures of functions with expected input and output types, assumes and guarantees.

I would strongly recommend to write a design proposal or we might end up talking cross purpose.

For terminology:

  • a "refactor" typically refers to changes in private layers that leave the logic and the public interfaces unchanged.
  • a "rework" can include changes to the API, or changes to the logic.
  • "widening the contract" means that a function or API accepts more inputs than previously, but inputs that were previously accepted are still accepted and lead to the same behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module:models
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

4 participants