[ENH] Refactor metrics to be polymorphic #1897

phoeenniixx · 2025-06-19T07:33:50Z

This PR refactors base_metrics to make them polymorphic, i.e, it can handle any dimension.
The current Metrics support the 2D ([batch, time]) and 3D ([batch, time, params]) inputs, This PR tries to add 4D ([batch, time, num_target, params]) input support to Metrics.

For this we mainly try to change the base Metric class, so that the changes are easily propagated to the child classes.
The original to_prediction and to_quantile methods (which support 3D inputs) are renamed to private functions - _to_prediction_3D and _to_quantile_3D. The new to_prediction and to_quantile also add support for 4D inputs.

Some other cases like MultiHorizonMetric, MultiLoss etc are also changed.

MultiLoss - Adds _prepare_y_pred to ensure y_pred is a list of tensors for mulitple types of inputs - 3D, 4D.
MultiHorizonMetric - make update polymorphic
DistributionLoss - Add 4D input support by changing to_quantile and to_prediction functions similar to (in terms of naming only) Metrics

phoeenniixx · 2025-06-19T07:34:21Z

FYI @agobbifbk, @fkiraly , @PranavBhatP

agobbifbk · 2025-06-19T09:00:52Z

Very hard to debug, a lot of overhead for just computing the loss :-) @fkiraly did you write it or do you know the author? Not sure to be able to say something useful here, sorry.

fkiraly

I am deeply uncomfortable with all the case distinctions here.
This is very difficult to parse.

What we need to work with:

proper docstrings. Do not just say the output is a torch.Tensor, but be clear about dimension, size, and type. To not say the output is y_pred without any type etc. Same everywhere
ensure that is also tested reliably. Test all changed and newly introduced methods with individual input/output pairs.

If we do not do this, we risk introducing a lot of bugs and brittleness. The code would be impossible to maintain in the current state.

Minor comment: It is easier to document clearly if you stick to the numpydoc format for docstring that we use in sktime.

phoeenniixx · 2025-06-19T13:05:04Z

proper docstrings.

I still need to add docstrings, for now, I just copied the docstrings of original methods, I will update them with numpydoc format and add clear dimensions

ensure that is also tested reliably.

I tested the metrics using test_metrics, and all the tests passed, but test_all_estimators showed a bug in DistributionLoss functions so, I will work on it and will add tests for 4D inputs as well

fkiraly · 2025-06-19T13:06:06Z

pytorch_forecasting/metrics/base_metrics.py

+        """
+        Convert network prediction into a point prediction.
+
+        Args:


this docstring should be clarified - assumed input type, guaranteed output type

pytorch_forecasting/metrics/base_metrics.py

fkiraly · 2025-06-19T13:06:17Z

pytorch_forecasting/metrics/base_metrics.py

+    def to_quantiles(
+        self, y_pred: torch.Tensor, quantiles: list[float] = None
+    ) -> torch.Tensor:
+        """


this docstring should be clarified - assumed input type, guaranteed output type

fkiraly · 2025-06-19T13:06:22Z

pytorch_forecasting/metrics/base_metrics.py

@@ -320,6 +370,35 @@ def __len__(self) -> int:
        """
        return len(self.metrics)

+    def _prepare_y_pred(


this docstring should be clarified - assumed input type, guaranteed output type

fkiraly · 2025-06-19T13:06:38Z

pytorch_forecasting/metrics/base_metrics.py

-    def to_prediction(self, y_pred: torch.Tensor, n_samples: int = 100) -> torch.Tensor:
+    def _to_prediction_3d(
+        self, y_pred: torch.Tensor, n_samples: int = 100
+    ) -> torch.Tensor:


this docstring should be clarified - assumed input type, guaranteed output type

fkiraly · 2025-06-19T13:08:52Z

I still need to add docstrings, for now, I just copied the docstrings of original methods, I will update them with numpydoc format and add clear dimensions

I understand that this is already how it was - which is part of the reason, perhaps, that the code base is a bit confusing.

Because we are now changing this - and working in a group of people - I think we need to be super precise as to what the contracts are.

We are in a really central and crucial location of the package API, I think only clean documentation (from the start) and systematic testing ensures that we do not introduce problems that are hard to spot later.

phoeenniixx · 2025-06-22T10:11:34Z

Hi, I tried to debug the errors, I was able to do so for most part but for these params in _deepar_pkg

pytorch-forecasting/pytorch_forecasting/models/deepar/_deepar_pkg.py

Lines 93 to 99 in 33041c7

    
           dict( 
        
               loss=ImplicitQuantileNetworkDistributionLoss(hidden_size=8), 
        
           ), 
        
           dict( 
        
               loss=MultivariateNormalDistributionLoss(), 
        
               trainer_kwargs=dict(accelerator="cpu"), 
        
           ),

I am a little clueless :

for ImplicitQuantileNetworkDistributionLoss, just adding input_size = 98 in param works that means

dict(
                loss=ImplicitQuantileNetworkDistributionLoss(hidden_size=8, input_size=98),
            ),

but what I am more concerned about is why this is not required in the current implementation?
the error it shows is:

  emb_inputs = x.unsqueeze(-2) * (
            1.0 + cos_emb_tau
        )  # ... x n_quantiles x input_size
E       RuntimeError: The size of tensor a (98) must match the size of tensor b (16) at non-singleton dimension 3

it is from this line :

pytorch-forecasting/pytorch_forecasting/metrics/distributions.py

Lines 543 to 547 in 33041c7

    
           emb_inputs = x.unsqueeze(-2) * ( 
        
               1.0 + cos_emb_tau 
        
           )  # ... x n_quantiles x input_size 
        
           emb_outputs = self.output_layer(emb_inputs).squeeze(-1)  # ... x n_quantiles 
        
           return emb_outputs

at first i thought this is because the default value of input_size is 16 and that is causing the issue, and i changed it to 98 and it worked! but the problem is why this was not raised in the current implementation, also I tried using a debugger to see when this shape problem occurs, but i never found the exact point where it fails as in the debugger, it always shows x to be of shape [a,b,16] (before unsqueeze) which means there should never be a requirement of 98?

Edit: I ran the debugger again but now with more brakpoints, i found that at first (for most of the y_pred, the shape is correct of x - [a,b,16], but I dont know why, suddenly after many y_preds it changes to [a,b,98].

for MultivariateNormalDistributionLoss:
the tests pass if we use target_normalizer (which is the next param):

 dict(
                loss=MultivariateNormalDistributionLoss(),
                data_loader_kwargs=dict(
                    target_normalizer=GroupNormalizer(
                        groups=["agency", "sku"], transformation="log1p"
                    )
                ),
                trainer_kwargs=dict(accelerator="cpu"),
            ),

I am not sure why it is not working without normalisation

this is the error:

        K.view(-1, m * m)[:, :: m + 1] += 1  # add identity matrix to K
>       return torch.linalg.cholesky(K)
E       torch._C._LinAlgError: linalg.cholesky: (Batch element 1): The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

so the normalisation makes it compatible, i understand that, but why not without it? and how my reshaping the inputs affecting things to become "not positive-definite"

phoeenniixx · 2025-06-22T10:13:28Z

I have added the docstrings, please see if that helps or if I need to makethem more descriptive,
Also please see if my changes are breaking the code ( rn I am a little confused with what is happening in these two params as mentioned above)

I think I should start adding tests, maybe that can help as well... so for now I will work on tests

PranavBhatP · 2025-06-22T10:32:38Z

so the normalisation makes it compatible, i understand that, but why not without it? and how my reshaping the inputs affecting things to become "not positive-definite"

I ran a quick search through co-pilot, it says that this kind of loss always requires normalization due to its existing implementation. There is no work-around to not using normalization when using this specific loss. Ig we should stick to this, not sure about the mathematical reason behind this.

phoeenniixx · 2025-06-22T10:35:06Z

so the normalisation makes it compatible, i understand that, but why not without it? and how my reshaping the inputs affecting things to become "not positive-definite"

I ran a quick search through co-pilot, it says that this kind of loss always requires normalization due to its existing implementation. There is no work-around to not using normalization when using this specific loss. Ig we should stick to this, not sure about the mathematical reason behind this.

but then why it never failed before? I mean this param was added without normalisation right? and it passed at that time? then why now it fails?

fkiraly · 2025-06-22T23:43:02Z

Hi, I tried to debug the errors

FYI, unfortunately I do not know why these parameters were chosen as they were - I just transferred them from the pre-existing tests in 1.2.0 and prior, from the tests folder in root. I did notice that the test parameters were "brittle" in the sense that varying them may cause hard to diagnose failures, but I did not drill down on that.

fkiraly · 2025-06-22T23:44:21Z

maybe sth to discuss in a tech meeting - to give any useful input, I would need to understand the exact context of what is failing and when.

fkiraly · 2025-06-22T23:46:46Z

pytorch_forecasting/metrics/base_metrics.py


-        Returns:
-            torch.Tensor: point prediction
+        Parameters


good docstring!

Regarding the API: is this "as current" or "as reworked"?

how do we ensure downwards compatibility with v1

did we not decide on (batch, variable, time, params)?

This is as current, right?

So this is a private method (_to_prediction_3D) - it contains the original logic of to_prediction that handles 2D and 3D inputs. If you look at the next function (to_prediction) , it contains the logic to handle the (batch, time, num_target, params).
The new to_prediction can handle 2D, 3D inputs (like the original implementation) as well as the 4D input.

For 2D, 3D inputs, to_prediction uses this _to_prediction_3D and for 4D it passes the 3D slices of [batch, time, params] for each target in num_target to _to_prediction_3D then stacks it up...

is this "as current" or "as reworked"?

fkiraly

In the PR description, can you please be more detailed of what you are doing, please?

what is the state prior to the PR
how are you changing it

Please be precise.

Also, a "refactor" means that public APIs remain the same - I guess this is not the case here? Or are you widening the contract?

phoeenniixx · 2025-06-23T09:06:59Z

Also, a "refactor" means that public APIs remain the same - I guess this is not the case here? Or are you widening the contract?

So as discussed on discord, we decided to make the metrics polymorphic, i.e, it can handle any dimension. So if this is the case, then even the original model implementations (that has 3D outputs) are also "ideally" compatible with metrics - This is what I am trying to do. So if even the original model implementations are compatible with metrics, we are just adding the support for 4D model outputs, I thought word "refactor" would explain it well. Am I messing things up here?

fkiraly · 2025-06-23T22:08:51Z

Am I messing things up here?

Not sure - what I am missing is a clear understanding of the status quo design, and a clear description of the target design, both including signatures of functions with expected input and output types, assumes and guarantees.

I would strongly recommend to write a design proposal or we might end up talking cross purpose.

For terminology:

a "refactor" typically refers to changes in private layers that leave the logic and the public interfaces unchanged.
a "rework" can include changes to the API, or changes to the logic.
"widening the contract" means that a function or API accepts more inputs than previously, but inputs that were previously accepted are still accepted and lead to the same behaviour.

refactor metrics

4de49a2

Merge branch 'main' into metrics

c9bb8ab

fkiraly requested changes Jun 19, 2025

View reviewed changes

fkiraly added enhancement New feature or request module:models labels Jun 19, 2025

fkiraly reviewed Jun 19, 2025

View reviewed changes

pytorch_forecasting/metrics/base_metrics.py Show resolved Hide resolved

fkiraly reviewed Jun 19, 2025

View reviewed changes

fkiraly moved this to PR in progress in May - Sep 2025 mentee projects Jun 20, 2025

fkiraly added this to May - Sep 2025 mentee projects Jun 20, 2025

fkiraly assigned phoeenniixx Jun 20, 2025

fkiraly mentioned this pull request Jun 21, 2025

[ENH] Request for Quantiles Implementation for NHiTS #1896

Open

phoeenniixx and others added 2 commits June 22, 2025 15:01

add docstrings and debug

312ca7b

Merge branch 'main' into metrics

7d0b0f4

fkiraly reviewed Jun 22, 2025

View reviewed changes

fkiraly requested changes Jun 22, 2025

View reviewed changes

PranavBhatP mentioned this pull request Jun 24, 2025

[ENH] Create design document for model output and metrics standardization v2 #1900

Open

[ENH] Refactor metrics to be polymorphic #1897

Are you sure you want to change the base?

[ENH] Refactor metrics to be polymorphic #1897

Uh oh!

Conversation

phoeenniixx commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Jun 19, 2025

Uh oh!

agobbifbk commented Jun 19, 2025

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoeenniixx commented Jun 19, 2025

Uh oh!

fkiraly Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fkiraly Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Jun 19, 2025

Uh oh!

phoeenniixx commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PranavBhatP commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jun 22, 2025

Uh oh!

fkiraly commented Jun 22, 2025

Uh oh!

fkiraly Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

phoeenniixx Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoeenniixx commented Jun 23, 2025

Uh oh!

fkiraly commented Jun 23, 2025

Uh oh!

Uh oh!

phoeenniixx commented Jun 19, 2025 •

edited

Loading

fkiraly left a comment •

edited

Loading

phoeenniixx commented Jun 22, 2025 •

edited

Loading

phoeenniixx commented Jun 22, 2025 •

edited

Loading

PranavBhatP commented Jun 22, 2025 •

edited

Loading

phoeenniixx commented Jun 22, 2025 •

edited

Loading

phoeenniixx Jun 23, 2025 •

edited

Loading

fkiraly left a comment •

edited

Loading