Skip to content

[ENH] Implementing the iTransformer model in PTFv2.#1994

Open
JATAYU000 wants to merge 13 commits intosktime:mainfrom
JATAYU000:iTransformer
Open

[ENH] Implementing the iTransformer model in PTFv2.#1994
JATAYU000 wants to merge 13 commits intosktime:mainfrom
JATAYU000:iTransformer

Conversation

@JATAYU000
Copy link

@JATAYU000 JATAYU000 commented Nov 28, 2025

Reference Issues/PRs

Fixes #1899

What does this implement/fix? Explain your changes.

Have started interfacing iTransformer in PTFv2, from the TSlib repository thuml/iTransformer
Work in progress, would like to have suggestions on it.

What should a reviewer concentrate their feedback on?

  • Current Implementation compliance with PTFv2

Did you add any tests for the change?

Not yet

Any other comments?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

Copy link
Contributor

@PranavBhatP PranavBhatP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @JATAYU000 . I've dropped some comments on the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally for v2, all the layers of the models' architecture are present in the layers directory. Many of the layers you are using here can be directly imported from this directory, I see a lot of commonality. I would suggest only adding new layers (if not already present in layers) as a subdirectory - layers/_<layer-type>. sub_modules.py is a v1 convention

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some changes in some layers and let it be in submodules for now in draft pr, will fix this .

:, :, :N
] # filter covariates

if self.use_norm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.use_norm is not required in the model code since it will be handled by the D1/D2 layers. Normalization and denomalization need not be handled here. Simply return dec_out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Thank you for pointing that out.

}

@classmethod
def get_test_train_params(cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few more test cases here?

"""
An implementation of iTransformer model for v2 of pytorch-forecasting.

Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring for model hyperparameters is missing.

@JATAYU000
Copy link
Author

I would suggest only adding new layers (if not already present in layers) as a subdirectory - layers/_

@PranavBhatP The EncoderLayer in layers/_encoders/ requires cross_attention , but iTransformer only needs self_attention.
Should I make cross_attention optional in the existing layer? or create a separate EncoderLayer for iTransformer?

@JATAYU000 JATAYU000 requested a review from PranavBhatP December 1, 2025 14:50
@fkiraly
Copy link
Collaborator

fkiraly commented Dec 3, 2025

re layers, I would do as follows:

  • if exact same layer is available in layers, reuse it
  • add new layers in layers
  • if layer with modification is needed, add it as a separate layer
  • optionally - this PR but also can be later PR - check if multiple similar layers can be "unified" in a single layer with more parameters

@PranavBhatP
Copy link
Contributor

optionally - this PR but also can be later PR - check if multiple similar layers can be "unified" in a single layer with more parameters

@fkiraly seems like a nice good first issue?

@PranavBhatP
Copy link
Contributor

@JATAYU000 any hurdles with fixing issues in the PR?

@JATAYU000
Copy link
Author

@PranavBhatP There are'nt any hurdles, I was busy with some other projects, I needed a review on the model and layer implementation since it updates the TimeXer layer to be a common TSLib layer, Have to add tests will do that.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 99.10714% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@ea75590). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...orecasting/models/itransformer/_itransformer_v2.py 98.21% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1994   +/-   ##
=======================================
  Coverage        ?   86.77%           
=======================================
  Files           ?      168           
  Lines           ?     9817           
  Branches        ?        0           
=======================================
  Hits            ?     8519           
  Misses          ?     1298           
  Partials        ?        0           
Flag Coverage Δ
cpu 86.77% <99.10%> (?)
pytest 86.77% <99.10%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JATAYU000 JATAYU000 marked this pull request as ready for review January 21, 2026 09:45
"""
Encoder module for the TimeXer model.
Encoder module for Tslib models.
Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this to numpydoc style docstring

"""
Encoder layer for the TimeXer model.
Encoder layer for TsLib models.
Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this to numpydoc style docstring

{},
dict(d_model=16, n_heads=2, e_layers=2, d_ff=64),
dict(
d_model=32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we try one param with output_attetion=True as well? to cover all possibilities?

Copy link
Member

@phoeenniixx phoeenniixx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JATAYU000 ! I think it is almost ready. Just few comments and suggestions.

FYI @agobbifbk, @PranavBhatP

@phoeenniixx phoeenniixx added enhancement New feature or request feature request New feature or request module:models labels Feb 9, 2026
@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 99.10714% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@ea75590). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...orecasting/models/itransformer/_itransformer_v2.py 98.21% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1994   +/-   ##
=======================================
  Coverage        ?   86.77%           
=======================================
  Files           ?      168           
  Lines           ?     9817           
  Branches        ?        0           
=======================================
  Hits            ?     8519           
  Misses          ?     1298           
  Partials        ?        0           
Flag Coverage Δ
cpu 86.77% <99.10%> (?)
pytest 86.77% <99.10%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@agobbifbk
Copy link

This is misleading, timestamp are not processed by this layer :-)

enc_out = self.enc_embedding(x_enc, x_mark_enc)  # covariates (e.g timestamp) 

Moreover here I don't see any reference to the crossattention

        self.encoder = Encoder(
            [
                EncoderLayer(
                    self_attention=AttentionLayer(
                        FullAttention(
                            False,
                            self.factor,
                            attention_dropout=self.dropout,
                            output_attention=True,
                        ),
                        self.d_model,
                        self.n_heads,
                    ),
                    d_model=self.d_model,
                    d_ff=self.d_ff,
                    dropout=self.dropout,
                    activation=self.activation,
                    output_attention=True,
                )
                for _ in range(self.e_layers)
            ],
            norm_layer=torch.nn.LayerNorm(self.d_model),
            output_attention=True,
        )
        if self.n_quantiles is not None:
            self.projector = nn.Linear(
                self.d_model, self.prediction_length * self.n_quantiles, bias=True
            )
        else:
            self.projector = nn.Linear(self.d_model, self.prediction_length, bias=True)

the Encoder layer has crossattention None by default

class EncoderLayer(nn.Module):
    def __init__(
        self,
        self_attention,
        cross_attention=None,
        d_model=512,
        d_ff=None,
        dropout=0.1,
        activation="relu",
        output_attention=False,
    ):

How the user can use this? Or I'm missing something?

@JATAYU000
Copy link
Author

JATAYU000 commented Feb 10, 2026

This is misleading, timestamp are not processed by this layer :-)

The comment was cut, it was supposed to be # covariates (e.g timestamp) are embedded as tokens

Moreover here I don't see any reference to the cross_attention

iTransformer does not require to pass cross_attention since it defaults to None.
when it is passed it is included in the block, defaulting it to None was to make the layer flexible to the layers which doesn't need cross_atention as in iTransformer or any other inverse models implemented from TsLib in the future

If there is any other reason to not to default to None? I did not understand the exact problem could you please elaborate a bit more @agobbifbk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature request New feature or request module:models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] implementing iTransformer from tslib.

5 participants