[ENH] `xLSTMTime` implementation #1709

phoeenniixx · 2024-11-09T11:31:25Z

Description

This PR tries to implement xLSTMTime based on this paper

Checklist

Linked issues (if existing)
Amended changelog for large changes (and added myself there as contributor)
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

codecov · 2024-11-09T11:48:36Z

Codecov Report

❌ Patch coverage is 95.65217% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@a88a404). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...orch_forecasting/layers/_recurrent/_slstm/layer.py	91.11%	4 Missing ⚠️
...orch_forecasting/layers/_recurrent/_mlstm/layer.py	93.18%	3 Missing ⚠️
...torch_forecasting/layers/_recurrent/_slstm/cell.py	95.08%	3 Missing ⚠️
pytorch_forecasting/models/xlstm/_xlstm.py	96.72%	2 Missing ⚠️
...torch_forecasting/layers/_recurrent/_mlstm/cell.py	98.27%	1 Missing ⚠️
...ch_forecasting/layers/_recurrent/_mlstm/network.py	93.75%	1 Missing ⚠️
...ch_forecasting/layers/_recurrent/_slstm/network.py	95.23%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1709   +/-   ##
=======================================
  Coverage        ?   87.39%           
=======================================
  Files           ?      113           
  Lines           ?     8419           
  Branches        ?        0           
=======================================
  Hits            ?     7358           
  Misses          ?     1061           
  Partials        ?        0

Flag	Coverage Δ
cpu	`87.39% <95.65%> (?)`
pytest	`87.39% <95.65%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

phoeenniixx · 2024-11-09T11:59:09Z

hi @fkiraly, I am new to pytorch-forecasting and its tests and all, can you please tell me exactly what am I "missing"?

phoeenniixx · 2024-11-09T19:56:59Z

Will these tests suffice @fkiraly?

benHeid

Hi @phoeenniixx,
welcome to pytorch-forecasting and thank you for your pull request and contributing xlstm.
I added first comments about the BaseClass you used. Please change it to one of the BaseClasses (see the comment). Since I suppose that this will change your code a bit. I will wait with a complete review until you changed it.

benHeid · 2024-12-08T07:54:37Z

pytorch_forecasting/models/xLSTMTime/xLSTMTime.py

+        return trend, seasonal
+
+
+class xLSTMTime(nn.Module):


Please use the Base classes of pytorch-forecasting (BaseModelWithCovariates, etc.) depending on the properties of the forecaster.
The advantage of doing this is that it automatically comes with PyTorch lightning and thus less boilerplate is needed.

You might compare it with the NHITS implementation and check how it is implemented.

benHeid · 2024-12-08T07:57:09Z

pytorch_forecasting/models/xLSTMTime/xLSTMTime.py

Please ensure that the naming conventions of files are met. I.e., only lower case is allowed and use _ as a separator. between words. .../x_lstm_time/x_lstm_time.py

benHeid · 2024-12-08T07:58:10Z

pytorch_forecasting/models/xLSTMTime/xLSTMTime.py

+        device: Optional[torch.device] = None,
+    ):
+        """
+        Initialize xLSTMTime model.


Please check where to put the reference to the paper that originally proposes xlstm.

phoeenniixx · 2024-12-08T16:18:31Z

Thanks for the review @benHeid!
I will have to restructure a little ig, I will see and use appropriate base class, use it in main xLSTMTime class, rest will be left untouched? (wrt to baseclass atleast)
I will make the changes and get back to you in few days!
Thanks!

phoeenniixx · 2024-12-10T09:29:59Z

Hi @benHeid, I need some help:

here I implemented xLSTMTime class using BaseModel as for now I think this is the best fitted class... what do you think?
Also, I made some changes in the forward function of the code where before it was accepting Tensor object, I changed it to Dict as I found out that the user mainly uses TimeSeriesDataSet and it returns a dict, please correct me if I am wrong here.
I am using the encoder_cont key of the dict as input x.

Please tell me if I am in a right direction

class xLSTMTime(BaseModel):

    def __init__(
        self,
        input_size: int,
        hidden_size: int,
        output_size: int,
        xlstm_type: Literal['slstm', 'mlstm'],
        num_layers: int = 1,
        decomposition_kernel: int = 25,
        input_projection_size: Optional[int] = None,
        dropout: float = 0.1,
        loss: Metric = SMAPE(),
        device: Optional[torch.device] = None,
        **kwargs
    ):
        super().__init__(loss=loss, **kwargs)

        if xlstm_type not in ['slstm', 'mlstm']:
            raise ValueError("xlstm_type must be either 'slstm' or 'mlstm'")

        self.xlstm_type = xlstm_type
        self._device = device or torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.to(self._device)

        self.decomposition = SeriesDecomposition(decomposition_kernel)
        self.batch_norm = nn.BatchNorm1d(hidden_size)

        self.input_projection_size = input_projection_size or hidden_size

        self.input_linear = None  

        if xlstm_type == 'mlstm':
            self.lstm = mLSTMNetwork(
                input_size=hidden_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                output_size=hidden_size,
                dropout=dropout,
                device=self.device
            )
        else:  # slstm
            self.lstm = sLSTMNetwork(
                input_size=hidden_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                output_size=hidden_size,
                dropout=dropout,
                device=self.device
            )

        self.output_linear = nn.Linear(hidden_size, output_size)
        self.instance_norm = nn.InstanceNorm1d(output_size)

    def forward(
        self,
        x: Dict[str, torch.Tensor],  
        hidden_states: Optional[
            Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]
        ] = None
    ) -> Tuple[torch.Tensor, Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]]:
   
        encoder_cont = x["encoder_cont"]
        batch_size, seq_len, n_features = encoder_cont.shape

        trend, seasonal = self.decomposition(encoder_cont)

        x = torch.cat([trend, seasonal], dim=-1)
        concatenated_features = x.shape[-1]

        if self.input_linear is None:
            self.input_linear = nn.Linear(concatenated_features, self.input_projection_size).to(self._device)

        x = self.input_linear(x)

        x = x.transpose(1, 2)  
        x = self.batch_norm(x)
        x = x.transpose(1, 2)  

        if hidden_states is None:
            hidden_states = self.lstm.init_hidden(batch_size)

        x = x.transpose(0, 1)
        output, hidden_states = self.lstm(x, *hidden_states)

        if isinstance(output, tuple):
            output = output[0]

        if output.dim() == 2:
            output = output.unsqueeze(0)
        output = self.output_linear(output)

        output = output.transpose(1, 2)
        output = self.instance_norm(output)
        output = output.transpose(1, 2)

        return output, hidden_states


    def predict(
            self,
            x: torch.Tensor,
            hidden_states: Optional[
                Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]
            ] = None
    ) -> torch.Tensor:

        output, _ = self.forward(x, hidden_states)
        return output

    def training_step(self, batch, batch_idx):
        x, y = batch
        y = y[0] if isinstance(y, tuple) else y 

        y_pred, _ = self(x)

        if y_pred.ndim == 3 and y_pred.size(0) == 1:
            y_pred = y_pred.squeeze(0)  
        loss = self.loss(y_pred, y)
        self.log("train_loss", loss)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y = y[0] if isinstance(y, tuple) else y 

        y_pred, _ = self(x)

        if y_pred.ndim == 3 and y_pred.size(0) == 1:
            y_pred = y_pred.squeeze(0)  
        loss = self.loss(y_pred, y)
        self.log("val_loss", loss)

        return loss




    def test_step(self, batch, batch_idx):
        x, y = batch
        y = y[0] if isinstance(y, tuple) else y 

        y_pred, _ = self(x)

        if y_pred.ndim == 3 and y_pred.size(0) == 1:
            y_pred = y_pred.squeeze(0)  
        loss = self.loss(y_pred, y)
        self.log("test_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)
        return {"optimizer": optimizer, "lr_scheduler": scheduler, "monitor": "val_loss"}

phoeenniixx · 2024-12-10T09:35:39Z

Also, Do we need to change the baseclass of just xLSTMTime only or mLSTMNetwork and sLSTMNetwork should also be changed?
(Although I think they are just a part of this main class so they could inherit from nn.Module without any problem?)

benHeid · 2024-12-10T19:42:53Z

here I implemented xLSTMTime class using BaseModel as for now I think this is the best fitted class... what do you think?

Mhm. if the implementation does not support any exogenous features than either BaseModel or AutoRegressiveBaseModel. I would assume that the ladder is probably the better fit.

Also, I made some changes in the forward function of the code where before it was accepting Tensor object, I changed it to Dict as I found out that the user mainly uses TimeSeriesDataSet and it returns a dict, please correct me if I am wrong here.

I agree that a dict should be used here.

I am using the encoder_cont key of the dict as input x.

Yes that is the target time series.

Please tell me if I am in a right direction

You might check the RNN implementation. Since this is also inheriting from an Autoregressive model and probably the most similar of the implemented models.
I would suggest that you check carefully, if you really need to implement the step / training_step method etc. or if is sufficient to use the inherited methods from the base class.

But I think you are in the right direction.

phoeenniixx · 2024-12-12T16:26:30Z

Hi @benHeid, I have updated the implementation using AutoRegressiveBaseModel, please review it. Also, I have not changed or added the tests (they are failing due to some changes in input and output format) as I saw that for other modules, there is a specific "trend" of writing the tests and I might need some help with that. Can you please provide me a brief about them, like what specific tests should I add etc.

I can add the docstrings in subsequent commits once I am sure that this is what we want.

benHeid · 2024-12-24T12:05:01Z

Sorry for my late response. Please ensure that the linting tests are green. Probably running the pre commit hooks locally should make it.

Regarding the failing tests, you might check how the output currently looks like by manually executing the xLSTM. You might then see what the issue is.

@fkiraly do we have any guides for pytorch-forecasting on how to write tests?

phoeenniixx · 2024-12-24T13:21:40Z

Thanks for the reply @benHeid, actually the reason the tests are failing is: earlier I was using tensors, tuple etc and now TimeSeriesDataset is being used that uses a dict, that is the reason the tests are failing, I can correct those but I didn't do that because I noticed that for other models, they just use functions like test_integration etc. To write those functions, I first need to understand the input like dataloaders, dataset that is entered in these functions, like which data we are using here, the labels etc. is that data any arbitrary data or some pre-defined dataset?
Like look into this function from test_models.test_rnn_model,py:

def _integration(
    data_with_covariates, tmp_path, cell_type="LSTM", data_loader_kwargs={}, clip_target: bool = False, **kwargs
):
    data_with_covariates = data_with_covariates.copy()
    if clip_target:
        data_with_covariates["target"] = data_with_covariates["volume"].clip(1e-3, 1.0)
    else:
        data_with_covariates["target"] = data_with_covariates["volume"]
    data_loader_default_kwargs = dict(
        target="target",
        time_varying_known_reals=["price_actual"],
        time_varying_unknown_reals=["target"],
        static_categoricals=["agency"],
        add_relative_time_idx=True,
    )
    data_loader_default_kwargs.update(data_loader_kwargs)
    dataloaders_with_covariates = make_dataloaders(data_with_covariates, **data_loader_default_kwargs)
    train_dataloader = dataloaders_with_covariates["train"]
    val_dataloader = dataloaders_with_covariates["val"]
    test_dataloader = dataloaders_with_covariates["test"]

    early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=1, verbose=False, mode="min")

    logger = TensorBoardLogger(tmp_path)
    trainer = pl.Trainer(
        max_epochs=3,
        gradient_clip_val=0.1,
        callbacks=[early_stop_callback],
        enable_checkpointing=True,
        default_root_dir=tmp_path,
        limit_train_batches=2,
        limit_val_batches=2,
        limit_test_batches=2,
        logger=logger,
    )

    net = RecurrentNetwork.from_dataset(
        train_dataloader.dataset,
        cell_type=cell_type,
        learning_rate=0.15,
        log_gradient_flow=True,
        log_interval=1000,
        hidden_size=5,
        **kwargs,
    )
    net.size()
    try:
        trainer.fit(
            net,
            train_dataloaders=train_dataloader,
            val_dataloaders=val_dataloader,
        )
        test_outputs = trainer.test(net, dataloaders=test_dataloader)
        assert len(test_outputs) > 0
        # check loading
        net = RecurrentNetwork.load_from_checkpoint(trainer.checkpoint_callback.best_model_path)

        # check prediction
        net.predict(val_dataloader, fast_dev_run=True, return_index=True, return_decoder_lengths=True)
    finally:
        shutil.rmtree(tmp_path, ignore_errors=True)

    net.predict(val_dataloader, fast_dev_run=True, return_index=True, return_decoder_lengths=True)

Here they are using keys like "volume", and this is for data_with_covariates but I am not using the covariate base class that i can use directly this code and modify it to my requirements. I want to understand how this whole thing works and then I can write the test...

phoeenniixx · 2024-12-24T13:22:28Z

for now I am just removing the test file and updating the code as required

fkiraly

Minor things before a more thorough review:

can you kindly add tests for some basic use cases?
can you make sure nothing except imports are in the __init__ files? Similar to the recent change sin the repo.

fkiraly · 2025-07-31T14:52:24Z

pytorch_forecasting/layers/__init__.py


 __all__ = [
    "FullAttention",
-    "TriangularCausalMask",


why does this line get removed?

I didn't see any imports for TriangularCausalMask

I actually found it in layers._attention._full_attention, it is not imported even in the __init__ of layers._attention. I will add it to both the locations. At first, I thought it didnt exist 😅

fkiraly

Looks good!

Minor requests related to docs:

class docstring should be in the class, not in __init__
the model should also be added to the model overview

phoeenniixx · 2025-07-31T19:08:06Z

the model should also be added to the model overview

I have seen some of the models having this docstring in __init__ rather than the class, so should we move those docstrings as well to the class? (obv in some other PR)

phoeenniixx · 2025-08-01T16:51:08Z

Hi @fkiraly, before we close this, I have one doubt:
Why do the models have so much of if-else conditions in the __init__ of classes (see DeepAR, DecoderMLP etc)? And many of these variables in the init are not even used afterwards. Is this a design choice? I think we should avoid such things in v2 - keeping unnecessary params in __init__?

I avoided a similar design here in xlstm, should i add these conditions here as well?

fkiraly · 2025-08-02T15:17:46Z

Hi @fkiraly, before we close this, I have one doubt: Why do the models have so much of if-else conditions in the __init__ of classes (see DeepAR, DecoderMLP etc)? And many of these variables in the init are not even used afterwards. Is this a design choice? I think we should avoid such things in v2 - keeping unnecessary params in __init__?

I think this design has to do with the fact that the models are not properly getting the metadata from the TimeSeriesDataSet - the translation is done in imperative fashion in from_dataset.

What I am a bit surprised about - why do you not need these in __init__?

fkiraly · 2025-08-02T15:26:14Z

I see, the tests only construct via from_dataset. The __init__ is not actually tested - missing that there might be a problem.

fkiraly · 2025-08-02T15:30:21Z

This PR is related: apparently not all models were properly tested or initializable via __init__: #1837

Do you know what the implicit contract is for __init__?

fkiraly · 2025-08-02T16:51:21Z

minor comment regarding module structure - I would make it similar to other modules in naming:

call the module xlstm
the internal python module should be _xlstm.py etc

phoeenniixx · 2025-08-05T16:33:00Z

Hi @fkiraly, are there any other changes I need to make?

fkiraly · 2025-08-06T10:07:13Z

docs/source/models.rst

@@ -31,6 +31,7 @@ and you should take into account. Here is an overview over the pros and cons of
   :py:class:`~pytorch_forecasting.models.deepar.DeepAR`,                                                  "x",          "x",                "x",          "",               "x",              "x",          "x [#deepvar]_ ",              "x",                       "",           3
   :py:class:`~pytorch_forecasting.models.temporal_fusion_transformer.TemporalFusionTransformer`,          "x",          "x",                "x",          "x",              "",               "x",          "",                            "x",                       "x",          4
   :py:class:`~pytorch_forecasting.model.tide.TiDEModel`,                                                  "x",          "x",                "x",          "",               "",               "",           "",                            "x",                       "",           3
+   :py:class:`~pytorch_forecasting.models.x_lstm_time.xLSTMTime`,                                          "x",          "x",                "x",          "",               "",               "",           "",                            "x",                       "",           3


this is incorrect now

Oh sorry I forgot to change here

fkiraly

only minimal change requests.

doclink is now broken due to rename
could you move the lstm layers into a _recurrent folder, i.e., layers._recurrent._mlstm etc?

fkiraly

Great!

initial commit

665825a

phoeenniixx changed the title ~~initial commit~~ [ENH] xLSTMTime implementation Nov 9, 2024

linting

5e57d34

adding some tests and a little in debug in sLSTM structure

e498848

benHeid requested changes Dec 8, 2024

View reviewed changes

new baseclass implementation

38e4c9c

phoeenniixx requested review from fkiraly, fnhirwa, geetu040, jdb78, pranavvp16, XinyuWuu and yarnabrina as code owners December 12, 2024 16:26

phoeenniixx requested a review from benHeid December 12, 2024 16:27

phoeenniixx added 2 commits December 13, 2024 23:18

Update __init__.py

a72c8c6

little debug in predict method

b3b3e55

trying the baseclass predict function and removing the test files

87f4ff4

fkiraly assigned phoeenniixx Dec 30, 2024

fkiraly requested changes Jan 5, 2025

View reviewed changes

fkiraly reviewed Jul 31, 2025

View reviewed changes

fkiraly requested changes Jul 31, 2025

View reviewed changes

Merge branch 'main' into xLSTMTime

fd4b2ba

phoeenniixx added 3 commits August 1, 2025 00:41

update documentation

6a7cc23

Merge remote-tracking branch 'origin/xLSTMTime' into xLSTMTime

60d1651

add TriangularCausalMask

96ec23d

phoeenniixx requested a review from fkiraly July 31, 2025 19:15

fkiraly moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Aug 4, 2025

refactor files

6a40b7a

fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Aug 5, 2025

Merge branch 'main' into xLSTMTime

40beee8

fkiraly reviewed Aug 6, 2025

View reviewed changes

fkiraly requested changes Aug 6, 2025

View reviewed changes

fkiraly moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Aug 6, 2025

phoeenniixx added 3 commits August 7, 2025 00:56

refactor files

1cfaf9c

Merge remote-tracking branch 'origin/xLSTMTime' into xLSTMTime

ed189de

update models.rst

7addfad

phoeenniixx requested a review from fkiraly August 6, 2025 19:40

fkiraly approved these changes Aug 6, 2025

View reviewed changes

fkiraly merged commit 3093b9f into sktime:main Aug 6, 2025
35 checks passed

github-project-automation bot moved this from PR in progress to Done in May - Sep 2025 mentee projects Aug 6, 2025

github-project-automation bot moved this from PR in progress to Done in Dec 2024 - Mar 2025 mentee projects Aug 6, 2025

phoeenniixx deleted the xLSTMTime branch August 7, 2025 17:48

[ENH] xLSTMTime implementation #1709

[ENH] xLSTMTime implementation #1709

Uh oh!

Conversation

phoeenniixx commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov bot commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

phoeenniixx commented Nov 9, 2024

Uh oh!

phoeenniixx commented Nov 9, 2024

Uh oh!

benHeid left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoeenniixx commented Dec 8, 2024

Uh oh!

phoeenniixx commented Dec 10, 2024

Uh oh!

phoeenniixx commented Dec 10, 2024

Uh oh!

benHeid commented Dec 10, 2024

Uh oh!

phoeenniixx commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benHeid commented Dec 24, 2024

Uh oh!

phoeenniixx commented Dec 24, 2024

Uh oh!

phoeenniixx commented Dec 24, 2024

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

phoeenniixx commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Aug 2, 2025

Uh oh!

fkiraly commented Aug 2, 2025

Uh oh!

fkiraly commented Aug 2, 2025

Uh oh!

fkiraly commented Aug 2, 2025

Uh oh!

phoeenniixx commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

[ENH] `xLSTMTime` implementation #1709

[ENH] `xLSTMTime` implementation #1709

phoeenniixx commented Nov 9, 2024 •

edited

Loading

codecov bot commented Nov 9, 2024 •

edited

Loading

phoeenniixx commented Dec 12, 2024 •

edited

Loading

phoeenniixx commented Jul 31, 2025 •

edited

Loading

phoeenniixx commented Aug 1, 2025 •

edited

Loading

phoeenniixx commented Aug 5, 2025 •

edited

Loading