[ENH] Consistent 3D output for single-target point predictions in `TimeXer` v1. #1936

PranavBhatP · 2025-08-01T07:29:50Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Small change in the code to provide a 3d output tensor for point predictions with 3rd dimension set to 1. With this change the output contract for TimeXer is

Point predictions: (batch_size, predictions, 1) where the 3rd dimension indicates a single target.
Quantile predicitons: (batch_size, predictions, num_quantiles) where the 3rd dimension indicates the number of quantiles for which the output is generated.

What should a reviewer concentrate their feedback on?

PR checklist

The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

codecov · 2025-08-01T10:24:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@fe2abd7). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1936   +/-   ##
=======================================
  Coverage        ?   85.40%           
=======================================
  Files           ?      160           
  Lines           ?     9462           
  Branches        ?        0           
=======================================
  Hits            ?     8081           
  Misses          ?     1381           
  Partials        ?        0

Flag	Coverage Δ
cpu	`85.40% <100.00%> (?)`
pytest	`85.40% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fkiraly

I am worried that we are removing tests but not adding tests.

If we are removing an API stepout, then we should remove a stepout in the testes to match, no?

PranavBhatP · 2025-08-02T16:35:05Z

it was an unintended removal, happened by mistake, sorry. The code stays the same in test_timexer.py

fkiraly

ok, so it was a bug! Good that we do reviews.

Related question: is there a stepout we can remove now?

…to a single handling block

PranavBhatP · 2025-08-09T17:20:07Z

is there a stepout we can remove now?

with that question in mind, I tried tweaking with a few of the tensor shapes, and I was able to reduce the number of stepouts. now, the default return is a 3d tensor - (batch_size, prediction,_length, n_quantiles). For point predictions, n_quantiles = 1 by default, and for explicit quantiles we equate the n_quantiles=len(loss.quantiles) where loss has to be QuantileLoss

fkiraly

You mention you reduced the number of stepouts, but I do not see a change in tests - did you forget to push them?

PranavBhatP · 2025-08-09T17:38:30Z

no i meant stepouts in the actual model code. We were handling quantile and non-quantile predictions separately inside the model. But with this PR, I have unified the handling to a single format for all types of metrics used (quantile or point).

The code for the tests ultimately remains the same, since we are testing the output shape of predict API for TimeXer, which is not the concern of this PR.

If there is a misunderstanding, please let me know.

fkiraly · 2025-08-10T17:37:33Z

Yes, I meant tests.

I think thta one of the following must be true:

there was a stepout in the tests that we can now remove
the tests were not complete or were faulty (e.g., they are not covering the API this was a stepout from)

agobbifbk · 2025-08-11T07:06:21Z

Seems that all the shape are well managed. Here you are using the self.n_quantiles as indicator of the number of output per channel. What if I want to use a distribution loss here? Should we rename it in a more general way?

fkiraly

Repeating my review request:

if there was a stepout for the forecaster in tests, please remove the stepout
if there was no stepout, please review the tests why there was no stepout and/or why they still passed

PranavBhatP · 2025-08-23T16:51:47Z

Again, then the definition of "non-conformance" that we have applied to TimeXer will extend to NBeats also. So, @phoeenniixx, we might need a separate PR for that as well, does this sound right?

fkiraly · 2025-08-23T20:17:09Z

So then my proposal would be to add the test in
#1951,

and skip both TimeXer and NBEATS. We merge it before this PR, and in this PR we remove the skip for TimeXer.

phoeenniixx · 2025-08-23T20:41:31Z

So then my proposal would be to add the test in #1951,

and skip both TimeXer and NBEATS. We merge it before this PR, and in this PR we remove the skip for TimeXer.

Yeah we can do this, it will require just one line change in _integration(removing the 2 from the list) and also a check to skip the models in _integration (as all the models are tested through _integration). The output formats are currently tested in the _integration and you dont want to skip the whole _integration test, rather just the output check. So, I think just one check before the output format test to skip only the TimeXer and NBeats would be enough. We wont require this kind of skip anyway in future, so just a if-else block should suffice

if not isinstance(class, (TimeXer,NBeats)):
   # perform the output test for 3D

And afterwards, we fix NBeats(in some other PR, as @PranavBhatP mentioned), remove this skip and then all models will be API conformant.

fkiraly · 2025-08-23T21:33:10Z

@phoeenniixx, that would work!

Although I do not like skips per model inside tests, these are easy to forget there. This indicates that maybe there is too much being tested in a single test method.

phoeenniixx · 2025-08-24T07:21:06Z

Although I do not like skips per model inside tests, these are easy to forget there. This indicates that maybe there is too much being tested in a single test method.

I agree, we need some refactoring. I think we should raise an issue as a checklist, so that we dont forget anything. In v2 also, there are a lot of things that can be refactored in test-framework. But these things imo, should have a lower priority, we should first achieve a stable (and complete) API for v2, just save the things along the way that needs refactoring, and we can do this afterwards? As we would anyway need a full refactoring round after first complete v2 like connecting v1 and v2 etc. We can do these things at that time?

phoeenniixx · 2025-08-24T07:21:48Z

For now, we should focus on API conformance for v1 and design and implementation for v2...

fkiraly · 2025-08-24T19:21:58Z

agreed - what is, in your opinion, @phoeenniixx, the quickest, simplest way that we ensure:

the inconsistency in TimeXer is covered by TestAllEstimators tests
we use the "skip then fix" pattern

phoeenniixx · 2025-08-25T12:36:36Z

I could think of two ways:

A more "intuitive" way similar to sktime:
Add test skip in tags for TimeXer, but this will skip the whole _integration. But We know it passes the rest of the tests (like model training etc), So we could do this.

Hardcode way:
Just add a check before doing the output format test something like this -

if not isinstance(class, (TimeXer,NBeats)):
 # perform the output test for 3D

and remove the tests for 2D format from the list here:

pytorch-forecasting/pytorch_forecasting/tests/test_all_estimators.py

Lines 337 to 347 in b5eb779

    
           assert n_dims in [2, 3], ( 
        
               f"Prediction output must be 2D or 3D, but got {n_dims}D tensor " 
        
               f"with shape {output.shape}" 
        
           ) 
        
           if n_dims == 2: 
        
               batch_size, prediction_length = output.shape 
        
               assert batch_size > 0, f"Batch size must be positive, got {batch_size}" 
        
               assert ( 
        
                   prediction_length > 0 
        
               ), f"Prediction length must be positive, got {prediction_length}"

IMO, the best way for now is the Hardcode way because this "non-conformance" is a special case that happened due to no concrete contract in past, but from now on, we have one concrete contract - the output format should be 3D for single target. And we are already aware of the non-conformant models, we should just raise an issue for them (for NBeats mainly as TimeXer is already being fixed by @PranavBhatP). This would not happen in future models, also we will also be refactoring the test framework - making it more modular and adding new tests if required. So, the easiest and quickest way would be this hardcode way, I think.

But this is mainly because we right now have a lot on our plate, but we should avoid such things in future, once we have a stable API for v2.

phoeenniixx · 2025-08-25T12:39:03Z

What do you think about this @fkiraly, @PranavBhatP ?

fkiraly · 2025-08-25T17:06:35Z

ok, @PranavBhatP, I agree with the arguments that we can cut corners because the skips will only apply to the models in the past.

For this I think we should open an issue for nbeats and explicitly mention that the test internal skip should be removed.

The first option would imply changes to the framework or tests first - they need refactoring, and I think this would be th better option, if we could make a wish - but it would cause scope creep, and feels like a distraction at the moment.

What do you think, @PranavBhatP.

PranavBhatP · 2025-08-25T19:04:39Z

What do you think, @PranavBhatP.

Yes, this approach makes sense as a temporary fix, I will try changing the code for this. In this PR do I only skip tests for TimeXer, or do I do it for both the models in question.

phoeenniixx · 2025-08-25T19:20:33Z

So, from what I can understand, you are solving the issue for TimeXer here. I think, the changes to the _integration should happen in another PR, where you first remove the tests for 2D output formats and then also add skips for TimeXer and NBeats, and then remove the skip for TimXer here and make TimeXer API -conformant. The same should be done for NBeats in some other PR no?

fkiraly · 2025-08-25T21:34:02Z

yes, @phoeenniixx, that is what I had understood

PranavBhatP · 2025-08-26T12:53:13Z

Then in that case the PR #1951 handles the changes specific to the tests in TimeXer (test_timexer.py), we can have a separate PR to make changes to _integration in test_all_estimators.py and once that is done we can merge this PR. does this sound right? I am not 100% sure about what is the expected workflow.

phoeenniixx · 2025-08-26T13:01:56Z

yeah i think this makes sense, right @fkiraly ?

fkiraly · 2025-08-26T14:04:32Z

yes, sounds like a good plan to me too

…1936 (#1951) #### What does this implement/fix? Explain your changes. Adds the test for the output shapes of the forward method of `TimeXer`. The test checks the output contract for the model, which is being implemented in #1936. Currently skipped under `test_timexer.py` due to obvious causes of failure i.e current version of model on main is still not updated to the required output contract.

this commit removes the test skip tag added in PR sktime#1966

PranavBhatP · 2025-09-29T11:24:17Z

This PR has been updated using the merged PR #1966 and I am running tests for TimeXer (v1) with the standard output format. It is passing all tests on local @fkiraly @phoeenniixx @agobbifbk

fkiraly

👍

phoeenniixx · 2025-09-29T19:43:53Z

pytorch_forecasting/models/timexer/_timexer.py

            self.enc_in = len(self.reals)

-        self.n_quantiles = None
+        self.n_quantiles = 1


Can you please clarify why are we doing n_quantiles=1 and not n_quantiles = None here?
As from what I have seen so far in the package, we always use n_quantiles = None when not using QuantileLoss, and making n_quantiles=1 may change that contract? Although it wont affect the working ig, but for the user this can be misleading?
That QuantileLoss is being used, even when it is not?

No, self.n_quantiles is being set to 1 because in the case of a point prediction, the median quantile is what is being predicted. So instead of setting self.n_quantiles = None and having step-outs for checking the value of self.n_quantiles inside the layer architecture, we simply set it to 1 and let the loss handle the rest.

For some context, setting self.n_quantiles = 1 simplifies how we handle the distinction between a loss like MAE and QuantileLoss. Refer this diff - https://github.com/sktime/pytorch-forecasting/pull/1936/files#diff-8c4bc78319ca93d3c0e93a38a0ee551ac3c4193f717955c0ce3477e4147a9153

TLDR; It removes unecessary "if" statements and step-outs in the logic for TimeXer's output.

Maybe a comment would clarify this in the code, I will add it, thanks for highlighting this.

Ohh, I see, Thanks!
If this is working, maybe then we should refactor other places as well where this if/else block is being used because of n_quantiles = None?
I jam just thinking it would be good if everything consistent with rest of the API....

It's just, I am a bit concerned about the meaning here- if a person is using this estimator, they would see n_quantiles=1 even for point predicitons (I am talking about the user who is just using the estimator and not actually reading the code) and this may confuse the user that "why this param is not None like it is for other estimators in ptf?"

Also, idts this if/else block would be much of a overhead, so, if using this block keeps the user from getting confused, I think we should prefer that.

But that is my thought, Please let me know what is your reasoning here...

It removes unecessary "if" statements and step-outs in the logic for TimeXer's output.

The reasoning is as simple as this :). Since we are standardising the output format, it makes more sense if we renamed this attribute to self.output_dim or something like that, instead of self.n_quantiles. That way we have a more generic meaning and assign different values to it only when there is a step-out to a different case (QuantileLoss in our case)

could you kindly explain what API or parameter change exactly we are talking about?

So, in this PR @PranavBhatP is changing the Convention of using n_quantiles = None (that happens currently in the package) when the user is not using QuantileLoss (@PranavBhatP please correct me if I understood it wrongly), rather n_quantiles=1 is being introduced in absence of QuantileLoss . (see comments #1936 (comment) and #1936 (comment)) and I was concerned about the meaning it could convey (see comment #1936 (comment)).

He suggested, we could rename the n_quantiles to output_dim and it would solve the issue of the meaning. But again this would lead to changes to the code across the whole package. And it would be breaking as a param name is being changed, so as you said, we could deprecate it, but I think we should introduce (if we decide to) to v2 and keep v1 intact, as v1 is already mature enough, and it may not even complete the deprecation cycle, as before that we may completely move to v2. I am assuming the deprecation cycle will complete after 2 realeases, hopefully we might release a full fledged v2 by then. So, I think if we are going to make this change, we should do this in v2.

I am still not very sure if this would be a good idea, that's why I wanted if you and @agobbifbk could comment your thoughts on this change.

For me output_dim is a good idea, it is more general. I don't think it is necessary to deprecate it if we rename because it is not a parameter that the user can modify, isn't? It just depend on the chosen loss function(s) right?

I agree, but I am assuming n_quantiles are right now used in documentation to define the shape of output when we use QuantileLoss, so, ig this arg is visible to the user in the documentation? If yes, we might need to deprecate it? I am not sure, what's the process here😅

I think a design document is the right process. Write up the vignettes now - and analyze whether n_quantiles even appears in them.

Then propose alternative vignettes for future use, and from that develop a change and/or deprecation trajectory.

PranavBhatP added 2 commits August 1, 2025 01:33

alter output format to 3d

fc93504

Merge branch 'main' into timexer-output

486ed18

PranavBhatP requested review from benHeid, fkiraly, fnhirwa, jdb78 and yarnabrina as code owners August 1, 2025 07:29

PranavBhatP changed the title ~~[ENH] Consistent 3D output for single-target point predictions in TimeXer v1 and predict~~ [ENH] Consistent 3D output for single-target point predictions in TimeXer v1. Aug 1, 2025

fkiraly moved this to PR in progress in May - Sep 2025 mentee projects Aug 1, 2025

fkiraly added this to May - Sep 2025 mentee projects Aug 1, 2025

fkiraly assigned PranavBhatP Aug 1, 2025

fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Aug 1, 2025

fkiraly added the enhancement New feature or request label Aug 2, 2025

fkiraly requested changes Aug 2, 2025

View reviewed changes

revert unintended removal of assert in test_timexer.py

7c21c69

PranavBhatP requested a review from fkiraly August 4, 2025 12:34

fkiraly requested changes Aug 6, 2025

View reviewed changes

PranavBhatP added 2 commits August 9, 2025 22:09

Merge branch 'main' into timexer-output

1db93b5

remove stepout for handling single and multi-target output and unify …

6329b99

…to a single handling block

PranavBhatP requested a review from fkiraly August 9, 2025 17:17

fkiraly requested changes Aug 9, 2025

View reviewed changes

PranavBhatP requested a review from fkiraly August 10, 2025 13:50

fkiraly requested changes Aug 13, 2025

View reviewed changes

Merge branch 'main' into timexer-output

e6e0cda

PranavBhatP mentioned this pull request Sep 7, 2025

[ENH] improve _integration in TestAllEstimators to address non-conformace of TimeXer and NBeats #1966

Closed

Merge branch 'main' into timexer-output

8676044

PranavBhatP requested a review from phoeenniixx as a code owner September 29, 2025 11:18

remove test skip tag for test_integration

82a0fad

this commit removes the test skip tag added in PR sktime#1966

PranavBhatP requested a review from fkiraly September 29, 2025 11:22

This was referenced Sep 29, 2025

[ENH] standardise output format of NBeats and NBeatsKAN estimators for point predictions #1975

Closed

[ENH] standardise output format of xLSTMTime estimator for point predictions #1976

Closed

fkiraly previously approved these changes Sep 29, 2025

View reviewed changes

phoeenniixx reviewed Sep 29, 2025

View reviewed changes

add comment to clarify self.n_quantiles=1

9ba81f0

PranavBhatP dismissed fkiraly’s stale review via 9ba81f0 September 30, 2025 13:59

PranavBhatP requested a review from phoeenniixx September 30, 2025 13:59

[ENH] Consistent 3D output for single-target point predictions in TimeXer v1. #1936

Are you sure you want to change the base?

[ENH] Consistent 3D output for single-target point predictions in TimeXer v1. #1936

Uh oh!

Conversation

PranavBhatP commented Aug 1, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

PR checklist

Uh oh!

codecov bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

PranavBhatP commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

PranavBhatP commented Aug 9, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

PranavBhatP commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Aug 10, 2025

Uh oh!

agobbifbk commented Aug 11, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

PranavBhatP commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Aug 23, 2025

Uh oh!

phoeenniixx commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Aug 23, 2025

Uh oh!

phoeenniixx commented Aug 24, 2025

Uh oh!

phoeenniixx commented Aug 24, 2025

Uh oh!

fkiraly commented Aug 24, 2025

Uh oh!

phoeenniixx commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Aug 25, 2025

Uh oh!

fkiraly commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PranavBhatP commented Aug 25, 2025

Uh oh!

phoeenniixx commented Aug 25, 2025

Uh oh!

fkiraly commented Aug 25, 2025

Uh oh!

PranavBhatP commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Aug 26, 2025

Uh oh!

fkiraly commented Aug 26, 2025

Uh oh!

PranavBhatP commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly left a comment

[ENH] Consistent 3D output for single-target point predictions in `TimeXer` v1. #1936

[ENH] Consistent 3D output for single-target point predictions in `TimeXer` v1. #1936

codecov bot commented Aug 1, 2025 •

edited

Loading

PranavBhatP commented Aug 2, 2025 •

edited

Loading

PranavBhatP commented Aug 9, 2025 •

edited

Loading

PranavBhatP commented Aug 23, 2025 •

edited

Loading

phoeenniixx commented Aug 23, 2025 •

edited

Loading

phoeenniixx commented Aug 25, 2025 •

edited

Loading

fkiraly commented Aug 25, 2025 •

edited

Loading

PranavBhatP commented Aug 26, 2025 •

edited

Loading

PranavBhatP commented Sep 29, 2025 •

edited

Loading

PranavBhatP Sep 30, 2025 •

edited

Loading

phoeenniixx Sep 30, 2025 •

edited

Loading

PranavBhatP Sep 30, 2025 •

edited

Loading

phoeenniixx Oct 5, 2025 •

edited

Loading