fix(series): arithmetics for Series[Any] #1343

cmp0xff · 2025-08-22T12:34:30Z

This PR implements the ideas from #1274 (comment) and #1274 (comment).

Tests added: Please use assert_type() to assert the type of any return value

Dr-Irv

Only issues I see are with respect to the code inside of if TYPE_CHECKING_INVALID_USAGE that you added.

The concern here is that some of the lines will execute fine if the Series comes from a DataFrame and has the correct type inside, but we see that it is a static Series[Any]. And vice versa.

I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any], then we detect that as a typing problem. But we need to be selective.

For example,

df = pd.DataFrame({"a": [1,2,3], "b": pd.to_datetime(["1/1/2025", "2/1/2025", "3/1/2025"])})
sa = df["a"]
sb = df["b"]
sa - pd.Timestamp("1/1/2024")  # fails at runtime
sb - pd.Timestamp("1/1/2024") # works at runtime

Here sa and sb are Series[Any] (mypy) or Series[Unknown] (pyright). So the typing either has to accept both cases or reject both cases.

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series) or is a string or Series[str]. I think the current stubs are more permissive, but now I'm not sure that's the right thing to do.

Dr-Irv · 2025-08-22T17:04:23Z

tests/series/arithmetic/test_sub.py

+    if TYPE_CHECKING_INVALID_USAGE:
+        _0 = left_td - s
    check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta)
+    if TYPE_CHECKING_INVALID_USAGE:
+        _1 = left_td - a


When you have TYPE_CHECKING_INVALID_USAGE, that means we should have # type: ignore and # pyright: ignore statements that demonstrate the type checker can catch those errors.

Dr-Irv · 2025-08-22T17:05:45Z

tests/series/arithmetic/test_sub.py

    check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta)
+    if TYPE_CHECKING_INVALID_USAGE:
+        _1 = left_ts - a


This is an example of valid code that should be accepted.

Sorry, it was a typo. 1fb597b

However, I did not easily understand your comment here.

This is an example of valid code that should be accepted.

left_ts and left_td are Series[Any] at type checking. But you wrote

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series)

It seems to me that in your plan, both left_td - a and left_ts - a should give an error / a Never at type checking. Am I right?

My proposed plan is more permissive and will not detect the problem of left_ts - a, because at runtime, Series[Any] - TimedeltaSeries can either be TimestampSeries or TimedeltaSeries or give an error. In my proposed plan, at type checking, it would give Series[Any].

cmp0xff · 2025-08-23T08:39:43Z

Hi @Dr-Irv , thank you for drafting the plan.

Current plan

I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any], then we detect that as a typing problem. But we need to be selective.

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series) or is a string or Series[str].

I would like to summarise this typing plan as following:

When the calculation can give a runtime error, typing shows an error or Never
Certain cases are exceptions

`Timestamp` and `Timedelta`: permissive or forbidding

With this typing plan, I have the following examples in my mind:

Series[Any] (int) - TimestampSeries -> error at type checking, error at runtime
Series[Any] (Timestamp) - TimestampSeries -> error at type checking, TimedeltaSeires at runtime

As a user I probably do not want the static type checker to aggressivly point out a potential problem. When the stub is less permissive and more forbidding, the static type checker becomes more aggresive. It seems better to me to allow both cases at the stage of static type checking, otherwise the user may need to manually ignore the type checker in many cases.

`int`: exceptions to the plan

"We need to be selective" is important in the plan, because we also have

Series[Any] (int) + Series[int] -> Series[Any] at type checking, Series[int] at runtime
Series[Any] (str) + Series[int] -> Series[Any] at type checking, error at runtime

Currently we are happy with the stub giving us Series[Any] for adding Series[Any] to Series[int]. This is an exception, which may potentially confuse the user.

Proposing a consistent plan

I would like to propose a new typing plan as following:

When the calculation gives several typing results or a runtime error, typing shows Series[Any]
When the calculation gives one typing result, say Series[R], or a runtime error, typing shows Series[R]
When the calculation always gives a runtime error, typing shows an error or Never

With this typing plan, the previous examples give different results:

Series[Any] (int) - TimestampSeries -> TimedeltaSeries at type checking, error at runtime (TimedeltaSeries is the only possible result that is valid, so unfortunately the type checker does not cache the potential problem here)
Series[Any] (Timestamp) - TimestampSeries -> TimedeltaSeries at type checking, TimedeltaSeires at runtime
Series[Any] (int) + Series[int] -> Series[Any] at type checking, Series[int] at runtime (no exceptional rule in the plan)
Series[Any] (str) + Series[int] -> Series[Any] at type checking, error at runtime (Series[float], Series[int] etc. are possible valid results, so unfortunately the type checker does not cache the potential problem here)

Further examples:

Series[Any] (int) + Series[str] -> Series[str] at type checking, error at runtime (Series[str] is the only possible result that is valid, so unfortunately the type checker does not cache the potential problem here)
Series[Any] (str) + Series[str] -> Series[str] at type checking, Series[str] at runtime
Series[Any] * TimestampSeries -> error / Never at type checking, error at runtime (Timestamp is consistently not multiplicative)

Thank you for reading the lengthy explanation. What do you think?

Dr-Irv · 2025-08-23T18:39:54Z

Thank you for reading the lengthy explanation. What do you think?

The challenge here is the issue of wide vs narrow types. See https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#narrow-vs-wide-arguments for some writeup I did about that.

Let's consider this example from your list:

Series[Any] (int) - TimestampSeries -> TimedeltaSeries at type checking, error at runtime

In what is in main today, the following code works as you describe there, i.e., the type checker infers that result is TimedeltaSeries, but it fails at runtime.

si = pd.DataFrame({"a": pd.Series([1,2,3])})["a"]
st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
result = si - st

I think we do a better service to users if we actually catch this via typing, i.e., for Timedelta, TimedeltaSeries, Timestamp, TimestampSeries, str and Series[str], if they are in a binary operation with a Series[Any] (either before the operator or after the operator), the type checker reports an error. That's telling the user "We don't know how to handle a generic series with another operand that has a specified type", but we are limiting the types we do that with to just the ones I mentioned.

This makes the user then cast the variable si above to Series[int] (in which case we catch the failure), and know it will possibly fail at runtime.

Let's also consider this example:

st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
sd = pd.DataFrame({"a": [pd.Timedelta("1 day"), pd.Timedelta("2 days"), pd.Timedelta("3 days")]})["a"]
result = st - sd

In this case, if we adopt my proposal, the type checker would say that st - sd is invalid. But the type of sd is partially unknown, so we are then suggesting that the user do:

result = st - cast("pd.Series[Timedelta]", sd)

which is telling the type checker "I know this is a series of timedeltas"

I'm choosing what I consider to be a happy medium here between your proposal, and something that would be too narrow (e.g., disallowing Series[Any].__sub__(Series[Any])), by suggesting that if we know the types of ONE of the operands, but not the other, we try to catch the error via static typing.

I should say that the current behavior in the stubs is from 3 years ago when we first inherited the project from something MIcrosoft had started, and now that I have more experience with typing, as well as using the stubs in my own code, I've come around to trying to find more things with static type checking if we can find them, then not.

So the summary of my proposal is (with respect to Series) for binary operators a X b, where X is the operator:

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.
If a is Series[Any] and b is fully typed, we say that is an error.
If a is fully typed, and b is Series[Any], we say that is an error.
If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

Let me know your thoughts on that.

cmp0xff · 2025-08-24T09:13:46Z

Hi @Dr-Irv , thank you again for the detailed reply.

`TimestampSeries`

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.

If a is Series[Any] and b is fully typed, we say that is an error.

If a is fully typed, and b is Series[Any], we say that is an error.

If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

I am concerned with the compatibility of 2+3 and 4.

We can make Series[Any] - TimestampSeries not explicitly typed, so that the type checking shows an error.
However we already have Series[S1] - Series -> Series[Any]
When we remove TimestampSeries, Series[Any] - Series[Timestamp] will default to ``Series[S1] - Series, giving Series`.
Along this practice, we would need to explicitly set Series[Never] - TimestampSeries -> Never for now, and after the removal of TimestampSeries, Series[Never] - Series[Timestamp] -> Never
Alternatively, we will not be able to remove TimestampSeries.

In my proposal, I would rather allow Series[Any] - TimestampSeries as well as Series[Any] - TimedeltaSeries, leaving the user to check the potential error at runtime. We can either give Series[Any] in both cases, or, using my plan, give TimedeltaSeries in the first case.

We have already done the same to allow Series[Any] - Series[int], giving Series[Any], which could also fail at runtime if Series[Any] actually contains str.

Dr-Irv · 2025-08-24T16:21:17Z

In my proposal, I would rather allow Series[Any] - TimestampSeries as well as Series[Any] - TimedeltaSeries, leaving the user to check the potential error at runtime. We can either give Series[Any] in both cases, or, using my plan, give TimedeltaSeries in the first case.

Or we can catch it in type checking??

We have already done the same to allow Series[Any] - Series[int], giving Series[Any], which could also fail at runtime if Series[Any] actually contains str.

Yes, this latter point is quite valid. Here's a possible modification of my proposal: (with respect to Series) for binary operators a X b, where X is the operator:

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.
If a is Series[Any] and b is Timestamp, Series[Timestamp], Timedelta, Series[Timedelta], str, Series[str], we catch that as a typing error.
If a is Timestamp, Series[Timestamp], Timedelta, Series[Timedelta], str, Series[str], and b is Series[Any], we say that is an error.
If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

Is this possible?

cmp0xff · 2025-08-24T20:29:25Z

I tried to implement the plan in e0b5b59. The str part proves to be a bit tricky. I would like to leave it out for now.

Dr-Irv · 2025-08-25T13:03:40Z

I tried to implement the plan in e0b5b59. The str part proves to be a bit tricky. I would like to leave it out for now.

So should I review now?

cmp0xff · 2025-08-25T13:16:28Z

So should I review now?

Yes please. I believe I have implemented the ideas in #1343 (comment). I would like to follow the proposal there.

Dr-Irv

I figured out how to handle the operators with str. See 9c972d3

I put the tests I used in one file, figuring you can spread them out accordingly.

The issue is that _ListLike includes Sequence[S1], which matches a str, so the key was to use SequenceNotStr[S1] instead.

Dr-Irv · 2025-08-25T13:25:33Z

tests/series/arithmetic/test_sub.py

+        # `numpy` typing gives the corresponding `ndarray`s in the static type
+        # checking, where our `__rsub__` cannot override.


Given that we are catching this with the typing stubs, you can remove this comment.

Removed in 8d8691e

Dr-Irv · 2025-08-25T13:44:58Z

tests/series/arithmetic/test_sub.py

-    check(assert_type(left_ts - s, "TimedeltaSeries"), pd.Series, pd.Timedelta)
-    check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta)
+    if TYPE_CHECKING_INVALID_USAGE:
+        # When I merge this one to the previous one, mypy does not allow me to pass


I think when mypy sees the result of Never on a method (as opposed to an operator), it assumes the code following that won't be executed.

Caught with pytest.raises in 8d8691e

cmp0xff

9c972d3 cherry picked as 5c23f68

cmp0xff · 2025-08-25T20:32:04Z

tests/series/arithmetic/test_sub.py

+        # `numpy` typing gives the corresponding `ndarray`s in the static type
+        # checking, where our `__rsub__` cannot override.


Removed in 8d8691e

cmp0xff · 2025-08-25T20:32:24Z

tests/series/arithmetic/test_sub.py

-    check(assert_type(left_ts - s, "TimedeltaSeries"), pd.Series, pd.Timedelta)
-    check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta)
+    if TYPE_CHECKING_INVALID_USAGE:
+        # When I merge this one to the previous one, mypy does not allow me to pass


Caught with pytest.raises in 8d8691e

Dr-Irv

Pretty close. Just issue in using pytest.raises.

I also think it would be worth adding a subsection to https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#use-of-generic-types that explains the philosophy we've agreed to. Namely, since df["a"] has unknown type, we restrict arithmetic when the the other operand is Timedelta, Timestamp, str, etc., and their Series variants. Then we tell people they should use cast to tell the type checker "I know that this column of my DataFrame has the right type"

Dr-Irv · 2025-08-25T21:51:47Z

tests/series/arithmetic/str/test_add.py

+        with pytest.raises(AssertionError):
+            assert_never(left + i)


pytest will never execute this part of the code because of the if. I think this should remain as assert_type(left + i, Never)

Dr-Irv · 2025-08-25T21:52:48Z

tests/series/arithmetic/test_add.py

+        with pytest.raises(AssertionError):
+            assert_never(left_i + s)
+        with pytest.raises(AssertionError):
+            assert_never(s + left_i)
+        with pytest.raises(AssertionError):
+            assert_never(left_i.add(s))
+        with pytest.raises(AssertionError):
+            assert_never(left_i.radd(s))


sae issue - pytest will never run this code anyway. So can't we just use assert(left_i + s, Never) ?

Dr-Irv · 2025-08-25T21:53:31Z

tests/series/arithmetic/test_sub.py

+        with pytest.raises(AssertionError):
+            assert_never(left_ts.sub(a))

+        left_td.sub(s)  # type: ignore[arg-type] # pyright: ignore[reportArgumentType,reportCallIssue]
+        with pytest.raises(AssertionError):
+            assert_never(left_td.sub(a))


same issue - remove the pytest part

Dr-Irv · 2025-08-25T21:55:00Z

tests/test_frame.py


-    df["x"] = df["y"] + pd.Timedelta(days=3)
+    df["x"] = cast("TimestampSeries", df["y"]) + pd.Timedelta(days=3)


add a comment here indicating why the cast is necessary - it's because the type of df["y"] is unknown, so we have to cast to tell the type checker what kind of type it is when adding a Timedelta

fix(series): arithmetics for Series[Any]

b6cdcf1

cmp0xff mentioned this pull request Aug 22, 2025

refactor: #718 only drop TimestampSeries #1274

Draft

2 tasks

Dr-Irv requested changes Aug 22, 2025

View reviewed changes

fix(comment): pandas-dev#1343 (comment)

1fb597b

chore(typing): update mypy and ty

961d692

fix(comment): without str pandas-dev#1343 (comment)

e0b5b59

cmp0xff changed the title ~~fix(series): arithmetics for Series[Any]~~ fix(series): arithmetics for Series[Any] (Timestamp | Timedelta) Aug 24, 2025

cmp0xff requested a review from Dr-Irv August 25, 2025 13:14

Dr-Irv requested changes Aug 25, 2025

View reviewed changes

Dr-Irv and others added 3 commits August 25, 2025 17:28

handle str ops

5c23f68

feat(series): implement the proposal

8d8691e

Merge branch 'main' into feature/cmp0xff/arithmetics-for-series-any

646cdcd

cmp0xff commented Aug 25, 2025

View reviewed changes

cmp0xff requested a review from Dr-Irv August 25, 2025 20:36

cmp0xff changed the title ~~fix(series): arithmetics for Series[Any] (Timestamp | Timedelta)~~ fix(series): arithmetics for Series[Any] Aug 25, 2025

Dr-Irv requested changes Aug 25, 2025

View reviewed changes

		# `numpy` typing gives the corresponding `ndarray`s in the static type
		# checking, where our `__rsub__` cannot override.


		df["x"] = df["y"] + pd.Timedelta(days=3)
		df["x"] = cast("TimestampSeries", df["y"]) + pd.Timedelta(days=3)

Uh oh!

fix(series): arithmetics for Series[Any] #1343

Are you sure you want to change the base?

fix(series): arithmetics for Series[Any] #1343

Uh oh!

Conversation

cmp0xff commented Aug 22, 2025

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmp0xff commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current plan

Timestamp and Timedelta: permissive or forbidding

int: exceptions to the plan

Proposing a consistent plan

Uh oh!

Dr-Irv commented Aug 23, 2025

Uh oh!

cmp0xff commented Aug 24, 2025

TimestampSeries

Uh oh!

Dr-Irv commented Aug 24, 2025

Uh oh!

cmp0xff commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr-Irv commented Aug 25, 2025

Uh oh!

cmp0xff commented Aug 25, 2025

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmp0xff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmp0xff commented Aug 23, 2025 •

edited

Loading

`Timestamp` and `Timedelta`: permissive or forbidding

`int`: exceptions to the plan

`TimestampSeries`

cmp0xff commented Aug 24, 2025 •

edited

Loading