-
-
Notifications
You must be signed in to change notification settings - Fork 147
fix(series): arithmetics for Series[Any] #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(series): arithmetics for Series[Any] #1343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only issues I see are with respect to the code inside of if TYPE_CHECKING_INVALID_USAGE
that you added.
The concern here is that some of the lines will execute fine if the Series
comes from a DataFrame
and has the correct type inside, but we see that it is a static Series[Any]
. And vice versa.
I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any]
, then we detect that as a typing problem. But we need to be selective.
For example,
df = pd.DataFrame({"a": [1,2,3], "b": pd.to_datetime(["1/1/2025", "2/1/2025", "3/1/2025"])})
sa = df["a"]
sb = df["b"]
sa - pd.Timestamp("1/1/2024") # fails at runtime
sb - pd.Timestamp("1/1/2024") # works at runtime
Here sa
and sb
are Series[Any]
(mypy) or Series[Unknown]
(pyright). So the typing either has to accept both cases or reject both cases.
I think we have to be selective here, and probably disallow subtraction with untyped Series
when the other argument is known to be time related (Timestamp
, Timedelta
and associated Series
) or is a string or Series[str]
. I think the current stubs are more permissive, but now I'm not sure that's the right thing to do.
tests/series/arithmetic/test_sub.py
Outdated
if TYPE_CHECKING_INVALID_USAGE: | ||
_0 = left_td - s | ||
check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
if TYPE_CHECKING_INVALID_USAGE: | ||
_1 = left_td - a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you have TYPE_CHECKING_INVALID_USAGE
, that means we should have # type: ignore
and # pyright: ignore
statements that demonstrate the type checker can catch those errors.
tests/series/arithmetic/test_sub.py
Outdated
check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
if TYPE_CHECKING_INVALID_USAGE: | ||
_1 = left_ts - a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of valid code that should be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it was a typo. 1fb597b
However, I did not easily understand your comment here.
This is an example of valid code that should be accepted.
left_ts
and left_td
are Series[Any]
at type checking. But you wrote
I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series)
It seems to me that in your plan, both left_td - a
and left_ts - a
should give an error / a Never
at type checking. Am I right?
My proposed plan is more permissive and will not detect the problem of left_ts - a
, because at runtime, Series[Any] - TimedeltaSeries
can either be TimestampSeries
or TimedeltaSeries
or give an error. In my proposed plan, at type checking, it would give Series[Any]
.
Hi @Dr-Irv , thank you for drafting the plan. Current plan
I would like to summarise this typing plan as following:
|
The challenge here is the issue of wide vs narrow types. See https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#narrow-vs-wide-arguments for some writeup I did about that. Let's consider this example from your list:
In what is in si = pd.DataFrame({"a": pd.Series([1,2,3])})["a"]
st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
result = si - st I think we do a better service to users if we actually catch this via typing, i.e., for This makes the user then Let's also consider this example: st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
sd = pd.DataFrame({"a": [pd.Timedelta("1 day"), pd.Timedelta("2 days"), pd.Timedelta("3 days")]})["a"]
result = st - sd In this case, if we adopt my proposal, the type checker would say that result = st - cast("pd.Series[Timedelta]", sd) which is telling the type checker "I know this is a series of timedeltas" I'm choosing what I consider to be a happy medium here between your proposal, and something that would be too narrow (e.g., disallowing I should say that the current behavior in the stubs is from 3 years ago when we first inherited the project from something MIcrosoft had started, and now that I have more experience with typing, as well as using the stubs in my own code, I've come around to trying to find more things with static type checking if we can find them, then not. So the summary of my proposal is (with respect to
Let me know your thoughts on that. |
Hi @Dr-Irv , thank you again for the detailed reply.
|
Or we can catch it in type checking??
Yes, this latter point is quite valid. Here's a possible modification of my proposal: (with respect to
Is this possible? |
I tried to implement the plan in e0b5b59. The |
So should I review now? |
Yes please. I believe I have implemented the ideas in #1343 (comment). I would like to follow the proposal there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured out how to handle the operators with str
. See 9c972d3
I put the tests I used in one file, figuring you can spread them out accordingly.
The issue is that _ListLike
includes Sequence[S1]
, which matches a str
, so the key was to use SequenceNotStr[S1]
instead.
tests/series/arithmetic/test_sub.py
Outdated
# `numpy` typing gives the corresponding `ndarray`s in the static type | ||
# checking, where our `__rsub__` cannot override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are catching this with the typing stubs, you can remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 8d8691e
tests/series/arithmetic/test_sub.py
Outdated
check(assert_type(left_ts - s, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
if TYPE_CHECKING_INVALID_USAGE: | ||
# When I merge this one to the previous one, mypy does not allow me to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when mypy
sees the result of Never
on a method (as opposed to an operator), it assumes the code following that won't be executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught with pytest.raises
in 8d8691e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/series/arithmetic/test_sub.py
Outdated
# `numpy` typing gives the corresponding `ndarray`s in the static type | ||
# checking, where our `__rsub__` cannot override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 8d8691e
tests/series/arithmetic/test_sub.py
Outdated
check(assert_type(left_ts - s, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
if TYPE_CHECKING_INVALID_USAGE: | ||
# When I merge this one to the previous one, mypy does not allow me to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught with pytest.raises
in 8d8691e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty close. Just issue in using pytest.raises
.
I also think it would be worth adding a subsection to https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#use-of-generic-types that explains the philosophy we've agreed to. Namely, since df["a"]
has unknown type, we restrict arithmetic when the the other operand is Timedelta
, Timestamp
, str
, etc., and their Series
variants. Then we tell people they should use cast
to tell the type checker "I know that this column of my DataFrame
has the right type"
with pytest.raises(AssertionError): | ||
assert_never(left + i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest
will never execute this part of the code because of the if
. I think this should remain as assert_type(left + i, Never)
with pytest.raises(AssertionError): | ||
assert_never(left_i + s) | ||
with pytest.raises(AssertionError): | ||
assert_never(s + left_i) | ||
with pytest.raises(AssertionError): | ||
assert_never(left_i.add(s)) | ||
with pytest.raises(AssertionError): | ||
assert_never(left_i.radd(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sae issue - pytest
will never run this code anyway. So can't we just use assert(left_i + s, Never)
?
with pytest.raises(AssertionError): | ||
assert_never(left_ts.sub(a)) | ||
|
||
left_td.sub(s) # type: ignore[arg-type] # pyright: ignore[reportArgumentType,reportCallIssue] | ||
with pytest.raises(AssertionError): | ||
assert_never(left_td.sub(a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue - remove the pytest
part
|
||
df["x"] = df["y"] + pd.Timedelta(days=3) | ||
df["x"] = cast("TimestampSeries", df["y"]) + pd.Timedelta(days=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here indicating why the cast
is necessary - it's because the type of df["y"]
is unknown, so we have to cast to tell the type checker what kind of type it is when adding a Timedelta
This PR implements the ideas from #1274 (comment) and #1274 (comment).
assert_type()
to assert the type of any return value