-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
BUG: Fix Series.str.contains with compiled regex on Arrow string dtype (#61942) #61946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 21 commits
f3829fd
c2a64fa
838b1c5
563f1f1
fda5619
324e609
b474604
3345bc7
9f06042
cbab096
d88f8d1
a0decbc
8fc81e0
8e226cd
6768fb1
05ae24f
702384d
0be9a18
4ddc7db
9a7e640
b00fbe0
76f741c
8e65078
4912758
0e620ca
915b38f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
.. _whatsnew_232: | ||
|
||
These are the changes in pandas 2.3.2. See :ref:`release` for a full changelog | ||
including other versions of pandas. | ||
|
||
{{ header }} | ||
|
||
Bug fixes | ||
^^^^^^^^^ | ||
|
||
- Fixed ``Series.str.contains`` with compiled regex on Arrow string dtype, which now correctly delegates to the object-dtype implementation. (:issue:`61942`) |
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -2,6 +2,7 @@ | |||||||
datetime, | ||||||||
timedelta, | ||||||||
) | ||||||||
import re | ||||||||
|
||||||||
import numpy as np | ||||||||
import pytest | ||||||||
|
@@ -176,6 +177,16 @@ def test_empty_str_methods(any_string_dtype): | |||||||
tm.assert_series_equal(empty_str, empty.str.translate(table)) | ||||||||
|
||||||||
|
||||||||
@pytest.mark.parametrize("dtype", ["string[pyarrow]"]) | ||||||||
def test_str_contains_compiled_regex_arrow_dtype(dtype): | ||||||||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
By using this fixture, we test it with all different string-like dtypes and ensure it behaves consistently (you might just have to define the expected boolean dtype depending on the exact dtype, you can see how that is done in the other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thankyou for suggesting . |
||||||||
ser = Series(["foo", "bar", "baz"], dtype=dtype) | ||||||||
pat = re.compile("ba.") | ||||||||
result = ser.str.contains(pat) | ||||||||
assert str(result.dtype) == "bool[pyarrow]" | ||||||||
expected = Series([False, True, True], dtype="bool[pyarrow]") | ||||||||
tm.assert_series_equal(result, expected) | ||||||||
|
||||||||
|
||||||||
@pytest.mark.parametrize( | ||||||||
"method, expected", | ||||||||
[ | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file already exists but as
v2.3.2.rst
. So you can move the item to that fileThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve deleted the file I previously created. However, I couldn't find v2.3.2.rst in the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file exists on main, see https://github.com/pandas-dev/pandas/blob/main/doc/source/whatsnew/v2.3.2.rst. So if you don't have it locally, that means you have to fetch the latest upstream repo and merge in your branch. See https://pandas.pydata.org/docs/development/contributing.html#updating-your-pull-request