Skip to content

BUG: Fix Series.str.contains with compiled regex on Arrow string dtype (#61942) #61946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f3829fd
BUG : Fix Series.str.contains with compiled regex on Arrow string
Aniketsy Jul 25, 2025
c2a64fa
BUG: Fix handling of compiled regex in Series.str.contains for Arrow-…
Aniketsy Jul 25, 2025
838b1c5
BUG: Fix handling of compiled regex in Series.str.contains for Arrow-…
Aniketsy Jul 25, 2025
563f1f1
STYLE: Fix formatting and docstring issues in str.contains
Aniketsy Jul 25, 2025
fda5619
Fixed ruff format
Aniketsy Jul 25, 2025
324e609
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
b474604
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
3345bc7
Revert changes to pandas/core/strings/accessor.py from PR #61946
Aniketsy Jul 26, 2025
9f06042
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
cbab096
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
d88f8d1
BUG: Fix Series.str.contains with compiled regex and arrow strings (#…
Aniketsy Jul 28, 2025
a0decbc
Revert changes to pandas/core/arrays/arrow/array.py in PR
Aniketsy Jul 28, 2025
8fc81e0
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 28, 2025
8e226cd
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 28, 2025
6768fb1
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
05ae24f
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
702384d
Revert changes to test_strings.py
Aniketsy Jul 29, 2025
0be9a18
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
4ddc7db
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
9a7e640
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
b00fbe0
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
76f741c
Revert test_strings.py changes and remove accidental whatsnew file
Aniketsy Jul 29, 2025
8e65078
Revert test_strings.py changes and remove accidental whatsnew file
Aniketsy Jul 29, 2025
4912758
Merge remote-tracking branch 'upstream/main' into fix-arrow-contains-…
Aniketsy Jul 30, 2025
0e620ca
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 30, 2025
915b38f
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/source/whatsnew/v2.3.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file already exists but as v2.3.2.rst. So you can move the item to that file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github

I’ve deleted the file I previously created. However, I couldn't find v2.3.2.rst in the repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file exists on main, see https://github.com/pandas-dev/pandas/blob/main/doc/source/whatsnew/v2.3.2.rst. So if you don't have it locally, that means you have to fetch the latest upstream repo and merge in your branch. See https://pandas.pydata.org/docs/development/contributing.html#updating-your-pull-request

Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _whatsnew_232:

These are the changes in pandas 2.3.2. See :ref:`release` for a full changelog
including other versions of pandas.

{{ header }}

Bug fixes
^^^^^^^^^

- Fixed ``Series.str.contains`` with compiled regex on Arrow string dtype, which now correctly delegates to the object-dtype implementation. (:issue:`61942`)
2 changes: 2 additions & 0 deletions pandas/core/arrays/string_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,8 @@ def _str_contains(
):
if flags:
return super()._str_contains(pat, case, flags, na, regex)
if isinstance(pat, re.Pattern):
pat = pat.pattern

return ArrowStringArrayMixin._str_contains(self, pat, case, flags, na, regex)

Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/strings/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
datetime,
timedelta,
)
import re

import numpy as np
import pytest
Expand Down Expand Up @@ -176,6 +177,16 @@ def test_empty_str_methods(any_string_dtype):
tm.assert_series_equal(empty_str, empty.str.translate(table))


@pytest.mark.parametrize("dtype", ["string[pyarrow]"])
def test_str_contains_compiled_regex_arrow_dtype(dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("dtype", ["string[pyarrow]"])
def test_str_contains_compiled_regex_arrow_dtype(dtype):
def test_str_contains_compiled_regex_arrow_dtype(any_string_dtype):

By using this fixture, we test it with all different string-like dtypes and ensure it behaves consistently (you might just have to define the expected boolean dtype depending on the exact dtype, you can see how that is done in the other .str.contains tests)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankyou for suggesting .
I have applied changes and updated PR

ser = Series(["foo", "bar", "baz"], dtype=dtype)
pat = re.compile("ba.")
result = ser.str.contains(pat)
assert str(result.dtype) == "bool[pyarrow]"
expected = Series([False, True, True], dtype="bool[pyarrow]")
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"method, expected",
[
Expand Down
Loading