Skip to content

BUG: Fix Series.str.contains with compiled regex on Arrow string dtype (#61942) #61946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f3829fd
BUG : Fix Series.str.contains with compiled regex on Arrow string
Aniketsy Jul 25, 2025
c2a64fa
BUG: Fix handling of compiled regex in Series.str.contains for Arrow-…
Aniketsy Jul 25, 2025
838b1c5
BUG: Fix handling of compiled regex in Series.str.contains for Arrow-…
Aniketsy Jul 25, 2025
563f1f1
STYLE: Fix formatting and docstring issues in str.contains
Aniketsy Jul 25, 2025
fda5619
Fixed ruff format
Aniketsy Jul 25, 2025
324e609
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
b474604
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
3345bc7
Revert changes to pandas/core/strings/accessor.py from PR #61946
Aniketsy Jul 26, 2025
9f06042
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
cbab096
Move fix into _str_contains of ArrowExtensionArray
Aniketsy Jul 26, 2025
d88f8d1
BUG: Fix Series.str.contains with compiled regex and arrow strings (#…
Aniketsy Jul 28, 2025
a0decbc
Revert changes to pandas/core/arrays/arrow/array.py in PR
Aniketsy Jul 28, 2025
8fc81e0
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 28, 2025
8e226cd
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 28, 2025
6768fb1
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
05ae24f
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
702384d
Revert changes to test_strings.py
Aniketsy Jul 29, 2025
0be9a18
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
4ddc7db
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
9a7e640
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
b00fbe0
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 29, 2025
76f741c
Revert test_strings.py changes and remove accidental whatsnew file
Aniketsy Jul 29, 2025
8e65078
Revert test_strings.py changes and remove accidental whatsnew file
Aniketsy Jul 29, 2025
4912758
Merge remote-tracking branch 'upstream/main' into fix-arrow-contains-…
Aniketsy Jul 30, 2025
0e620ca
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 30, 2025
915b38f
BUG: Fix Series.str.contains with compiled regex on Arrow string dtyp…
Aniketsy Jul 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.3.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Bug fixes
"string" type in the JSON Table Schema for :class:`StringDtype` columns
(:issue:`61889`)
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
- Fixed ``Series.str.contains`` with compiled regex on Arrow string dtype, which now correctly delegates to the object-dtype implementation. (:issue:`61942`)

.. ---------------------------------------------------------------------------
.. _whatsnew_232.contributors:
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/arrays/string_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,8 @@ def _str_contains(
):
if flags:
return super()._str_contains(pat, case, flags, na, regex)
if isinstance(pat, re.Pattern):
pat = pat.pattern

return ArrowStringArrayMixin._str_contains(self, pat, case, flags, na, regex)

Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/strings/test_find_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,21 @@ def test_contains_nan(any_string_dtype):
tm.assert_series_equal(result, expected)


def test_str_contains_compiled_regex_arrow_dtype(any_string_dtype):
# GH#61942
ser = Series(["foo", "bar", "baz"], dtype=any_string_dtype)
pat = re.compile("ba.")
result = ser.str.contains(pat)
# Determine expected dtype and values
expected_dtype = {
"string[pyarrow]": "bool[pyarrow]",
"string": "boolean",
"str": bool,
}.get(any_string_dtype, object)
expected = Series([False, True, True], dtype=expected_dtype)
tm.assert_series_equal(result, expected)


# --------------------------------------------------------------------------------------
# str.startswith
# --------------------------------------------------------------------------------------
Expand Down
Loading