Skip to content

Commit e11a375

Browse files
authored
Merge branch 'main' into main
2 parents 96089c4 + 36b8f20 commit e11a375

File tree

18 files changed

+316
-115
lines changed

18 files changed

+316
-115
lines changed

.github/workflows/wheels.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"
163163

164164
- name: Build wheels
165-
uses: pypa/cibuildwheel@v2.23.3
165+
uses: pypa/cibuildwheel@v3.1.1
166166
with:
167167
package-dir: ./dist/${{ startsWith(matrix.buildplat[1], 'macosx') && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
168168
env:

doc/source/whatsnew/v2.3.2.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Bug fixes
2525
- Fix :meth:`~DataFrame.to_json` with ``orient="table"`` to correctly use the
2626
"string" type in the JSON Table Schema for :class:`StringDtype` columns
2727
(:issue:`61889`)
28-
28+
- Boolean operations (``|``, ``&``, ``^``) with bool-dtype objects on the left and :class:`StringDtype` objects on the right now cast the string to bool, with a deprecation warning (:issue:`60234`)
2929

3030
.. ---------------------------------------------------------------------------
3131
.. _whatsnew_232.contributors:

doc/source/whatsnew/v3.0.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -731,6 +731,7 @@ Timezones
731731

732732
Numeric
733733
^^^^^^^
734+
- Bug in :func:`api.types.infer_dtype` returning "mixed" for complex and ``pd.NA`` mix (:issue:`61976`)
734735
- Bug in :func:`api.types.infer_dtype` returning "mixed-integer-float" for float and ``pd.NA`` mix (:issue:`61621`)
735736
- Bug in :meth:`DataFrame.corr` where numerical precision errors resulted in correlations above ``1.0`` (:issue:`61120`)
736737
- Bug in :meth:`DataFrame.cov` raises a ``TypeError`` instead of returning potentially incorrect results or other errors (:issue:`53115`)
@@ -851,6 +852,7 @@ Groupby/resample/rolling
851852
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
852853
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` were not keeping the index name when the index had :class:`ArrowDtype` timestamp dtype (:issue:`61222`)
853854
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
855+
- Bug in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` that was returning numpy dtype values when input values are pyarrow dtype values, instead of returning pyarrow dtype values. (:issue:`53030`)
854856
- Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
855857
- Bug in :meth:`DataFrameGroupBy.agg` where applying a user-defined function to an empty DataFrame returned a Series instead of an empty DataFrame. (:issue:`61503`)
856858
- Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)
@@ -940,6 +942,7 @@ Other
940942
- Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers' associated dtype, for string and datetime columns (:issue:`54781`)
941943
- Bug in ``Series.list`` methods not preserving the original :class:`Index`. (:issue:`58425`)
942944
- Bug in ``Series.list`` methods not preserving the original name. (:issue:`60522`)
945+
- Bug in ``Series.replace`` when the Series was created from an :class:`Index` and Copy-On-Write is enabled (:issue:`61622`)
943946
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
944947
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
945948
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)

pandas/_libs/lib.pyx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1974,9 +1974,11 @@ cdef class ComplexValidator(Validator):
19741974
return cnp.PyDataType_ISCOMPLEX(self.dtype)
19751975

19761976

1977-
cdef bint is_complex_array(ndarray values):
1977+
cdef bint is_complex_array(ndarray values, bint skipna=True):
19781978
cdef:
1979-
ComplexValidator validator = ComplexValidator(values.size, values.dtype)
1979+
ComplexValidator validator = ComplexValidator(values.size,
1980+
values.dtype,
1981+
skipna=skipna)
19801982
return validator.validate(values)
19811983

19821984

pandas/core/arrays/arrow/array.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
overload,
1313
)
1414
import unicodedata
15+
import warnings
1516

1617
import numpy as np
1718

@@ -27,6 +28,7 @@
2728
pa_version_under13p0,
2829
)
2930
from pandas.util._decorators import doc
31+
from pandas.util._exceptions import find_stack_level
3032

3133
from pandas.core.dtypes.cast import (
3234
can_hold_element,
@@ -852,6 +854,25 @@ def _logical_method(self, other, op) -> Self:
852854
# integer types. Otherwise these are boolean ops.
853855
if pa.types.is_integer(self._pa_array.type):
854856
return self._evaluate_op_method(other, op, ARROW_BIT_WISE_FUNCS)
857+
elif (
858+
(
859+
pa.types.is_string(self._pa_array.type)
860+
or pa.types.is_large_string(self._pa_array.type)
861+
)
862+
and op in (roperator.ror_, roperator.rand_, roperator.rxor)
863+
and isinstance(other, np.ndarray)
864+
and other.dtype == bool
865+
):
866+
# GH#60234 backward compatibility for the move to StringDtype in 3.0
867+
op_name = op.__name__[1:].strip("_")
868+
warnings.warn(
869+
f"'{op_name}' operations between boolean dtype and {self.dtype} are "
870+
"deprecated and will raise in a future version. Explicitly "
871+
"cast the strings to a boolean dtype before operating instead.",
872+
FutureWarning,
873+
stacklevel=find_stack_level(),
874+
)
875+
return op(other, self.astype(bool))
855876
else:
856877
return self._evaluate_op_method(other, op, ARROW_LOGICAL_FUNCS)
857878

pandas/core/arrays/string_.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
missing,
5353
nanops,
5454
ops,
55+
roperator,
5556
)
5657
from pandas.core.algorithms import isin
5758
from pandas.core.array_algos import masked_reductions
@@ -390,6 +391,26 @@ class BaseStringArray(ExtensionArray):
390391

391392
dtype: StringDtype
392393

394+
# TODO(4.0): Once the deprecation here is enforced, this method can be
395+
# removed and we use the parent class method instead.
396+
def _logical_method(self, other, op):
397+
if (
398+
op in (roperator.ror_, roperator.rand_, roperator.rxor)
399+
and isinstance(other, np.ndarray)
400+
and other.dtype == bool
401+
):
402+
# GH#60234 backward compatibility for the move to StringDtype in 3.0
403+
op_name = op.__name__[1:].strip("_")
404+
warnings.warn(
405+
f"'{op_name}' operations between boolean dtype and {self.dtype} are "
406+
"deprecated and will raise in a future version. Explicitly "
407+
"cast the strings to a boolean dtype before operating instead.",
408+
FutureWarning,
409+
stacklevel=find_stack_level(),
410+
)
411+
return op(other, self.astype(bool))
412+
return NotImplemented
413+
393414
@doc(ExtensionArray.tolist)
394415
def tolist(self) -> list:
395416
if self.ndim > 1:

pandas/core/frame.py

Lines changed: 29 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -7173,35 +7173,43 @@ def sort_values(
71737173
`natural sorting <https://en.wikipedia.org/wiki/Natural_sort_order>`__.
71747174
This can be done using
71757175
``natsort`` `package <https://github.com/SethMMorton/natsort>`__,
7176-
which provides sorted indices according
7177-
to their natural order, as shown below:
7176+
which provides a function to generate a key
7177+
to sort data in their natural order:
71787178
71797179
>>> df = pd.DataFrame(
71807180
... {
7181-
... "time": ["0hr", "128hr", "72hr", "48hr", "96hr"],
7182-
... "value": [10, 20, 30, 40, 50],
7181+
... "hours": ["0hr", "128hr", "0hr", "64hr", "64hr", "128hr"],
7182+
... "mins": [
7183+
... "10mins",
7184+
... "40mins",
7185+
... "40mins",
7186+
... "40mins",
7187+
... "10mins",
7188+
... "10mins",
7189+
... ],
7190+
... "value": [10, 20, 30, 40, 50, 60],
71837191
... }
71847192
... )
71857193
>>> df
7186-
time value
7187-
0 0hr 10
7188-
1 128hr 20
7189-
2 72hr 30
7190-
3 48hr 40
7191-
4 96hr 50
7192-
>>> from natsort import index_natsorted
7193-
>>> index_natsorted(df["time"])
7194-
[0, 3, 2, 4, 1]
7194+
hours mins value
7195+
0 0hr 10mins 10
7196+
1 128hr 40mins 20
7197+
2 0hr 40mins 30
7198+
3 64hr 40mins 40
7199+
4 64hr 10mins 50
7200+
5 128hr 10mins 60
7201+
>>> from natsort import natsort_keygen
71957202
>>> df.sort_values(
7196-
... by="time",
7197-
... key=lambda x: np.argsort(index_natsorted(x)),
7203+
... by=["hours", "mins"],
7204+
... key=natsort_keygen(),
71987205
... )
7199-
time value
7200-
0 0hr 10
7201-
3 48hr 40
7202-
2 72hr 30
7203-
4 96hr 50
7204-
1 128hr 20
7206+
hours mins value
7207+
0 0hr 10mins 10
7208+
2 0hr 40mins 30
7209+
4 64hr 10mins 50
7210+
3 64hr 40mins 40
7211+
5 128hr 10mins 60
7212+
1 128hr 40mins 20
72057213
"""
72067214
inplace = validate_bool_kwarg(inplace, "inplace")
72077215
axis = self._get_axis_number(axis)

pandas/core/generic.py

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5004,27 +5004,38 @@ def sort_values(
50045004
50055005
>>> df = pd.DataFrame(
50065006
... {
5007-
... "time": ["0hr", "128hr", "72hr", "48hr", "96hr"],
5008-
... "value": [10, 20, 30, 40, 50],
5007+
... "hours": ["0hr", "128hr", "0hr", "64hr", "64hr", "128hr"],
5008+
... "mins": [
5009+
... "10mins",
5010+
... "40mins",
5011+
... "40mins",
5012+
... "40mins",
5013+
... "10mins",
5014+
... "10mins",
5015+
... ],
5016+
... "value": [10, 20, 30, 40, 50, 60],
50095017
... }
50105018
... )
50115019
>>> df
5012-
time value
5013-
0 0hr 10
5014-
1 128hr 20
5015-
2 72hr 30
5016-
3 48hr 40
5017-
4 96hr 50
5018-
>>> from natsort import index_natsorted
5020+
hours mins value
5021+
0 0hr 10mins 10
5022+
1 128hr 40mins 20
5023+
2 0hr 40mins 30
5024+
3 64hr 40mins 40
5025+
4 64hr 10mins 50
5026+
5 128hr 10mins 60
5027+
>>> from natsort import natsort_keygen
50195028
>>> df.sort_values(
5020-
... by="time", key=lambda x: np.argsort(index_natsorted(df["time"]))
5029+
... by=["hours", "mins"],
5030+
... key=natsort_keygen(),
50215031
... )
5022-
time value
5023-
0 0hr 10
5024-
3 48hr 40
5025-
2 72hr 30
5026-
4 96hr 50
5027-
1 128hr 20
5032+
hours mins value
5033+
0 0hr 10mins 10
5034+
2 0hr 40mins 30
5035+
4 64hr 10mins 50
5036+
3 64hr 40mins 40
5037+
5 128hr 10mins 60
5038+
1 128hr 40mins 20
50285039
"""
50295040
raise AbstractMethodError(self)
50305041

pandas/core/groupby/ops.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -44,13 +44,15 @@
4444
ensure_platform_int,
4545
ensure_uint64,
4646
is_1d_only_ea_dtype,
47+
is_string_dtype,
4748
)
4849
from pandas.core.dtypes.missing import (
4950
isna,
5051
maybe_fill,
5152
)
5253

5354
from pandas.core.arrays import Categorical
55+
from pandas.core.arrays.arrow.array import ArrowExtensionArray
5456
from pandas.core.frame import DataFrame
5557
from pandas.core.groupby import grouper
5658
from pandas.core.indexes.api import (
@@ -963,18 +965,26 @@ def agg_series(
963965
-------
964966
np.ndarray or ExtensionArray
965967
"""
968+
result = self._aggregate_series_pure_python(obj, func)
969+
npvalues = lib.maybe_convert_objects(result, try_float=False)
970+
971+
if isinstance(obj._values, ArrowExtensionArray):
972+
# When obj.dtype is a string, any object can be cast. Only do so if the
973+
# UDF returned strings or NA values.
974+
if not is_string_dtype(obj.dtype) or lib.is_string_array(
975+
npvalues, skipna=True
976+
):
977+
out = maybe_cast_pointwise_result(
978+
npvalues, obj.dtype, numeric_only=True, same_dtype=preserve_dtype
979+
)
980+
else:
981+
out = npvalues
966982

967-
if not isinstance(obj._values, np.ndarray):
983+
elif not isinstance(obj._values, np.ndarray):
968984
# we can preserve a little bit more aggressively with EA dtype
969985
# because maybe_cast_pointwise_result will do a try/except
970986
# with _from_sequence. NB we are assuming here that _from_sequence
971987
# is sufficiently strict that it casts appropriately.
972-
preserve_dtype = True
973-
974-
result = self._aggregate_series_pure_python(obj, func)
975-
976-
npvalues = lib.maybe_convert_objects(result, try_float=False)
977-
if preserve_dtype:
978988
out = maybe_cast_pointwise_result(npvalues, obj.dtype, numeric_only=True)
979989
else:
980990
out = npvalues

pandas/core/internals/blocks.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
final,
1111
)
1212
import warnings
13-
import weakref
1413

1514
import numpy as np
1615

@@ -863,14 +862,22 @@ def replace_list(
863862
)
864863

865864
if i != src_len:
866-
# This is ugly, but we have to get rid of intermediate refs
867-
# that did not go out of scope yet, otherwise we will trigger
868-
# many unnecessary copies
865+
# This is ugly, but we have to get rid of intermediate refs. We
866+
# can simply clear the referenced_blocks if we already copied,
867+
# otherwise we have to remove ourselves
868+
self_blk_ids = {
869+
id(b()): i for i, b in enumerate(self.refs.referenced_blocks)
870+
}
869871
for b in result:
870-
ref = weakref.ref(b)
871-
b.refs.referenced_blocks.pop(
872-
b.refs.referenced_blocks.index(ref)
873-
)
872+
if b.refs is self.refs:
873+
# We are still sharing memory with self
874+
if id(b) in self_blk_ids:
875+
# Remove ourselves from the refs; we are temporary
876+
self.refs.referenced_blocks.pop(self_blk_ids[id(b)])
877+
else:
878+
# We have already copied, so we can clear the refs to avoid
879+
# future copies
880+
b.refs.referenced_blocks.clear()
874881
new_rb.extend(result)
875882
rb = new_rb
876883
return rb

0 commit comments

Comments
 (0)