Skip to content

BUG: .describe() doesn't work for EAs #61707 #61760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
9b3c6ac
Fix describe() for ExtensionArrays with multiple internal dtypes
kernelism Jul 2, 2025
3550556
chore: remove redundant words in comment (#61759)
ianlv Jul 2, 2025
22f12fc
DEPS: bump pyarrow minimum version from 10.0 to 12.0 (#61723)
jorisvandenbossche Jul 3, 2025
b91fa1d
DEPR: object inference in to_stata (#56536)
jbrockmendel Jul 3, 2025
9dcce63
ENH: Allow third-party packages to register IO engines (#61642)
datapythonista Jul 3, 2025
391107a
Revert "ENH: Allow third-party packages to register IO engines" (#61767)
jbrockmendel Jul 3, 2025
51763f9
BUG: NA.__and__, __or__, __xor__ with np.bool_ objects (#61768)
jbrockmendel Jul 3, 2025
e5a1c10
BUG: Fix unpickling of string dtypes of legacy pandas versions (#61770)
Liam3851 Jul 7, 2025
2b471c8
DOC: add pandas 3.0 migration guide for the string dtype (#61705)
jorisvandenbossche Jul 7, 2025
0faaf5c
DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW…
jorisvandenbossche Jul 7, 2025
cf1a11c
BUG[string]: incorrect index downcast in DataFrame.join (#61771)
jbrockmendel Jul 7, 2025
ebca3c5
TST: update expected dtype for sum of decimals with pyarrow 21+ (#61799)
jorisvandenbossche Jul 7, 2025
b9d5732
DOC: Add link to WebGL in pandas ecosystem (#61790)
star1327p Jul 7, 2025
be2cb8c
CLN: remove and udpate for outdated _item_cache (#61789)
chilin0525 Jul 7, 2025
ff8a607
DOC: prepare 2.3.1 whatsnew notes for release (#61794)
jorisvandenbossche Jul 7, 2025
d21ad1a
PERF: avoid object-dtype path in ArrowEA._explode (#61786)
jbrockmendel Jul 7, 2025
16fd208
TST: option_context bug on Mac GH#58055 (#61779)
jbrockmendel Jul 7, 2025
b5e441e
BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with tim…
jbrockmendel Jul 7, 2025
fea4f5b
REF: remove unreachable, stronger typing in parsers.pyx (#61785)
jbrockmendel Jul 7, 2025
7c2796d
[pre-commit.ci] pre-commit autoupdate (#61802)
pre-commit-ci[bot] Jul 7, 2025
d1a245c
DEPS: Bump NumPy and tzdata (#61806)
mroeschke Jul 8, 2025
d5f97ed
feature #49580: support new-style float_format string in to_csv (#61650)
pedromfdiogo Jul 8, 2025
f94b430
CI: Remove PyPy references in CI testing (#61814)
mroeschke Jul 9, 2025
e635c3e
TST[string]: update expecteds for using_string_dtype to fix xfails (#…
jbrockmendel Jul 10, 2025
b876c67
BUG: Fix Index.equals between object and string (#61541)
sanggon6107 Jul 10, 2025
9da2c8f
BUG: Require sample weights to sum to less than 1 when replace = True…
microslaw Jul 11, 2025
d785a3d
DOC: Update link to pytz documentation (#61821)
star1327p Jul 11, 2025
337d5fe
REF: separate out helpers in libparser (#61832)
jbrockmendel Jul 11, 2025
688e2a0
TST: Fix `test_mask_stringdtype` (#61830)
arthurlw Jul 11, 2025
e1328fc
TST: enable 2D tests for MaskedArrays, fix+test shift (#61826)
jbrockmendel Jul 11, 2025
fd7bfaa
BUG: Fix infer_dtype result for float with embedded pd.NA (#61624)
heoh Jul 11, 2025
e83b820
DOC: Correct error message in AbstractMethodError for methodtype argu…
Maaz-319 Jul 11, 2025
da7f2be
DOC: rm excessive backtick (#61839)
mattwang44 Jul 12, 2025
4f2aa4d
DOC: Update README.md to reference issues related to 'good first issu…
sivasweatha Jul 12, 2025
a2315af
BUG: Fix pivot_table margins to include NaN groups when dropna=False …
iabhi4 Jul 13, 2025
bc6ad14
Remove incorrect line in Series init docstring (#61849)
petern48 Jul 14, 2025
1d153bb
TST(string dtype): Resolve xfails in test_from_dummies (#60694)
rhshadrach Jul 15, 2025
43711d5
API: np.isinf on Index return Index[bool] (#61874)
jbrockmendel Jul 16, 2025
2c89a91
DOC: Add Raises section to to_numeric docstring (#61868)
tisjayy Jul 16, 2025
13bba34
String dtype: turn on by default (#61722)
jorisvandenbossche Jul 16, 2025
598b7d1
DOC: show Parquet examples with default engine (without explicit pyar…
jorisvandenbossche Jul 16, 2025
88cb152
DOC: update Parquet IO user guide on index handling and type support …
jorisvandenbossche Jul 16, 2025
042ac78
ERR: improve exception message from timedelta64-datetime64 (#61876)
jbrockmendel Jul 16, 2025
3e9237c
BUG: Timedelta with invalid keyword (#61883)
jbrockmendel Jul 16, 2025
d5eab1b
API: Index.__cmp__(Series) return NotImplemented (#61884)
jbrockmendel Jul 16, 2025
90b1c5d
DOC: make doc build run with string dtype enabled (#61864)
jorisvandenbossche Jul 17, 2025
6537afe
DOC: fix doctests for string dtype changes (top-level) (#61887)
jorisvandenbossche Jul 17, 2025
6fca116
BUG: disallow exotic np.datetime64 unit (#61882)
jbrockmendel Jul 17, 2025
4b18266
API: IncompatibleFrequency subclass TypeError (#61875)
jbrockmendel Jul 18, 2025
6a6a1ba
BUG: If both index and axis are passed to DataFrame.drop, raise a cle…
khemkaran10 Jul 18, 2025
8de38e8
BUG: fix padding for string categories in CategoricalIndex repr (#61894)
jorisvandenbossche Jul 19, 2025
9edf890
61760: merge with main
kernelism Jul 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 8 additions & 17 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
env_file: [actions-310.yaml, actions-311.yaml, actions-312.yaml, actions-313.yaml]
# Prevent the include jobs from overriding other jobs
pattern: [""]
pandas_future_infer_string: ["0"]
pandas_future_infer_string: ["1"]
include:
- name: "Downstream Compat"
env_file: actions-311-downstream_compat.yaml
Expand All @@ -45,6 +45,10 @@ jobs:
env_file: actions-313-freethreading.yaml
pattern: "not slow and not network and not single_cpu"
platform: ubuntu-24.04
- name: "Without PyArrow"
env_file: actions-312.yaml
pattern: "not slow and not network and not single_cpu"
platform: ubuntu-24.04
- name: "Locale: it_IT"
env_file: actions-311.yaml
pattern: "not slow and not network and not single_cpu"
Expand All @@ -67,18 +71,9 @@ jobs:
# It will be temporarily activated during tests with locale.setlocale
extra_loc: "zh_CN"
platform: ubuntu-24.04
- name: "Future infer strings"
- name: "Past no infer strings"
env_file: actions-312.yaml
pandas_future_infer_string: "1"
platform: ubuntu-24.04
- name: "Future infer strings (without pyarrow)"
env_file: actions-311.yaml
pandas_future_infer_string: "1"
platform: ubuntu-24.04
- name: "Pypy"
env_file: actions-pypy-39.yaml
pattern: "not slow and not network and not single_cpu"
test_args: "--max-worker-restart 0"
pandas_future_infer_string: "0"
platform: ubuntu-24.04
- name: "Numpy Dev"
env_file: actions-311-numpydev.yaml
Expand All @@ -88,7 +83,6 @@ jobs:
- name: "Pyarrow Nightly"
env_file: actions-311-pyarrownightly.yaml
pattern: "not slow and not network and not single_cpu"
pandas_future_infer_string: "1"
platform: ubuntu-24.04
fail-fast: false
name: ${{ matrix.name || format('{0} {1}', matrix.platform, matrix.env_file) }}
Expand All @@ -103,7 +97,7 @@ jobs:
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}
# Clipboard tests
QT_QPA_PLATFORM: offscreen
REMOVE_PYARROW: ${{ matrix.name == 'Future infer strings (without pyarrow)' && '1' || '0' }}
REMOVE_PYARROW: ${{ matrix.name == 'Without PyArrow' && '1' || '0' }}
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{ matrix.env_file }}-${{ matrix.pattern }}-${{ matrix.extra_apt || '' }}-${{ matrix.pandas_future_infer_string }}-${{ matrix.platform }}
Expand Down Expand Up @@ -169,12 +163,9 @@ jobs:
with:
# xref https://github.com/cython/cython/issues/6870
werror: ${{ matrix.name != 'Freethreading' }}
# TODO: Re-enable once Pypy has Pypy 3.10 on conda-forge
if: ${{ matrix.name != 'Pypy' }}

- name: Test (not single_cpu)
uses: ./.github/actions/run-tests
if: ${{ matrix.name != 'Pypy' }}
env:
# Set pattern to not single_cpu if not already set
PATTERN: ${{ env.PATTERN == '' && 'not single_cpu' || matrix.pattern }}
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,6 @@ jobs:
- [macos-14, macosx_arm64]
- [windows-2022, win_amd64]
- [windows-11-arm, win_arm64]
# TODO: support PyPy?
python: [["cp310", "3.10"], ["cp311", "3.11"], ["cp312", "3.12"], ["cp313", "3.13"], ["cp313t", "3.13"]]
include:
# Build Pyodide wheels and upload them to Anaconda.org
Expand Down
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ ci:
skip: [pyright, mypy]
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.12
rev: v0.12.2
hooks:
- id: ruff
args: [--exit-non-zero-on-fix]
Expand Down Expand Up @@ -47,7 +47,7 @@ repos:
types_or: [python, rst, markdown, cython, c]
additional_dependencies: [tomli]
- repo: https://github.com/MarcoGorelli/cython-lint
rev: v0.16.6
rev: v0.16.7
hooks:
- id: cython-lint
- id: double-quote-cython-strings
Expand Down Expand Up @@ -95,14 +95,14 @@ repos:
- id: sphinx-lint
args: ["--enable", "all", "--disable", "line-too-long"]
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v20.1.5
rev: v20.1.7
hooks:
- id: clang-format
files: ^pandas/_libs/src|^pandas/_libs/include
args: [-i]
types_or: [c, c++]
- repo: https://github.com/trim21/pre-commit-mirror-meson
rev: v1.8.1
rev: v1.8.2
hooks:
- id: meson-fmt
args: ['--inplace']
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ All contributions, bug reports, bug fixes, documentation improvements, enhanceme

A detailed overview on how to contribute can be found in the **[contributing guide](https://pandas.pydata.org/docs/dev/development/contributing.html)**.

If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out.
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?q=is%3Aissue%20state%3Aopen%20label%3ADocs%20sort%3Aupdated-desc) and [good first issue](https://github.com/pandas-dev/pandas/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22%20sort%3Aupdated-desc) where you could start out.

You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).

Expand Down
30 changes: 15 additions & 15 deletions asv_bench/benchmarks/gil.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
from .pandas_vb_common import BaseIO # isort:skip


def test_parallel(num_threads=2, kwargs_list=None):
def run_parallel(num_threads=2, kwargs_list=None):
"""
Decorator to run the same function multiple times in parallel.

Expand Down Expand Up @@ -95,7 +95,7 @@ def setup(self, threads, method):
{"key": np.random.randint(0, ngroups, size=N), "data": np.random.randn(N)}
)

@test_parallel(num_threads=threads)
@run_parallel(num_threads=threads)
def parallel():
getattr(df.groupby("key")["data"], method)()

Expand Down Expand Up @@ -123,7 +123,7 @@ def setup(self, threads):
ngroups = 10**3
data = Series(np.random.randint(0, ngroups, size=size))

@test_parallel(num_threads=threads)
@run_parallel(num_threads=threads)
def get_groups():
data.groupby(data).groups

Expand All @@ -142,7 +142,7 @@ def setup(self, dtype):
df = DataFrame({"col": np.arange(N, dtype=dtype)})
indexer = np.arange(100, len(df) - 100)

@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def parallel_take1d():
take_nd(df["col"].values, indexer)

Expand All @@ -163,7 +163,7 @@ def setup(self):
k = 5 * 10**5
kwargs_list = [{"arr": np.random.randn(N)}, {"arr": np.random.randn(N)}]

@test_parallel(num_threads=2, kwargs_list=kwargs_list)
@run_parallel(num_threads=2, kwargs_list=kwargs_list)
def parallel_kth_smallest(arr):
algos.kth_smallest(arr, k)

Expand All @@ -180,42 +180,42 @@ def setup(self):
self.period = self.dti.to_period("D")

def time_datetime_field_year(self):
@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def run(dti):
dti.year

run(self.dti)

def time_datetime_field_day(self):
@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def run(dti):
dti.day

run(self.dti)

def time_datetime_field_daysinmonth(self):
@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def run(dti):
dti.days_in_month

run(self.dti)

def time_datetime_field_normalize(self):
@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def run(dti):
dti.normalize()

run(self.dti)

def time_datetime_to_period(self):
@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def run(dti):
dti.to_period("s")

run(self.dti)

def time_period_to_datetime(self):
@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def run(period):
period.to_timestamp()

Expand All @@ -232,7 +232,7 @@ def setup(self, method):
if hasattr(DataFrame, "rolling"):
df = DataFrame(arr).rolling(win)

@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def parallel_rolling():
getattr(df, method)()

Expand All @@ -249,7 +249,7 @@ def parallel_rolling():
"std": rolling_std,
}

@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def parallel_rolling():
rolling[method](arr, win)

Expand Down Expand Up @@ -286,7 +286,7 @@ def setup(self, dtype):
self.fname = f"__test_{dtype}__.csv"
df.to_csv(self.fname)

@test_parallel(num_threads=2)
@run_parallel(num_threads=2)
def parallel_read_csv():
read_csv(self.fname)

Expand All @@ -305,7 +305,7 @@ class ParallelFactorize:
def setup(self, threads):
strings = Index([f"i-{i}" for i in range(100000)], dtype=object)

@test_parallel(num_threads=threads)
@run_parallel(num_threads=threads)
def parallel():
factorize(strings)

Expand Down
19 changes: 19 additions & 0 deletions asv_bench/benchmarks/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,25 @@ def time_frame(self, kind):
self.df.to_csv(self.fname)


class ToCSVFloatFormatVariants(BaseIO):
fname = "__test__.csv"

def setup(self):
self.df = DataFrame(np.random.default_rng(seed=42).random((1000, 1000)))

def time_old_style_percent_format(self):
self.df.to_csv(self.fname, float_format="%.6f")

def time_new_style_brace_format(self):
self.df.to_csv(self.fname, float_format="{:.6f}")

def time_new_style_thousands_format(self):
self.df.to_csv(self.fname, float_format="{:,.2f}")

def time_callable_format(self):
self.df.to_csv(self.fname, float_format=lambda x: f"{x:.6f}")


class ToCSVMultiIndexUnusedLevels(BaseIO):
fname = "__test__.csv"

Expand Down
5 changes: 4 additions & 1 deletion ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then

MSG='Python and Cython Doctests' ; echo "$MSG"
python -c 'import pandas as pd; pd.test(run_doctests=True)'
RET=$(($RET + $?)) ; echo "$MSG" "DONE"
# TEMP don't let doctests fail the build until all string dtype changes are fixed
# RET=$(($RET + $?)) ; echo "$MSG" "DONE"
echo "$MSG" "DONE"

fi

Expand All @@ -72,6 +74,7 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Series.dt PR01" `# Accessors are implemented as classes, but we do not document the Parameters section` \
-i "pandas.Period.freq GL08" \
-i "pandas.Period.ordinal GL08" \
-i "pandas.errors.IncompatibleFrequency SA01,SS06,EX01" \
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02" \
-i "pandas.core.groupby.SeriesGroupBy.plot PR02" \
-i "pandas.core.resample.Resampler.quantile PR01,PR07" \
Expand Down
6 changes: 3 additions & 3 deletions ci/deps/actions-310-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:

# required dependencies
- python-dateutil=2.8.2
- numpy=1.23.5
- numpy=1.26.0

# optional dependencies
- beautifulsoup4=4.12.3
Expand All @@ -41,7 +41,7 @@ dependencies:
- qtpy=2.3.0
- openpyxl=3.1.2
- psycopg2=2.9.6
- pyarrow=10.0.1
- pyarrow=12.0.1
- pyiceberg=0.7.1
- pymysql=1.1.0
- pyqt=5.15.9
Expand All @@ -62,4 +62,4 @@ dependencies:
- pip:
- adbc-driver-postgresql==0.10.0
- adbc-driver-sqlite==0.8.0
- tzdata==2022.7
- tzdata==2023.3
4 changes: 2 additions & 2 deletions ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dependencies:
- qtpy>=2.3.0
- openpyxl>=3.1.2
- psycopg2>=2.9.6
- pyarrow>=10.0.1
- pyarrow>=12.0.1
- pyiceberg>=0.7.1
- pymysql>=1.1.0
- pyqt>=5.15.9
Expand All @@ -60,4 +60,4 @@ dependencies:
- pip:
- adbc-driver-postgresql>=0.10.0
- adbc-driver-sqlite>=0.8.0
- tzdata>=2022.7
- tzdata>=2023.3
4 changes: 2 additions & 2 deletions ci/deps/actions-311-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- qtpy>=2.3.0
- openpyxl>=3.1.2
- psycopg2>=2.9.6
- pyarrow>=10.0.1
- pyarrow>=12.0.1
- pyiceberg>=0.7.1
- pymysql>=1.1.0
- pyqt>=5.15.9
Expand Down Expand Up @@ -73,4 +73,4 @@ dependencies:
- pip:
- adbc-driver-postgresql>=0.10.0
- adbc-driver-sqlite>=0.8.0
- tzdata>=2022.7
- tzdata>=2023.3
2 changes: 1 addition & 1 deletion ci/deps/actions-311-numpydev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ dependencies:
- "--extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
- "--pre"
- "numpy"
- "tzdata>=2022.7"
- "tzdata>=2023.3"
2 changes: 1 addition & 1 deletion ci/deps/actions-311-pyarrownightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:
- pip

- pip:
- "tzdata>=2022.7"
- "tzdata>=2023.3"
- "--extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
- "--prefer-binary"
- "--pre"
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- pyqt>=5.15.9
- openpyxl>=3.1.2
- psycopg2>=2.9.6
- pyarrow>=10.0.1
- pyarrow>=12.0.1
- pyiceberg>=0.7.1
- pymysql>=1.1.0
- pyreadstat>=1.2.6
Expand Down
4 changes: 2 additions & 2 deletions ci/deps/actions-312.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ dependencies:
- pyqt>=5.15.9
- openpyxl>=3.1.2
- psycopg2>=2.9.6
- pyarrow>=10.0.1
- pyarrow>=12.0.1
- pyiceberg>=0.7.1
- pymysql>=1.1.0
- pyreadstat>=1.2.6
Expand All @@ -60,4 +60,4 @@ dependencies:
- pip:
- adbc-driver-postgresql>=0.10.0
- adbc-driver-sqlite>=0.8.0
- tzdata>=2022.7
- tzdata>=2023.3
2 changes: 1 addition & 1 deletion ci/deps/actions-313-freethreading.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,5 @@ dependencies:
- pip:
# No free-threaded coveragepy (with the C-extension) on conda-forge yet
- pytest-cov
- "tzdata>=2022.7"
- tzdata>=2023.3
- "--extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
Loading
Loading