Skip to content

Update frame.py #61912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update frame.py #61912

wants to merge 1 commit into from

Conversation

imramraja
Copy link

This PR adds documentation to DataFrame.to_parquet and pandas.read_parquet highlighting that DataFrame.attrs are preserved when using the "pyarrow" engine.

This behavior is already implemented in pandas/io/parquet/pyarrow.py, but was undocumented. This PR improves discoverability for users.

  • Added Notes section in both docstrings
  • (Optional) Will add test in follow-up if needed

First-time contributor 😊

Copy link
Member

@arthurlw arthurlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run

pre-commit run --all-files

in your terminal to catch and fix these before pushing. Let me know if you want help setting it up!

@@ -2958,7 +2958,8 @@ def to_parquet(
is expected and consistent with pandas' handling of categorical data.
To manage file size and ensure a more predictable roundtrip process,
consider using :meth:`Categorical.remove_unused_categories` on the
DataFrame before saving.
DataFrame before saving
* When using the "pyarrow" engine, `DataFrame.attrs` are stored as part of the file's metadata and restored on reading.
Copy link
Member

@arthurlw arthurlw Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing because this line exceeds the maximum allowed length. Try splitting it into two lines to satisfy the linter.

@arthurlw arthurlw added the Docs label Jul 21, 2025
@mroeschke
Copy link
Member

Thanks for the PR, but I don't think we need to necessarily document this so closing. Happy to have your contributions on other issues labeled good first issue

@mroeschke mroeschke closed this Jul 21, 2025
@jorisvandenbossche
Copy link
Member

I don't think we need to necessarily document this

Given that this is a behaviour that is rather unique to the parquet format (most other IO methods in pandas don't preserve attrs, I think?), and also something that differs between both engines, this seems worth mentioning in the docs?

@mroeschke mroeschke reopened this Jul 22, 2025
@mroeschke mroeschke added the IO Parquet parquet, feather label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Parquet parquet, feather
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants