Skip to content

Conversation

allamlobna
Copy link

@allamlobna allamlobna commented Oct 10, 2025

Summary

This PR fixes an inconsistency where read_csv replaced empty
MultiIndex column level values with automatically generated
"Unnamed: x_level_y" placeholders.

Empty values are now preserved as empty strings (""),
matching MultiIndex index behavior and ensuring consistent
roundtrip results between to_csv and read_csv.

Changes

  • Added _clean_column_levels() helper to normalize MultiIndex
    column labels in BaseParser.
  • Updated _extract_multi_indexer_columns() to use it.
  • Adjusted test_multi_index_unnamed() expectations.
  • Added regression test for GH#59560 in test_header.py.
  • Added whatsnew entry under Bug Fixes → IO.

Impact

  • Aligns column + index behavior for MultiIndex CSVs.
  • No change for single-level headers.
  • Both C and Python parsers tested successfully.

@allamlobna allamlobna force-pushed the bugfix/clean-empty-vals-multiindex branch from 6a40ca3 to a399a8e Compare October 10, 2025 19:08
@allamlobna
Copy link
Author

allamlobna commented Oct 10, 2025

Hi, this PR currently fixes GH#59560, which addresses inconsistent handling of empty MultiIndex level values in read_csv.

Summary of changes:

  • Added _clean_column_levels() to normalize empty or automatically generated "Unnamed: x_level_y" placeholders when reading MultiIndex columns.

  • Hooked this into _extract_multi_indexer_columns() in the CSV parser so empty header cells are preserved as "" instead of "Unnamed:".

Observed side effect:
Because this normalization happens in shared column-cleaning logic, it’s also affecting other I/O readers which now return empty strings instead of "Unnamed:" placeholders.
This causes several tests in those modules to fail since they expect the old "Unnamed:" behavior.

Question for maintainers:
Would you prefer that I:

  • Limit the scope to CSV in this PR (restore previous behavior for Excel/HTML to keep this focused on GH#59560),

or

  • Extend the change across all I/O readers and update the corresponding parser tests for consistency in this same PR?

I’m happy to take either direction, just wanted to check what’s preferable from a review and scope standpoint.

@allamlobna allamlobna marked this pull request as draft October 10, 2025 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: inconsistency when read_csv reads MultiIndex with empty values

1 participant