Skip to content

ENH: Added DataFrame.nsorted to select top n rows according to column-dependent order #61457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

MartinBraquet
Copy link
Contributor

@MartinBraquet MartinBraquet commented May 19, 2025

PR is ready for review.

@MartinBraquet MartinBraquet requested a review from Dr-Irv as a code owner May 19, 2025 11:11
@MartinBraquet MartinBraquet changed the title Added :DataFrame.nsorted to select top n rows according to column-dependent order Added DataFrame.nsorted to select top n rows according to column-dependent order May 19, 2025
@MartinBraquet MartinBraquet changed the title Added DataFrame.nsorted to select top n rows according to column-dependent order ENH: Added DataFrame.nsorted to select top n rows according to column-dependent order May 19, 2025
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 19, 2025
@MartinBraquet
Copy link
Contributor Author

The PR has been waiting for reviewers since then.

@MartinBraquet
Copy link
Contributor Author

MartinBraquet commented Jul 20, 2025

@rhshadrach @snitish @Dr-Irv

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Some minor comments. Can you also add this API docs in doc/source/reference/frame.rst and series.rst.

Comment on lines 419 to 420
# Arguments for nsorted, nsmallest and nlargest.
NsmallestNlargestKeep = Literal["first", "last", "all"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove TypeAlias? Also, can you keep the style here consistent by not ending with a period .

Copy link
Contributor Author

@MartinBraquet MartinBraquet Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, good catch! Note to myself: add the type after extracting a constant in Pycharm, because the IDE won't do it for me (so far).

Comment on lines 7751 to 7752
This method is equivalent to
``df.nsorted(n, columns, ascending=False)``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this line was removed; including nsorted in the See Also section below seems sufficient.

Comment on lines 7895 to 7896
This method is equivalent to
``df.nsorted(n, columns, ascending=True)``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

n = self.n
dtype = self.obj.dtype
if not self.is_valid_dtype_n_method(dtype):
raise TypeError(f"Cannot use method '{method}' with dtype {dtype}")
raise TypeError(f"Cannot use n-sorting with dtype {dtype}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay but wonder if Cannot sort dtype {dtype} might be more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with your point. I updated the message.

Comment on lines +229 to +231
Series.nlargest
Series.nsmallest
Series.nsorted
Copy link
Contributor Author

@MartinBraquet MartinBraquet Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added nsorted and moved the sorting methods to the "Reshaping, sorting" section

Comment on lines +3843 to +3869
"""
Return the top `n` elements, sorted in ascending or descending order.

Parameters
----------
n : int, default 5
Return this many descending sorted values.
ascending : bool
If True, return the smallest `n` elements. If False, return
largest `n` elements.
keep : {'first', 'last', 'all'}, default 'first'
When there are duplicate values that cannot all fit in a
Series of `n` elements:

- ``first`` : return the first `n` occurrences in order
of appearance.
- ``last`` : return the last `n` occurrences in reverse
order of appearance.
- ``all`` : keep all occurrences. This can result in a Series of
size larger than `n`.

Returns
-------
Series
The `n` top values in the Series.

See Also
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since nsorted got added to the series API docs, I believe it requires its proper docstring as well. There is a lot of redundancy with nlargest and nsmallest. Let me know if it should be done differently.

@MartinBraquet
Copy link
Contributor Author

MartinBraquet commented Jul 22, 2025

I applied the comments and the checks passed, so the PR should be ready for re-review.

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small doc things I saw. Otherwise I'm OK if other maintainers are OK

MartinBraquet and others added 2 commits July 23, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Enable nsmallest/nlargest on object dtype
3 participants