docs: API Completeness overhaul #3285

FBruzzesi · 2025-11-06T16:41:58Z

What type of PR is this? (check all applicable)

Related issues

Closes docs: API Completeness page #2858

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

Big overhaul for the API Completeness generation. I realized we have quite a few exceptional cases to deal with 🙈
Now everything should render as expected but please double check. The top level functionalities through namespaces parsing are also added in a new page.

The javascript allows to have the tables to be sortable and with a free text filter by method name. Disclaimer: JS part is partially AI generated, partially from the mkdocs-material docs

screenshot

I tried using mkdocs-marimo, which is indeed great as tables come full of features out of the box and it was super easy to add the dropdown filter to select which backends to keep in the table. Yet:

docs build time went from 3 seconds to 45-50 seconds and I had no patience for that
the table rendering was a bit off, and not as clean as in marimo notebooks themselves

MarcoGorelli

thanks @FBruzzesi , I tried building this locally and it looks great!

my only comment would be: can we remove deprecated methods from the api completeness page? like LazyFrame.tail?

FBruzzesi · 2025-11-07T09:36:38Z

Thanks @MarcoGorelli - addressed the deprecated methods in 2823ebe

While doing so, I noticed that we don't have a mention for LazyFrame.tail in the main vs stable.v1 differences section.

I will wait for @dangotbanned opinion before merging 🙏🏼

dangotbanned · 2025-11-07T12:23:24Z

@FBruzzesi thanks for the summons 😉, screenshot looks great!

I need to take this for a spin locally so bear with me please!

For now though, I have some general comments from looking at the diff:

AFAICT, there isn't anything here to ensure the script stays in sync?
i. Fixing that part of (docs: API Completeness page #2858) like a one-off event and hoping we don't regress seems ... hopeful?
It looks like there's multiple reimplementations of things from (https://github.com/narwhals-dev/narwhals/blob/2823ebee644098c9200f1658888c522803240825/utils/check_api_reference.py)
i. Can anything be shared/reused?
I know OOP gets a bad rep in data circles 😂, but since the topic is what do these classes do?, would it not be appropriate here?
i. E.g. there are loads of functions that have a docstring and comments on every line (🙄✨?).
Future you (and current me 😉) will thank you if the code explains itself 🙏

FBruzzesi · 2025-11-07T13:27:12Z

Thanks for the perspective @dangotbanned

Let me add a general comment that kind of addresses all the points below: I treated this as a one off effort, definitely I didn't invest the same time and effort as I would for the main codebase. I mostly focused on going from broke to working, even cutting corners when possible 🙈

AFAICT, there isn't anything here to ensure the script stays in sync?
i. Fixing that part of (docs: API Completeness page #2858) like a one-off event and hoping we don't regress seems ... hopeful?

Do you have any idea on how to make it a guarantee? I see it hard to make it "stable forever" considering the underlying target might continuously change. I focus on avoiding false positive and false negative, which led to a lot of "exception" cases to handle. At least those are tracked now, but some depends on the current implementation in narwhals itself. I can add TODO comments based on potential/expected changes.

It looks like there's multiple reimplementations of things from (https://github.com/narwhals-dev/narwhals/blob/2823ebee644098c9200f1658888c522803240825/utils/check_api_reference.py)
i. Can anything be shared/reused?

I honestly didn't even try to check/compare it with other scripts

I know OOP gets a bad rep in data circles 😂, but since the topic is what do these classes do?, would it not be appropriate here?
i. E.g. there are loads of functions that have a docstring and comments on every line (🙄✨?).
Future you (and current me 😉) will thank you if the code explains itself 🙏

That's fair, I will try to put some more engineering effort. Maybe Claude can come handy in refactoring a script like this

dangotbanned · 2025-11-08T16:27:43Z

(#3285 (comment)) Thanks for explaining @FBruzzesi

Do you have any idea on how to make it a guarantee?
I see it hard to make it "stable forever" considering the underlying target might continuously change.

Well to make things easier, you'll be glad to hear that I'm not expecting a guarantee of forever-stability 😄

Important

What I want to see is an effort to minimize the burden of keeping this in sync

Before I suggest anything, I just wanna highlight this comment as I think we're viewing the issue of (#2858) quite differently:

I treated this as a one off effort, definitely I didn't invest the same time and effort as I would for the main codebase. I mostly focused on going from broke to working, even cutting corners when possible

In my mind, the API Completeness docs should be seen as one of the most important assets we have.
If a package is considering adopting narwhals - this might be the last thing they see before deciding they can't make the switch.
So if what's documented isn't accurate - then all the effort you (obviously) put into other areas of narwhals may not get the attention it deserves ❤️

dangotbanned · 2025-11-08T16:28:29Z

Suggestions

The public API is already defined and checked for documentation in (https://github.com/narwhals-dev/narwhals/blob/2823ebee644098c9200f1658888c522803240825/utils/check_api_reference.py)

If we moved some of those utilities into somewhere we can import from, without an error like:

from utils.check_api_reference import iter_api_reference_names
import narwhals as nw

frozenset(iter_api_reference_names(nw.DataFrame))
An exception has occurred, use %tb to see the full traceback.

SystemExit: 0

Then we'd have a starting point like this, which is known to be in sync:

frozenset({'clone',
           'collect_schema',
           'columns',
           'drop',

           'write_csv',
           'write_parquet'})

Next, since we want to know more about Compliant* implementations, we can compare these names against what is defined in the Compliant* protocol:

from narwhals._compliant import CompliantDataFrame
from typing_extensions import get_protocol_members

get_protocol_members(CompliantDataFrame)

Show biiiiiiig output

frozenset({'__array__',
           '__getitem__',
           '__len__',
           '__narwhals_dataframe__',
           '__narwhals_namespace__',
           '__native_namespace__',
           '_implementation',
           '_is_native',
           '_native_frame',
           '_version',
           '_with_native',
           '_with_version',
           'clone',
           'collect_schema',
           'columns',
           'drop',
           'drop_nulls',
           'estimated_size',
           'explode',
           'filter',
           'from_arrow',
           'from_dict',
           'from_dicts',
           'from_native',
           'from_numpy',
           'gather_every',
           'get_column',
           'group_by',
           'head',
           'is_unique',
           'item',
           'iter_columns',
           'iter_rows',
           'join',
           'join_asof',
           'lazy',
           'native',
           'pivot',
           'rename',
           'row',
           'rows',
           'sample',
           'schema',
           'select',
           'shape',
           'simple_select',
           'sort',
           'to_arrow',
           'to_dict',
           'to_narwhals',
           'to_numpy',
           'to_pandas',
           'to_polars',
           'unique',
           'unpivot',
           'with_columns',
           'with_row_index',
           'write_csv',
           'write_parquet'})

So now the problem is about comparing 2 sets of well-defined interfaces 🙂

I'd say the next part would be looking at what you currently have defined as edge-cases and see if any patterns emerge.
Keeping in mind that:

The size of these APIs is expected to grow
There are sub-protocols and concrete classes implementing them
A certain BDFL is prone to large refactors of the internals 😉

FBruzzesi · 2025-11-08T18:00:09Z

Thanks @dangotbanned, we agree on everything you said in the last two comments.

If I focus on:

What I want to see is an effort to minimize the burden of keeping this in sync

I would say that the changes here makes it quite easy to:

extend special cases
keep almost everything else in sync

which in my mind makes the what really aligned with the purpose we want to achieve.

I treated this as a one off effort, definitely I didn't invest the same time and effort as I would for the main codebase. I mostly focused on going from broke to working, even cutting corners when possible

This comment of mine was most likely taken to an extreme, which I hope it's not what the final result and implementation are reflecting.

From your previous comment:

I know OOP gets a bad rep in data circles 😂, but since the topic is what do these classes do?, would it not be appropriate here?

my understanding is that you are not convinced for the how this is implemented.

I can try to propose a different approach, which is more OOP based, but for now I don't see a big gap in what we can achieve either way.

dangotbanned · 2025-11-09T12:22:13Z

my understanding is that you are not convinced for the how this is implemented.

Yeah sorry for not being clearer - I do like the output (what)! 😄

Forget that I mentioned OOP for now, here are two examples of structuring things differently, that I've worked on

1

This module parses markdown docs and transforms the output into this python module

2

This module parses json into an intermediate representation to generate typing and a few different summary formats

Why is this relevant?

They operate on input outside of their control
The input is expected to change
They need to handle lots of hairy edge cases

To anticipate that the input may change, the input is moved into an intermediate representation first, and then I transformed that more stable version into some output(s).

This means upstream changes that break things should only impact one side of the problem.
Even within the first translation - I tried to generalize edge cases as much as possible, such that you will rarely see conditions comparing a value to a string defined inline.

Why would more structure benefit this script?

How much would break here following any of these changes?
Where would these changes need to be made?
Would all of the current edge casing still make sense when the underlying backends have changed?

From my perspective, I think we'd need to make changes all over the script 🤔

FBruzzesi · 2025-11-09T13:47:21Z

To anticipate that the input may change, the input is moved into an intermediate representation first, and then I transformed that more stable version into some output(s).
This means upstream changes that break things should only impact one side of the problem. Even within the first translation - I tried to generalize edge cases as much as possible, such that you will rarely see conditions comparing a value to a string defined inline.

I agree this is something likable to have, although it's so simple to do these parsing that the need for a intermediate layer for me is just more complexity, almost for the sake of it.

The next point is where we most likely disagree on:

From my perspective, I think we'd need to make changes all over the script 🤔

How much would break here following any of these changes? Where would these changes need to be made? Would all of the current edge casing still make sense when the underlying backends have changed?

api: remove most (compliant) Series methods, just re-use to_frame/select #3174

Realistically switch one pointer. I am not sure how to answer this one, since it might depend on what the implementation entail. But it would be the same with whatever implementation we choose.

api: move filter from CompliantSeries to CompliantExpr #3216

Nothing to change for new methods, they are automatically picked up

API: io functions for v2 #2116

Simplify/remove one special case that's currently exists because the logic is entirely at the narwhals level

[Enh]: Support for narwhals.Expr.hist #1561

Automatically picked up

One thing to stress from my side: I would rather keep/maintain a mapping of {backend: non-implemented methods}, since in all honesty they are so few at the moment, than having to maintain a overly complex codebase. For instance, in the API reference check, we have a manual list of SERIES_ONLY methods, I don't think we have many more "not implemented" methods to track. That said, the only way to know it to try it out

dangotbanned · 2025-11-10T11:35:15Z

(#3285 (comment))

No worries @FBruzzesi, I'm happy to agree to disagree on this and move past it 😄

I'll try to take another look at the visual part of the PR today.

Being able to filter the backend-axis the table (#2858 (comment)) was something that felt missing to me IIRC

FBruzzesi · 2025-11-10T16:07:35Z

I'll try to take another look at the visual part of the PR today.

Being able to filter the backend-axis the table (#2858 (comment)) was something that felt missing to me IIRC

I hope that 44601f6 solves the last bit (as you might guess from the commit message, it wasn't me)

screenshot

FBruzzesi added 6 commits November 5, 2025 22:36

WIP

8fc558c

eager only, lazy only split

47e12b5

add top level functions

e14b0de

a lot more hacking

725c94d

refactor

fc87536

sortable and filter-able

e2d1a0f

FBruzzesi added the documentation Improvements or additions to documentation label Nov 6, 2025

fix hook

35f97d2

MarcoGorelli approved these changes Nov 7, 2025

View reviewed changes

rm deprecated methods

2823ebe

dangotbanned self-requested a review November 7, 2025 09:37

FBruzzesi added 2 commits November 7, 2025 22:36

better docstrings

021d8a6

is_eager_allowed property

2e06a76

FBruzzesi added 2 commits November 10, 2025 17:01

merge main

9892532

Claude: Backend multi-selector dropdown

44601f6

rm excessive whitespace

384c2e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: API Completeness overhaul #3285

docs: API Completeness overhaul #3285

Uh oh!

FBruzzesi commented Nov 6, 2025 •

edited

Loading

Uh oh!

MarcoGorelli left a comment

Uh oh!

FBruzzesi commented Nov 7, 2025

Uh oh!

dangotbanned commented Nov 7, 2025

Uh oh!

FBruzzesi commented Nov 7, 2025

Uh oh!

dangotbanned commented Nov 8, 2025

Uh oh!

dangotbanned commented Nov 8, 2025

Uh oh!

FBruzzesi commented Nov 8, 2025

Uh oh!

dangotbanned commented Nov 9, 2025

Uh oh!

FBruzzesi commented Nov 9, 2025 •

edited

Loading

Uh oh!

dangotbanned commented Nov 10, 2025

Uh oh!

FBruzzesi commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

docs: API Completeness overhaul #3285

Are you sure you want to change the base?

docs: API Completeness overhaul #3285

Uh oh!

Conversation

FBruzzesi commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

FBruzzesi commented Nov 7, 2025

Uh oh!

dangotbanned commented Nov 7, 2025

Uh oh!

FBruzzesi commented Nov 7, 2025

Uh oh!

dangotbanned commented Nov 8, 2025

Uh oh!

dangotbanned commented Nov 8, 2025

Suggestions

Uh oh!

FBruzzesi commented Nov 8, 2025

Uh oh!

dangotbanned commented Nov 9, 2025

1

2

Why is this relevant?

Why would more structure benefit this script?

Uh oh!

FBruzzesi commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dangotbanned commented Nov 10, 2025

Uh oh!

FBruzzesi commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FBruzzesi commented Nov 6, 2025 •

edited

Loading

FBruzzesi commented Nov 9, 2025 •

edited

Loading

FBruzzesi commented Nov 10, 2025 •

edited

Loading