-
Notifications
You must be signed in to change notification settings - Fork 172
docs: API Completeness overhaul #3285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MarcoGorelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @FBruzzesi , I tried building this locally and it looks great!
my only comment would be: can we remove deprecated methods from the api completeness page? like LazyFrame.tail?
|
Thanks @MarcoGorelli - addressed the deprecated methods in 2823ebe While doing so, I noticed that we don't have a mention for I will wait for @dangotbanned opinion before merging ๐๐ผ |
|
@FBruzzesi thanks for the summons ๐, screenshot looks great! I need to take this for a spin locally so bear with me please! For now though, I have some general comments from looking at the diff:
|
|
Thanks for the perspective @dangotbanned Let me add a general comment that kind of addresses all the points below: I treated this as a one off effort, definitely I didn't invest the same time and effort as I would for the main codebase. I mostly focused on going from broke to working, even cutting corners when possible ๐
Do you have any idea on how to make it a guarantee? I see it hard to make it "stable forever" considering the underlying target might continuously change. I focus on avoiding false positive and false negative, which led to a lot of "exception" cases to handle. At least those are tracked now, but some depends on the current implementation in narwhals itself. I can add TODO comments based on potential/expected changes.
I honestly didn't even try to check/compare it with other scripts
That's fair, I will try to put some more engineering effort. Maybe Claude can come handy in refactoring a script like this |
|
(#3285 (comment)) Thanks for explaining @FBruzzesi
Well to make things easier, you'll be glad to hear that I'm not expecting a guarantee of forever-stability ๐ Important What I want to see is an effort to minimize the burden of keeping this in sync Before I suggest anything, I just wanna highlight this comment as I think we're viewing the issue of (#2858) quite differently:
In my mind, the API Completeness docs should be seen as one of the most important assets we have. |
SuggestionsThe public API is already defined and checked for documentation in (https://github.com/narwhals-dev/narwhals/blob/2823ebee644098c9200f1658888c522803240825/utils/check_api_reference.py) If we moved some of those utilities into somewhere we can import from, without an error like: from utils.check_api_reference import iter_api_reference_names
import narwhals as nw
frozenset(iter_api_reference_names(nw.DataFrame))
An exception has occurred, use %tb to see the full traceback.
SystemExit: 0Then we'd have a starting point like this, which is known to be in sync: frozenset({'clone',
'collect_schema',
'columns',
'drop',
'write_csv',
'write_parquet'})Next, since we want to know more about from narwhals._compliant import CompliantDataFrame
from typing_extensions import get_protocol_members
get_protocol_members(CompliantDataFrame)Show biiiiiiig output
frozenset({'__array__',
'__getitem__',
'__len__',
'__narwhals_dataframe__',
'__narwhals_namespace__',
'__native_namespace__',
'_implementation',
'_is_native',
'_native_frame',
'_version',
'_with_native',
'_with_version',
'clone',
'collect_schema',
'columns',
'drop',
'drop_nulls',
'estimated_size',
'explode',
'filter',
'from_arrow',
'from_dict',
'from_dicts',
'from_native',
'from_numpy',
'gather_every',
'get_column',
'group_by',
'head',
'is_unique',
'item',
'iter_columns',
'iter_rows',
'join',
'join_asof',
'lazy',
'native',
'pivot',
'rename',
'row',
'rows',
'sample',
'schema',
'select',
'shape',
'simple_select',
'sort',
'to_arrow',
'to_dict',
'to_narwhals',
'to_numpy',
'to_pandas',
'to_polars',
'unique',
'unpivot',
'with_columns',
'with_row_index',
'write_csv',
'write_parquet'})So now the problem is about comparing 2 sets of well-defined interfaces ๐ I'd say the next part would be looking at what you currently have defined as edge-cases and see if any patterns emerge.
|
|
Thanks @dangotbanned, we agree on everything you said in the last two comments. If I focus on:
I would say that the changes here makes it quite easy to:
which in my mind makes the what really aligned with the purpose we want to achieve.
This comment of mine was most likely taken to an extreme, which I hope it's not what the final result and implementation are reflecting. From your previous comment:
my understanding is that you are not convinced for the how this is implemented. I can try to propose a different approach, which is more OOP based, but for now I don't see a big gap in what we can achieve either way. |
Yeah sorry for not being clearer - I do like the output (what)! ๐ Forget that I mentioned OOP for now, here are two examples of structuring things differently, that I've worked on 1This module parses markdown docs and transforms the output into this python module 2This module parses json into an intermediate representation to generate typing and a few different summary formats Why is this relevant?
To anticipate that the input may change, the input is moved into an intermediate representation first, and then I transformed that more stable version into some output(s). This means upstream changes that break things should only impact one side of the problem. Why would more structure benefit this script?How much would break here following any of these changes?
From my perspective, I think we'd need to make changes all over the script ๐ค |
I agree this is something likable to have, although it's so simple to do these parsing that the need for a intermediate layer for me is just more complexity, almost for the sake of it. The next point is where we most likely disagree on:
Realistically switch one pointer. I am not sure how to answer this one, since it might depend on what the implementation entail. But it would be the same with whatever implementation we choose. Nothing to change for new methods, they are automatically picked up Simplify/remove one special case that's currently exists because the logic is entirely at the narwhals level Automatically picked up One thing to stress from my side: I would rather keep/maintain a mapping of {backend: non-implemented methods}, since in all honesty they are so few at the moment, than having to maintain a overly complex codebase. For instance, in the API reference check, we have a manual list of |
|
No worries @FBruzzesi, I'm happy to agree to disagree on this and move past it ๐ I'll try to take another look at the visual part of the PR today. Being able to filter the backend-axis the table (#2858 (comment)) was something that felt missing to me IIRC |
I hope that 44601f6 solves the last bit (as you might guess from the commit message, it wasn't me) |

What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Big overhaul for the API Completeness generation. I realized we have quite a few exceptional cases to deal with ๐
Now everything should render as expected but please double check. The top level functionalities through namespaces parsing are also added in a new page.
The javascript allows to have the tables to be sortable and with a free text filter by method name. Disclaimer: JS part is partially AI generated, partially from the mkdocs-material docs
screenshot
I tried using mkdocs-marimo, which is indeed great as tables come full of features out of the box and it was super easy to add the dropdown filter to select which backends to keep in the table. Yet: