-
-
Notifications
You must be signed in to change notification settings - Fork 147
Closed
Description
Most of pandas functions that return a numpy array have a fixed and known shape. For example, index.str.contains
always returns a 1D-array of type np.bool
and df.to_numpy
always returns a 2D-array. Currently, the codebase doesn't type the shape of numpy arrays. Having the shape types is super useful as it allows type checkers to better infer certain types and IDEs to offer better autocompletion. Consider this example:
import numpy as np
import numpy.typing as npt
a_nd: npt.NDArray[np.float64] = np.array([1, 2], dtype=np.float64)
a_1d = a_nd.ravel() # annotated as returning 1D array in numpy
for x in a_nd:
reveal_type(x)
for y in a_1d:
reveal_type(y)
mypy outputs the following:
ex.py:7: note: Revealed type is "Any"
ex.py:9: note: Revealed type is "numpy.float64"
It correctly identified the type of the loop variable as float64
instead of Any
because it knows the array is 1D.
I am exploring adding shape types to the numpy interface of pandas-stubs and modifying the test infrastructure to validate the shape as well as the dtype of arrays.
Metadata
Metadata
Assignees
Labels
No labels