Skip to content

Shape types for numpy arrays in return types #1324

@hamdanal

Description

@hamdanal

Most of pandas functions that return a numpy array have a fixed and known shape. For example, index.str.contains always returns a 1D-array of type np.bool and df.to_numpy always returns a 2D-array. Currently, the codebase doesn't type the shape of numpy arrays. Having the shape types is super useful as it allows type checkers to better infer certain types and IDEs to offer better autocompletion. Consider this example:

import numpy as np
import numpy.typing as npt

a_nd: npt.NDArray[np.float64] = np.array([1, 2], dtype=np.float64)
a_1d = a_nd.ravel()  # annotated as returning 1D array in numpy
for x in a_nd:
    reveal_type(x)
for y in a_1d:
    reveal_type(y)

mypy outputs the following:

ex.py:7: note: Revealed type is "Any"
ex.py:9: note: Revealed type is "numpy.float64"

It correctly identified the type of the loop variable as float64 instead of Any because it knows the array is 1D.

I am exploring adding shape types to the numpy interface of pandas-stubs and modifying the test infrastructure to validate the shape as well as the dtype of arrays.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions