Skip to content

Fix AsyncGroup.create_dataset() dtype handling and optimize tests #3050 #3059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/3050.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- Fixed potential error in `AsyncGroup.create_dataset()` where `dtype` argument could be missing when calling `create_array()`
102 changes: 68 additions & 34 deletions src/zarr/core/group.py
Original file line number Diff line number Diff line change
Expand Up @@ -1155,32 +1155,35 @@
# create_dataset in zarr 2.x requires shape but not dtype if data is
# provided. Allow this configuration by inferring dtype from data if
# necessary and passing it to create_array
if "dtype" not in kwargs and data is not None:
kwargs["dtype"] = data.dtype
if "dtype" not in kwargs:
if data is not None:
kwargs["dtype"] = data.dtype
else:
raise ValueError("dtype must be provided if data is None")

Check warning on line 1162 in src/zarr/core/group.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/core/group.py#L1162

Added line #L1162 was not covered by tests
array = await self.create_array(name, shape=shape, **kwargs)
if data is not None:
await array.setitem(slice(None), data)
return array

@deprecated("Use AsyncGroup.require_array instead.")
async def require_dataset(
@deprecated("Use Group.require_array instead.")
def require_dataset(
self,
name: str,
*,
shape: ChunkCoords,
shape: ShapeLike,
dtype: npt.DTypeLike = None,
exact: bool = False,
**kwargs: Any,
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
) -> Array:
"""Obtain an array, creating if it doesn't exist.

.. deprecated:: 3.0.0
The h5py compatibility methods will be removed in 3.1.0. Use `AsyncGroup.require_dataset` instead.
The h5py compatibility methods will be removed in 3.1.0. Use `Group.require_array` instead.

Arrays are known as "datasets" in HDF5 terminology. For compatibility
with h5py, Zarr groups also implement the :func:`zarr.AsyncGroup.create_dataset` method.
with h5py, Zarr groups also implement the :func:`zarr.Group.create_dataset` method.

Other `kwargs` are as per :func:`zarr.AsyncGroup.create_dataset`.
Other `kwargs` are as per :func:`zarr.Group.create_array`.

Parameters
----------
Expand All @@ -1189,16 +1192,16 @@
shape : int or tuple of ints
Array shape.
dtype : str or dtype, optional
NumPy dtype.
NumPy dtype. If None, the dtype will be inferred from the existing array.
exact : bool, optional
If True, require `dtype` to match exactly. If false, require
If True, require `dtype` to match exactly. If False, require
`dtype` can be cast from array dtype.

Returns
-------
a : AsyncArray
a : Array
"""
return await self.require_array(name, shape=shape, dtype=dtype, exact=exact, **kwargs)
return self.require_array(name, shape=shape, dtype=dtype, exact=exact, **kwargs)

Check warning on line 1204 in src/zarr/core/group.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/core/group.py#L1204

Added line #L1204 was not covered by tests

async def require_array(
self,
Expand All @@ -1211,7 +1214,7 @@
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
"""Obtain an array, creating if it doesn't exist.

Other `kwargs` are as per :func:`zarr.AsyncGroup.create_dataset`.
Other `kwargs` are as per :func:`zarr.AsyncGroup.create_array`.

Parameters
----------
Expand All @@ -1220,9 +1223,9 @@
shape : int or tuple of ints
Array shape.
dtype : str or dtype, optional
NumPy dtype.
NumPy dtype. If None, the dtype will be inferred from the existing array.
exact : bool, optional
If True, require `dtype` to match exactly. If false, require
If True, require `dtype` to match exactly. If False, require
`dtype` can be cast from array dtype.

Returns
Expand Down Expand Up @@ -2511,65 +2514,96 @@
.. deprecated:: 3.0.0
The h5py compatibility methods will be removed in 3.1.0. Use `Group.create_array` instead.


Arrays are known as "datasets" in HDF5 terminology. For compatibility
with h5py, Zarr groups also implement the :func:`zarr.Group.require_dataset` method.
with h5py, Zarr groups also implement the :func:`zarr.AsyncGroup.require_dataset` method.

Parameters
----------
name : str
Array name.
**kwargs : dict
Additional arguments passed to :func:`zarr.Group.create_array`
Additional arguments passed to :func:`zarr.AsyncGroup.create_array`.

Returns
-------
a : Array
a : AsyncArray
"""
return Array(self._sync(self._async_group.create_dataset(name, **kwargs)))

@deprecated("Use Group.require_array instead.")
def require_dataset(self, name: str, *, shape: ShapeLike, **kwargs: Any) -> Array:
def require_dataset(
self,
name: str,
*,
shape: ShapeLike,
dtype: npt.DTypeLike = None,
exact: bool = False,
**kwargs: Any,
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
"""Obtain an array, creating if it doesn't exist.

.. deprecated:: 3.0.0
The h5py compatibility methods will be removed in 3.1.0. Use `Group.require_array` instead.

Arrays are known as "datasets" in HDF5 terminology. For compatibility
with h5py, Zarr groups also implement the :func:`zarr.Group.create_dataset` method.
with h5py, Zarr groups also implement the :func:`zarr.AsyncGroup.create_dataset` method.

Other `kwargs` are as per :func:`zarr.Group.create_dataset`.
Other `kwargs` are as per :func:`zarr.AsyncGroup.create_array`.

Parameters
----------
name : str
Array name.
**kwargs :
See :func:`zarr.Group.create_dataset`.
shape : int or tuple of ints
Array shape.
dtype : str or dtype, optional
NumPy dtype. If None, the dtype will be inferred from the existing array.
exact : bool, optional
If True, require `dtype` to match exactly. If False, require
`dtype` can be cast from array dtype.

Returns
-------
a : Array
a : AsyncArray
"""
return Array(self._sync(self._async_group.require_array(name, shape=shape, **kwargs)))
return self.require_array(name, shape=shape, dtype=dtype, exact=exact, **kwargs)

Check warning on line 2569 in src/zarr/core/group.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/core/group.py#L2569

Added line #L2569 was not covered by tests

def require_array(self, name: str, *, shape: ShapeLike, **kwargs: Any) -> Array:
def require_array(
self,
name: str,
*,
shape: ShapeLike,
dtype: npt.DTypeLike = None,
exact: bool = False,
**kwargs: Any,
) -> Array:
"""Obtain an array, creating if it doesn't exist.

Other `kwargs` are as per :func:`zarr.Group.create_array`.
Other `kwargs` are as per :func:`zarr.AsyncGroup.create_array`.

Parameters
----------
name : str
Array name.
**kwargs :
See :func:`zarr.Group.create_array`.
shape : int or tuple of ints
Array shape.
dtype : str or dtype, optional
NumPy dtype. If None, the dtype will be inferred from the existing array.
exact : bool, optional
If True, require `dtype` to match exactly. If False, require
`dtype` can be cast from array dtype.

Returns
-------
a : Array
a : AsyncArray
"""
return Array(self._sync(self._async_group.require_array(name, shape=shape, **kwargs)))
return Array(

Check warning on line 2600 in src/zarr/core/group.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/core/group.py#L2600

Added line #L2600 was not covered by tests
self._sync(
self._async_group.require_array(
name, shape=shape, dtype=dtype, exact=exact, **kwargs
)
)
)

@_deprecate_positional_args
def empty(self, *, name: str, shape: ChunkCoords, **kwargs: Any) -> Array:
Expand Down Expand Up @@ -2918,7 +2952,7 @@
This function will parse its input to ensure that the hierarchy is complete. Any implicit groups
will be inserted as needed. For example, an input like
```{'a/b': GroupMetadata}``` will be parsed to
```{'': GroupMetadata, 'a': GroupMetadata, 'b': Groupmetadata}```
```{'': GroupMetadata, 'a': GroupMetadata, 'b': Groupmetadata}```.

After input parsing, this function then creates all the nodes in the hierarchy concurrently.

Expand Down