Add asynchronous load method #10327

TomNicholas · 2025-05-16T16:05:49Z

Adds an .async_load() method to Variable, which works by plumbing async get_duck_array all the way down until it finally gets to the async methods zarr v3 exposes.

Needs a lot of refactoring before it could be merged, but it works.

Closes Add an asynchronous load method? #10326
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

API:

for more information, see https://pre-commit.ci

…zed indexing

for more information, see https://pre-commit.ci

xarray/namedarray/pycompat.py

…to async.load

for more information, see https://pre-commit.ci

TomNicholas · 2025-08-11T16:35:46Z

These test failures are actually non-deterministic. The test tries to do vectorized indexing, and expects an error about vectorized indexing not being supported to be raised. But the test sometimes fails because an error about orthogonal indexing not being supported is raised instead.

What seems to be happening is that for the exact same indexer, sometimes the indexing call goes through the vectorized indexing codepath first and sometimes it goes through the orthogonal indexing codepath first. I think in both cases it gives the same result, but the order of execution can differ.

This script replicates the behaviour on this branch. If you run it repeatedly you will find that the behaviour changes between runs, as the error raised is inconsistent.

#!/usr/bin/env python3
"""
Standalone reproducer for the flaky async test behavior.
"""

import asyncio
import xarray as xr
import zarr
from xarray.tests.test_dataset import create_test_data


async def test_flaky_behavior():
    """Reproduce the exact test scenario that shows flaky behavior."""
    
    # Create zarr store with format 3
    memorystore = zarr.storage.MemoryStore({})
    ds = create_test_data()
    ds.to_zarr(memorystore, zarr_format=3, consolidated=False)
    
    # Open the dataset
    ds = xr.open_zarr(memorystore, consolidated=False, chunks=None)
    
    # Create the exact same indexer as the failing test
    indexer = {
        "dim1": xr.Variable(data=[2, 3], dims="points"),
        "dim2": xr.Variable(data=[1, 3], dims="points"),
    }
    
    # Apply isel and try load_async
    try:
        await ds.isel(**indexer).load_async()
        print("ERROR: Should have raised NotImplementedError!")
    except NotImplementedError as e:
        error_msg = str(e)
        if "vectorized async indexing" in error_msg:
            print("VECTORIZED")
        elif "orthogonal async indexing" in error_msg:
            print("ORTHOGONAL")  
        else:
            print(f"OTHER: {error_msg}")


if __name__ == "__main__":
    asyncio.run(test_flaky_behavior())

This other script replicates similar behaviour on main. To reveal the use of different codepaths this second script requires inserting debugging print statements. If you run this repeatedly you will see the order of the print statements changes between runs.

#!/usr/bin/env python3
"""
Test sync .load() to see which indexing codepath is taken.
"""

import xarray as xr
import zarr
from xarray.tests.test_dataset import create_test_data


def test_sync_load():
    """Test with sync .load() instead of .load_async()"""
    
    # Create zarr store with format 3
    memorystore = zarr.storage.MemoryStore({})
    ds = create_test_data()
    ds.to_zarr(memorystore, zarr_format=3, consolidated=False)
    
    # Open the dataset
    ds = xr.open_zarr(memorystore, consolidated=False, chunks=None)
    
    # Create the exact same indexer as the failing test
    indexer = {
        "dim1": xr.Variable(data=[2, 3], dims="points"),
        "dim2": xr.Variable(data=[1, 3], dims="points"),
    }
    
    # Apply isel and load (sync)
    result = ds.isel(**indexer).load()
    print("SYNC_LOAD_COMPLETED")


if __name__ == "__main__":
    test_sync_load()

# need to add these debugging print statments
class ZarrArrayWrapper:
    def __getitem__(self, key):
        array = self._array
        if isinstance(key, indexing.BasicIndexer):
            print(f"DEBUG: SYNC BasicIndexer: {key}")
            method = self._getitem
        elif isinstance(key, indexing.VectorizedIndexer):
            print(f"DEBUG: SYNC VectorizedIndexer: {key}")
            method = self._vindex
        elif isinstance(key, indexing.OuterIndexer):
            print(f"DEBUG: SYNC OuterIndexer: {key}")
            method = self._oindex

I think this is somehow to do with variable or indexer ordering not being deterministic - which could be due to use of dicts internally perhaps?

I can hide this weirdness by simply changing my test to be happy with either error. But I don't know if this is indicative of a bug that needs to be fixed.

keewis · 2025-08-11T16:51:15Z

which could be due to use of dicts internally perhaps?

dict is deterministic since python 3.7, what you're looking for is set.

Either way, the decision on whether or not to use basic, orthogonal, or vectorized indexing depends on the types of indexers you pass to. According to

xarray/xarray/core/variable.py

Lines 661 to 672 in 54ac2fe

    
           dims = [] 
        
           for k, d in zip(key, self.dims, strict=True): 
        
               if isinstance(k, Variable): 
        
                   if len(k.dims) > 1: 
        
                       return self._broadcast_indexes_vectorized(key) 
        
                   dims.append(k.dims[0]) 
        
               elif not isinstance(k, integer_types): 
        
                   dims.append(d) 
        
           if len(set(dims)) == len(dims): 
        
               return self._broadcast_indexes_outer(key) 
        
           return self._broadcast_indexes_vectorized(key)

the presence of two variable indexers with a single, common dimension should go into _broadcast_indexes_vectorized, which should not return outer indexers.

TomNicholas · 2025-08-11T16:55:09Z

the decision on whether or not to use basic, orthogonal, or vectorized indexing depends on the types of indexers you pass to.

I'm passing exactly the same indexers every time.

the presence of two variable indexers with a single, common dimension should go into _broadcast_indexes_vectorized, which should not return outer indexers.

It should, but apparently it doesn't always! If you run either of those scripts, you will see OuterIndexers are being created.

TomNicholas · 2025-08-11T16:59:30Z

changing my test to be happy with either error

As I thought, with this change applied (in a7918e4) now everything seems to be passing. (I don't think the warnings causing readthedocs or the upstream mypy failures are anything to do with this PR)

keewis · 2025-08-11T17:11:45Z

I think I figured out why: create_test_data creates a dataset that has three data variables, two of which do not have both indexed dims. Thus, if these variables are indexed first you get the orthogonal index error (indexing along one dim is always basic or orthogonal indexing), while if the other variable is indexed first you get the vectorized index error.

TomNicholas · 2025-08-11T17:57:23Z

I think I figured out why: create_test_data creates a dataset that has three data variables, two of which do not have both indexed dims. Thus, if these variables are indexed first you get the orthogonal index error (indexing along one dim is always basic or orthogonal indexing), while if the other variable is indexed first you get the vectorized index error.

Riiiiiight, thank you.

So actually there's another way for me to dodge this problem in my test: just index into a single Variable instead of into a Dataset. Then there can't be a race condition between variables.

keewis · 2025-08-11T17:59:12Z

you can also use a single-variable dataset, but yeah, that would eliminate the issue

…etween different variables

TomNicholas and others added 21 commits October 24, 2024 17:48

new blank whatsnew

01e7518

Merge branch 'main' of https://github.com/pydata/xarray

83e553b

Merge branch 'main' of https://github.com/pydata/xarray

e44326d

Merge branch 'main' of https://github.com/pydata/xarray

4e4eeb0

Merge branch 'main' of https://github.com/pydata/xarray

d858059

Merge branch 'main' of https://github.com/pydata/xarray

d377780

Merge branch 'main' of https://github.com/pydata/xarray

3132f6a

Merge branch 'main' of https://github.com/pydata/xarray

900eef5

Merge branch 'main' of https://github.com/pydata/xarray

4c4462f

Merge branch 'main' of https://github.com/pydata/xarray

5b9b749

Merge branch 'main' of https://github.com/pydata/xarray

fadb953

Merge branch 'main' of https://github.com/TomNicholas/xarray

57d9d23

Merge branch 'main' of https://github.com/pydata/xarray

11170fc

Merge branch 'main' of https://github.com/pydata/xarray

0b8fa41

Merge branch 'main' of https://github.com/pydata/xarray

f769f85

Merge branch 'main' of https://github.com/pydata/xarray

4eef318

Merge branch 'main' of https://github.com/pydata/xarray

29242a4

test async load using special zarr LatencyStore

e6b3b3b

don't use dask

3ceab60

async all the way down

071c35a

remove assert False

29374f9

TomNicholas added the enhancement label May 16, 2025

github-actions bot added topic-backends topic-indexing topic-documentation topic-zarr Related to zarr storage library io topic-NamedArray Lightweight version of Variable labels May 16, 2025

pre-commit-ci bot and others added 2 commits May 16, 2025 16:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

ab12bb8

for more information, see https://pre-commit.ci

add pytest-asyncio to CI envs

62aa39d

TomNicholas and others added 8 commits August 4, 2025 22:05

test correct error message is raised for each indexing case

df09780

ensure each test runs on the earliest version of xaarr it can

84f8e30

remove pointless repeated getitem

19090b0

set N_LAZY_VARS correctly in test

49416db

remove unused import

2ed8455

rename flag to make it more clear its only for orthogonal and vectori…

a8a2860

…zed indexing

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef6afdf

for more information, see https://pre-commit.ci

remove IndexingAdapter special case

de98308

dcherian reviewed Aug 6, 2025

View reviewed changes

xarray/namedarray/pycompat.py Outdated Show resolved Hide resolved

dcherian and others added 6 commits August 6, 2025 08:53

type fixes

e32ea13

return a deepcopy

da2d43c

Merge branch 'async.load' of https://github.com/TomNicholas/xarray in…

ac3127f

…to async.load

try again

d46fc3f

one more

cc253c7

Try again

78c9116

This was referenced Aug 9, 2025

open_dataset creates default indexes sequentially, causing significant latency in cloud high-latency stores #10579

Open

How should Xarray control asynchronous calls? #10622

Open

TomNicholas and others added 4 commits August 11, 2025 13:31

try fixing _in_memory error by not returning the adapter class

a727ecb

Merge branch 'main' into async.load

959edc2

[pre-commit.ci] auto fixes from pre-commit.com hooks

9b7afc2

for more information, see https://pre-commit.ci

remove scope=module from fixture for robustness

b4ef26f

modify test to be happy with either error message

a7918e4

use Variable instead of Dataset to avoid race condition of indexing b…

199d50a

…etween different variables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add asynchronous load method #10327

Add asynchronous load method #10327

TomNicholas commented May 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

TomNicholas commented Aug 11, 2025 •

edited

Loading

Uh oh!

keewis commented Aug 11, 2025

Uh oh!

TomNicholas commented Aug 11, 2025

Uh oh!

TomNicholas commented Aug 11, 2025 •

edited

Loading

Uh oh!

keewis commented Aug 11, 2025 •

edited

Loading

Uh oh!

TomNicholas commented Aug 11, 2025 •

edited by keewis

Loading

Uh oh!

keewis commented Aug 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add asynchronous load method #10327

Are you sure you want to change the base?

Add asynchronous load method #10327

Conversation

TomNicholas commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TomNicholas commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Aug 11, 2025

Uh oh!

TomNicholas commented Aug 11, 2025

Uh oh!

TomNicholas commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomNicholas commented Aug 11, 2025 • edited by keewis Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TomNicholas commented May 16, 2025 •

edited

Loading

TomNicholas commented Aug 11, 2025 •

edited

Loading

TomNicholas commented Aug 11, 2025 •

edited

Loading

keewis commented Aug 11, 2025 •

edited

Loading

TomNicholas commented Aug 11, 2025 •

edited by keewis

Loading

keewis commented Aug 11, 2025 •

edited

Loading