Skip to content

Reshape codec #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Reshape codec #10

wants to merge 1 commit into from

Conversation

jbms
Copy link
Contributor

@jbms jbms commented Apr 30, 2025

No description provided.

@jbms
Copy link
Contributor Author

jbms commented Apr 30, 2025

@ldeakan this is an alternative to the previously discussed squeeze that can accomplish the same thing but is more flexible. What do you think?

It should provide a lot of flexibility for using image codecs and zfp with zarr arrays containing an arbitrary number of dimensions.

@jbms jbms marked this pull request as draft April 30, 2025 19:10
@normanrz
Copy link
Member

normanrz commented May 5, 2025

Tagging @LDeakin again, because the linking didn't seem to work.

@LDeakin
Copy link
Member

LDeakin commented May 6, 2025

It looks pretty reasonable to me, but I could give you a more definitive review if I get around to implementing it. @jbms do you intend to support this in tensorstore / neuroglancer?

@normanrz Should extensions have an implementation before they get merged in this repo?

@jbms
Copy link
Contributor Author

jbms commented May 6, 2025

I do intend to implement it in tensorstore and neuroglancer. I've started the tensorstore implementation.

There are some tradeoffs I made in designing this:

  • Allowing dimensions to be specified indirectly (i.e. as the product of one or more input dimensions, or -1 to mean all remaining) is critical to allow it to work with variable chunking. It adds a bit of complexity in resolving the shape but it doesn't affect anything else.
  • I think it will often be useful to combine this with transpose. If all elements of the shape had to be specified as an array of input dims (no explicit size or -1) then it would be natural to allow this codec to also perform transposing. But we already have transpose for transposing and then it wouldn't support cases that require an explicit size for some dimension (though I can't really think of a lot of clear use cases for that).
  • For implementations like zarrs that always use a fixed c/Fortran memory layout non-partial encoding and decoding should be trivial. On the other hand, when combined with transpose there may be unnecessary copying. When allowing arbitrary strided layouts as in tensorstore (and I think zarr-python), the non-partial encoding and decoding is more complicated -- some cases can be handled without copying, some cases can't.

@jbms
Copy link
Contributor Author

jbms commented May 6, 2025

I will indeed implement it before switching to non-draft but also wanted to get feedback on it.

@normanrz
Copy link
Member

normanrz commented May 6, 2025

@normanrz Should extensions have an implementation before they get merged in this repo?

There is no strict requirement to have an implementation when registering the name. It is recommended, though.

`prod(B_shape[:i]) == prod(A_shape[input_dims[0]])` and
`prod(B_shape[i+1:]) == prod(A_shape[input_dims[k-1]+1:])`.

This two constraints ensure that if the size of output dimension `i` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This two constraints ensure that if the size of output dimension `i` is
These two constraints ensure that if the size of output dimension `i` is

Comment on lines +70 to +71
For example, the array metadata below specifies that the compressor is the Zstd
codec configured with a compression level of 1 and with the checksum stored:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this description does not match the JSON

specified above.

As this codec does NOT alter the lexicographical order of elements, the contents
of the output array `B` is related to the contents of the input array `A` by:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of the output array `B` is related to the contents of the input array `A` by:
of the output array `B` are related to the contents of the input array `A` by:

@d-v-b d-v-b mentioned this pull request May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants