Skip to content

Reserve characters in names #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

mkitti
Copy link

@mkitti mkitti commented Apr 17, 2025

This PR reserves characters in registered names.

The intention is to reserve characters to allow for namespace or a specific URI schemes in the future.

The reservations include all URI reserved characters as in RFC 3986.

They additionally include ".", "-", and "~". Underscores are allowed since sharding_index is already used.

xref: zarr-developers/zarr-specs#330 (comment)

Capitalization should be considered in a future pull request. I recommend reserving all capital letters.

mkitti added 2 commits April 17, 2025 10:28
This reserves characters in registered names.

The intention is to reserve characters to allow for namespace or a specific URI schemes in the future.
@normanrz
Copy link
Member

#2 introduces some codecs with . in their name. This is currently in use in zarr-python. Either we say . is a namespace delimiter and we assign the numcodecs prefix or we remove . from the reserved list, here.

Most of these characters are already prohibited by the ^[a-z][a-z0-9-_.]+$ regex in the spec: https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#id30

@mkitti
Copy link
Author

mkitti commented Apr 17, 2025

At the moment, I would lean towards the most conservative option possible.

For names with no prefix, I would tend to reserve . here so that we can use that to indicate prefixes.

Would those numcodecs extensions be registered here as well or do we delegate that namespace to be handled within the numcodecs repository?

I'm also a bit worried now about disambiguation with numcodecs.js versus the Python package.

@mkitti
Copy link
Author

mkitti commented Apr 27, 2025

I added a section on namespaces.

Namespaces

Namespaces MAY precede an extension name or other namespace prefixes. The character . MUST be used to delimit namespaces from each other and from the extension names. The primary use of namespaces is to disambiguate extensions with similar names which may differ in implementation or metadata. Extensions SHOULD register without a namespace if there is no amibiguity or difference in implementation from another extension of a similar name or use an existing namespace if applicable.

I also removed - as a reserved character since #5 is trying to use them. However, I am not sure why - is being used. I think we should discourage the use of - so that identifiers in common programming languages can match extension names exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants