Implement CBOR export to diagnostic notation

In CBOR codec `src/cbor/` implement export to a `string` CBRO Extended Diagnostic Notation. See specification:

---

# CBOR Extended Diagnostic Notation (EDN)

Abstract

   This document formalizes and consolidates the definition of the
   Extended Diagnostic Notation (EDN) of the Concise Binary Object
   Representation (CBOR), addressing implementer experience.

   Replacing EDN's previous informal descriptions, it updates RFC 8949,
   obsoleting its Section 8, and RFC 8610, obsoleting its Appendix G.

   It also specifies and uses registry-based extension points, using one
   to support text representations of epoch-based dates/times and of IP
   addresses and prefixes.

2.  Overview over CBOR Extended Diagnostic Notation (EDN)

   CBOR is a binary interchange format.  To facilitate documentation and
   debugging, and in particular to facilitate communication between
   entities cooperating in debugging, this document defines a simple
   human-readable diagnostic notation.  All actual interchange always
   happens in the binary format.

   Note that diagnostic notation truly was designed as a diagnostic
   format; it originally was not meant to be parsed.  Therefore, no
   formal definition (as in ABNF) was given in the original documents.
   Recognizing that formal grammars can aid interoperation of tools and
   usability of documents that employ EDN, Section 5 now provides ABNF
   definitions.

   EDN is a true superset of JSON as it is defined in [STD90] in
   conjunction with [RFC7493] (that is, any interoperable [RFC7493] JSON
   text also is an EDN text), extending it both to cover the greater
   expressiveness of CBOR and to increase its usability.

   EDN borrows the JSON syntax for numbers (integer and floating-point,
   Section 2.4), certain simple values (Section 2.8), UTF-8 [STD63] text
   strings, arrays, and maps (maps are called objects in JSON; the
   diagnostic notation extends JSON here by allowing any data item in
   the map key position).

   As EDN is used for truly diagnostic purposes, its implementations MAY
   support generation and possibly ingestion of EDN for CBOR data items
   that are well-formed but not valid.  It is RECOMMENDED that an
   implementation enables such usage only explicitly by configuration
   (such as an API or CLI flag).  Validity of CBOR data items is
   discussed in Section 5.3 of RFC 8949 [STD94], with basic validity
   discussed in Section 5.3.1 of RFC 8949 [STD94], and tag validity
   discussed in Section 5.3.2 of RFC 8949 [STD94].  Tag validity is more
   likely a subject for individual application-oriented extensions,
   while the two cases of basic validity (for text strings and for maps)
   are addressed in Sections 2.5.7 and 2.6.2 under the heading of
   _validity_.

   The rest of this section provides an overview over specific features
   of EDN, starting with certain common syntactical features and then
   going through kinds of CBOR data items roughly in the order of CBOR
   major types.  Any additional detailed syntax discussion needed has
   been deferred to Section 5.1.


2.3.  Encoding Indicators

   Sometimes it is useful to indicate in the diagnostic notation which
   of several alternative representations were actually used; for
   example, a data item written »1.5« by a diagnostic decoder might have
   been encoded as a half-, single-, or double-precision float.

   The convention for encoding indicators is that anything starting with
   an underscore and all immediately following characters that are
   alphanumeric or underscore is an encoding indicator, and can be
   ignored by anyone not interested in this information.  For example, _
   or _3.

   Encoding indicators are always optional.

   Encoding indicators are placed immediately to the right of the data
   item or of a syntactic feature that can stand for the data item the
   encoding of which the encoding indicator is controlling.  Table 1
   provides examples for encoding indicators used with various kinds of
   data items.

                       +====+=====================+
                       | mt | examples            |
                       +====+=====================+
                       | 0  | 1_1, 0x4711_3       |
                       +----+---------------------+
                       | 1  | -1_1                |
                       +----+---------------------+
                       | 2  | 'A'_1               |
                       +----+---------------------+
                       | 3  | "A"_1               |
                       +----+---------------------+
                       | 4  | [_1 "bar"]          |
                       +----+---------------------+
                       | 5  | {_1 "bar": 1}       |
                       +----+---------------------+
                       | 6  | 1_1(4711)           |
                       +----+---------------------+
                       | 7  | 1.5_2, 0x4711p+03_3 |
                       +----+---------------------+

                           Table 1: Examples of
                         Encoding Indicators for
                         Different Data Items (mt
                              = major type)

   (In the following, an abbreviation of the form ai=nn gives nn as the
   numeric value of the field _additional information_, the low-order 5
   bits of the initial byte: see Section 3 of RFC 8949 [STD94].  This
   field is used in encoding the "argument", i.e., the value, tag, or
   length; ai=0 to ai=23 mean that the value of the ai field immediately
   _is_ the argument, ai=24 to ai=27 mean that the argument is carried
   in 2^(ai-24) (1, 2, 4, or 8) additional bytes, and ai=31 means that
   indefinite length encoding is used.)

   An underscore followed by a decimal digit n indicates that the
   preceding item (or, for arrays and maps, the item starting with the
   preceding bracket or brace) was encoded with an additional
   information value of ai=24+n.  For example, 1.5_1 is a half-precision
   floating-point number (2^1 = 2 additional bytes or 16 bits), while
   1.5_3 is encoded as double precision (2^3 = 8 additional bytes or 64
   bits).


   The encoding indicator _ is an abbreviation of what would in full
   form be _7, which is not used.  Therefore, an underscore _ on its own
   stands for indefinite length encoding (ai=31).  (Note that this
   encoding indicator is only available behind the opening brace/bracket
   for map and array (Section 2.6.1): strings have a special syntax
   streamstring for indefinite length encoding except for the special
   cases ''_ and ""_ (Section 2.5.4).)

   The encoding indicators _0 to _3 can be used to indicate ai=24 to
   ai=27, respectively; they therefore stand for 1, 2, 4, and 8 bytes of
   additional information (ai) following the initial byte in the head of
   the data item.  (The abbreviation of _7 into _ was discussed above.
   _4 to _6 are not currently used in CBOR, but will be available if and
   when CBOR is extended to make use of ai=28 to ai=30.)

   Surprisingly, Section 8.1 of RFC 8949 [STD94] does not address ai=0
   to ai=23 — the assumption seems to have been that preferred
   serialization (Section 4.1 of RFC 8949 [STD94]) will be used when
   converting CBOR diagnostic notation to an encoded CBOR data item, so
   leaving out the encoding indicator for a data item with a preferred
   serialization will implicitly use ai=0 to ai=23 if that is possible.
   The present specification allows making this explicit:

   _i ("immediate") stands for encoding with ai=0 to ai=23, i.e., it
   indicates that the argument is encoded directly in the initial byte
   of the CBOR item.

   While no pressing use for further values for encoding indicators
   comes to mind, this is an extension point for EDN; Section 6.2
   defines a registry for additional values.

   Encoding Indicators are discussed in further detail in Section 2.5.4
   for indefinite length strings and in Section 2.6.1 for arrays and
   maps.

2.4.  Numbers

   In addition to JSON's decimal number literals, EDN provides
   hexadecimal, octal, and binary number literals in the usual
   C-language notation (octal with 0o prefix present only).



   Numbers composed only of digits (of the respective base) are
   interpreted as CBOR integers (major type 0/1, or where the number
   cannot be represented in this way, major type 6 with tag 2/3).  A
   leading "+" sign is a no-op, and a leading "-" sign inverts the sign
   of the number.  So 0, 000, +0 all represent the same integer zero, as
   does -0.  Similarly, 1, 001, +1 and +0001 all stand for the same
   integer one, and -1 and -0001 both designate the same integer minus
   one.

   Using a decimal point (.) and/or an exponent (e for decimal, p for
   hexadecimal) turns the number into a floating point number (major
   type 7) instead, irrespective of whether it is an integral number
   mathematically.  Note that, in floating point numbers, 0.0 is not the
   same number as -0.0, even if they are mathematically equal.

   In Table 2, all the items on a row are the same number (also shown in
   CBOR, hexadecimally), but they are distinct from items in a different
   row.

      +========================================+===================+
      | EDN                                    | CBOR hex          |
      +========================================+===================+
      | 4711, 0x1267, 0o11147, 0b1001001100111 | 19 1267 # uint    |
      +----------------------------------------+-------------------+
      | 1.5, 0.15e1, 15e-1, 0x1.8p0, 0x18p-4   | F9 3E00 # float16 |
      +----------------------------------------+-------------------+
      | 0, +0, -0                              | 00      # uint    |
      +----------------------------------------+-------------------+
      | 0.0, +0.0                              | F9 0000 # float16 |
      +----------------------------------------+-------------------+
      | -0.0                                   | F9 8000 # float16 |
      +----------------------------------------+-------------------+

      Table 2: Example Sets of Equivalent Notations for Some Numbers

   The non-finite floating-point numbers Infinity, -Infinity, and NaN
   are written exactly as in this sentence (this is also a way they can
   be written in JavaScript, although JSON does not allow them).

   See Section 5.1, Paragraph 7, Item 3 for additional details of the
   EDN number syntax.

   (Note that literals for further number formats, e.g., for
   representing rational numbers as fractions, or for NaNs with non-zero
   payloads, can be added as application-oriented literals.  Background
   information beyond that in [STD94] about the representation of
   numbers in CBOR can be found in the informational document
   [I-D.bormann-cbor-numbers].)

2.5.  Strings

   CBOR distinguishes two kinds of strings: text strings (the bytes in
   the string constitute UTF-8 [STD63] text, major type 3), and byte
   strings (CBOR does not further characterize the bytes that constitute
   the string, major type 2).

2.5.1.  Text String Literals

   EDN notates text strings in a form compatible to that of notating
   text strings in JSON (i.e., as a double-quoted string literal), with
   a number of usability extensions.  In JSON, no control characters are
   allowed to occur directly in text string literals; if needed, they
   can be specified using escapes such as \t or \r.  In EDN, string
   literals additionally can contain newlines (LINEFEED U+000A), which
   are copied into the resulting string like other characters in the
   string literal.  To deal with variability in platform presentation of
   newlines, any carriage return characters (U+000D) that may be present
   in the EDN string literal are not copied into the resulting string
   (see Section 5.1, Paragraph 7, Item 2).  No other control characters
   can occur directly in a string literal, and the handling of escaped
   characters (\r etc.) is as in JSON.

   JSON's escape scheme for characters that are not on Unicode's basic
   multilingual plane (BMP) is cumbersome (see Section 7 of RFC 8259
   [STD90]).  EDN keeps it, but also adds the syntax \u{NNN} where NNN
   is the Unicode scalar value as a hexadecimal number.  This means the
   following are equivalent (the first o is escaped as \u{6f} for no
   particular reason):

   "D\u{6f}mino's \u{1F073} + \u{2318}"   # \u{}-escape 3 chars
   "Domino's \uD83C\uDC73 + \u2318"       # escape JSON-like
   "Domino's 🁳 + ⌘"                       # unescaped

2.5.2.  Byte String Literals

   EDN adds a number of ways to notate byte strings, some of which
   provide detailed access to the bits within those bytes (see
   Section 2.5.5).  However, quite often, byte strings carry bytes that
   can be meaningfully notated as UTF-8 text (Section 2.5.3).

2.5.3.  Single-Quoted String Literals

   Analogously to text string literals delimited by double quotes, EDN
   allows the use of single quotes (without a prefix) to express byte
   string literals with UTF-8 text; for instance, the following are
   equivalent:

   'hello world'
   h'68656c6c6f20776f726c64'

   The escaping rules of JSON strings are applied equivalently for text-
   based byte string literals, e.g., \\ stands for a single backslash
   and \' stands for a single quote.  However, to facilitate parsing, in
   single-quoted strings EDN excludes certain escaping mechanisms
   available for double-quoted strings:

   *  \/ is an escape in JSON that is available for EDN text strings as
      well to ensure all JSON texts are EDN literals.  Since EDN's
      single-quoted strings to not occur in JSON, this legacy
      compatibility feature is not available for them.

   *  \u-based escapes are not available for characters in the range
      from U+0020 to U+007e (essentially, printable ASCII).

   Single-quoted string literals can occur unprefixed and stand for the
   byte string that encodes its text string value (the "content"), or be
   prefixed by what looks like an application-extension prefix (see
   Section 2.1).

   In a prefixed string literal, the text content of the single-quoted
   string literal is not used directly as a byte string, but is further
   processed in a way that is defined by the meaning given to the
   prefix.  Depending on the prefix, the result of that processing can,
   but need not be, a byte string value.

   Prefixed string literals (which are always single-quoted after the
   prefix) are used both for base-encoded byte string literals (see
   Section 2.5.5) and for application-oriented extension literals (see
   Section 2.1, called app-string).  (Additional kinds of base-encoded
   string literals can be defined as application-oriented extension
   literals by registering their prefixes; there is no fundamental
   difference between the two predefined base-encoded string literal
   prefixes (h, b64) and any such potential future extension literal
   prefixes.)

2.5.4.  Encoding Indicators of Strings

   For indefinite length encoding, strings (byte and text strings) have
   a special syntax streamstring.  This is used (except for the special
   cases ''_ and ""_ below) to notate their detailed composition into
   individual "chunks" (Section 3.2.3 of RFC 8949 [STD94]), by
   representing the individual chunks in sequence within parentheses,
   each optionally followed by a comma, with an encoding indicator _
   immediately after the opening parenthesis: e.g., (_ h'0123', h'4567')
   or (_ "foo", "bar").  The overall type (byte string or text string)

   of the string is provided by the types of the individual chunks,
   which all need to be of the same type (Section 3.2.3 of RFC 8949
   [STD94]).

   For an indefinite-length string with no chunks inside, (_ ) would be
   ambiguous as to whether a byte string (encoded 0x5fff) or a text
   string (encoded 0x7fff) is meant and is therefore not used.  The
   basic forms ''_ and ""_ can be used instead and are reserved for the
   case of no chunks only --- not as short forms for the (permitted, but
   not really useful) encodings with only empty chunks, which need to be
   notated as (_ ''), (_ ""), etc., when it is desired to preserve the
   chunk structure.

2.5.5.  Base-Encoded Byte String Literals

   Besides the unprefixed byte string literals that are analogous to
   JSON text string literals, EDN provides base-encoded byte string
   literals.  These are notated as prefixed string literals that carry
   one of the base encodings [RFC4648], without padding, i.e., the base
   encoding is enclosed in a single-quoted string literal, prefixed by
   »h« for base16 or »b64« for base64 or base64url (the actual encodings
   of the latter do not overlap, so the string remains unambiguous).
   For example, the byte string consisting of the four bytes 12 34 56 78
   (given in hexadecimal here) could be written h'12345678' or
   b64'EjRWeA'.

   Examples often benefit from some blank space (spaces, line breaks) in
   byte strings literals.  In certain EDN prefixed byte string literals,
   blank space is ignored; for instance, the following are equivalent:

      h'48656c6c6f20776f726c64'
      h'48 65 6c 6c 6f 20 77 6f 72 6c 64'
      h'4 86 56c 6c6f
        20776 f726c64'

   The internal syntax of prefixed single-quote literals such as h'' and
   b64'' can also allow comments as blank space (see Section 2.2).

      h'68656c6c6f20776f726c64'
      h'68 65 6c /doubled l!/ 6c 6f # hello
        20 /space/
        77 6f 72 6c 64' /world/

   Slash characters are part of the base64 classic alphabet (see Table 1
   in Section 4 of [RFC4648]), and they therefore need be in the b64''
   set of characters that contribute to the byte string.  Therefore,
   only end-of-line comments are available in b64 byte string literals.

      b64'/base64 not a comment/ but one follows # comment'
      h'FDB6AC 7BAE27A2D69CA2699E9EDFDBBADA2779FA25 968C2C'

   These two byte string literals stand for the same byte string; the
   deliberately confusing base64 content starts with b64'/bas' which is
   the same as h'FDB6AC' and ends with b64'lows' which is the same as
   h'968C2C'.

2.5.6.  CBOR Sequence Literals

   In diagnostic notation, a sequence of zero or more CBOR data item
   literals can be enclosed in << and >>, optionally prefixed by an
   application-extension prefix; we speak of _sequence literals_. EDN
   mainly deals with individual data items, not with CBOR sequences
   [RFC8742], so the CBOR sequence represented by the sequence literal
   needs to be further processed to obtain the value of the literal.

   Prefixed sequence literals refer to the application extension (see
   Section 2.1) identified by the prefix and apply the extension to its
   sequence content, resulting in a single data item.  This data item
   may be a string or may not (always) be, depending on the definition
   of the application extension.

   An unprefixed sequence literal applies CBOR encoding to the data
   items in its content, taken as a CBOR sequence.  The value of the
   literal thus is a byte string with the encoded content; we also speak
   of _embedded CBOR_. For instance, each pair of columns in the
   following are equivalent:

      <<1>>              h'01'
      <<1, 2>>           h'0102'
      <<"hello", null>>  h'65 68656c6c6f f6'
      <<>>               h''



2.6.  Arrays and Maps

   EDN borrows the JSON syntax for arrays and maps.  (Maps are called
   objects in JSON.)

   For maps, EDN extends the JSON syntax by allowing any data item in
   the map key position (before the colon).

   JSON requires the use of a comma as a separator character between the
   elements of an array as well as between the members (key/value pairs)
   of a map.  (These commas also were required in the original
   diagnostic notation defined in [STD94] and [RFC8610].)  The separator
   commas are now optional in the places where EDN syntax allows commas.
   (Stylistically, leaving out the commas is more idiomatic when they
   occur at line breaks.)


   In addition, EDN also allows, but does not require, a trailing comma
   before the closing bracket/brace, enabling an easier to maintain
   "terminator" style of their use.

   In summary, the following eight examples are all equivalent:

   [1, 2, 3]
   [1, 2, 3,]
   [1  2  3]
   [1  2  3,]
   [1  2, 3]
   [1  2, 3,]
   [1, 2  3]
   [1, 2  3,]

   as are

   {1: "n", "x": "a"}
   {1: "n", "x": "a",}
   {1: "n"  "x": "a"}
   # etc.

      |  CDDL's comma separators in the equivalent contexts (CDDL
      |  groups) are entirely optional (and actually are terminators,
      |  which together with their optionality allows them to be used
      |  like separators as well, or even not at all).  In summary,
      |  comma use is now aligned between EDN and CDDL, in a fully
      |  backwards compatible way.

2.6.1.  Encoding Indicators of Arrays and Maps

   A single underscore can be written after the opening brace of a map
   or the opening bracket of an array to indicate that the data item was
   represented in indefinite-length format.  For example, [_ 1, 2]
   contains an indicator that an indefinite-length representation was
   used to represent the data item [1, 2].

   At the same position, encoding indicators for specifying the size of
   the array or map head for definite-length format can be used instead,
   specifically _i or _0 to _3.  For example [_0 false, true] can be
   used to specify the encoding of the array [false, true] as 98 02 f4
   f5.

2.6.2.  Validity of Maps

   As discussed at the start of Section 2, EDN implementations MAY
   support generation and possibly ingestion of EDN for CBOR data items
   that are well-formed but not valid (Section 5.3 of RFC 8949 [STD94]).

   For maps, this is relevant for map keys that occur more than once, as
   in:

   {1: "to", 1: "fro"}

2.7.  Tags

   A tag is written as a decimal unsigned integer for the tag number,
   followed by the tag content in parentheses; for instance, a date in
   the format specified by RFC 3339 (ISO 8601) could be notated as:

        0("2013-03-21T20:04:00Z")

   or the equivalent epoch-based time as the following:

        1(1363896240)

   The tag number can be followed by an encoding indicator giving the
   encoding of the tag head.  For example:

        1_1(1363896240)

   (assuming preferred encoding for the tag content) is encoded as

   d9 0001        # tag(1)
      1a 514b67b0 # unsigned(1363896240)

2.8.  Simple values

   EDN uses JSON syntax for the simple values True (»true«), False
   (»false«), and Null (»null«).  Undefined is written »undefined« as in
   JavaScript.

   These and all other simple values can be given as "simple()" with the
   appropriate integer in the parentheses.  For example, »simple(42)«
   indicates major type 7, value 42, and »simple(0x14)« indicates
   »false«, as does »simple(20)« or »simple(0b10100)«.

3.  Application-Oriented Extension Literals

   This document extends the syntax used in diagnostic notation to also
   enable application-oriented extensions.  This section defines a
   number of application-oriented extensions.







3.1.  The "dt" Extension

   The application-extension identifier "dt" is used to notate a date/
   time literal that can be used as an Epoch-Based Date/Time as per
   Section 3.4.2 of RFC 8949 [STD94].

   The content of the literal is a single Standard Date/Time String as
   per Section 3.4.1 of RFC 8949 [STD94], as a text or byte string.

   The value of the literal is a number representing the result of a
   conversion of the given Standard Date/Time String to an Epoch-Based
   Date/Time.  If fractional seconds are given in the text (production
   time-secfrac in Figure 4), the value is a floating-point number; the
   value is an integer number otherwise.  In the all-upper-case variant
   of the app-prefix, the value is enclosed in a tag number 1.

   Each row of Table 3 shows an example of "dt" notation and equivalent
   notation not using an application-extension identifier.

             +================================+==============+
             | dt literal                     | plain EDN    |
             +================================+==============+
             | dt'1969-07-21T02:56:16Z'       | -14159024    |
             +--------------------------------+--------------+
             | dt'1969-07-21T02:56:16.0Z'     | -14159024.0  |
             +--------------------------------+--------------+
             | dt'1969-07-21T02:56:16.5Z'     | -14159023.5  |
             +--------------------------------+--------------+
             | dt<<'1969-07-21T02:56:16.5Z'>> | -14159023.5  |
             +--------------------------------+--------------+
             | dt<<"1969-07-21T02:56:16.5Z">> | -14159023.5  |
             +--------------------------------+--------------+
             | DT'1969-07-21T02:56:16Z'       | 1(-14159024) |
             +--------------------------------+--------------+

                 Table 3: dt and DT literals vs. plain EDN

   See Section 5.2.3 for an ABNF definition for the content of dt
   literals.

3.2.  The "ip" Extension

   The application-extension identifier "ip" is used to notate an IP
   address literal that can be used as an IP address as per Section 3 of
   [RFC9164].

   The content of the literal is a single IPv4address or IPv6address as
   per Section 3.2.2 of [RFC3986], as a text or byte string.


   With the lower-case app-string prefix ip, the value of the literal is
   a byte string representing the binary IP address.  With the upper-
   case app-string prefix IP, the literal is such a byte string tagged
   with tag number 54, if an IPv6address is used, or tag number 52, if
   an IPv4address is used.

   As an additional case, the upper-case app-string prefix IP'' can be
   used with an IP address prefix such as 2001:db8::/56 or 192.0.2.0/24,
   with the equivalent tag as its value.  (Note that [RFC9164]
   representations of address prefixes need to implement the truncation
   of the address byte string as described in Section 4.2 of [RFC9164];
   see example below.)  For completeness, the lower-case variant
   ip'2001:db8::/56' or ip'192.0.2.0/24' stands for an unwrapped
   [56,h'20010db8'] or [24,h'c00002']; however, in this case the
   information on whether an address is IPv4 or IPv6 often needs to come
   from the context.

   Note that this application-extension provides no direct
   representation of the "Interface format" defined in Section 3.1.3 of
   [RFC9164], an address combined with an optional prefix length and an
   optional zone identifier, and therefore no way to reference a zone
   identifier at all.  (If needed, this format can be put together by
   building their structures explicitly, e.g., an interface format
   without a zone identifier can be represented as in
   52([ip'192.0.2.42',24]), or an interface format with zone identifier
   42 as in 54([ip'fe80::0202:02ff:ffff:fe03:0303',64,42]).)

   Each row of Table 4 shows an example of "ip" notation and equivalent
   notation not using an application-extension identifier.







     +====================+=========================================+
     | ip literal         | plain EDN                               |
     +====================+=========================================+
     | ip'192.0.2.42'     | h'c000022a'                             |
     +--------------------+-----------------------------------------+
     | ip<<'192.0.2.42'>> | h'c000022a'                             |
     +--------------------+-----------------------------------------+
     | IP'192.0.2.42'     | 52(h'c000022a')                         |
     +--------------------+-----------------------------------------+
     | IP'192.0.2.0/24'   | 52([24,h'c00002'])                      |
     +--------------------+-----------------------------------------+
     | ip'2001:db8::42'   | h'20010db8000000000000000000000042'     |
     +--------------------+-----------------------------------------+
     | IP'2001:db8::42'   | 54(h'20010db8000000000000000000000042') |
     +--------------------+-----------------------------------------+
     | IP'2001:db8::/64'  | 54([64,h'20010db8'])                    |
     +--------------------+-----------------------------------------+

                Table 4: ip and IP literals vs. plain EDN

   See Section 5.2.4 for an ABNF definition for the content of ip
   literals.

3.3.  The "hash" Extension

   The application-extension identifier "hash" is used to notate the
   input to a cryptographic hash function as well as identify such a
   hash function to obtain a byte string that represents the output of
   that hash function.

   The content of the literal is a string, optionally followed by either
   an integer or a text string that identifies the hash function in the
   COSE Algorithms registry of the CBOR Object Signing and Encryption
   (COSE) registry group [IANA.cose], either by the identifier (value:
   integer or string), or, if no algorithm is registered with this
   value, by its name used in the registry.  If the second item is not
   given, the default algorithm used is -16 ("SHA-256").

   No uppercase variant prefix is defined for the application-extension
   identifier "hash".

          +===============+====================================+
          | hash literal  | plain EDN                          |
          +===============+====================================+
          | hash<<'foo'>> | h'2C26B46B68FFC68FF99B453C1D304134 |
          |               | 13422D706483BFA0F98A5E886266E7AE'  |
          +---------------+------------------------------------+
          | hash'foo'     | h'2C26B46B68FFC68FF99B453C1D304134 |
          |               | 13422D706483BFA0F98A5E886266E7AE'  |
          +---------------+------------------------------------+
          | hash<<'foo',  | h'2C26B46B68FFC68FF99B453C1D304134 |
          | -16>>         | 13422D706483BFA0F98A5E886266E7AE'  |
          +---------------+------------------------------------+
          | hash<<'foo',  | h'2C26B46B68FFC68FF99B453C1D304134 |
          | "SHA-256">>   | 13422D706483BFA0F98A5E886266E7AE'  |
          +---------------+------------------------------------+
          | hash<<'foo',  | h'F7FBBA6E0636F890E56FBBF3283E524C |
          | -44>>         | 6FA3204AE298382D624741D0DC663832   |
          |               | 6E282C41BE5E4254D8820772C5518A2C   |
          |               | 5A8C0C7F7EDA19594A7EB539453E1ED7'  |
          +---------------+------------------------------------+
          | hash<<'foo',  | h'F7FBBA6E0636F890E56FBBF3283E524C |
          | "SHA-512">>   | 6FA3204AE298382D624741D0DC663832   |
          |               | 6E282C41BE5E4254D8820772C5518A2C   |
          |               | 5A8C0C7F7EDA19594A7EB539453E1ED7'  |
          +---------------+------------------------------------+

                   Table 5: hash literals vs. plain EDN

4.  Stand-in Representations in Binary CBOR

   In some cases, an EDN consumer cannot construct actual CBOR items
   that represent the CBOR data intended for eventual interchange.  This
   document defines stand-in representation for two such cases:

   *  The EDN consumer does not know (or does not implement) an
      application-extension identifier used in the EDN document
      (Section 4.1) but wants to preserve the information for a later
      processor.

   *  The generator of some EDN intended for human consumption (such as
      in a specification document) may not want to include parts of the
      final data item, destructively replacing complete subtrees or
      possibly just parts of a lengthy string by _elisions_
      (Section 4.2).

   Implementation note: Typically, the ultimate applications will fail
   if they encounter tags unknown to them, which the ones defined in
   this section likely are.  Where chains of tools are involved in

   processing EDN, it may be useful to fail earlier than at the ultimate
   receiver in the chain unless specific processing options (e.g.,
   command line flags) are given that indicate which of these stand-ins
   are expected at this stage in the chain.

4.1.  Handling unknown application-extension identifiers

   When ingesting CBOR diagnostic notation, any application-oriented
   extension literals are usually decoded and transformed into the
   corresponding data item during ingestion.  If an application-
   extension is not known or not implemented by the ingesting process,
   this is usually an error and processing has to stop.

   However, in certain cases, it can be desirable to exceptionally carry
   an uninterpreted application-oriented extension literal in an
   ingested data item, allowing to postpone its decoding to a specific
   later stage of ingestion.

   This specification defines a CBOR Tag for this purpose: The
   Diagnostic Notation Unresolved Application-Extension Tag, tag number
   CPA999 (Section 6.5).  The content of this tag is an array of a text
   string for the application-extension identifier, and another array:

   *  For app-strings, the second array contains a single item, a text
      string containing the text notated by the single-quoted string in
      the app-string.

   *  For app-sequences, the second array contains zero or more items,
      which represent each item in the sequence contained in the app-
      sequence.

   For example, cri'https://example.com' can be represented as /CPA/
   999(["cri", ["https://example.com"]]), or hash<<"data", -44>> as
   /CPA/ 999(["hash", ["data", -44]]).

   If a stage of ingestion is not prepared to handle the Unresolved
   Application-Extension Tag, this is an error and processing has to
   stop, as if this stage had been ingesting an unknown or unimplemented
   application-extension literal itself.


4.2.  Handling information deliberately elided from an EDN document

   When using EDN for exposition in a document or on a whiteboard, it is
   often useful to be able to leave out parts of an EDN document that
   are not of interest at that point of the exposition.

   To facilitate this, this specification supports the use of an
   _ellipsis_ (notated as three or more dots in a row, as in ...) to
   indicate parts of an EDN document that have been elided (and
   therefore cannot be reconstructed).

   Upon ingesting EDN as a representation of a CBOR data item for
   further processing, the occurrence of an ellipsis usually is an error
   and processing has to stop.

   However, it is useful to be able to process EDN documents with
   ellipses in the automation scripts for the documents using them.
   This specification defines a CBOR Tag that can be used in the
   ingestion for this purpose: The Diagnostic Notation Ellipsis Tag, tag
   number CPA888 (Section 6.5).  The content of this tag either is

   1.  null (indicating a data item entirely replaced by an ellipsis),
       or it is

   2.  an array, the elements of which are alternating between fragments
       of a string and the actual elisions, represented as ellipses
       carrying a null as content.

   Elisions can stand in for entire subtrees, e.g. in:

   [1, 2, ..., 3]
   { "a": 1,
     "b": ...,
     ...: ...
   }

   A single ellipsis (or key/value pair of ellipses) can imply eliding
   multiple elements in an array (members in a map); if more detailed
   control is required, a data definition language such as CDDL can be
   employed.  (Note that the stand-in form defined here does not allow
   multiple key/value pairs with an ellipsis as a key: the CBOR data
   item would not be valid.)

   Subtree elisions can be represented in a CBOR data item by using
   /CPA/888(null) as the stand-in:


   [1, 2, 888(null), 3]
   { "a": 1,
     "b": 888(null),
     888(null): 888(null)
   }

   Elisions also can be used as part of a (text or byte) string:

   { "contract": "Herewith I buy" + ... + "gned: Alice & Bob",
     "bytes_in_IRI": 'https://a.example/' + ... + '&q=Übergrößenträger',
     "signature": h'4711...0815',
   }

   The example "contract" combines string concatenation via the +
   operator (Section 5.1) with ellipses; while the example "signature"
   uses special syntax that allows the use of ellipses between the bytes
   notated _inside_ h'' literals.

   String elisions can be represented in a CBOR data item by a stand-in
   that wraps an array of string fragments alternating with ellipsis
   indicators:

   { "contract": /CPA/888(["Herewith I buy", 888(null),
                           "gned: Alice & Bob"]),
     "bytes_in_IRI": 888(['https://a.example/', 888(null),
                          '&q=Übergrößenträger']),
     "signature": 888([h'4711', 888(null), h'0815']),
   }

   Note that the use of elisions is different from "commenting out" EDN
   text, e.g.:

   { "signature": h'4711/.../0815',
     # ...: ...
   }

   The consumer of this EDN will ignore the comments and therefore will
   have no idea after ingestion that some information has been elided;
   validation steps may then simply fail instead of being informed about
   the elisions.

5.  ABNF Definitions

   This section collects grammars in ABNF form ([STD68] as extended in
   [RFC7405]) that serve to define the syntax of EDN and some
   application-oriented literals.





Bormann                  Expires 8 January 2026                [Page 28]

Internet-Draft   CBOR Extended Diagnostic Notation (EDN)       July 2025


   Implementation note: The ABNF definitions in this section are
   intended to be useful in a Parsing Expression Grammar (PEG) parser
   interpretation (see Appendix A of [RFC8610] for an introduction into
   PEG).

5.1.  Overall ABNF Definition for Extended Diagnostic Notation

   This subsection provides an overall ABNF definition for the syntax of
   CBOR extended diagnostic notation.

   For simplicity, the internal parsing for the built-in EDN prefixes is
   specified in the same way.  ABNF definitions for h'' and b64'' are
   provided in Section 5.2.1 and Section 5.2.2.  However, the prefixes
   b32'' and h32'' are not in wide use and an ABNF definition in this
   document could therefore not be based on implementation experience.

   seq             = S [item *(MSC item) SOC]
   one-item        = S item S
   item            = map / array / tagged
                   / number / simple
                   / string / streamstring

   string1         = (tstr / bstr) spec
   string1e        = string1 / ellipsis
   ellipsis        = 3*"." ; "..." or more dots
   string          = string1e *(S "+" S string1e)

   number          = (hexfloat / hexint / octint / binint
                      / decnumber / nonfin) spec
   sign            = "+" / "-"
   decnumber       = [sign] (1*DIGIT ["." *DIGIT] / "." 1*DIGIT)
                            ["e" [sign] 1*DIGIT]

   hexfloat        = [sign] "0x" (1*HEXDIG ["." *HEXDIG] / "." 1*HEXDIG)
                            "p" [sign] 1*DIGIT
   hexint          = [sign] "0x" 1*HEXDIG
   octint          = [sign] "0o" 1*ODIGIT
   binint          = [sign] "0b" 1*BDIGIT
   nonfin          = %s"Infinity"
                   / %s"-Infinity"
                   / %s"NaN"
   simple          = %s"false"
                   / %s"true"
                   / %s"null"
                   / %s"undefined"
                   / %s"simple(" S item S ")"
   uint            = "0" / DIGIT1 *DIGIT
   tagged          = uint spec "(" S item S ")"

   app-prefix      = lcalpha *lcldh ; including h and b64
                   / ucalpha *ucldh ; tagged variant, if defined
   app-string      = app-prefix sqstr
   app-sequence    = app-prefix "<<" seq ">>"
   sqstr           = SQUOTE *single-quoted SQUOTE
   bstr            = app-string / sqstr / app-sequence / embedded
                     ; app-string/-sequence could be any type
   tstr            = DQUOTE *double-quoted DQUOTE
   embedded        = "<<" seq ">>"

   array           = "[" (specms S item *(MSC item) SOC / spec S) "]"
   map             = "{" (specms S keyp *(MSC keyp) SOC / spec S) "}"
   keyp            = item S ":" S item

   ; We allow %x09 HT in prose, but not in strings
   blank           = %x09 / %x0A / %x0D / %x20
   non-slash       = blank / %x21-2e / %x30-7F / NONASCII
   non-lf          = %x09 / %x0D / %x20-7F / NONASCII
   comment         = "/" *non-slash "/"
                   / "#" *non-lf %x0A
   ; optional space
   S               = *blank *(comment *blank)
   ; mandatory space
   MS              = (blank/comment) S
   ; mandatory comma and/or space
   MSC             = ("," S) / (MS ["," S])
   ; optional comma and/or space
   SOC             = S ["," S]

   ; check semantically that strings are either all text or all bytes
   ; note that there must be at least one string to distinguish
   streamstring    = "(_" MS string *(MSC string) SOC ")"

   spec            = ["_" *wordchar]
   specms          = ["_" *wordchar MS]

   double-quoted   = unescaped
                   / SQUOTE
                   / "\" escapable-d

   single-quoted   = unescaped
                   / DQUOTE
                   / "\" escapable-s

   escapable1      = %s"b" ; BS backspace U+0008
                   / %s"f" ; FF form feed U+000C
                   / %s"n" ; LF line feed U+000A
                   / %s"r" ; CR carriage return U+000D
                   / %s"t" ; HT horizontal tab U+0009
                   / "\"   ; \ backslash (reverse solidus) U+005C

   escapable-d     = escapable1
                   / DQUOTE
                   / "/"   ; / slash (solidus) U+002F (JSON!)
                   / (%s"u" hexchar) ;  uXXXX      U+XXXX

   escapable-s     = escapable1
                   / SQUOTE
                   / (%s"u" hexchar-s) ;  uXXXX      U+XXXX

   hexchar         = "{" (1*"0" [ hexscalar ] / hexscalar) "}"
                   / non-surrogate
                   / two-surrogate
   non-surrogate   = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG)
                   / ("D" ODIGIT 2HEXDIG )
   two-surrogate   = high-surrogate "\" %s"u" low-surrogate
   high-surrogate  = "D" ("8"/"9"/"A"/"B") 2HEXDIG
   low-surrogate   = "D" ("C"/"D"/"E"/"F") 2HEXDIG
   hexscalar       = "10" 4HEXDIG / HEXDIG1 4HEXDIG
                   / non-surrogate / 1*3HEXDIG

   ; single-quote hexchar-s: don't allow 0020..007e
   hexchar-s       = "{" (1*"0" [ hexscalar-s ] / hexscalar-s) "}"
                   / non-surrogate-s
                   / two-surrogate
   non-surrogate-s = "007F"                 ; rubout
                   / "00" ("0"/"1"/"8"/"9"/HEXDIGA) HEXDIG
                   / "0" HEXDIG1 2HEXDIG
                   / non-surrogate-1
   non-surrogate-1 = ((DIGIT1 / "A"/"B"/"C" / "E"/"F") 3HEXDIG)
                   / ("D" ODIGIT 2HEXDIG )


   hexscalar-s     = "10" 4HEXDIG / HEXDIG1 4HEXDIG
                   / non-surrogate-1 / HEXDIG1 2HEXDIG
                   / ("1"/"8"/"9"/HEXDIGA) HEXDIG
                   / "7F"
                   / HEXDIG1

   ; Note that no other C0 characters are allowed, including %x09 HT
   unescaped       = %x0A ; new line
                   / %x0D ; carriage return -- ignored on input
                   / %x20-21
                        ; omit 0x22 "
                   / %x23-26
                        ; omit 0x27 '
                   / %x28-5B
                        ; omit 0x5C \
                   / %x5D-7F
                   / NONASCII

   DQUOTE          = %x22    ; " double quote
   SQUOTE          = "'"     ; ' single quote
   DIGIT           = %x30-39 ; 0-9
   DIGIT1          = %x31-39 ; 1-9
   ODIGIT          = %x30-37 ; 0-7
   BDIGIT          = %x30-31 ; 0-1
   HEXDIGA         = "A" / "B" / "C" / "D" / "E" / "F"
   ; Note: double-quoted strings as in "A" are case-insensitive in ABNF
   HEXDIG          = DIGIT / HEXDIGA
   HEXDIG1         = DIGIT1 / HEXDIGA
   lcalpha         = %x61-7A ; a-z
   lcldh           = lcalpha / DIGIT / "-"
   ucalpha         = %x41-5A ; A-Z
   ucldh           = ucalpha / DIGIT / "-"
   ALPHA           = lcalpha / ucalpha
   wordchar        = "_" / ALPHA / DIGIT ; [_a-z0-9A-Z]
   NONASCII        = %x80-D7FF / %xE000-10FFFF





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement CBOR export to diagnostic notation #30

CBOR Extended Diagnostic Notation (EDN)

etc.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Implement CBOR export to diagnostic notation #30

Description

CBOR Extended Diagnostic Notation (EDN)

etc.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions