Skip to content

Conversation

jku
Copy link
Member

@jku jku commented Sep 25, 2025

This should fix #1552

  • added a test workflow based on idea taken from model-transparency
  • the test immediately fails because apparently we write cp1252 on Windows (and this is different with rekor 2 checkpoints that contain an mdash)
  • I've fixed the issues by being explicit about encoding: This changes serialization on Windows. Maybe we should actually be reading in bytes but I wanted the change to be minimal here.

Some details on the impact of this issue:

  • It is possible that some bundles produced on older releases on windows are no longer verifiable on windows when using a newer release (if the bundles did have encoding issues they used to be verifiable on windows but were never verifiable on other platforms)
  • In practice it seems this issue triggers specifically with rekor v2 because of the new checkpoint format

@jku

This comment was marked as outdated.

@jku

This comment was marked as outdated.

@jku jku marked this pull request as ready for review September 25, 2025 12:49
@spencerschrock
Copy link

This should fix #1552

I noticed all the encoding changes are in _cli.py. Is this something we (model-transparency) would be hitting in our call path? Or is sigstore-python CLI used by rekor2, which we may be hitting by upgrading to sigstore-python v4?

@jku
Copy link
Member Author

jku commented Sep 25, 2025

I noticed all the encoding changes are in _cli.py. Is this something we (model-transparency) would be hitting in our call path? Or is sigstore-python CLI used by rekor2, which we may be hitting by upgrading to sigstore-python v4?

oh good point: _cli is not used by other modules so it's possible that some of your code has the same issues (maybe it was made following the _cli module here?)

@spencerschrock
Copy link

oh good point: _cli is not used by other modules so it's possible that some of your code has the same issues

Indeed, we were also reading and writing signatures without specifying an encoding. Forcing everything to utf-8 seemed to resolve our issues

Maybe we should actually be reading in bytes

model-transparency will be thinking about this too

@spencerschrock
Copy link

Maybe we should actually be reading in bytes

For the JSON bundle, I think UTF-8 is more conformant depending on how one interprets "closed ecosystem"?

https://datatracker.ietf.org/doc/html/rfc8259#section-8.1

JSON text exchanged between systems that are not part of a closed
ecosystem MUST be encoded using UTF-8 [RFC3629].

jku added 4 commits September 25, 2025 19:43
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Apparently Python really likes to use cp1252 on some platforms but our
bundles must not be platform specific.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Sep 25, 2025

To be clear, I didn't mean we should think about changing the representation on disk: utf-8 text is fine --I was just thinking that it might be clearer to read bytes and pass those to the json parser (double checking that it actually uses utf-8 in all cases).

@jku jku changed the title Cross platform test and fixes Cross platform test and UTF-8 fixes Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

windows encoding issues
2 participants