Skip to content

Support for IBM037 and IBM1140? #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbowtie opened this issue Dec 29, 2021 · 5 comments
Open

Support for IBM037 and IBM1140? #1

jbowtie opened this issue Dec 29, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@jbowtie
Copy link

jbowtie commented Dec 29, 2021

By my count there are four EBCDIC code pages likely to be encountered on the internet, though the comprehensive list of all registered IANA character sets found at https://www.iana.org/assignments/character-sets/character-sets.xhtml lists about 20 EBCDIC codepages.

Most of them are cross-referenced to the code page definitions in https://datatracker.ietf.org/doc/html/rfc1345 (which aggregates many of the original sources into a single reference document).

I'm happy to do this work for the two most common pages (IBM037 and IBM1140) as I am implementing an XML parser, however I don't know if it's something you would want in scope, behind a cargo feature, or segregated into an EBCDIC-specific crate.

@bonega
Copy link
Owner

bonega commented Dec 29, 2021

Hi, I am open to including them.
Looking at how to implement them I discovered a bug with how Yore handles codepages that deviate from ascii.
Got to fix that first.
I expect that EBCDIC decoding performance will be quite a bit slower because we can't just memcpy from <=0xFF to utf8.
I will give you a ping when I have implemented it

@bonega bonega added the enhancement New feature or request label Dec 29, 2021
@bonega
Copy link
Owner

bonega commented Jan 13, 2022

I published ebcdic branch
Can you check it out?

@jbowtie
Copy link
Author

jbowtie commented Jan 17, 2022

It works as expected with my test cases so far - though to be fair I'm not exercising the encode functionality.

Do you have a test suite to build out? I could contribute some files, or maybe you want to encode with iconv / decode with yore?

@bonega
Copy link
Owner

bonega commented Jan 25, 2022

I have fixed some bugs in a private branch, will push that shortly.
Will probably use iconv to make some tests.
The rest of yore is tested against oem_cp and encoding_rs, but they have no ebcdic encodings

@bonega
Copy link
Owner

bonega commented Jan 25, 2022

I force pushed the ebcdic branch: https://github.com/bonega/yore/tree/ebcdic
Use "ebcdic" feature flag.
Please add PR for test data.
At the moment I am only running it through the invariant fuzzer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants