-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8364320: String encodeUTF8 latin1 with negatives #26597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back bokken! A progress list of the required criteria for merging this PR into |
@bokken This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 163 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@liach) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Benchmark on win64 Baseline:
Patch:
|
Webrevs
|
byte[] dst = StringUTF16.newBytesFor(val.length); | ||
for (byte c : val) { | ||
if (positives > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if for very short strings, the overhead of the setup for arraycopy
is worth it.
It might be interesting to raise the test here to 4 or 8 for the array copy and otherwise start the loop at 0 and see if the performance difference is detectable.
@bokken FYI to make JMH comparison easier, you can let JMH generate JSON reports, upload them to github gists, and use https://jmh.morethan.io/ to compare the two results from two gists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, even though Roger's concern is valid, I think the current shape is the best for this code. All usages of JavaLangAccess.inflateBytesToChars
already exhibit the same pattern as the code here. A separate heuristic for this method would increase our maintenance cost for uncertain gains.
@liach / @RogerRiggs I have been experimenting locally with other options which are a bit more complex: I am continuing to evaluate various options. |
As suggested on mailing list, when encoding latin1 bytes to utf-8, we can count the leading positive bytes and in the case where there is a negative, we can copy all the positive values to the target byte[] prior to processing the remaining data 1 byte at a time.
https://mail.openjdk.org/pipermail/core-libs-dev/2025-July/149417.html
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26597/head:pull/26597
$ git checkout pull/26597
Update a local copy of the PR:
$ git checkout pull/26597
$ git pull https://git.openjdk.org/jdk.git pull/26597/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26597
View PR using the GUI difftool:
$ git pr show -t 26597
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26597.diff
Using Webrev
Link to Webrev Comment