Skip to content

Conversation

jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Sep 3, 2025

This patch optimizes PopCount value transforms using KnownBits information.
Following are the results of the micro-benchmark included with the patch

System: 13th Gen Intel(R) Core(TM) i3-1315U

Baseline:
Benchmark                                      Mode  Cnt       Score   Error  Units
PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s

Withopt:
Benchmark                                      Mode  Cnt       Score   Error  Units
PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s

Kindly review and share your feedback.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365205: C2: Optimize popcount value computation using knownbits (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075
$ git checkout pull/27075

Update a local copy of the PR:
$ git checkout pull/27075
$ git pull https://git.openjdk.org/jdk.git pull/27075/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27075

View PR using the GUI difftool:
$ git pr show -t 27075

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27075.diff

Using Webrev

Link to Webrev Comment

@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 3, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 3, 2025

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8365205: C2: Optimize popcount value computation using knownbits

Reviewed-by: epeter, hgreule, qamai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 591 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 3, 2025
@openjdk
Copy link

openjdk bot commented Sep 3, 2025

@jatin-bhateja
The hotspot-compiler label was successfully added.

@jatin-bhateja jatin-bhateja marked this pull request as ready for review September 4, 2025 05:42
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 4, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 4, 2025

@SirYwell
Copy link
Member

SirYwell commented Sep 4, 2025

The change looks good, but I wonder:

  • if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
  • whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
    • _zeroes has the bits set that are known to be always 0. So BitsPer<Type> - popCount(x) gives you an upper limit of how many bits might be 1. And BitsPer<Type> - popCount(_zeroes) is equivalent to popCount(~_zeroes).
    • _ones has the bits set that are known to be always 1. Trivially, popCount(_ones) is a valid lower bound.
    • The rest repeats how adjust_bits_from_unsigned_bounds works, but that's not specific to the popcount nodes.

@jatin-bhateja
Copy link
Member Author

Hi @TobiHartmann , @SirYwell , @eme64 , can you kindly verify the changes in the latest patch?

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement @jatin-bhateja , thanks for working on it :)

Comment on lines 35 to 79
@Benchmark
public int StockKernelInt() {
int res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
int constrained_i = i & 0xFFFF;
res += constrained_i;
}
return res;
}

@Benchmark
public int LogicFoldingKerenlInt() {
int res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
int constrained_i = i & 0xFFFF;
if (Integer.bitCount(constrained_i) > 16) {
throw new AssertionError("Uncommon trap");
}
res += constrained_i;
}
return res;
}

@Benchmark
public long StockKernelLong() {
long res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
long constrained_i = i & 0xFFFFFFL;
res += constrained_i;
}
return res;
}

@Benchmark
public long LogicFoldingKerenLong() {
long res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
long constrained_i = i & 0xFFFFFFL;
if (Long.bitCount(constrained_i) > 24) {
throw new AssertionError("Uncommon trap");
}
res += constrained_i;
}
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the stock kernels are there to show performance if there is no op, the folding kernels you hope have the same performance. It would be nice to have one where the bitCount does not fold away, just to keep that comparison :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, on a second thought, since any benchmarks compare the performance of kernels with and without optimization it's better to do away with the stock variants and only retain folding kernels.

@eme64
Copy link
Contributor

eme64 commented Sep 18, 2025

@jatin-bhateja I'm going to be out of the office for about 3 weeks, so feel free to ask others for reviews!

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Sep 18, 2025
@openjdk
Copy link

openjdk bot commented Sep 18, 2025

@jatin-bhateja The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@jatin-bhateja
Copy link
Member Author

Hi @eme64 , @chhagedorn , @SirYwell , let me know if its good to land now.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm about to board my plane for a 3-week vacation, I'm leaving a last little note for the reviewers.

I think this is a really nice addition, so thanks for doing it @jatin-bhateja 😊 . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as JDK-8367341.

Please make sure you test it, and make sure the random values generated with the Generators in test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java make sense. Currently, there is for example a 32 bit range for a 64 bit long value, which is not correct, I think.

By default, my recommendation is to not constrain the Generators ranges, unless there is a really good reason. Generators are already built to produce values close to zero at an over-proportional rate. But by not restricting we may at some point also hit cases that we did not anticipate, and catch bugs that way.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Sep 19, 2025

As I'm about to board my plane for a 3-week vacation, I'm leaving a last little note for the reviewers.

I think this is a really nice addition, so thanks for doing it @jatin-bhateja 😊 . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as JDK-8367341.

Correct, currently KnownBits information is constrained as they are generated for limited value ranges, as discussed in
#27075 (comment)

// We use the KnownBits information from the integer types to derive how many one bits
// we have at least and at most.
// From the definition of KnownBits, we know:
// zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this, is ones[i] mixed up with zeros[i]? I.e., t[i]=0 if zeros[i]=1

Copy link
Member Author

@jatin-bhateja jatin-bhateja Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja
Copy link
Member Author

Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in.

Copy link
Member

@SirYwell SirYwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, although I'm not exactly sure about the semantics of widen and whether Type::WidenMax is the right choice here. Someone else has to look at that.

Thanks for the work!

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion, we might do this as a later RFE.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Sep 29, 2025

Looks good to me now, although I'm not exactly sure about the semantics of widen and whether Type::WidenMax is the right choice here. Someone else has to look at that.

Thanks for the work!

Hi @TobiHartmann @chhagedorn , can you kindly run this through your testing and share if its good to land in.

@TobiHartmann
Copy link
Member

I submitted testing for this and will report back once it finished.

@TobiHartmann
Copy link
Member

Testing all passed. I'll pass the review to someone else.

@eirbjo
Copy link
Contributor

eirbjo commented Oct 6, 2025

Is the core-libs label appropriate for this PR? Looks hotspot specific?

@SirYwell
Copy link
Member

SirYwell commented Oct 6, 2025

Is the core-libs label appropriate for this PR? Looks hotspot specific?

That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific.

@eirbjo
Copy link
Contributor

eirbjo commented Oct 6, 2025

That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific.

Right, makes sense. I'll go ahead and remove that label.

/label remove core-libs

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Oct 6, 2025
@openjdk
Copy link

openjdk bot commented Oct 6, 2025

@eirbjo
The core-libs label was successfully removed.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Oct 6, 2025

Hi @merykitty , Can you kindly be the second reviewer?

@liach
Copy link
Member

liach commented Oct 6, 2025

FYI this patch was marked core-libs because of the benchmark addition in java/lang. I wonder if it belongs to vm/compiler instead.

Copy link
Member

@merykitty merykitty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Oct 9, 2025

Hi @TobiHartmann, we have @merykitty and @SirYwell clearance, need your approval to land this.

@jatin-bhateja
Copy link
Member Author

Hi @TobiHartmann, Looking forward to your approval here.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm back from vacation. Code looks good to me now. But running some internal testing before approval ;)

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed -> approved 👍

@jatin-bhateja Thanks for the work!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 13, 2025
@jatin-bhateja
Copy link
Member Author

Thanks @SirYwell , @eme64 ,@merykitty

@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Oct 14, 2025

Going to push as commit 4496418.
Since your change was applied there have been 606 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 14, 2025
@openjdk openjdk bot closed this Oct 14, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 14, 2025
@openjdk
Copy link

openjdk bot commented Oct 14, 2025

@jatin-bhateja Pushed as commit 4496418.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

8 participants