-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8361582: AArch64: Some ConH values cannot be replicated with SVE #26589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
After this commit - openjdk@a49ecb2, the JTREG test - hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java fails for some of the tests which contain constant values such as - public void vectorAddConstInputFloat16() { for (int i = 0; i < LEN; ++i) { output[i] = float16ToRawShortBits(add(shortBitsToFloat16(input1[i]), FP16_CONST)); } } <The full failure log is present in the JBS ticket, thus not reproducing it here> The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates. This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
👋 Welcome back bkilambi! A progress list of the required criteria for merging this PR into |
@Bhavana-Kilambi This change is no longer ready for integration - check the PR body for details. |
@Bhavana-Kilambi The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of this.
I am still a bit confused what matches Replicate
with immH
that does not fit immH8_shift8
when Matcher::vector_length_in_bytes(n) > 16
?
Hi, thanks for your review. If the immediate value does not fit [1] jdk/src/hotspot/cpu/aarch64/aarch64.ad Line 6963 in 8ac4a88
[2] jdk/src/hotspot/cpu/aarch64/aarch64_vector.ad Line 4806 in 8ac4a88
|
Ah OK, just checking. I ran this patch on the machine where I have originally found the issue, and it seems to work. |
@Bhavana-Kilambi this pull request can not be integrated into git checkout JDK-8361582
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
Hi @shipilev @theRealAph Can I please ask for another round of review? I have addressed the review comments in the latest patch. Thanks a lot! |
test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me, thanks. Some nits in the test remain, but they are non-blocking.
/reviewers 2
test/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java
Outdated
Show resolved
Hide resolved
private static short[] input; | ||
private static short[] output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might give things even more chance to vectorize? Not sure, feel free to ignore.
private static short[] input; | |
private static short[] output; | |
private static final short[] INPUTE; | |
private static final short[] OUTPUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it's ok to not add these changes to the code. The loops are getting vectorized fine and the tests do pass on aarch64 and x86. I will consider this if there's any issue with auto vectorization in the futurre. Thanks
Thanks for your review comments and approval @shipilev. I will address your review comments in the next patchset along with any other comments. @theRealAph Would you be able to take another look at the updated patch please? Thank you in advance! |
For
We should investigate that. As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions. |
Can we/should we plug this problem in encoding first, without going too much into the optimizing the non-broken case? As it stands now, real FP16-using code can run into matcher errors in JDK 25. I would like to fix that first. |
HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify |
Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead. |
Just a quick look at what can be done to improve the codegen -
Only on SVE machines with >16B vectors, we load from the constant table -
We could generate a similar |
Apologies, I thought I could change just the replicate backend nodes to be able to generate the |
Why not?
I don't understand. Why not do something along these lines?
|
I tried exactly that and it does generate a
Apologies for not being clear.
The destination register is an FPR. If we would want to modify this to generate a move to a scratch register instead (something similar to loadConI) then we would have to change the destination register to |
This is the part I don't understand. Why would you have to change the destination register to
|
Thanks. Yes I could do that (as I mentioned earlier in my comment), but I was trying to avoid the extra mov - |
After this commit - a49ecb2, the JTREG test -
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java
fails for some of the tests which contain constant values such as -<The full failure log is present in the JBS ticket, thus not reproducing it here>
The current code in the JDK results in the generation of sve_dup instruction for every 16-bit immediate while the acceptable range is [-128, 127] for 8-bit immediates and [-127 << 8, 128 << 8] with a multiple of 256 for 16-bit signed immediates.
This patch allows the generation of sve_dup instruction for only those 16-bit values which are within the limits as specified above and for the values which are out of range, the immediate half float value is loaded from the constant pool into a register ("loadConH" mach node) which is then replicated or broadcasted to an SVE register ("replicateHF" mach node).
Both the tests -
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java
andtest/hotspot/jtreg/compiler/c2/aarch64/TestFloat16Replicate.java
pass on 256-bit SVE machine. JTREG tests - hotspot (hotspot_all), langtools (tier1) and jdk(tier 1-3) pass on the same machine.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26589/head:pull/26589
$ git checkout pull/26589
Update a local copy of the PR:
$ git checkout pull/26589
$ git pull https://git.openjdk.org/jdk.git pull/26589/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26589
View PR using the GUI difftool:
$ git pr show -t 26589
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26589.diff
Using Webrev
Link to Webrev Comment