-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[llvm] Don't combine repeated subnormal divisors #149333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ashermancinelli
wants to merge
3
commits into
llvm:main
Choose a base branch
from
ashermancinelli:ajm/no-recip-fdiv-with-subnormals
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+32
−0
Open
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) { | |
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0))) | ||
return SDValue(); | ||
|
||
// Skip if we have subnormals, multiplying with the reciprocal will introduce | ||
// infinities. | ||
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true); | ||
if (N1CFP) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might also want to block the transformation in cases where N1CFP is a non-splat constant vector and one of the elements is subnormal. |
||
FPClassTest FPClass = N1CFP->getValueAPF().classify(); | ||
if (FPClass == fcPosSubnormal || FPClass == fcNegSubnormal) { | ||
return SDValue(); | ||
} | ||
} | ||
|
||
// Exit early if the target does not want this transform or if there can't | ||
// possibly be enough uses of the divisor to make the transform worthwhile. | ||
unsigned MinUses = TLI.combineRepeatedFPDivisors(); | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py | ||
; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s | FileCheck %s | ||
ashermancinelli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
; Negative test: repeated FP divisor transform should bail out when the rewrite | ||
; would introduce infinities because of subnormal constant divisors. | ||
define void @two_denorm_fdivs(float %a0, float %a1, float %a2, ptr %res) { | ||
; CHECK-LABEL: two_denorm_fdivs: | ||
; CHECK: # %bb.0: # %entry | ||
; CHECK-NEXT: movss {{.*#+}} xmm0 = [1.95915678E-39,0.0E+0,0.0E+0,0.0E+0] | ||
; CHECK-NEXT: divss %xmm0, %xmm1 | ||
; CHECK-NEXT: movss %xmm1, (%rdi) | ||
; CHECK-NEXT: divss %xmm0, %xmm2 | ||
; CHECK-NEXT: movss %xmm2, 4(%rdi) | ||
; CHECK-NEXT: retq | ||
entry: | ||
%div0 = fdiv arcp float %a1, 0x37E5555500000000 | ||
store float %div0, ptr %res | ||
%ptr1 = getelementptr inbounds float, ptr %res, i64 1 | ||
%div1 = fdiv arcp float %a2, 0x37E5555500000000 | ||
store float %div1, ptr %ptr1 | ||
ret void | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of think the
arcp
flag allows this. I know it's not what the user would want, but that often happens with fast math.This change still lets the operation happen if the divisor isn't constant but is dynamically subnormal, right? It seems like if we really want to stop this, the transformation should be happening somewhere that we can call
computeKnownFPClass
and check for possible subnormals, which would likely disable the transformation in most cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check on line 18229 above seems to be skipping the
arcp
check after the legalize phase. I'm not sure why that would be the case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review 😄 In my reading of langref "arcp: Allows division to be treated as a multiplication by a reciprocal",
arcp
doesn't seem to necessitate that the compiler perform the div -> mul+recip, especially when we know at compile time that something nasty is going on (a divide by subnormal).I see this in
visitFDIV
:I defer to your judgement, but if we can be a little more friendly in our constant folding, I don't see how that's a bad thing.
To your second question: yes, this doesn't do anything for preventing the same dynamically subnormal transform. I can dig around for example uses of
computeKnownFPClass
if you think that's a profitable direction. Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I definitely wouldn't say that it's a profitable direction, just that it's the only way to handle this completely. I think
computeKnownFPClass
will report that any non-constant value could possibly be a subnormal unless there has been an explicit check in the code to prove that it isn't, so in almost all case it would just end up disabling the optimization.I agree that the
arcp
flag doesn't require the compiler to perform the optimization. There is some merit to blocking the optimization in cases where we can be reasonably certain the user wouldn't want it even if the semantics ofarcp
do allow it. It just doesn't sit right with me to allow the optimization on variables but not on constants, because then it depends on how well we've done constant propagation to get to this point, and you can get things like a bug in the user's program that disappears if we don't inline a certain function, for instance, which could be maddening to track down.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was already quite maddening to track down 😄. Full context is that this xform on a denormal created an infinity, which became a poisoned vector lane
<poison, x, x, x>
, which became abroadcast x
with mattr=+avx2 and amove <0, x, x, x>
on mattr=-avx2, which caused the test to pass with avx2 disabled because the poison became a zero without avx2 and it took on the value ofx
with >=avx2. At least with a dynamically subnormal value, the transform does not end up poisoning anything and the fast-math chaos stops there. It's the pessimistic poisoning that was the most frustrating part of this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I've seen cases before where fast-math introduced an infinity and something else declared it to be poison (https://discourse.llvm.org/t/nnan-ninf-and-poison/56506). That is no fun.
I guess I'd have to agree that it makes sense to avoid introducing an infinity when we know we're doing so, especially if
ninf
is in effect.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a DAG version of computeKnownFPClass, but should have one (it will also always do a significantly worse job than the IR version)