Skip to content

[llvm] Don't combine repeated subnormal divisors #149333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18235,6 +18235,16 @@ SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
if (N0CFP && (N0CFP->isExactlyValue(1.0) || N0CFP->isExactlyValue(-1.0)))
return SDValue();

// Skip if we have subnormals, multiplying with the reciprocal will introduce
// infinities.
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, /* AllowUndefs */ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of think the arcp flag allows this. I know it's not what the user would want, but that often happens with fast math.

This change still lets the operation happen if the divisor isn't constant but is dynamically subnormal, right? It seems like if we really want to stop this, the transformation should be happening somewhere that we can call computeKnownFPClass and check for possible subnormals, which would likely disable the transformation in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on line 18229 above seems to be skipping the arcp check after the legalize phase. I'm not sure why that would be the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there's no way to consistently disable the transform for subnormals without completely disabling arcp for non-constant denominators. I don't really want to deal with the bug report that SimplifyCFG is illegally transforming floating-point code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review 😄 In my reading of langref "arcp: Allows division to be treated as a multiplication by a reciprocal", arcp doesn't seem to necessitate that the compiler perform the div -> mul+recip, especially when we know at compile time that something nasty is going on (a divide by subnormal).

I see this in visitFDIV:

    // Only do the transform if the reciprocal is a legal fp immediate that
    // isn't too nasty (eg NaN, denormal, ...).
    if (((st == APFloat::opOK && !Recip.isDenormal()) ||

I defer to your judgement, but if we can be a little more friendly in our constant folding, I don't see how that's a bad thing.

To your second question: yes, this doesn't do anything for preventing the same dynamically subnormal transform. I can dig around for example uses of computeKnownFPClass if you think that's a profitable direction. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I definitely wouldn't say that it's a profitable direction, just that it's the only way to handle this completely. I think computeKnownFPClass will report that any non-constant value could possibly be a subnormal unless there has been an explicit check in the code to prove that it isn't, so in almost all case it would just end up disabling the optimization.

I agree that the arcp flag doesn't require the compiler to perform the optimization. There is some merit to blocking the optimization in cases where we can be reasonably certain the user wouldn't want it even if the semantics of arcp do allow it. It just doesn't sit right with me to allow the optimization on variables but not on constants, because then it depends on how well we've done constant propagation to get to this point, and you can get things like a bug in the user's program that disappears if we don't inline a certain function, for instance, which could be maddening to track down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was already quite maddening to track down 😄. Full context is that this xform on a denormal created an infinity, which became a poisoned vector lane <poison, x, x, x>, which became a broadcast x with mattr=+avx2 and a move <0, x, x, x> on mattr=-avx2, which caused the test to pass with avx2 disabled because the poison became a zero without avx2 and it took on the value of x with >=avx2. At least with a dynamically subnormal value, the transform does not end up poisoning anything and the fast-math chaos stops there. It's the pessimistic poisoning that was the most frustrating part of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I've seen cases before where fast-math introduced an infinity and something else declared it to be poison (https://discourse.llvm.org/t/nnan-ninf-and-poison/56506). That is no fun.

I guess I'd have to agree that it makes sense to avoid introducing an infinity when we know we're doing so, especially if ninf is in effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a DAG version of computeKnownFPClass, but should have one (it will also always do a significantly worse job than the IR version)

if (N1CFP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also want to block the transformation in cases where N1CFP is a non-splat constant vector and one of the elements is subnormal.

https://godbolt.org/z/zzPT4dTP5

FPClassTest FPClass = N1CFP->getValueAPF().classify();
if (FPClass == fcPosSubnormal || FPClass == fcNegSubnormal) {
return SDValue();
}
}

// Exit early if the target does not want this transform or if there can't
// possibly be enough uses of the divisor to make the transform worthwhile.
unsigned MinUses = TLI.combineRepeatedFPDivisors();
Expand Down
22 changes: 22 additions & 0 deletions llvm/test/CodeGen/X86/repeated-fp-divisors-denorm.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s | FileCheck %s

; Negative test: repeated FP divisor transform should bail out when the rewrite
; would introduce infinities because of subnormal constant divisors.
define void @two_denorm_fdivs(float %a0, float %a1, float %a2, ptr %res) {
; CHECK-LABEL: two_denorm_fdivs:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: movss {{.*#+}} xmm0 = [1.95915678E-39,0.0E+0,0.0E+0,0.0E+0]
; CHECK-NEXT: divss %xmm0, %xmm1
; CHECK-NEXT: movss %xmm1, (%rdi)
; CHECK-NEXT: divss %xmm0, %xmm2
; CHECK-NEXT: movss %xmm2, 4(%rdi)
; CHECK-NEXT: retq
entry:
%div0 = fdiv arcp float %a1, 0x37E5555500000000
store float %div0, ptr %res
%ptr1 = getelementptr inbounds float, ptr %res, i64 1
%div1 = fdiv arcp float %a2, 0x37E5555500000000
store float %div1, ptr %ptr1
ret void
}
Loading