You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
release/21.x: [AArch64,TTI] Disable RealUse check for vector insert/extract costs and Apple CPUs. (#146526)
Back-port #146526 (02d3738) for the 21.x release, just for Apple
CPUs. As discussed during the review, the patch was landed just after
the branch, to avoid regressions. We already did a careful performance
analysis on Apple M series CPUs with this change and are seeing
significant gains on a number of workloads, which we would like to
enable for 21.x
Original message:
getVectorInstrCostHelper would return costs of zero for vector
inserts/extracts that move data between GPR and vector registers, if
there was no 'real' use, i.e. there was no corresponding existing
instruction.
This meant that passes like LoopVectorize and SLPVectorizer, which
likely are the main users of the interface, would understimate the cost
of insert/extracts that move data between GPR and vector registers,
which has non-trivial costs.
The patch removes the special case and only returns costs of zero for
lane 0 if it there is no need to transfer between integer and vector
registers.
This impacts a number of SLP test, and most of them look like general
improvements.I think the change should make things more accurate for any
AArch64 target, but if not it could also just be Apple CPU specific.
I am seeing +2% end-to-end improvements on SLP-heavy workloads.
PR: #146526
0 commit comments