You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider an EVL tail folded loop with a VF of 4 and a trip count of 5. With EVL tail folding, it's possible that this will take place with two iterations, one with EVL=3, and one with EVL=2.
A header mask will come in with the form icmp ule wide-canonical-iv, backedge-tc.
Most recipes will be converted to a VP intrinsic to use EVL in optimizeMaskToEVL. This should really be thought of as an optimisation, but consider a recipe that isn't handled yet or slips through, and so still uses the header mask.
The header mask is generated as icmp ule wide-canonical-iv, backedge-tc.
On the first iteration, the mask will look like:
[0, 1, 2, 3] <= 4 = [T, T, T, T]
However for the recipes which were optimized to VP intrinsics, they will have an EVL of 3, so basically a mask of [T, T, T, F].
On the second iteration, the mask will look like:
[4, 5, 6, 7] <= 4 = [T, F, F, F]
But for the VP intrinsics, they will have an EVL of 2 so a mask of [T, T, F, F].
We need to convert the header masks to something of the form icmp ult step-vector, EVL, otherwise we end up processing a different number of elements per iteration depending on whether or not it was converted to a VP intrinsic.