Skip to content

Unoptimized header masks mixed with VP intrinsics may have different lengths during EVL tail folding #150197

@lukel97

Description

@lukel97

As spotted by @Mel-Chen in this review comment: #149981 (comment)

Consider an EVL tail folded loop with a VF of 4 and a trip count of 5. With EVL tail folding, it's possible that this will take place with two iterations, one with EVL=3, and one with EVL=2.

A header mask will come in with the form icmp ule wide-canonical-iv, backedge-tc.

Most recipes will be converted to a VP intrinsic to use EVL in optimizeMaskToEVL. This should really be thought of as an optimisation, but consider a recipe that isn't handled yet or slips through, and so still uses the header mask.

The header mask is generated as icmp ule wide-canonical-iv, backedge-tc.

On the first iteration, the mask will look like:

[0, 1, 2, 3] <= 4 = [T, T, T, T]

However for the recipes which were optimized to VP intrinsics, they will have an EVL of 3, so basically a mask of [T, T, T, F].

On the second iteration, the mask will look like:

[4, 5, 6, 7] <= 4 = [T, F, F, F]

But for the VP intrinsics, they will have an EVL of 2 so a mask of [T, T, F, F].

We need to convert the header masks to something of the form icmp ult step-vector, EVL, otherwise we end up processing a different number of elements per iteration depending on whether or not it was converted to a VP intrinsic.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions