Skip to content

[PowerPC] Add intrinsic definition for load and store with Right Length Left-justified #148873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions llvm/include/llvm/IR/IntrinsicsPowerPC.td
Original file line number Diff line number Diff line change
Expand Up @@ -1351,6 +1351,18 @@ def int_ppc_vsx_lxvll :
def int_ppc_vsx_lxvp :
DefaultAttrsIntrinsic<[llvm_v256i1_ty], [llvm_ptr_ty],
[IntrReadMem, IntrArgMemOnly]>;
def int_ppc_vsx_lxvrl :
DefaultAttrsIntrinsic<[llvm_anyvector_ty], [llvm_ptr_ty, llvm_i64_ty],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lxvrl only load 16bytes. does llvm_anyvector_ty include the llvm_v256i1_ty ?

maybe you can use the same definition as

def int_ppc_vsx_lxvl :
    DefaultAttrsIntrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i64_ty],
                          [IntrReadMem, IntrArgMemOnly]>;

Copy link
Contributor Author

@lei137 lei137 Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm_anyvector_ty include the llvm_v256i1_ty ?

Yes, it seem to include all.

def int_ppc_vsx_lxvl : DefaultAttrsIntrinsic<[llvm_v4i32_ty]

These instructions are byte wise loads, Using anyvector_ty will allow me to later add more patterns to support the various 16 byte vector combinations. I am only generating it for v4i32 and v2i64 for now.. Is there a way to isolate this to all 16byte vectors only?

[IntrReadMem, IntrArgMemOnly]>;
def int_ppc_vsx_lxvrll :
DefaultAttrsIntrinsic<[llvm_anyvector_ty], [llvm_ptr_ty, llvm_i64_ty],
[IntrReadMem, IntrArgMemOnly]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. refer to int_ppc_vsx_lxvll

def int_ppc_vsx_lxvprl :
DefaultAttrsIntrinsic<[llvm_v256i1_ty], [llvm_ptr_ty, llvm_i64_ty],
[IntrReadMem, IntrArgMemOnly]>;
def int_ppc_vsx_lxvprll :
DefaultAttrsIntrinsic<[llvm_v256i1_ty], [llvm_ptr_ty, llvm_i64_ty],
[IntrReadMem, IntrArgMemOnly]>;

// Vector store.
def int_ppc_vsx_stxvw4x : Intrinsic<[], [llvm_v4i32_ty, llvm_ptr_ty],
Expand All @@ -1370,6 +1382,19 @@ def int_ppc_vsx_stxvll :
def int_ppc_vsx_stxvp :
Intrinsic<[], [llvm_v256i1_ty, llvm_ptr_ty], [IntrWriteMem,
IntrArgMemOnly]>;
def int_ppc_vsx_stxvrl :
Intrinsic<[], [llvm_anyvector_ty, llvm_ptr_ty, llvm_i64_ty],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto , refer to

  def int_ppc_vsx_stxvl :
      Intrinsic<[], [llvm_v4i32_ty, llvm_ptr_ty, llvm_i64_ty],
      [IntrWriteMem, IntrArgMemOnly]>;

[IntrWriteMem, IntrArgMemOnly]>;
def int_ppc_vsx_stxvrll :
Intrinsic<[], [llvm_anyvector_ty, llvm_ptr_ty, llvm_i64_ty],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

[IntrWriteMem, IntrArgMemOnly]>;
def int_ppc_vsx_stxvprl :
Intrinsic<[], [llvm_v256i1_ty, llvm_ptr_ty, llvm_i64_ty], [IntrWriteMem,
IntrArgMemOnly]>;
def int_ppc_vsx_stxvprll :
Intrinsic<[], [llvm_v256i1_ty, llvm_ptr_ty, llvm_i64_ty], [IntrWriteMem,
IntrArgMemOnly]>;

// Vector and scalar maximum.
def int_ppc_vsx_xvmaxdp : PowerPC_VSX_Vec_DDD_Intrinsic<"xvmaxdp">;
def int_ppc_vsx_xvmaxsp : PowerPC_VSX_Vec_FFF_Intrinsic<"xvmaxsp">;
Expand Down
20 changes: 20 additions & 0 deletions llvm/lib/Target/PowerPC/PPCInstrFuture.td
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,23 @@ let Predicates = [HasVSX, IsISAFuture] in {
"stxvprll $XTp, $RA, $RB", IIC_LdStLFD, []>;
}
}

// Load/Store VSX Vector with Right Length Left-justified.
foreach Ty = [v4i32, v2i64] in {
def : Pat<(Ty (int_ppc_vsx_lxvrl addr:$RA, i64:$RB)),
(LXVRL memr:$RA, g8rc:$RB)>;
def : Pat<(Ty (int_ppc_vsx_lxvrll addr:$RA, i64:$RB)),
(LXVRLL $RA, $RB)>;
def : Pat<(int_ppc_vsx_stxvrl Ty:$XT, addr:$RA, i64:$RB),
(STXVRL $XT, $RA, $RB)>;
def : Pat<(int_ppc_vsx_stxvrll Ty:$XT, addr:$RA, i64:$RB),
(STXVRLL $XT, $RA, $RB)>;
}

// Load/Store VSX Vector pair with Right Length Left-justified.
def : Pat<(v256i1(int_ppc_vsx_lxvprl addr:$RA, i64:$RB)), (LXVPRL $RA, $RB)>;
def : Pat<(v256i1(int_ppc_vsx_lxvprll addr:$RA, i64:$RB)), (LXVPRLL $RA, $RB)>;
def : Pat<(int_ppc_vsx_stxvprl v256i1:$XTp, addr:$RA, i64:$RB),
(STXVPRL $XTp, $RA, $RB)>;
def : Pat<(int_ppc_vsx_stxvprll v256i1:$XTp, addr:$RA, i64:$RB),
(STXVPRLL $XTp, $RA, $RB)>;
150 changes: 150 additions & 0 deletions llvm/test/CodeGen/PowerPC/vsx-ldst-with-length.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run /update_llc_test_checks.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks are regex so that we don't need to generate diff checks for different run lines due to register allocation differences for register pairs. Since these builtin generates a specific instruction only, I thought this was better then auto generating them.

; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr -mcpu=future < %s | \
; RUN: FileCheck %s
; RUN: llc -verify-machineinstrs -mtriple=powerpc64-ibm-aix-xcoff \
; RUN: -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr -mcpu=future < %s | \
; RUN: FileCheck %s

; Test for load/store to/from v4i32.

define <4 x i32> @testLXVRL(ptr %a, i64 %b) {
; CHECK-LABEL: testLXVRL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxvrl v2, r3, r4
; CHECK-NEXT: blr
entry:
%0 = tail call <4 x i32> @llvm.ppc.vsx.lxvrl(ptr %a, i64 %b)
ret <4 x i32> %0
}
declare <4 x i32> @llvm.ppc.vsx.lxvrl(ptr, i64)

define <4 x i32> @testLXVRLL(ptr %a, i64 %b) {
; CHECK-LABEL: testLXVRLL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxvrll v2, r3, r4
; CHECK-NEXT: blr
entry:
%0 = tail call <4 x i32> @llvm.ppc.vsx.lxvrll(ptr %a, i64 %b)
ret <4 x i32> %0
}
declare <4 x i32> @llvm.ppc.vsx.lxvrll(ptr, i64)

define void @testSTXVRL(<4 x i32> %a, ptr %b, i64 %c) {
; CHECK-LABEL: testSTXVRL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: stxvrl v2, [[REG:r[0-9]+]], [[REG1:r[0-9]+]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks you define the REG and REG1 , but never use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these can be anonymous matches.

; CHECK: blr
entry:
tail call void @llvm.ppc.vsx.stxvrl(<4 x i32> %a, ptr %b, i64 %c)
ret void
}
declare void @llvm.ppc.vsx.stxvrl(<4 x i32>, ptr, i64)

define void @testSTXVRLL(<4 x i32> %a, ptr %b, i64 %c) {
; CHECK-LABEL: testSTXVRLL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: stxvrll v2, [[REG:r[0-9]+]], [[REG1:r[0-9]+]]
; CHECK: blr
entry:
tail call void @llvm.ppc.vsx.stxvrll(<4 x i32> %a, ptr %b, i64 %c)
ret void
}
declare void @llvm.ppc.vsx.stxvrll(<4 x i32>, ptr, i64)

; Test for load/store to/from v2i64.

define <2 x i64> @testLXVRL2(ptr %a, i64 %b) {
; CHECK-LABEL: testLXVRL2:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxvrl v2, r3, r4
; CHECK-NEXT: blr
entry:
%0 = tail call <2 x i64> @llvm.ppc.vsx.lxvrl.v2i64(ptr %a, i64 %b)
ret <2 x i64> %0
}
declare <2 x i64> @llvm.ppc.vsx.lxvrl.v2i64(ptr, i64)

define <2 x i64> @testLXVRLL2(ptr %a, i64 %b) {
; CHECK-LABEL: testLXVRLL2:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxvrll v2, r3, r4
; CHECK-NEXT: blr
entry:
%0 = tail call <2 x i64> @llvm.ppc.vsx.lxvrll.v2i64(ptr %a, i64 %b)
ret <2 x i64> %0
}
declare <2 x i64> @llvm.ppc.vsx.lxvrll.v2i64(ptr, i64)

define void @testSTXVRL2(<2 x i64> %a, ptr %b, i64 %c) {
; CHECK-LABEL: testSTXVRL2:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: stxvrl v2, [[REG:r[0-9]+]], [[REG1:r[0-9]+]]
; CHECK: blr
entry:
tail call void @llvm.ppc.vsx.stxvrl.v2i64(<2 x i64> %a, ptr %b, i64 %c)
ret void
}
declare void @llvm.ppc.vsx.stxvrl.v2i64(<2 x i64>, ptr, i64)

define void @testSTXVRLL2(<2 x i64> %a, ptr %b, i64 %c) {
; CHECK-LABEL: testSTXVRLL2:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: stxvrll v2, [[REG:r[0-9]+]], [[REG1:r[0-9]+]]
; CHECK: blr
entry:
tail call void @llvm.ppc.vsx.stxvrll.v2i64(<2 x i64> %a, ptr %b, i64 %c)
ret void
}
declare void @llvm.ppc.vsx.stxvrll.v2i64(<2 x i64>, ptr, i64)

; Test for load/store vectore pair.

define <256 x i1> @testLXVPRL(ptr %vpp, i64 %b) {
; CHECK-LABEL: testLXVPRL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxvprl vsp34, r4, r5
; CHECK: blr
entry:
%0 = tail call <256 x i1> @llvm.ppc.vsx.lxvprl(ptr %vpp, i64 %b)
ret <256 x i1> %0
}
declare <256 x i1> @llvm.ppc.vsx.lxvprl(ptr, i64)

define <256 x i1> @testLXVPRLL(ptr %vpp, i64 %b) {
; CHECK-LABEL: testLXVPRLL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxvprll vsp34, r4, r5
; CHECK: blr
entry:
%0 = tail call <256 x i1> @llvm.ppc.vsx.lxvprll(ptr %vpp, i64 %b)
ret <256 x i1> %0
}
declare <256 x i1> @llvm.ppc.vsx.lxvprll(ptr, i64)

define void @testSTXVPRL(ptr %v, ptr %vp, i64 %len) {
; CHECK-LABEL: testSTXVPRL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxv v2
; CHECK-NEXT: lxv v3
; CHECK-NEXT: stxvprl vsp34, r4, r5
; CHECK-NEXT: blr
entry:
%0 = load <256 x i1>, ptr %v, align 32
tail call void @llvm.ppc.vsx.stxvprl(<256 x i1> %0, ptr %vp, i64 %len)
ret void
}
declare void @llvm.ppc.vsx.stxvprl(<256 x i1>, ptr, i64)

define void @testSTXVPRLL(ptr %v, ptr %vp, i64 %len) {
; CHECK-LABEL: testSTXVPRLL:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: lxv v2
; CHECK-NEXT: lxv v3
; CHECK-NEXT: stxvprll vsp34, r4, r5
; CHECK-NEXT: blr
entry:
%0 = load <256 x i1>, ptr %v, align 32
tail call void @llvm.ppc.vsx.stxvprll(<256 x i1> %0, ptr %vp, i64 %len)
ret void
}
declare void @llvm.ppc.vsx.stxvprll(<256 x i1>, ptr, i64)
Loading