[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc #151882

hidekisaito · 2025-08-03T18:58:45Z

Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.

llvmbot · 2025-08-03T18:59:16Z

@llvm/pr-subscribers-backend-amdgpu

Author: None (hidekisaito)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/151882.diff

6 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+35)
(modified) offload/plugins-nextgen/common/include/PluginInterface.h (+3)
(modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+4-1)
(modified) offload/test/lit.cfg (+8-3)
(modified) offload/test/sanitizer/use_after_free_2.c (+3)
(added) offload/test/sanitizer/use_after_free_3.c (+35)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index f8db9bf0ae739..94e635d94ed3b 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2944,6 +2944,41 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
     return Plugin::success();
   }
 
+  bool checkIfCoarseGrainMemoryNearOrAbove64GB() {
+    for (AMDGPUMemoryPoolTy *Pool : AllMemoryPools) {
+      if (Pool->isGlobal() && Pool->isCoarseGrained()) {
+        uint64_t Value;
+        hsa_status_t Status =
+            Pool->getAttrRaw(HSA_AMD_MEMORY_POOL_INFO_SIZE, Value);
+        if (Status != HSA_STATUS_SUCCESS)
+          continue;
+        constexpr uint64_t Almost64Gig = 0xFF0000000;
+        if (Value >= Almost64Gig)
+          return true;
+      }
+    }
+    return false; // CoarseGrain pool w/ 64GB or more capacity not found
+  }
+
+  size_t getMemoryManagerSizeThreshold() override {
+    // Targeting high memory capacity GPUs such as
+    // MI210 or later data center GPUs.
+    if (checkIfCoarseGrainMemoryNearOrAbove64GB()) {
+      // Set GenericDeviceTy::MemoryManager's Threshold to ~3GB,
+      // if threshold is not already set by ENV var
+      // LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD.
+      // This MemoryManager is used for omp_target_alloc(), OpenMP
+      // (non-usm) map clause, etc.
+      //
+      // Ideally, this kind of pooling is best performed at
+      // a common level (e.g, user side of HSA) between OpenMP and HIP
+      // but that feature does not exist (yet).
+      constexpr size_t Almost3Gig = 3000000000u;
+      return Almost3Gig;
+    }
+    return 0;
+  }
+
   /// Envar for controlling the number of HSA queues per device. High number of
   /// queues may degrade performance.
   UInt32Envar OMPX_NumQueues;
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 8c17a2ee07047..87c3777b0eda7 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -1090,6 +1090,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   /// Pointer to the memory manager or nullptr if not available.
   MemoryManagerTy *MemoryManager;
 
+  /// Per device setting of MemoryManager's Threshold
+  virtual size_t getMemoryManagerSizeThreshold() { return 0 /* use default */; }
+
   /// Environment variables defined by the OpenMP standard.
   Int32Envar OMP_TeamLimit;
   Int32Envar OMP_NumTeams;
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 94a050b559efe..46151cc9abce6 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -815,8 +815,11 @@ Error GenericDeviceTy::init(GenericPluginTy &Plugin) {
 
   // Enable the memory manager if required.
   auto [ThresholdMM, EnableMM] = MemoryManagerTy::getSizeThresholdFromEnv();
-  if (EnableMM)
+  if (EnableMM) {
+    if (ThresholdMM == 0)
+      ThresholdMM = getMemoryManagerSizeThreshold();
     MemoryManager = new MemoryManagerTy(*this, ThresholdMM);
+  }
 
   return Plugin::success();
 }
diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg
index 800a63bc0ee32..f3e8e9a66685e 100644
--- a/offload/test/lit.cfg
+++ b/offload/test/lit.cfg
@@ -121,6 +121,7 @@ if config.libomptarget_test_pgo:
 # For all other targets, we currently assume it is.
 supports_unified_shared_memory = True
 supports_apu = False
+supports_large_allocation_memory_pool = False
 if config.libomptarget_current_target.startswith('nvptx'):
   try:
     cuda_arch = int(config.cuda_test_arch[:3])
@@ -132,9 +133,11 @@ if config.libomptarget_current_target.startswith('nvptx'):
 elif config.libomptarget_current_target.startswith('amdgcn'):
     # amdgpu_test_arch contains a list of AMD GPUs in the system
     # only check the first one assuming that we will run the test on it.
-    if not (config.amdgpu_test_arch.startswith("gfx90a") or
-            config.amdgpu_test_arch.startswith("gfx942") or
-            config.amdgpu_test_arch.startswith("gfx950")):
+    if (config.amdgpu_test_arch.startswith("gfx90a") or
+        config.amdgpu_test_arch.startswith("gfx942") or
+        config.amdgpu_test_arch.startswith("gfx950")):
+       supports_large_allocation_memory_pool = True
+    else:
        supports_unified_shared_memory = False
     # check if AMD architecture is an APU:
     if ((config.amdgpu_test_arch.startswith("gfx942") and
@@ -144,6 +147,8 @@ if supports_unified_shared_memory:
    config.available_features.add('unified_shared_memory')
 if supports_apu:
    config.available_features.add('apu')
+if supports_large_allocation_memory_pool:
+   config.available_features.add('large_allocation_memory_pool')
 
 # Setup environment to find dynamic library at runtime
 if config.operating_system == 'Windows':
diff --git a/offload/test/sanitizer/use_after_free_2.c b/offload/test/sanitizer/use_after_free_2.c
index 02aa453d0a975..1c1e09744a750 100644
--- a/offload/test/sanitizer/use_after_free_2.c
+++ b/offload/test/sanitizer/use_after_free_2.c
@@ -10,6 +10,9 @@
 // UNSUPPORTED: s390x-ibm-linux-gnu
 // UNSUPPORTED: s390x-ibm-linux-gnu-LTO
 
+// If offload memory pooling is enabled for a large allocation, reuse error is
+// not detected. UNSUPPORTED: large_allocation_memory_pool
+
 #include <omp.h>
 
 int main() {
diff --git a/offload/test/sanitizer/use_after_free_3.c b/offload/test/sanitizer/use_after_free_3.c
new file mode 100644
index 0000000000000..fd77cff0d5c81
--- /dev/null
+++ b/offload/test/sanitizer/use_after_free_3.c
@@ -0,0 +1,35 @@
+// clang-format off
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD=1024 %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK
+// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK-PASS
+// clang-format on
+
+// If offload memory pooling is enabled for a large allocation, reuse error is
+// not detected. Run the test w/ and w/o ENV var override on memory pooling
+// threshold. REQUIRES: large_allocation_memory_pool
+
+#include <omp.h>
+#include <stdio.h>
+
+int main() {
+  int N = (1 << 30);
+  char *A = (char *)malloc(N);
+  char *P;
+#pragma omp target map(A[:N]) map(from : P)
+  {
+    P = &A[N / 2];
+    *P = 3;
+  }
+  // clang-format off
+// CHECK: OFFLOAD ERROR: memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
+// CHECK: Device pointer [[PTR]] points into prior host-issued allocation:
+// CHECK: Last deallocation:
+// CHECK: Last allocation of size 1073741824
+// clang-format on
+#pragma omp target
+  { *P = 5; }
+
+  // CHECK-PASS: PASS
+  printf("PASS\n");
+  return 0;
+}

llvmbot · 2025-08-03T18:59:16Z

@llvm/pr-subscribers-offload

Author: None (hidekisaito)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/151882.diff

6 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+35)
(modified) offload/plugins-nextgen/common/include/PluginInterface.h (+3)
(modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+4-1)
(modified) offload/test/lit.cfg (+8-3)
(modified) offload/test/sanitizer/use_after_free_2.c (+3)
(added) offload/test/sanitizer/use_after_free_3.c (+35)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index f8db9bf0ae739..94e635d94ed3b 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -2944,6 +2944,41 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
     return Plugin::success();
   }
 
+  bool checkIfCoarseGrainMemoryNearOrAbove64GB() {
+    for (AMDGPUMemoryPoolTy *Pool : AllMemoryPools) {
+      if (Pool->isGlobal() && Pool->isCoarseGrained()) {
+        uint64_t Value;
+        hsa_status_t Status =
+            Pool->getAttrRaw(HSA_AMD_MEMORY_POOL_INFO_SIZE, Value);
+        if (Status != HSA_STATUS_SUCCESS)
+          continue;
+        constexpr uint64_t Almost64Gig = 0xFF0000000;
+        if (Value >= Almost64Gig)
+          return true;
+      }
+    }
+    return false; // CoarseGrain pool w/ 64GB or more capacity not found
+  }
+
+  size_t getMemoryManagerSizeThreshold() override {
+    // Targeting high memory capacity GPUs such as
+    // MI210 or later data center GPUs.
+    if (checkIfCoarseGrainMemoryNearOrAbove64GB()) {
+      // Set GenericDeviceTy::MemoryManager's Threshold to ~3GB,
+      // if threshold is not already set by ENV var
+      // LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD.
+      // This MemoryManager is used for omp_target_alloc(), OpenMP
+      // (non-usm) map clause, etc.
+      //
+      // Ideally, this kind of pooling is best performed at
+      // a common level (e.g, user side of HSA) between OpenMP and HIP
+      // but that feature does not exist (yet).
+      constexpr size_t Almost3Gig = 3000000000u;
+      return Almost3Gig;
+    }
+    return 0;
+  }
+
   /// Envar for controlling the number of HSA queues per device. High number of
   /// queues may degrade performance.
   UInt32Envar OMPX_NumQueues;
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index 8c17a2ee07047..87c3777b0eda7 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -1090,6 +1090,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   /// Pointer to the memory manager or nullptr if not available.
   MemoryManagerTy *MemoryManager;
 
+  /// Per device setting of MemoryManager's Threshold
+  virtual size_t getMemoryManagerSizeThreshold() { return 0 /* use default */; }
+
   /// Environment variables defined by the OpenMP standard.
   Int32Envar OMP_TeamLimit;
   Int32Envar OMP_NumTeams;
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 94a050b559efe..46151cc9abce6 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -815,8 +815,11 @@ Error GenericDeviceTy::init(GenericPluginTy &Plugin) {
 
   // Enable the memory manager if required.
   auto [ThresholdMM, EnableMM] = MemoryManagerTy::getSizeThresholdFromEnv();
-  if (EnableMM)
+  if (EnableMM) {
+    if (ThresholdMM == 0)
+      ThresholdMM = getMemoryManagerSizeThreshold();
     MemoryManager = new MemoryManagerTy(*this, ThresholdMM);
+  }
 
   return Plugin::success();
 }
diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg
index 800a63bc0ee32..f3e8e9a66685e 100644
--- a/offload/test/lit.cfg
+++ b/offload/test/lit.cfg
@@ -121,6 +121,7 @@ if config.libomptarget_test_pgo:
 # For all other targets, we currently assume it is.
 supports_unified_shared_memory = True
 supports_apu = False
+supports_large_allocation_memory_pool = False
 if config.libomptarget_current_target.startswith('nvptx'):
   try:
     cuda_arch = int(config.cuda_test_arch[:3])
@@ -132,9 +133,11 @@ if config.libomptarget_current_target.startswith('nvptx'):
 elif config.libomptarget_current_target.startswith('amdgcn'):
     # amdgpu_test_arch contains a list of AMD GPUs in the system
     # only check the first one assuming that we will run the test on it.
-    if not (config.amdgpu_test_arch.startswith("gfx90a") or
-            config.amdgpu_test_arch.startswith("gfx942") or
-            config.amdgpu_test_arch.startswith("gfx950")):
+    if (config.amdgpu_test_arch.startswith("gfx90a") or
+        config.amdgpu_test_arch.startswith("gfx942") or
+        config.amdgpu_test_arch.startswith("gfx950")):
+       supports_large_allocation_memory_pool = True
+    else:
        supports_unified_shared_memory = False
     # check if AMD architecture is an APU:
     if ((config.amdgpu_test_arch.startswith("gfx942") and
@@ -144,6 +147,8 @@ if supports_unified_shared_memory:
    config.available_features.add('unified_shared_memory')
 if supports_apu:
    config.available_features.add('apu')
+if supports_large_allocation_memory_pool:
+   config.available_features.add('large_allocation_memory_pool')
 
 # Setup environment to find dynamic library at runtime
 if config.operating_system == 'Windows':
diff --git a/offload/test/sanitizer/use_after_free_2.c b/offload/test/sanitizer/use_after_free_2.c
index 02aa453d0a975..1c1e09744a750 100644
--- a/offload/test/sanitizer/use_after_free_2.c
+++ b/offload/test/sanitizer/use_after_free_2.c
@@ -10,6 +10,9 @@
 // UNSUPPORTED: s390x-ibm-linux-gnu
 // UNSUPPORTED: s390x-ibm-linux-gnu-LTO
 
+// If offload memory pooling is enabled for a large allocation, reuse error is
+// not detected. UNSUPPORTED: large_allocation_memory_pool
+
 #include <omp.h>
 
 int main() {
diff --git a/offload/test/sanitizer/use_after_free_3.c b/offload/test/sanitizer/use_after_free_3.c
new file mode 100644
index 0000000000000..fd77cff0d5c81
--- /dev/null
+++ b/offload/test/sanitizer/use_after_free_3.c
@@ -0,0 +1,35 @@
+// clang-format off
+// RUN: %libomptarget-compileopt-generic
+// RUN: %not --crash env -u LLVM_DISABLE_SYMBOLIZATION OFFLOAD_TRACK_ALLOCATION_TRACES=1 LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD=1024 %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK
+// RUN: %libomptarget-run-generic 2>&1 | %fcheck-generic --check-prefixes=CHECK-PASS
+// clang-format on
+
+// If offload memory pooling is enabled for a large allocation, reuse error is
+// not detected. Run the test w/ and w/o ENV var override on memory pooling
+// threshold. REQUIRES: large_allocation_memory_pool
+
+#include <omp.h>
+#include <stdio.h>
+
+int main() {
+  int N = (1 << 30);
+  char *A = (char *)malloc(N);
+  char *P;
+#pragma omp target map(A[:N]) map(from : P)
+  {
+    P = &A[N / 2];
+    *P = 3;
+  }
+  // clang-format off
+// CHECK: OFFLOAD ERROR: memory access fault by GPU {{.*}} (agent 0x{{.*}}) at virtual address [[PTR:0x[0-9a-z]*]]. Reasons: {{.*}}
+// CHECK: Device pointer [[PTR]] points into prior host-issued allocation:
+// CHECK: Last deallocation:
+// CHECK: Last allocation of size 1073741824
+// clang-format on
+#pragma omp target
+  { *P = 5; }
+
+  // CHECK-PASS: PASS
+  printf("PASS\n");
+  return 0;
+}

github-actions · 2025-08-03T19:01:13Z

✅ With the latest revision this PR passed the C/C++ code formatter.

offload/test/sanitizer/use_after_free_3.c

offload/plugins-nextgen/amdgpu/src/rtl.cpp

jhuber6

I'm wondering if it wouldn't be better to just expose the device's memory size and have the MemoryManager use that internally when configuring this.

offload/plugins-nextgen/amdgpu/src/rtl.cpp

offload/plugins-nextgen/common/include/PluginInterface.h

hidekisaito · 2025-08-05T05:55:03Z

I'm wondering if it wouldn't be better to just expose the device's memory size and have the MemoryManager use that internally when configuring this.

What threshold to set at what memory capacity, by default, is inherently device dependent.

… size in omp_target_alloc

doru1004

LGTM

jhuber6 · 2025-08-06T17:34:42Z

I'm wondering if it wouldn't be better to just expose the device's memory size and have the MemoryManager use that internally when configuring this.

What threshold to set at what memory capacity, by default, is inherently device dependent.

I thought the memory manager already took the device into account.

doru1004 · 2025-08-06T17:34:42Z

@hidekisaito any chance you could add an appropriate patch description? Thanks so much :)

ronlieb

LGTM

hidekisaito · 2025-08-06T17:53:58Z

I'm wondering if it wouldn't be better to just expose the device's memory size and have the MemoryManager use that internally when configuring this.

What threshold to set at what memory capacity, by default, is inherently device dependent.

I thought the memory manager already took the device into account.

No, it does not. Currently, all device under plugin-nextgen gets Memory Manager enabled by default at threshold of "1 << 13"
--- via GenericDeviceTy::init(), override through ENV var.

hidekisaito · 2025-08-06T18:51:10Z

@hidekisaito any chance you could add an appropriate patch description? Thanks so much :)

Done.

llvmbot added backend:AMDGPU offload labels Aug 3, 2025

hidekisaito requested a review from carlobertolli August 3, 2025 19:02

hidekisaito force-pushed the omp_target_alloc branch from 3ba5726 to 904d01e Compare August 3, 2025 19:49

ronlieb requested review from ronlieb and doru1004 August 3, 2025 19:55

arsenm reviewed Aug 4, 2025

View reviewed changes

offload/test/sanitizer/use_after_free_3.c Show resolved Hide resolved

shiltian reviewed Aug 4, 2025

View reviewed changes

offload/plugins-nextgen/amdgpu/src/rtl.cpp Outdated Show resolved Hide resolved

jhuber6 reviewed Aug 4, 2025

View reviewed changes

offload/plugins-nextgen/amdgpu/src/rtl.cpp Outdated Show resolved Hide resolved

offload/plugins-nextgen/amdgpu/src/rtl.cpp Outdated Show resolved Hide resolved

offload/plugins-nextgen/common/include/PluginInterface.h Outdated Show resolved Hide resolved

hidekisaito force-pushed the omp_target_alloc branch from 904d01e to c8ed60b Compare August 5, 2025 05:43

hidekisaito force-pushed the omp_target_alloc branch from c8ed60b to 77976ae Compare August 5, 2025 06:00

[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation…

2a09047

… size in omp_target_alloc

hidekisaito force-pushed the omp_target_alloc branch from 77976ae to 2a09047 Compare August 5, 2025 06:05

doru1004 approved these changes Aug 6, 2025

View reviewed changes

ronlieb approved these changes Aug 6, 2025

View reviewed changes

hidekisaito merged commit 83e5a99 into llvm:main Aug 6, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc #151882

[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc #151882

Uh oh!

hidekisaito commented Aug 3, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 3, 2025

Uh oh!

llvmbot commented Aug 3, 2025

Uh oh!

github-actions bot commented Aug 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jhuber6 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hidekisaito commented Aug 5, 2025

Uh oh!

doru1004 left a comment

Uh oh!

jhuber6 commented Aug 6, 2025

Uh oh!

doru1004 commented Aug 6, 2025

Uh oh!

ronlieb left a comment

Uh oh!

hidekisaito commented Aug 6, 2025

Uh oh!

hidekisaito commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc #151882

[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size in omp_target_alloc #151882

Uh oh!

Conversation

hidekisaito commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 3, 2025

Uh oh!

llvmbot commented Aug 3, 2025

Uh oh!

github-actions bot commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hidekisaito commented Aug 5, 2025

Uh oh!

doru1004 left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Aug 6, 2025

Uh oh!

doru1004 commented Aug 6, 2025

Uh oh!

ronlieb left a comment

Choose a reason for hiding this comment

Uh oh!

hidekisaito commented Aug 6, 2025

Uh oh!

hidekisaito commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

hidekisaito commented Aug 3, 2025 •

edited

Loading

github-actions bot commented Aug 3, 2025 •

edited

Loading