Use ATTACH maps for array-sections/subscripts on pointers. #1

abhinavgaba · 2025-07-16T14:02:39Z

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

[Offload] Introduce ATTACH map-type support for pointer attachment. llvm/llvm-project#149036
[Clang][OpenMP] Capture mapped pointers on target by reference. llvm/llvm-project#145454

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.

With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.

 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }

Attach-style maps for declare mappers. That should be a separate PR.
use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
use_device_ptr handling has some issues too. Need debugging.
Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

abhinavgaba · 2025-07-16T14:03:51Z

offload/libomptarget/interface.cpp

The libomptarget code will disappear from this PR once llvm#149036 is merged.

abhinavgaba · 2025-07-23T13:31:19Z

clang/lib/CodeGen/CGOpenMPRuntime.cpp

@@ -7096,8 +7129,8 @@ class MappableExprsHandler {
      const ValueDecl *Mapper = nullptr, bool ForDeviceAddr = false,
      const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
      ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
-          OverlappedElements = {},
-      bool AreBothBasePtrAndPteeMapped = false) const {


AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

Extend support in LLDB for WebAssembly. This PR adds a new Process plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly targets. It adds support for WebAssembly's memory model with separate address spaces, and the ability to fetch the call stack from the WebAssembly runtime. I have tested this change with the WebAssembly Micro Runtime (WAMR, https://github.com/bytecodealliance/wasm-micro-runtime) which implements a GDB debug stub and supports the qWasmCallStack packet. ``` (lldb) process connect --plugin wasm connect://localhost:4567 Process 1 stopped * thread #1, name = 'nobody', stop reason = trace frame #0: 0x40000000000001ad wasm32_args.wasm`main: -> 0x40000000000001ad <+3>: global.get 0 0x40000000000001b3 <+9>: i32.const 16 0x40000000000001b5 <+11>: i32.sub 0x40000000000001b6 <+12>: local.set 0 (lldb) b add Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c (lldb) c Process 1 resuming Process 1 stopped * thread #1, name = 'nobody', stop reason = breakpoint 1.1 frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 1 int 2 add(int a, int b) 3 { -> 4 return a + b; 5 } 6 7 int (lldb) bt * thread #1, name = 'nobody', stop reason = breakpoint 1.1 * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12 frame #2: 0x40000000000001fe wasm32_args.wasm ``` This PR is based on an unmerged patch from Paolo Severini: https://reviews.llvm.org/D78801. I intentionally stuck to the foundations to keep this PR small. I have more PRs in the pipeline to support the other features/packets. My motivation for supporting Wasm is to support debugging Swift compiled to WebAssembly: https://www.swift.org/documentation/articles/wasm-getting-started.html

…erver (llvm#148774) Summary: There was a deadlock was introduced by [PR llvm#146441](llvm#146441) which changed `CurrentThreadIsPrivateStateThread()` to `CurrentThreadPosesAsPrivateStateThread()`. This change caused the execution path in [`ExecutionContextRef::SetTargetPtr()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L513) to now enter a code block that was previously skipped, triggering [`GetSelectedFrame()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L522) which leads to a deadlock. Thread 1 gets m_modules_mutex in [`ModuleList::AppendImpl`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Core/ModuleList.cpp#L218), Thread 3 gets m_language_runtimes_mutex in [`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501), but then Thread 1 waits for m_language_runtimes_mutex in [`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501) while Thread 3 waits for m_modules_mutex in [`ScanForGNUstepObjCLibraryCandidate`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Plugins/LanguageRuntime/ObjC/GNUstepObjCRuntime/GNUstepObjCRuntime.cpp#L57). This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call [`NotifyModuleAdded`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Target.cpp#L1810) should be thread-safe, since the module should be added to the `ModuleList` before the mutex is released, and the notifier doesn't modify the module list further, and the call is operates on local state and the `Target` instance. ### Deadlocked Thread backtraces: ``` * thread #3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at futex-internal.h:146:13 /*... a bunch of mutex related bt ... */ liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19 frame llvm#8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41 frame llvm#9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8 frame llvm#10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36 frame llvm#11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9 ... frame llvm#21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48 frame llvm#22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32 frame llvm#23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3 frame llvm#24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23 frame llvm#25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41 ... * thread #1, name = 'lldb', stop reason = signal SIGSTOP * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13 /*... a bunch of mutex related bt ... */ liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19 frame llvm#8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41 frame llvm#9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36 ... frame llvm#13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19 frame llvm#14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3 ... frame llvm#19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9 frame llvm#20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23 ``` ## Test Plan: Built a hello world a.out Run server in one terminal: ``` ~/llvm/build/Debug/bin/lldb-server g :1234 a.out ``` Run client in another terminal ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3" ``` Before: Client hangs indefinitely ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main" (lldb) gdb-remote 1234 ^C^C ``` After: ``` ~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3" (lldb) gdb-remote 1234 Process 837068 stopped * thread #1, name = 'a.out', stop reason = signal SIGSTOP frame #0: 0x00007ffff7fe4a60 ld-linux-x86-64.so.2`_start: -> 0x7ffff7fe4a60 <+0>: movq %rsp, %rdi 0x7ffff7fe4a63 <+3>: callq 0x7ffff7fe5780 ; _dl_start at rtld.c:522:1 ld-linux-x86-64.so.2`_dl_start_user: 0x7ffff7fe4a68 <+0>: movq %rax, %r12 0x7ffff7fe4a6b <+3>: movl 0x18067(%rip), %eax ; _dl_skip_args (lldb) b hello.cc:3 Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf (lldb) c Process 837068 resuming Process 837068 stopped * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x00005555555551bf a.out`main at hello.cc:4:13 1 #include <iostream> 2 3 int main() { -> 4 std::cout << "Hello World" << std::endl; 5 return 0; 6 } ```

…lvm#152156) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (llvm#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, llvm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, llvm#32 add x12, x12, llvm#32 stp q0, q1, [x10, #-16] add x10, x10, llvm#32 ... ```

Auto-generate check lines for scalable-loop-unpredicated-body-scalar-tail.ll, while also updating the input to be more compact and avoid unnecessary checks to keep auto-generated checks compact without loss of generality.

…e.td. (llvm#152547) Only set a target guard if it deviates from its default value[1]. When a target guard is set, it is automatically AND'd with its default value. This means there is no need to use SVETargetGuard="sve,bf16" because SVETargetGuard="bf16" is sufficient. [1] Defaults: SVETargetGuard="sve", SMETargetGuard="sme"

…m#153392) This introduced a 5% compile-time regression on AArch64, see https://llvm-compile-time-tracker.com/compare.php?from=b9138bde3562de5c28a239dbd303caf2406678c6&to=271688b87abe7cf45aceaff8266270a25eb7b436&stat=instructions:u. Reverts llvm#152505.

This commit optimizes `tok::isLiteral` by replacing a succession of `13` conditions with a range-based check. I am not sure whether this is allowed. I believe it is done nowhere else in the codebase ; however, I have seen range-based conditions being used with other enums. --------- Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>

…#153042) Implement a framework to make it easier to detect if evaluate::Expr<T> has certain structure.

…tests (llvm#153383) We missed several +/-0.0 comparison mismatches due to only doing equality checks

…partial reland of llvm#152505) (llvm#153398)

…lvm#146328) This is a series of patches (1/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * unifies errorless .s and .txt tests into a single file * remove .txt tests which don't have feature requirements * makes the .s tests have a roundabout run line to test both encoding and assembly See also llvm#146329, llvm#146330 and llvm#146331. --------- Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>

…139712) Updated version of llvm#134990 which was reverted because of the buildbot [openmp-offload-amdgpu-runtime-2](https://lab.llvm.org/buildbot/#/builders/10) failing. Its configuration has `LLVM_ENABLE_LIBCXX=ON` set although it does not even have libc++ installed. In addition to llvm#134990, this PR adds a check whether C++ libraries are actually available. `#include <chrono>` was chosen because this is the library that [openmp-offload-amdgpu-runtime-2 failed with](llvm#134990 (comment)). Original summary: The buidbot [flang-aarch64-libcxx](https://lab.llvm.org/buildbot/#/builders/89) is currently failing with an ABI issue. The suspected reason is that LLVMSupport.a is built using libc++, but the unittests are using the default C++ standard library, libstdc++ in this case. This predefined `llvm_gtest` target uses the LLVMSupport from `find_package(LLVM)`, which finds the libc++-built LLVMSupport. To fix, store the `LLVM_ENABLE_LIBCXX` setting in the LLVMConfig.cmake such that everything that links to LLVM libraries use the same standard library. In this case discussed in llvm/llvm-zorg#387 it was the flang-rt unittests, but other runtimes with GTest unittests should have the same issue (e.g. offload), and any external project that uses `find_package(LLVM)`. This patch fixed the problem for me locally.

…ests Avoid nested intrinsics in constexpr tests - use __32qs instead to work correctly with -fno-signed-char tests

…th -fno-signed-char tests Work with explicit signed char types to avoid signed/unsigned char truncation out of bounds warnings

Catches a typo in the _mm512_adds_epu8 constexpr test

Caused by llvm#153297. This is likely not Windows on Arm specific but the other Windows bots don't have this problem. json::parse tries to construct llvm::Expected<Message> by moving another instance of Message into it. This would normally use this constructor: ``` /// Create an Expected<T> success value from the given OtherT value, which /// must be convertible to T. template <typename OtherT> Expected(OtherT &&Val, std::enable_if_t<std::is_convertible_v<OtherT, T>> * = nullptr) <...> ``` Note that llvm::Expected does not have a T&& constructor. Presumably the authors thought the converting one would be used. Except that in our build, using clang-cl 19.1.7, Visual Studio 2022, MSVC STL 202208, somehow is_convertible_v is false for Message. If you add a static_assert to check that this is the case, it suddenly becomes convertible. As best I can understand, this is because evaluation of the properties of Message are delayed. Delayed so much that by the time the constructor is called, it's still false. So we can "fix" this by asserting that it is convertible some time before it is checked by the constructor. I'm not sure if that's a compiler problem or the way MSVC STL's variant is written. I couldn't reproduce this behaviour in smaller examples or on Linux systems. This is the least invasive fix and only touches the new lldb code, so I'm going with it. Including the whole error here because it's a strange one and maybe later someone who has a clue about this can fix it in a better way. ``` C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build>ninja [5/15] Building CXX object tools\lldb\source\Pro...CP\CMakeFiles\lldbProtocolMCP.dir\Server.cpp.obj FAILED: tools/lldb/source/Protocol/MCP/CMakeFiles/lldbProtocolMCP.dir/Server.cpp.obj C:\Users\tcwg\scoop\shims\ccache.exe C:\Users\tcwg\scoop\apps\llvm-arm64\current\bin\clang-cl.exe /nologo -TP -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\source\Protocol\MCP -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include -IC:\Users\tcwg\scoop\apps\python\current\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\..\clang\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\..\clang\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\source /DWIN32 /D_WINDOWS /Zc:inline /Zc:__cplusplus /Oi /Brepro /bigobj /permissive- -Werror=unguarded-availability-new /W4 -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported /Gw -Wno-vla-extension /O2 /Ob2 /DNDEBUG -std:c++17 -MD -wd4018 -wd4068 -wd4150 -wd4201 -wd4251 -wd4521 -wd4530 -wd4589 /EHs-c- /GR- /showIncludes /Fotools\lldb\source\Protocol\MCP\CMakeFiles\lldbProtocolMCP.dir\Server.cpp.obj /Fdtools\lldb\source\Protocol\MCP\CMakeFiles\lldbProtocolMCP.dir\lldbProtocolMCP.pdb -c -- C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP\Server.cpp In file included from C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP\Server.cpp:9: In file included from C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\include\lldb/Protocol/MCP/Server.h:12: In file included from C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\include\lldb/Protocol/MCP/Protocol.h:17: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/JSON.h(938,12): error: no viable conversion from returned value of type 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to function return type 'Expected<variant<Request, Response, Notification>>' 938 | return std::move(Result); | ^~~~~~~~~~~~~~~~~ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP\Server.cpp(57,30): note: in instantiation of function template specialization 'llvm::json::parse<std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>>' requested here 57 | auto message = llvm::json::parse<Message>(/*JSON=*/data); | ^ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(485,40): note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'const Expected<variant<Request, Response, Notification>> &' for 1st argument 485 | template <class T> class [[nodiscard]] Expected { | ^~~~~~~~ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(507,3): note: candidate constructor not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'Error &&' for 1st argument 507 | Expected(Error &&Err) | ^ ~~~~~~~~~~~ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(521,3): note: candidate constructor not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'ErrorSuccess' for 1st argument 521 | Expected(ErrorSuccess) = delete; | ^ ~~~~~~~~~~~~ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(539,3): note: candidate constructor not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'Expected<variant<Request, Response, Notification>> &&' for 1st argument 539 | Expected(Expected &&Other) { moveConstruct(std::move(Other)); } | ^ ~~~~~~~~~~~~~~~~ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(526,3): note: candidate template ignored: requirement 'std::is_convertible_v<std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>, std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>>' was not satisfied [with OtherT = remove_reference_t<variant<Request, Response, Notification> &>] 526 | Expected(OtherT &&Val, | ^ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(544,3): note: candidate template ignored: could not match 'Expected' against 'variant' 544 | Expected(Expected<OtherT> &&Other, | ^ C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(552,12): note: explicit constructor is not a candidate 552 | explicit Expected( | ^ 1 error generated. [8/15] Building CXX object tools\lldb\source\Plu...nProtocolServerMCP.dir\ProtocolServerMCP.cpp.obj ninja: build stopped: subcommand failed. ```

…ned (llvm#148848) DriverKit doesn't define `os_log_error`, so fails to build. Fallback to `os_log` if on DriverKit. rdar://140295247

An atomic update expression of form x = x + a + b is technically illegal, since the right-hand side is parsed as (x+a)+b, and the atomic variable x should be an argument to the top-level +. When the type of x is integer, the result of (x+a)+b is guaranteed to be the same as x+(a+b), so instead of reporting an error, the compiler can treat (x+a)+b as x+(a+b). This PR implements this kind of reassociation for integral types, and for the two arithmetic associative/commutative operators: + and *.

…cit method call (llvm#153373) This PR introduces a mechanism to defer JIT engine initialization, enabling registration of required symbols before global constructor execution. ## Problem Modules containing `gpu.module` generate global constructors (e.g., kernel load/unload) that execute *during* engine creation. This can force premature symbol resolution, causing failures when: - Symbols are registered via `mlirExecutionEngineRegisterSymbol` *after* creation - Global constructors exist (even if not directly using unresolved symbols, e.g., an external function declaration) - GPU modules introduce mandatory binary loading logic ## Usage ```c // Create engine without initialization MlirExecutionEngine jit = mlirExecutionEngineCreate(...); // Register required symbols mlirExecutionEngineRegisterSymbol(jit, ...); // Explicitly initialize (runs global constructors) mlirExecutionEngineInitialize(jit); ``` --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

…lvm#146329) This is a series of patches (2/4) to unify assembly/disassembly of recent AArch64 tests into a single file. The aim is to improve consistency, so that all instructions and system registers are thoroughly tested, and future test cases will be in a unified format. This patch: * removes .txt tests which have only one feature required * makes the .s tests have a roundabout run line to test both encoding and assembly * creates diagnostic tests when needed * fixes naming convention of tests See also llvm#146328, llvm#146330 and llvm#146331. Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>

Tag types like stucts or enums didn't have a declaration attached to them. The source locations are present in the IPI stream in `LF_UDT_MOD_SRC_LINE` records: ``` 0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63] udt = 0x1058, mod = 3, file = 1, line = 0 0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A] udt = 0x1253, mod = 35, file = 93, line = 17069 ``` The file is an ID in the string table `/names`: ``` ID | String 1 | '\<unknown>' 12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h' 93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h' ``` Here, we're not interested in `mod`. This would indicate which module contributed the UDT. I was looking at Rustc's PDB and found that it uses `<unknown>` for some types, so I added a check for that. This makes two DIA PDB shell tests to work with the native PDB plugin. --------- Co-authored-by: Michael Buch <michaelbuch12@gmail.com>

Fix Mac build breakage (reported by aeubanks in llvm#153142 (comment)) by including stdint.h and using uintptr_t

…gather/store_scatter (llvm#152429) Lowering transfer_read/transfer_write to load_gather/store_scatter in case the target uArch doesn't support load_nd/store_nd. The high level steps: 1. compute Strides; 2. compute Offsets; 3. collapseMemrefTo1D; 4. create Load gather or store_scatter op

) Directly emit shl instead of a multiply if VF * Step is a power-of-2. The main motivation here is to prepare the code and test for directly generating and expanding a SCEV expression of the minimum iteration count. SCEVExpander will directly emit shl for multiplies with powers-of-2. InstCombine will also performs this combine, so end-to-end this should effectively by NFC. PR: llvm#153495

The current implementation tries to (1) patch the existing readline module definition if it's already present in the inittab and (2) append our patched readline module to the inittab. The former (1) uses the non-stable Python API and I can't find a situation where this is necessary. We do this work before initialization, so for the readline module to exist, it either needs to be added by Python itself (which doesn't seem to be the case), or someone would have had to have added it without initializing.

)

`y` should be the first argument and `x` should be the second, otherwise the formula is wrong. This also matches the documentation [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-step).

…#150655) Instead of just outputting everything into the designated root folder, HTML and JSON output will be placed in html/ and json/ directories.

) A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an `int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be mapped, which is incorrect. This change fixes six such tests.

Fixing buildbot failures after PR llvm#153305, e.g. https://lab.llvm.org/buildbot/#/builders/203/builds/19861 Analysis already depends on `ProfileData`, so the transitive closure of the dependencies of `ScalarOpts` doesn't change. Also avoided an extra dependency (and very unnecessary) on `Instrumentation`. The API previously used doesn't need to live in Instrumentation to begin with, but that's something to address in a follow-up.

…nType (llvm#153646) This was a regression introduced in llvm#147835 Since this regression was never released, there are no release notes. Fixes llvm#153540

…m#145623) Emit safety guards for ptr accesses when cross partition loads exist which have a corresponding store to the same address in a different partition. This will emit the necessary ptr checks for these accesses. The test case was obtained from SuperTest, which SiFive runs regularly. We enabled LoopDistribution by default in our downstream compiler, this change was part of that enablement.

Remove extraneous argument from printf statement --------- Co-authored-by: Joachim <protze@rz.rwth-aachen.de>

The 'cfi_salt' attribute specifies a string literal that is used as a "salt" for Control-Flow Integrity (CFI) checks to distinguish between functions with the same type signature. This attribute can be applied to function declarations, function definitions, and function pointer typedefs. This attribute prevents function pointers from being replaced with pointers to functions that have a compatible type, which can be a CFI bypass vector. The attribute affects type compatibility during compilation and CFI hash generation during code generation. Attribute syntax: [[clang::cfi_salt("<salt_string>")]] GNU-style syntax: __attribute__((cfi_salt("<salt_string>"))) - The attribute takes a single string of non-NULL ASCII characters. - It only applies to function types; using it on a non-function type will generate an error. - All function declarations and the function definition must include the attribute and use identical salt values. Example usage: // Header file: #define __cfi_salt(S) __attribute__((cfi_salt(S))) // Convenient typedefs to avoid nested declarator syntax. typedef int (*fp_unsalted_t)(void); typedef int (*fp_salted_t)(void) __cfi_salt("pepper"); struct widget_ops { fp_unsalted_t init; // Regular CFI. fp_salted_t exec; // Salted CFI. fp_unsalted_t teardown; // Regular CFI. }; // bar.c file: static int bar_init(void) { ... } static int bar_salted_exec(void) __cfi_salt("pepper") { ... } static int bar_teardown(void) { ... } static struct widget_generator _generator = { .init = bar_init, .exec = bar_salted_exec, .teardown = bar_teardown, }; struct widget_generator *widget_gen = _generator; // 2nd .c file: int generate_a_widget(void) { int ret; // Called with non-salted CFI. ret = widget_gen.init(); if (ret) return ret; // Called with salted CFI. ret = widget_gen.exec(); if (ret) return ret; // Called with non-salted CFI. return widget_gen.teardown(); } Link: ClangBuiltLinux/linux#1736 Link: KSPP/linux#365 --------- Signed-off-by: Bill Wendling <morbo@google.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com>

…#153604) Like we did for the 'private' clause, this adds an easier to use helper function to add the 'firstprivate' clause + recipe to the Parallel and Serial ops.

This is NFC as this target does not have it.

llvm#153472) Use the Python limited API when building with SWIG 4.2 or later.

The stable function map could be huge for a large application. Fully loading it is slow and consumes a significant amount of memory, which is unnecessary and drastically slows down compilation especially for non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in lazy loading support for the stable function map. The detailed changes are: - `StableFunctionMap` - The map now stores entries in an `EntryStorage` struct, which includes offsets for serialized entries and a `std::once_flag` for thread-safe lazy loading. - The underlying map type is changed from `DenseMap` to `std::unordered_map` for compatibility with `std::once_flag`. - `contains()`, `size()` and `at()` are implemented to only load requested entries on demand. - Lazy Loading Mechanism - When reading indexed codegen data, if the newly-introduced `-indexed-codegen-data-lazy-loading` flag is set, the stable function map is not fully deserialized up front. The binary format for the stable function map now includes offsets and sizes to support lazy loading. - The safety of lazy loading is guarded by the once flag per function hash. This guarantees that even in a multi-threaded environment, the deserialization for a given function hash will happen exactly once. The first thread to request it performs the load, and subsequent threads will wait for it to complete before using the data. For single-threaded builds, the overhead is negligible (a single check on the once flag). For multi-threaded scenarios, users can omit the flag to retain the previous eager-loading behavior.

This patchs adds a symbol table to CIRGenFunction plus scopes and insertions to the table where we were missing them previously.

) As fmul and fmadd are so similar, their performance characteristics tend to be the same on most platforms, at least in terms of reciprocal throughputs. Processors capable of performing a given number of fmul per cycle can usually perform the same number of fma, with the extra add being relatively simple on top. This patch makes the scores of the two operations the same, which brings the throughput cost of a fma/fmuladd to 2, and the latency to 3, which are the defaults for fmul. Note that we might also want to change the throughput cost of a fmul to 1, as most processors have ample bandwidth for them, but they should still stay in-line with one another.

…ay-section-using-attach-maptype

)

This reverts commit ca4ebf9. Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b configurations. See here for a minimal reproducer: llvm#153393 (comment)

…on-using-attach-maptype

abhinavgaba commented Jul 16, 2025

View reviewed changes

offload/libomptarget/interface.cpp Outdated

Copy link

Owner Author

abhinavgaba Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

abhinavgaba mentioned this pull request Jul 17, 2025

[Offload] Introduce ATTACH map-type support for pointer attachment. llvm/llvm-project#149036

Open

abhinavgaba changed the title ~~[WIP] Use ATTACH maps for array-sections/subscripts on pointers.~~ Use ATTACH maps for array-sections/subscripts on pointers. Jul 22, 2025

This was referenced Jul 22, 2025

[OpenMP] Mapping of 'middle' structures chained through '->' does not work llvm/llvm-project#141042

Open

[Clang][OpenMP] Capture mapped pointers on target by reference. llvm/llvm-project#145454

Open

abhinavgaba commented Jul 23, 2025

View reviewed changes

bchetioui and others added 21 commits August 13, 2025 11:01

Fix bazel BUILD after 7d1b9ca (llvm#153388)

e861e49

[LV] Regenerate checks for test (NFC).

10a6fd7

Auto-generate check lines for scalable-loop-unpredicated-body-scalar-tail.ll, while also updating the input to be more compact and avoid unnecessary checks to keep auto-generated checks compact without loss of generality.

[mlir][DialectUtils] Fix div by zero crash (llvm#153380)

2fcdaba

[flang][Evaluate] Pattern matching framework for evaluate::Expr (llvm…

dc1c9d3

…#153042) Implement a framework to make it easier to detect if evaluate::Expr<T> has certain structure.

[X86] Check the exact fp bits and not just fp equality for constexpr …

11ef62f

…tests (llvm#153383) We missed several +/-0.0 comparison mismatches due to only doing equality checks

[TableGen] Avoid duplicate variable names in RuntimeLibcallsEmitter (…

9c361cc

…partial reland of llvm#152505) (llvm#153398)

[X86] avx512bw-builtins.c - avoid _mm256_setr_epi8 inside constexpr t…

11186af

…ests Avoid nested intrinsics in constexpr tests - use __32qs instead to work correctly with -fno-signed-char tests

[X86] builtin_test_helpers.h - ensure the match_v*qi matchers work wi…

5c95b83

…th -fno-signed-char tests Work with explicit signed char types to avoid signed/unsigned char truncation out of bounds warnings

[X86] avx512bw-builtins.c - add C/C++ test coverage

9ce3aff

Catches a typo in the _mm512_adds_epu8 constexpr test

[Dexter] Add DAP stepNext and stepOut support (llvm#152717)

8c7e1ab

[sanitizer_common] Use os_log for DriverKit as os_log_error is undefi…

c198c14

…ned (llvm#148848) DriverKit doesn't define `os_log_error`, so fails to build. Fallback to `os_log` if on DriverKit. rdar://140295247

thurstond and others added 30 commits August 14, 2025 11:20

[asan] Fix-forward undefined type in test from llvm#153142 (llvm#153636)

37cc010

Fix Mac build breakage (reported by aeubanks in llvm#153142 (comment)) by including stdint.h and using uintptr_t

[flang][cuda] Add interfaces for __expf and __exp10f (llvm#153633)

20a8299

[NFC] Use [[maybe_unused]] for variable used in assertion (llvm#153639

016c301

)

Fix typo in step intrinsic comment (llvm#153642)

cbfc22c

`y` should be the first argument and `x` should be the second, otherwise the formula is wrong. This also matches the documentation [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-step).

[clang-doc] place HTML/JSON output inside their own directories (llvm…

4f00704

…#150655) Instead of just outputting everything into the designated root folder, HTML and JSON output will be placed in html/ and json/ directories.

[NFC][Offload] Add missing maps to OpenMP offloading tests. (llvm#153103

2912c9c

) A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an `int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be mapped, which is incorrect. This change fixes six such tests.

[clang] fix source range computation for DeducedTemplateSpecializatio…

eeada0d

…nType (llvm#153646) This was a regression introduced in llvm#147835 Since this regression was never released, there are no release notes. Fixes llvm#153540

[AMDGPU] Increase LDS to 320K on gfx1250 (llvm#153645)

49f2093

[OpenMP] Update printf stmt in kmp_settings.cpp (llvm#152800)

5479b7e

Remove extraneous argument from printf statement --------- Co-authored-by: Joachim <protze@rz.rwth-aachen.de>

[OpenACC] Add firstprivate recipe helper methods to ACC dialect (llvm…

e5e3e4b

…#153604) Like we did for the 'private' clause, this adds an easier to use helper function to add the 'firstprivate' clause + recipe to the Parallel and Serial ops.

[AMDGPU] Encode NV bit in VIMAGE/VSAMPLE. NFC (llvm#153654)

6b316ec

This is NFC as this target does not have it.

[LV] Regenerate some more tests.

8a0c7e9

[lldb] Use the Python limited API with SWIG 4.2 or later (llvm#153119) (

52c9489

llvm#153472) Use the Python limited API when building with SWIG 4.2 or later.

[flang][cuda] Add bind names for __double2ll_rX interfaces (llvm#153660)

bad3df4

Remove debug prints. Update some tests.

63531f8

[Clang][attr] Add '-std=c11' to allow for typedef redefinition

1e9fc8e

[CIR][NFC] Add Symbol Table to CIRGenFunction (llvm#153625)

e56ae96

This patchs adds a symbol table to CIRGenFunction plus scopes and insertions to the table where we were missing them previously.

Merge branch 'libomptarget-introduce-attach-support' into map-ptr-arr…

6c3def5

…ay-section-using-attach-maptype

[flang][cuda] Add bind names for __double2ull_rX interfaces (llvm#153678

0659044

)

Revert "[SLP]Support LShr as base for copyable elements"

db5f7dc

This reverts commit ca4ebf9. Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b configurations. See here for a minimal reproducer: llvm#153393 (comment)

Merge remote-tracking branch 'upstream/main' into map-ptr-array-secti…

fb6a48c

…on-using-attach-maptype

Update copy-gaps tests.

502dbb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ATTACH maps for array-sections/subscripts on pointers. #1

Use ATTACH maps for array-sections/subscripts on pointers. #1

abhinavgaba commented Jul 16, 2025 •

edited

Loading

Uh oh!

abhinavgaba Jul 16, 2025

Uh oh!

abhinavgaba Jul 23, 2025

Uh oh!

Uh oh!

Use ATTACH maps for array-sections/subscripts on pointers. #1

Are you sure you want to change the base?

Use ATTACH maps for array-sections/subscripts on pointers. #1

Conversation

abhinavgaba commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhinavgaba Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

abhinavgaba Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abhinavgaba commented Jul 16, 2025 •

edited

Loading