Skip to content

Use ATTACH maps for array-sections/subscripts on pointers. #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4,740 commits into
base: tgt-capture-mapped-ptrs-by-ref
Choose a base branch
from

Conversation

abhinavgaba
Copy link
Owner

@abhinavgaba abhinavgaba commented Jul 16, 2025

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.


With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

  • When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.
 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }
  • Attach-style maps for declare mappers. That should be a separate PR.
  • use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
  • use_device_ptr handling has some issues too. Need debugging.
  • Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

@@ -7096,8 +7129,8 @@ class MappableExprsHandler {
const ValueDecl *Mapper = nullptr, bool ForDeviceAddr = false,
const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
OverlappedElements = {},
bool AreBothBasePtrAndPteeMapped = false) const {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

abhinavgaba pushed a commit that referenced this pull request Aug 11, 2025
Extend support in LLDB for WebAssembly. This PR adds a new Process
plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly
targets. It adds support for WebAssembly's memory model with separate
address spaces, and the ability to fetch the call stack from the
WebAssembly runtime.

I have tested this change with the WebAssembly Micro Runtime (WAMR,
https://github.com/bytecodealliance/wasm-micro-runtime) which implements
a GDB debug stub and supports the qWasmCallStack packet.

```
(lldb) process connect --plugin wasm connect://localhost:4567
Process 1 stopped
* thread #1, name = 'nobody', stop reason = trace
    frame #0: 0x40000000000001ad
wasm32_args.wasm`main:
->  0x40000000000001ad <+3>:  global.get 0
    0x40000000000001b3 <+9>:  i32.const 16
    0x40000000000001b5 <+11>: i32.sub
    0x40000000000001b6 <+12>: local.set 0
(lldb) b add
Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c
(lldb) c
Process 1 resuming
Process 1 stopped
* thread #1, name = 'nobody', stop reason = breakpoint 1.1
    frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
   1    int
   2    add(int a, int b)
   3    {
-> 4        return a + b;
   5    }
   6
   7    int
(lldb) bt
* thread #1, name = 'nobody', stop reason = breakpoint 1.1
  * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12
    frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12
    frame #2: 0x40000000000001fe wasm32_args.wasm
```

This PR is based on an unmerged patch from Paolo Severini:
https://reviews.llvm.org/D78801. I intentionally stuck to the
foundations to keep this PR small. I have more PRs in the pipeline to
support the other features/packets.

My motivation for supporting Wasm is to support debugging Swift compiled
to WebAssembly:
https://www.swift.org/documentation/articles/wasm-getting-started.html
abhinavgaba pushed a commit that referenced this pull request Aug 11, 2025
…erver (llvm#148774)

Summary:
There was a deadlock was introduced by [PR
llvm#146441](llvm#146441) which changed
`CurrentThreadIsPrivateStateThread()` to
`CurrentThreadPosesAsPrivateStateThread()`. This change caused the
execution path in
[`ExecutionContextRef::SetTargetPtr()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L513)
to now enter a code block that was previously skipped, triggering
[`GetSelectedFrame()`](https://github.com/llvm/llvm-project/blob/10b5558b61baab59c7d3dff37ffdf0861c0cc67a/lldb/source/Target/ExecutionContext.cpp#L522)
which leads to a deadlock.

Thread 1 gets m_modules_mutex in
[`ModuleList::AppendImpl`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Core/ModuleList.cpp#L218),
Thread 3 gets m_language_runtimes_mutex in
[`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501),
but then Thread 1 waits for m_language_runtimes_mutex in
[`GetLanguageRuntime`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Process.cpp#L1501)
while Thread 3 waits for m_modules_mutex in
[`ScanForGNUstepObjCLibraryCandidate`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Plugins/LanguageRuntime/ObjC/GNUstepObjCRuntime/GNUstepObjCRuntime.cpp#L57).

This fixes the deadlock by adding a scoped block around the mutex lock
before the call to the notifier, and moved the notifier call outside of
the mutex-guarded section. The notifier call
[`NotifyModuleAdded`](https://github.com/llvm/llvm-project/blob/96148f92146e5211685246722664e51ec730e7ba/lldb/source/Target/Target.cpp#L1810)
should be thread-safe, since the module should be added to the
`ModuleList` before the mutex is released, and the notifier doesn't
modify the module list further, and the call is operates on local state
and the `Target` instance.

### Deadlocked Thread backtraces:
```
* thread #3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP
  * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at    futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */    
   liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19
    frame llvm#8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41
    frame llvm#9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8
    frame llvm#10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36
    frame llvm#11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9
    ...
    frame llvm#21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48
    frame llvm#22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32
    frame llvm#23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3
    frame llvm#24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23
    frame llvm#25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41
    ...
* thread #1, name = 'lldb', stop reason = signal SIGSTOP
  * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */    
   liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19
    frame llvm#8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41
    frame llvm#9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36
    ...
    frame llvm#13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19
    frame llvm#14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3
    ...
    frame llvm#19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9
    frame llvm#20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23
```

## Test Plan:
Built a hello world a.out
Run server in one terminal:
```
~/llvm/build/Debug/bin/lldb-server g :1234 a.out
```
Run client in another terminal
```
~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"
```

Before:
Client hangs indefinitely
```
~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main"
(lldb) gdb-remote 1234

^C^C
```

After:
```
~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"
(lldb) gdb-remote 1234
Process 837068 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007ffff7fe4a60
ld-linux-x86-64.so.2`_start:
->  0x7ffff7fe4a60 <+0>: movq   %rsp, %rdi
    0x7ffff7fe4a63 <+3>: callq  0x7ffff7fe5780 ; _dl_start at rtld.c:522:1

ld-linux-x86-64.so.2`_dl_start_user:
    0x7ffff7fe4a68 <+0>: movq   %rax, %r12
    0x7ffff7fe4a6b <+3>: movl   0x18067(%rip), %eax ; _dl_skip_args
(lldb) b hello.cc:3
Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf
(lldb) c
Process 837068 resuming
Process 837068 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x00005555555551bf a.out`main at hello.cc:4:13
   1   	#include <iostream>
   2
   3   	int main() {
-> 4   	  std::cout << "Hello World" << std::endl;
   5   	  return 0;
   6   	}
```
abhinavgaba pushed a commit that referenced this pull request Aug 11, 2025
…lvm#152156)

With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(llvm#132246), which reaps the same code generation benefits of preferring
fixed over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, llvm#8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, llvm#32
	add	    x12, x12, llvm#32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, llvm#32

...
```
bchetioui and others added 21 commits August 13, 2025 11:01
Auto-generate check lines for scalable-loop-unpredicated-body-scalar-tail.ll,
while also updating the input to be more compact and avoid unnecessary
checks to keep auto-generated checks compact without loss of
generality.
…e.td. (llvm#152547)

Only set a target guard if it deviates from its default value[1].

When a target guard is set, it is automatically AND'd with its default
value. This means there is no need to use SVETargetGuard="sve,bf16"
because SVETargetGuard="bf16" is sufficient.

[1] Defaults: SVETargetGuard="sve", SMETargetGuard="sme"
This commit optimizes `tok::isLiteral` by replacing a succession of `13`
conditions with a range-based check.

I am not sure whether this is allowed. I believe it is done nowhere else
in the codebase ; however, I have seen range-based conditions being used
with other enums.

---------

Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>
…#153042)

Implement a framework to make it easier to detect if evaluate::Expr<T>
has certain structure.
…tests (llvm#153383)

We missed several +/-0.0 comparison mismatches due to only doing equality checks
…lvm#146328)

This is a series of patches (1/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * unifies errorless .s and .txt tests into a single file
 * remove .txt tests which don't have feature requirements
* makes the .s tests have a roundabout run line to test both encoding
and assembly
 
See also llvm#146329, llvm#146330 and llvm#146331.

---------

Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
…139712)

Updated version of llvm#134990 which was reverted because of the buildbot
[openmp-offload-amdgpu-runtime-2](https://lab.llvm.org/buildbot/#/builders/10)
failing. Its configuration has `LLVM_ENABLE_LIBCXX=ON` set although it
does not even have libc++ installed. In addition to llvm#134990, this PR
adds a check whether C++ libraries are actually available. `#include
<chrono>` was chosen because this is the library that
[openmp-offload-amdgpu-runtime-2 failed
with](llvm#134990 (comment)).

Original summary:

The buidbot
[flang-aarch64-libcxx](https://lab.llvm.org/buildbot/#/builders/89) is
currently failing with an ABI issue. The suspected reason is that
LLVMSupport.a is built using libc++, but the unittests are using the
default C++ standard library, libstdc++ in this case. This predefined
`llvm_gtest` target uses the LLVMSupport from `find_package(LLVM)`,
which finds the libc++-built LLVMSupport.

To fix, store the `LLVM_ENABLE_LIBCXX` setting in the LLVMConfig.cmake
such that everything that links to LLVM libraries use the same standard
library. In this case discussed in
llvm/llvm-zorg#387 it was the flang-rt
unittests, but other runtimes with GTest unittests should have the same
issue (e.g. offload), and any external project that uses
`find_package(LLVM)`. This patch fixed the problem for me locally.
…ests

Avoid nested intrinsics in constexpr tests - use __32qs instead to work correctly with -fno-signed-char tests
…th -fno-signed-char tests

Work with explicit signed char types to avoid signed/unsigned char truncation out of bounds warnings
Catches a typo in the _mm512_adds_epu8 constexpr test
Caused by llvm#153297.

This is likely not Windows on Arm specific but the other Windows bots
don't have this problem.

json::parse tries to construct llvm::Expected<Message> by moving another
instance of Message into it. This would normally use this constructor:
```
  /// Create an Expected<T> success value from the given OtherT value, which
  /// must be convertible to T.
  template <typename OtherT>
  Expected(OtherT &&Val,
           std::enable_if_t<std::is_convertible_v<OtherT, T>> * = nullptr)
<...>
```
Note that llvm::Expected does not have a T&& constructor. Presumably
the authors thought the converting one would be used.

Except that in our build, using clang-cl 19.1.7, Visual Studio 2022,
MSVC STL 202208, somehow is_convertible_v is false for Message.

If you add a static_assert to check that this is the case, it suddenly
becomes convertible. As best I can understand, this is because evaluation
of the properties of Message are delayed. Delayed so much that by the time
the constructor is called, it's still false.

So we can "fix" this by asserting that it is convertible some time before
it is checked by the constructor.

I'm not sure if that's a compiler problem or the way MSVC STL's variant is
written. I couldn't reproduce this behaviour in smaller examples or on Linux
systems. This is the least invasive fix and only touches the new lldb code,
so I'm going with it.

Including the whole error here because it's a strange one and maybe later
someone who has a clue about this can fix it in a better way.

```
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build>ninja
[5/15] Building CXX object tools\lldb\source\Pro...CP\CMakeFiles\lldbProtocolMCP.dir\Server.cpp.obj
FAILED: tools/lldb/source/Protocol/MCP/CMakeFiles/lldbProtocolMCP.dir/Server.cpp.obj
C:\Users\tcwg\scoop\shims\ccache.exe C:\Users\tcwg\scoop\apps\llvm-arm64\current\bin\clang-cl.exe  /nologo -TP -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\source\Protocol\MCP -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include -IC:\Users\tcwg\scoop\apps\python\current\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\..\clang\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\..\clang\include -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source -IC:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\source /DWIN32 /D_WINDOWS   /Zc:inline /Zc:__cplusplus /Oi /Brepro /bigobj /permissive- -Werror=unguarded-availability-new /W4  -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported /Gw -Wno-vla-extension /O2 /Ob2 /DNDEBUG -std:c++17 -MD   -wd4018 -wd4068 -wd4150 -wd4201 -wd4251 -wd4521 -wd4530 -wd4589  /EHs-c- /GR- /showIncludes /Fotools\lldb\source\Protocol\MCP\CMakeFiles\lldbProtocolMCP.dir\Server.cpp.obj /Fdtools\lldb\source\Protocol\MCP\CMakeFiles\lldbProtocolMCP.dir\lldbProtocolMCP.pdb -c -- C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP\Server.cpp
In file included from C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP\Server.cpp:9:
In file included from C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\include\lldb/Protocol/MCP/Server.h:12:
In file included from C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\include\lldb/Protocol/MCP/Protocol.h:17:
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/JSON.h(938,12): error: no viable conversion from returned value of type 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to function return type 'Expected<variant<Request, Response, Notification>>'
  938 |     return std::move(Result);
      |            ^~~~~~~~~~~~~~~~~
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\source\Protocol\MCP\Server.cpp(57,30): note: in instantiation of function template specialization 'llvm::json::parse<std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>>' requested here
   57 |   auto message = llvm::json::parse<Message>(/*JSON=*/data);
      |                              ^
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(485,40): note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'const Expected<variant<Request, Response, Notification>> &' for 1st argument
  485 | template <class T> class [[nodiscard]] Expected {
      |                                        ^~~~~~~~
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(507,3): note: candidate constructor not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'Error &&' for 1st argument
  507 |   Expected(Error &&Err)
      |   ^        ~~~~~~~~~~~
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(521,3): note: candidate constructor not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'ErrorSuccess' for 1st argument
  521 |   Expected(ErrorSuccess) = delete;
      |   ^        ~~~~~~~~~~~~
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(539,3): note: candidate constructor not viable: no known conversion from 'remove_reference_t<variant<Request, Response, Notification> &>' (aka 'std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>') to 'Expected<variant<Request, Response, Notification>> &&' for 1st argument
  539 |   Expected(Expected &&Other) { moveConstruct(std::move(Other)); }
      |   ^        ~~~~~~~~~~~~~~~~
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(526,3): note: candidate template ignored: requirement 'std::is_convertible_v<std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>, std::variant<lldb_protocol::mcp::Request, lldb_protocol::mcp::Response, lldb_protocol::mcp::Notification>>' was not satisfied [with OtherT = remove_reference_t<variant<Request, Response, Notification> &>]
  526 |   Expected(OtherT &&Val,
      |   ^
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(544,3): note: candidate template ignored: could not match 'Expected' against 'variant'
  544 |   Expected(Expected<OtherT> &&Other,
      |   ^
C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\include\llvm/Support/Error.h(552,12): note: explicit constructor is not a candidate
  552 |   explicit Expected(
      |            ^
1 error generated.
[8/15] Building CXX object tools\lldb\source\Plu...nProtocolServerMCP.dir\ProtocolServerMCP.cpp.obj
ninja: build stopped: subcommand failed.
```
…ned (llvm#148848)

DriverKit doesn't define `os_log_error`, so fails to build. Fallback to `os_log` if on DriverKit.

rdar://140295247
An atomic update expression of form
  x = x + a + b
is technically illegal, since the right-hand side is parsed as (x+a)+b,
and the atomic variable x should be an argument to the top-level +. When
the type of x is integer, the result of (x+a)+b is guaranteed to be the
same as x+(a+b), so instead of reporting an error, the compiler can
treat (x+a)+b as x+(a+b).

This PR implements this kind of reassociation for integral types, and
for the two arithmetic associative/commutative operators: + and *.
…cit method call (llvm#153373)

This PR introduces a mechanism to defer JIT engine initialization,
enabling registration of required symbols before global constructor
execution.

## Problem

Modules containing `gpu.module` generate global constructors (e.g.,
kernel load/unload) that execute *during* engine creation. This can
force premature symbol resolution, causing failures when:
- Symbols are registered via `mlirExecutionEngineRegisterSymbol` *after*
creation
- Global constructors exist (even if not directly using unresolved
symbols, e.g., an external function declaration)
   - GPU modules introduce mandatory binary loading logic

## Usage
```c
// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(...);

// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);

// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);
```

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
…lvm#146329)

This is a series of patches (2/4) to unify assembly/disassembly of
recent AArch64 tests into a single file. The aim is to improve
consistency, so that all instructions and system registers are
thoroughly tested, and future test cases will be in a unified format.

This patch:
 * removes .txt tests which have only one feature required
* makes the .s tests have a roundabout run line to test both encoding
and assembly
 * creates diagnostic tests when needed
 * fixes naming convention of tests
 
See also llvm#146328, llvm#146330 and llvm#146331.

Co-authored-by: Virginia Cangelosi <virginia.cangelosi@arm.com>
Tag types like stucts or enums didn't have a declaration attached to
them. The source locations are present in the IPI stream in
`LF_UDT_MOD_SRC_LINE` records:

```
   0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63]
            udt = 0x1058, mod = 3, file = 1, line = 0
   0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A]
            udt = 0x1253, mod = 35, file = 93, line = 17069
```

The file is an ID in the string table `/names`:

```
     ID | String
      1 | '\<unknown>'
     12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h'
     93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h'
```

Here, we're not interested in `mod`. This would indicate which module
contributed the UDT.

I was looking at Rustc's PDB and found that it uses `<unknown>` for some
types, so I added a check for that.

This makes two DIA PDB shell tests to work with the native PDB plugin.

---------

Co-authored-by: Michael Buch <michaelbuch12@gmail.com>
thurstond and others added 30 commits August 14, 2025 11:20
Fix Mac build breakage (reported by aeubanks in
llvm#153142 (comment))
by including stdint.h and using uintptr_t
…gather/store_scatter (llvm#152429)

Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
  1. compute Strides;
  2. compute Offsets;
  3. collapseMemrefTo1D;
  4. create Load gather or store_scatter op
)

Directly emit shl instead of a multiply if VF * Step is a power-of-2. The
main motivation here is to prepare the code and test for directly
generating and expanding a SCEV expression of the minimum iteration
count. SCEVExpander will directly emit shl for multiplies with
powers-of-2.

InstCombine will also performs this combine, so end-to-end this should
effectively by NFC.

PR: llvm#153495
The current implementation tries to (1) patch the existing readline
module definition if it's already present in the inittab and (2) append
our patched readline module to the inittab. The former (1) uses the
non-stable Python API and I can't find a situation where this is
necessary. 

We do this work before initialization, so for the readline
module to exist, it either needs to be added by Python itself (which
doesn't seem to be the case), or someone would have had to have added it
without initializing.
`y` should be the first argument and `x` should be the second, otherwise
the formula is wrong. This also matches the documentation
[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-step).
…#150655)

Instead of just outputting everything into the designated root folder,
HTML and JSON output will be placed in html/ and json/ directories.
)

A few tests were only mapping a pointee, like: `map(pp[0][0])`, on an
`int** pp`, but expecting the pointers, like `pp`, `pp[0]` to also be
mapped, which is incorrect.

This change fixes six such tests.
Fixing buildbot failures after PR llvm#153305, e.g.
https://lab.llvm.org/buildbot/#/builders/203/builds/19861

Analysis already depends on `ProfileData`, so the transitive closure of
the dependencies of `ScalarOpts` doesn't change.

Also avoided an extra dependency (and very unnecessary) on
`Instrumentation`. The API previously used doesn't need to live in
Instrumentation to begin with, but that's something to address in a
follow-up.
…nType (llvm#153646)

This was a regression introduced in
llvm#147835

Since this regression was never released, there are no release notes.

Fixes llvm#153540
…m#145623)

Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.

The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
Remove extraneous argument from printf statement

---------

Co-authored-by: Joachim <protze@rz.rwth-aachen.de>
The 'cfi_salt' attribute specifies a string literal that is used as a
"salt" for Control-Flow Integrity (CFI) checks to distinguish between
functions with the same type signature. This attribute can be applied
to function declarations, function definitions, and function pointer
typedefs.

This attribute prevents function pointers from being replaced with
pointers to functions that have a compatible type, which can be a CFI
bypass vector.

The attribute affects type compatibility during compilation and CFI
hash generation during code generation.

  Attribute syntax: [[clang::cfi_salt("<salt_string>")]]
  GNU-style syntax: __attribute__((cfi_salt("<salt_string>")))

- The attribute takes a single string of non-NULL ASCII characters.
- It only applies to function types; using it on a non-function type
  will generate an error.
- All function declarations and the function definition must include
  the attribute and use identical salt values.

Example usage:

  // Header file:
  #define __cfi_salt(S) __attribute__((cfi_salt(S)))

  // Convenient typedefs to avoid nested declarator syntax.
  typedef int (*fp_unsalted_t)(void);
  typedef int (*fp_salted_t)(void) __cfi_salt("pepper");

  struct widget_ops {
    fp_unsalted_t init;     // Regular CFI.
    fp_salted_t exec;       // Salted CFI.
    fp_unsalted_t teardown; // Regular CFI.
  };

  // bar.c file:
  static int bar_init(void) { ... }
  static int bar_salted_exec(void) __cfi_salt("pepper") { ... }
  static int bar_teardown(void) { ... }

  static struct widget_generator _generator = {
    .init = bar_init,
    .exec = bar_salted_exec,
    .teardown = bar_teardown,
  };

  struct widget_generator *widget_gen = _generator;

  // 2nd .c file:
  int generate_a_widget(void) {
    int ret;

    // Called with non-salted CFI.
    ret = widget_gen.init();
    if (ret)
      return ret;

    // Called with salted CFI.
    ret = widget_gen.exec();
    if (ret)
      return ret;

    // Called with non-salted CFI.
    return widget_gen.teardown();
  }

Link: ClangBuiltLinux/linux#1736
Link: KSPP/linux#365

---------

Signed-off-by: Bill Wendling <morbo@google.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
…#153604)

Like we did for the 'private' clause, this adds an easier to use helper
function to add the 'firstprivate' clause + recipe to the Parallel and
Serial ops.
This is NFC as this target does not have it.
llvm#153472)

Use the Python limited API when building with SWIG 4.2 or later.
The stable function map could be huge for a large application. Fully
loading it is slow and consumes a significant amount of memory, which is
unnecessary and drastically slows down compilation especially for
non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in
lazy loading support for the stable function map. The detailed changes
are:

- `StableFunctionMap`
- The map now stores entries in an `EntryStorage` struct, which includes
offsets for serialized entries and a `std::once_flag` for thread-safe
lazy loading.
- The underlying map type is changed from `DenseMap` to
`std::unordered_map` for compatibility with `std::once_flag`.
- `contains()`, `size()` and `at()` are implemented to only load
requested entries on demand.

- Lazy Loading Mechanism
- When reading indexed codegen data, if the newly-introduced
`-indexed-codegen-data-lazy-loading` flag is set, the stable function
map is not fully deserialized up front. The binary format for the stable
function map now includes offsets and sizes to support lazy loading.
- The safety of lazy loading is guarded by the once flag per function
hash. This guarantees that even in a multi-threaded environment, the
deserialization for a given function hash will happen exactly once. The
first thread to request it performs the load, and subsequent threads
will wait for it to complete before using the data. For single-threaded
builds, the overhead is negligible (a single check on the once flag).
For multi-threaded scenarios, users can omit the flag to retain the
previous eager-loading behavior.
This patchs adds a symbol table to CIRGenFunction plus scopes and
insertions to the table where we were missing them previously.
)

As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.
This reverts commit ca4ebf9.

Causes compile-time crashes for some inputs with RVV zvl512b/zvl1024b
configurations. See here for a minimal reproducer:
llvm#153393 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.