Skip to content

Conversation

ndrewh
Copy link
Contributor

@ndrewh ndrewh commented Jul 24, 2025

Update pthread_t size used by the linux arm64 private loader. This fixes clients that may try to use pthread APIs that get the current tid.

Issue: #1569

@ndrewh
Copy link
Contributor Author

ndrewh commented Jul 24, 2025

Offsets taken from glibc in the docker image ubuntu:22.04

pthread_self returns (pthread_t *)tpidr_el0 - 1
source

004841a0    int64_t pthread_self()
004841a8        return _ReadMSR(SystemReg: tpidr_el0) - 0x7c0

__pthread_rwlock_rdlock gets the current tid
source

00482ec0    uint64_t __pthread_rwlock_rdlock(int32_t* arg1)
00482eec        int32_t x19_2
00482eec        
00482eec        if (arg1[6] == *(_ReadMSR(SystemReg: tpidr_el0) - 0x6f0))
0048303c            x19_2 = 0x23
00482eec        else
00482ef8            int32_t x5

@ndrewh ndrewh changed the title i#1569: Fix pthread_t size for arm64 private loader i#1569: Fix pthread_t size for linux a64 private loader Jul 24, 2025
@ndrewh
Copy link
Contributor Author

ndrewh commented Jul 24, 2025

Presumably this will only work on the exact libc version I tried. Should we map these offsets out as droptions?

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. I have some suggestions.

#ifdef X86
# define TLS_PRE_TCB_SIZE 0
#elif defined(AARCHXX)
/* FIXME i#1569: This may be wrong for AArch64! */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below was originally added for AArch32 Android, which is why this comment was here I assume. If you are only looking at AArch64 it seems best to not change the 32-bit values. To share code, it is probably best to make #defines for the byte array sizes, and then have a separate set of all 4 defines changed here only set for AArch64.

/* FIXME i#1569: This may be wrong for AArch64! */
/* Data structure to match libc pthread.
* GDB reads some slot in TLS, which is pid/tid of pthread, so we must make sure
* the size and member locations match to avoid gdb crash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here, and where these are used in privload_tls_init(), is talking about this struct layout affecting gdb use: but I don't think anyone has hit gdb issues on AArch64 so presumably that only occurs on AArch32 (and maybe RISCV since they added the same things). You hit this on private libs calling pthread routines, right? Edit this comment to include that reason to get this to match pthread?

# define TLS_PRE_TCB_SIZE sizeof(dr_pthread_t)
# define LIBC_PTHREAD_SIZE 0x4c0
# define LIBC_PTHREAD_TID_OFFSET 0x68
# define LIBC_PTHREAD_SIZE 0x7c0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the AArch64 version of these defines (see other comment on splitting), label these as Ubuntu22 libc 2.38 (right?) so it's clear what version they came from.

# define TLS_PRE_TCB_SIZE sizeof(dr_pthread_t)
# define LIBC_PTHREAD_SIZE 0x4c0
# define LIBC_PTHREAD_TID_OFFSET 0x68
# define LIBC_PTHREAD_SIZE 0x7c0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having DR get these programmatically would be best. Just like you got them from something like
objdump --disassemble=pthread_self /usr/lib/aarch64-linux-gnu/libc.so.6 for 0x7c0. Is privload_tls_init() run before we've loaded the private libc? If so we might have to delay the setup of things that depend on this. But if these are all obtainable from very simple exported functions like pthread_self then programmatically determining them should be straightforward, right?

Add a XXX comment on this (unless you want to try to implement it now)? Probably should file an issue and put the issue number here.

@ndrewh
Copy link
Contributor Author

ndrewh commented Jul 25, 2025

Ah yes I completely forgot about the existence of 32-bit arm... I will fix that.

@derekbruening Do you have any thoughts on adding a script to the build process to extract these offsets w/ objdump and insert them as defines? i.e. if DR_HOST_NOT_TARGET is not set, we set some variables HOST_LIBC_PTHREAD_SIZE and HOST_LIBC_PTHREAD_TID_OFFSET based on the system libc and use that instead (falling back to some hard-coded constants like we have now if that detection fails). Is there any precedent for something like this in dynamorio that i could copy from?

I will probably not try to implement that full solution in this PR (but it is tempting -- I think I have lost a lot of time to this every time I upgrade libc since I forget this code exists and it breaks my tool in unusual ways).

@ndrewh
Copy link
Contributor Author

ndrewh commented Jul 25, 2025

Amusingly, the way this bug manifested for me this time is that a client (which used cpython => hashlib => openssl => pthreads) was hitting a call to __pthread_rwlock_rdlock that was failing with EDEADLK... but only if I compiled DynamoRIO in release mode (-DDEBUG=OFF).

[EDEADLK]
The current thread already owns the read-write lock for writing.

I eventually came to realize that the thread ID was not in the expected location in TLS, and that the heap-filling that comes with DEBUG_MEMORY was unexpectedly making the call succeed in debug builds (since 0xAB != 0x0, the 'current owner' of the pthread_rwlock). I will not share how long I spent debugging that :)

@derekbruening
Copy link
Contributor

Ah yes I completely forgot about the existence of 32-bit arm... I will fix that.

@derekbruening Do you have any thoughts on adding a script to the build process to extract these offsets w/ objdump and insert them as defines? i.e. if DR_HOST_NOT_TARGET is not set, we set some variables HOST_LIBC_PTHREAD_SIZE and HOST_LIBC_PTHREAD_TID_OFFSET based on the system libc and use that instead (falling back to some hard-coded constants like we have now if that detection fails). Is there any precedent for something like this in dynamorio that i could copy from?

I will probably not try to implement that full solution in this PR (but it is tempting -- I think I have lost a lot of time to this every time I upgrade libc since I forget this code exists and it breaks my tool in unusual ways).

(Note that comments inside the code review scheme are nicer with actual threading, as opposed to these top-level comments where you have to quote and it's hard to keep track of what is in response to what and what has been resolved: so we generally avoid top-level comments in PRs)

Please see my comment #7569 (comment) where I propose dynamically figuring out the offsets so that one build will work anywhere. Hence the question about ordering.

@derekbruening
Copy link
Contributor

Amusingly, the way this bug manifested for me this time is that a client (which used cpython => hashlib => openssl => pthreads) was hitting a call to __pthread_rwlock_rdlock that was failing with EDEADLK... but only if I compiled DynamoRIO in release mode (-DDEBUG=OFF).

[EDEADLK]
The current thread already owns the read-write lock for writing.

I eventually came to realize that the thread ID was not in the expected location in TLS, and that the heap-filling that comes with DEBUG_MEMORY was unexpectedly making the call succeed in debug builds (since 0xAB != 0x0, the 'current owner' of the pthread_rwlock). I will not share how long I spent debugging that :)

Long-term better solution ideas are discussed with a proposal at #7312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants