amdgpu fixes for rpi5 #6947

pepijndevos · 2025-07-08T11:45:25Z

This cleans up the commit history of the amazing work by @Coreforge to make AMD GPUs work with the Raspberry Pi 5, following instructions by @6by9: geerlingguy/raspberry-pi-pcie-devices#222 (comment) and an explanation of the different parts of the patch: geerlingguy/raspberry-pi-pcie-devices#222 (comment)

So I've made

one commit with all the memset changes that to my understanding could potentially be upstreamed to mainline linux since they are just more correct.
one commit with a miscellaneous ttm_uncached change that may need an ifdef for arm only
one commit with all the volatile changes which are not that invasive but that coreforge suggested might need to be ifdefed as well for mainline acceptance
one commit to allow forcing write-combine memory

What is not included at the moment is the whole alignment machinery which to my understanding is more hacky and could be harder to get merged or might require significant changes. I'm not sure how essential that change is, but if desired I could include it as a separate commit as well. Or maybe the Ampere version of that trap could be used. fwiw, it seems llama.cpp works equally well without that patch applied from limited testing.

Just to be clear, I don't claim any authorship or even understanding of these changes, and am just trying to grease the wheels of getting these changes upstreamed as far as they will go, making it easier to use GPUs on Raspberry Pi, which I have a big interest in: https://sanctuary-systems.com/sentinel-core/

Coreforge · 2025-07-08T14:21:33Z

The alignment trap isn't needed if all userspace programs respect the alignment requirements. I guess llama.cpp might do that, so it works without it. Xorg I found did need it, even the arm64 build (or a userspace workaround like the memcpy library). The Ampere version should work as well, although I haven't tried it. The Ampere version also covers kernelspace, while mine only covers userspace, so more cards might work with it without extra changes, but at the cost of some performance (whether that would be noticeable or not, I don't know).

popcornmix · 2025-07-11T13:03:34Z

These would be best submitted upstream where the devs who actually understand this driver can comment if the patches are correct (or could be achieved in better ways). See submitting patches.

No objection to leaving this PR here for information for other interested users, but we are unlikely to merge it as a downstream only patch. If any patches are accepted upstream we're generally happy to cherry-pick them to get them into trees sooner.

pepijndevos · 2025-07-11T13:18:47Z

The impression I got from the @6by9 comment linked above is that you could maybe guide us on what might be needed to upstream these changes.

If you could have a first pass at tidying it up on rpi-6.12.y, and create a PR against raspberrypi/linux rpi-6.12.y, then we can give pointers on what is needed.

I'd be ~~happy~~willing to spearhead that effort, but could use some guidance from people more familiar with kernel development, because it sounds like a more legislative process than submitting a PR ;)

popcornmix · 2025-07-11T13:28:52Z

Yeah, @6by9 is probably a better guide - he's succeeded in upstreaming a number of patches recently.

pepijndevos · 2025-07-30T07:02:05Z

@6by9 any thoughts?

6by9 · 2025-08-04T14:35:44Z

I can't really give much guidance as to how upstream will respond to some of these patches because I'm not involved in the area of AMD GPUs.

Your initial summary is pretty much what I expect:

amdgpu: uses memset_io where applicable should be acceptable as AIUI it's a nop on x86 platforms. It might make them have a think about the relevant APIs.
The others are likely to get pushback, largely as they change the behaviour for x86. If they can be put behind suitable ifdefs then they might be accepted.

Another step that may get you more traction is if you can show testing on other ARM platforms with PCIe, just to show that limitations are more widespread than just the Broadcom PCIe interface.

Otherwise just send it with a decent cover letter explaining what you're up to, and see what the response is.
I do recommend using b4 for prepping your patches, if for no other reason than it ensures you've run checkpatch to make the patches clean.

6by9 · 2025-08-04T14:36:49Z

Having just approved our CI tests, it has thrown up a load of checkpatch warnings. Do fix those before sending upstream as otherwise you're likely to get some fairly terse responses for sending rubbish.

Coreforge · 2025-08-04T14:44:24Z

Most of the errors/warnings are for the use of volatile. While it might be usually wrong, here, it keeps the compiler from optimizing those parts in a way that will lead to bad alignment. If there's another way to prevent that, that should work fine too, volatile just seemed the easiest to me.

For testing on other platforms, Ampere needed some patches which force the entire PCIe range to not be cached from what I understood, and also include an alignment trap for any unaligned access (both in userspace and in the kernel, where I'd say it's more of a workaround).

6by9 · 2025-08-04T17:29:27Z

Most of the errors/warnings are for the use of volatile. While it might be usually wrong, here, it keeps the compiler from optimizing those parts in a way that will lead to bad alignment. If there's another way to prevent that, that should work fine too, volatile just seemed the easiest to me.

Many are, but ERROR: Missing Signed-off-by: line(s) and WARNING: Missing commit description - Add an appropriate one are definite no-nos, and WARNING: line length of 112 exceeds 100 columns can be avoided easily.

For testing on other platforms, Ampere needed some patches which force the entire PCIe range to not be cached from what I understood, and also include an alignment trap for any unaligned access (both in userspace and in the kernel, where I'd say it's more of a workaround).

I have a Radxa Rock 5B here that I intended to try the patches against, but I just haven't found the time to work out the build process for it and test.

pepijndevos added 3 commits July 8, 2025 13:53

amdgpu: uses memset_io where applicable

3b1c3fa

amdgpu: use ttm_uncached

e770936

amdgpu: mark some variables volatile

72af394

pepijndevos force-pushed the rpi-6.12.y-gpu branch from fcca389 to 72af394 Compare July 8, 2025 11:54

pepijndevos mentioned this pull request Jul 8, 2025

Test GPU (AMD Radeon RX 6700 XT) geerlingguy/raspberry-pi-pcie-devices#222

Open

pepijndevos mentioned this pull request Jul 12, 2025

Support AMDGPU on Raspberry Pi 5 home-assistant/operating-system#4159

Draft

amdgpu: add parameter to force write-combine memory

82e9e49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

amdgpu fixes for rpi5 #6947

amdgpu fixes for rpi5 #6947

pepijndevos commented Jul 8, 2025 •

edited

Loading

Uh oh!

Coreforge commented Jul 8, 2025

Uh oh!

popcornmix commented Jul 11, 2025

Uh oh!

pepijndevos commented Jul 11, 2025 •

edited

Loading

Uh oh!

popcornmix commented Jul 11, 2025

Uh oh!

pepijndevos commented Jul 30, 2025

Uh oh!

6by9 commented Aug 4, 2025

Uh oh!

6by9 commented Aug 4, 2025

Uh oh!

Coreforge commented Aug 4, 2025

Uh oh!

6by9 commented Aug 4, 2025

Uh oh!

Uh oh!

amdgpu fixes for rpi5 #6947

Are you sure you want to change the base?

amdgpu fixes for rpi5 #6947

Conversation

pepijndevos commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Coreforge commented Jul 8, 2025

Uh oh!

popcornmix commented Jul 11, 2025

Uh oh!

pepijndevos commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

popcornmix commented Jul 11, 2025

Uh oh!

pepijndevos commented Jul 30, 2025

Uh oh!

6by9 commented Aug 4, 2025

Uh oh!

6by9 commented Aug 4, 2025

Uh oh!

Coreforge commented Aug 4, 2025

Uh oh!

6by9 commented Aug 4, 2025

Uh oh!

Uh oh!

pepijndevos commented Jul 8, 2025 •

edited

Loading

pepijndevos commented Jul 11, 2025 •

edited

Loading