Skip to content

amdgpu fixes for rpi5 #6947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: rpi-6.12.y
Choose a base branch
from

Conversation

pepijndevos
Copy link

@pepijndevos pepijndevos commented Jul 8, 2025

This cleans up the commit history of the amazing work by @Coreforge to make AMD GPUs work with the Raspberry Pi 5, following instructions by @6by9: geerlingguy/raspberry-pi-pcie-devices#222 (comment) and an explanation of the different parts of the patch: geerlingguy/raspberry-pi-pcie-devices#222 (comment)

So I've made

  • one commit with all the memset changes that to my understanding could potentially be upstreamed to mainline linux since they are just more correct.
  • one commit with a miscellaneous ttm_uncached change that may need an ifdef for arm only
  • one commit with all the volatile changes which are not that invasive but that coreforge suggested might need to be ifdefed as well for mainline acceptance
  • one commit to allow forcing write-combine memory

What is not included at the moment is the whole alignment machinery which to my understanding is more hacky and could be harder to get merged or might require significant changes. I'm not sure how essential that change is, but if desired I could include it as a separate commit as well. Or maybe the Ampere version of that trap could be used. fwiw, it seems llama.cpp works equally well without that patch applied from limited testing.

Just to be clear, I don't claim any authorship or even understanding of these changes, and am just trying to grease the wheels of getting these changes upstreamed as far as they will go, making it easier to use GPUs on Raspberry Pi, which I have a big interest in: https://sanctuary-systems.com/sentinel-core/

@Coreforge
Copy link

The alignment trap isn't needed if all userspace programs respect the alignment requirements. I guess llama.cpp might do that, so it works without it. Xorg I found did need it, even the arm64 build (or a userspace workaround like the memcpy library). The Ampere version should work as well, although I haven't tried it. The Ampere version also covers kernelspace, while mine only covers userspace, so more cards might work with it without extra changes, but at the cost of some performance (whether that would be noticeable or not, I don't know).

@popcornmix
Copy link
Collaborator

These would be best submitted upstream where the devs who actually understand this driver can comment if the patches are correct (or could be achieved in better ways). See submitting patches.

No objection to leaving this PR here for information for other interested users, but we are unlikely to merge it as a downstream only patch. If any patches are accepted upstream we're generally happy to cherry-pick them to get them into trees sooner.

@pepijndevos
Copy link
Author

pepijndevos commented Jul 11, 2025

The impression I got from the @6by9 comment linked above is that you could maybe guide us on what might be needed to upstream these changes.

If you could have a first pass at tidying it up on rpi-6.12.y, and create a PR against raspberrypi/linux rpi-6.12.y, then we can give pointers on what is needed.

I'd be happywilling to spearhead that effort, but could use some guidance from people more familiar with kernel development, because it sounds like a more legislative process than submitting a PR ;)

@popcornmix
Copy link
Collaborator

Yeah, @6by9 is probably a better guide - he's succeeded in upstreaming a number of patches recently.

@pepijndevos
Copy link
Author

@6by9 any thoughts?

@6by9
Copy link
Contributor

6by9 commented Aug 4, 2025

I can't really give much guidance as to how upstream will respond to some of these patches because I'm not involved in the area of AMD GPUs.

Your initial summary is pretty much what I expect:

  • amdgpu: uses memset_io where applicable should be acceptable as AIUI it's a nop on x86 platforms. It might make them have a think about the relevant APIs.
  • The others are likely to get pushback, largely as they change the behaviour for x86. If they can be put behind suitable ifdefs then they might be accepted.

Another step that may get you more traction is if you can show testing on other ARM platforms with PCIe, just to show that limitations are more widespread than just the Broadcom PCIe interface.

Otherwise just send it with a decent cover letter explaining what you're up to, and see what the response is.
I do recommend using b4 for prepping your patches, if for no other reason than it ensures you've run checkpatch to make the patches clean.

@6by9
Copy link
Contributor

6by9 commented Aug 4, 2025

Having just approved our CI tests, it has thrown up a load of checkpatch warnings. Do fix those before sending upstream as otherwise you're likely to get some fairly terse responses for sending rubbish.

@Coreforge
Copy link

Most of the errors/warnings are for the use of volatile. While it might be usually wrong, here, it keeps the compiler from optimizing those parts in a way that will lead to bad alignment. If there's another way to prevent that, that should work fine too, volatile just seemed the easiest to me.

For testing on other platforms, Ampere needed some patches which force the entire PCIe range to not be cached from what I understood, and also include an alignment trap for any unaligned access (both in userspace and in the kernel, where I'd say it's more of a workaround).

@6by9
Copy link
Contributor

6by9 commented Aug 4, 2025

Most of the errors/warnings are for the use of volatile. While it might be usually wrong, here, it keeps the compiler from optimizing those parts in a way that will lead to bad alignment. If there's another way to prevent that, that should work fine too, volatile just seemed the easiest to me.

Many are, but ERROR: Missing Signed-off-by: line(s) and WARNING: Missing commit description - Add an appropriate one are definite no-nos, and WARNING: line length of 112 exceeds 100 columns can be avoided easily.

For testing on other platforms, Ampere needed some patches which force the entire PCIe range to not be cached from what I understood, and also include an alignment trap for any unaligned access (both in userspace and in the kernel, where I'd say it's more of a workaround).

I have a Radxa Rock 5B here that I intended to try the patches against, but I just haven't found the time to work out the build process for it and test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants