Performance of llama.cpp on AMD HIP/ROCm #15021
Replies: 8 comments 7 replies
-
RX 7800 XT (Sapphire Pulse 280W)ggml_cuda_init: found 1 ROCm devices:
build: 00131d6 (6031) ggml_vulkan: Found 1 Vulkan devices:
build: baad948 (6056) Notes:
|
Beta Was this translation helpful? Give feedback.
-
Happy to replicate: ggml_cuda_init: found 1 ROCm devices:
build: 9c35706 (6060) On Linux |
Beta Was this translation helpful? Give feedback.
-
RX 7600 XTggml_cuda_init: found 1 ROCm devices:
build: 9c35706 (6060) Running on Linux 6.12.32, mainline amdgpu, ROCm 6.4.1. ggml_vulkan: Found 1 Vulkan devices:
build: 9c35706 (6060) |
Beta Was this translation helpful? Give feedback.
-
AMD MI60. Happy to contribute.
I will post FA=1 and vulkan results once I have time during the weekend. |
Beta Was this translation helpful? Give feedback.
-
MI100Using ./llama-bench -m llama-2-7b.Q4_0.gguf -ngl 99 -fa 0,1 -sm none -mg 0
build: 9c35706 (6060) I'm running Ubuntu 24.04.2 and ROCm 6.4.1 |
Beta Was this translation helpful? Give feedback.
-
AMD Instinct MI300Xroot@0-4-9-gpu-mi300x1-192gb-devcloud-atl1:~/llama.cpp# ./build/bin/llama-bench -m llama-2-7b.Q4_0.gguf -ngl 99 -fa 0,1
build: 2bf3fbf (6069) Ref: #14640 |
Beta Was this translation helpful? Give feedback.
-
Pro V620Why does FA slow down the V620 so much? Been a question I've been trying to answer for a while now.
build: 03d4698 (6074) Linux, ROCm 6.4.1 ( will try upgrading soon) |
Beta Was this translation helpful? Give feedback.
-
Powercolor Hellhound RX 7900 XTX (400W power limit)Opensuse tumbleweed system with rocm packages from
build: 5c0eb5e (6075) Sapphire Nitro 7900 XTX (400W power limit)In a different PC unfortunately because these GPUs are too chonky to fit in a regular case
build: 9c35706 (6060) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is similar to the Performance of llama.cpp on Apple Silicon M-series and Performance of llama.cpp with Vulkan, but for ROCm! I think it's good to consolidate and discuss our results here.
We'll be testing the Llama 2 7B model like the other thread to keep things consistent, and use Q4_0 as it's simple to compute and small enough to fit on a 4GB GPU. You can download it here.
Instructions
Either run the commands below or download one of our ROCm(HIP) releases. If you have multiple GPUs please run the test on a single GPU using
-sm none -mg YOUR_GPU_NUMBER
unless the model is too big to fit in VRAM.Share your llama-bench results along with the git hash and ROCm info string in the comments. Feel free to try other models and compare backends, but only valid runs will be placed on the scoreboard.
If multiple entries are posted for the same device I'll prioritize newer commits with substantial ROCm updates, otherwise I'll pick the one with the highest overall score at my discretion. Performance may vary depending on driver, operating system, board manufacturer, etc. even if the chip is the same. For integrated graphics note that your memory speed and number of channels will greatly affect your inference speed!
ROCm Scoreboard for Llama 2 7B, Q4_0 (no FA)
ROCm Scoreboard for Llama 2 7B, Q4_0 (with FA)
Beta Was this translation helpful? Give feedback.
All reactions