Use an External GPU on Raspberry Pi 5 for 4K Gaming

After I saw Pineboards 4K Pi 5 external GPU gaming demo at Maker Faire Hanover, I decided it was time to set up my GPU test rig and see how the Pi OS amdgpu Linux kernel patch is going.

GLmark2 running on Pi 5 with AMD RX 460 external GPU

I tested it out on a livestream over the weekend, but I thought I'd document the current state of the patch, how to apply it, and what else is left to do to get full external GPU support on the Raspberry Pi.

Hardware setup for an external PCI Express GPU

external AMD RX 460 running on Raspberry Pi 5

There are a few different routes you can go to physically plug a graphics card into a Pi 5.

My preferred setup is this JMT External Graphics Card stand that uses Oculink with an M.2 to Oculink adapter (included). To use it, you also need an Oculink cable, and those together run $80.

On top of that (or more specifically, on top of the Pi), you need a HAT that converts the PCIe FFC connection on the Pi 5 to an M.2 slot, and my choice is the Pineboards HatDrive! Bottom, though there are tons of other options. That adds on another $20 or so.

The other option is to skip the external GPU stand entirely and mount it right on top of the Pi 5. You can do that with the uPCIty Lite, which is $30, and has an open-ended x4 PCIe slot.

That takes care of the PCIe signaling—but you also need to provide adequate power.

The Pi's PCIe FFC only supports up to 5W of power output. Regardless of the HAT you choose, you'll need to provide adequate power to the slot (up to 75W), and usually also to the card you insert into it (via PCIe ATX power connectors—requirements vary by card).

For that, I'm using this LIAN LI 750W SFX PSU, which has adequate power and cabling to supply power to the PCIe riser—or the uPCIty's 4-pin 12V CPU power intput, as well as to the graphics card's supplemental PCIe power jack.

If you choose uPCIty Lite, or some other method that doesn't have a 24-pin ATX power input like the graphics card stand I'm using, you'll also need a way to force your ATX power supply to turn on, like this ATX 24-pin Power Switch—or a jumper placed across the appropriate pins on the connector.

Choosing a card and Getting PCIe Gen 3

With the PCI Express slot ready to go, you need to choose a card to go into it. After a few years of testing various cards, our little group has settled on Polaris generation AMD graphics cards.

Why? Because they're new enough to use the open source amdgpu driver in the Linux kernel, and old enough the drivers and card details are pretty well known.

We had some success with older cards using the radeon driver, but that driver is older and the hardware is a bit outdated for any practical use with a Pi.

Nvidia hardware is right out, since outside of community nouveau drivers, Nvidia provides little in the way of open source code for the parts of their drivers we need to fix any quirks with the card on the Pi's PCI Express bus.

GitHub user Coreforge and myself (and Pineboards now, too) all chose the RX 460 4 GB as the model to test with, because it's new enough to be useful, old enough to be cheap, and uses PCI Express Gen 3, which is perfect for the Pi 5's bus.

Speaking of, to force Gen 3 speed on the Pi 5's PCI Express bus, you need to edit /boot/firmware/config.txt and add the following line at the bottom:

dtparam=pciex1_gen=3

The Pi 5's external PCI Express bus only provides 1 lane (x1), for 8 GT/s (a boost from the 5 GT/s you get with the default PCIe Gen 2 speed).

Applying the Linux kernel patch

With the hardware connected and the Gen 3 speed configured, you could boot the Pi and identify the card using lspci, but Raspberry Pi OS won't be able to use the card, because the amdgpu driver isn't included by default in the Pi OS.

Therefore, it's time to recompile the Linux kernel!

Follow Raspberry Pi's guide: Build the Linux kernel.

After the git clone step, you'll need to download and apply the patchset we've been working on to enable Polaris-generation cards on Pi 5. Assuming you're in the linux checkout directory (cd linux), run these commands:

wget -O amdgpu-pi5.patch https://github.com/geerlingguy/linux/pull/8.patch
git apply -v amdgpu-pi5.patch 

You should see it apply successfully—if not, either the patch is outdated for the latest 6.6.y Pi OS branch, or you may have checked out a different kernel release. This particular patch was made against the 6.6.y Linux kernel.

Before you start recompiling the Linux kernel (following the rest of the instructions in the Pi kernel guide), you should also patch in Coreforge's optimized memcpy library:

wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c

gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
sudo mv memcpy.so /usr/local/lib/memcpy.so
sudo nano /etc/ld.so.preload

# Put the following line inside ld.so.preload:
/usr/local/lib/memcpy.so

Linux kernel compilation on Pi 5 - menuconfig graphics drivers

To make sure the amdgpu driver is enabled when you recompile the Linux kernel, run make menuconfig (you'll also need to apt install libncurses-dev), and navigate through the menus to select AMD GPU.

Then, follow the instruction to compile the kernel (make -j6 Image.gz modules dtbs), and install it, moving all the parts into place with the sudo cp commands.

The last thing you need to do is install the AMD firmware:

sudo apt install firmware-amd-graphics

Now, reboot, and the Pi 5 should be able to output video through the HDMI port, DisplayPort, or whatever other port on the external graphics card.

If not, debug the connection using a UART connection to the Pi, the Pi's onboard micro HDMI connection, or over SSH — use dmesg to see kernel messages (usually there's a pretty obvious error you can start searching).

4K Gaming on Pi 5 (for real)

Now comes the fun part. The Pi 5 supposedly supports 4K display output. But if you use it at 60 Hz, even normal UI elements will feel a bit laggy.

With the RX 460, I get smooth 60 Hz output at 4K resolution. And if you install a game like SuperTuxKart (sudo apt install supertuxkart), you'll be able to play with all graphics settings maxed out, at 4K.

It gives me 15-20 fps at that resolution, but if I drop the graphics options down a slight bit, I can get 60+ fps all day. The Pi 5's internal VideoCore GPU isn't playable with maxed out graphics settings even at 1080p!

Doom 3 running at 4K on a Raspberry Pi 5

I also installed Doom 3 with Pi-Apps, and got a solid 60 fps at 4K (it seems like the engine locks the game at 60 fps, I could get a lot more than that if I were able to unlock it—but it's been a long time since I hacked around with the old Id games' console...

Again, the Pi's internal GPU struggles to give a playable experience even on lower graphics settings at 1080p.

I couldn't get Steam installed using Box86/Box64 yet, but would like to try that with Doom Eternal and some other games I know play okay on other Arm64 platforms like my Ampere workstations (incidentally, with Nvidia GPUs like the 4070 Ti and 4090... which have better Arm64 drivers for more fully-compliant PCI Express buses).

Outside of games, I ran glmark2-es2, and on the Pi's internal V3D graphics, I got a score of about 1800. On the external AMD RX 460, I got 2383.

nvtop running on Raspberry Pi 5 RX 460

In a bit of a surprise, nvtop actually works out of the box (sudo apt install nvtop) and provides a much better overview of GPU utilization than radontop. It even includes temperature and fan speed info, in addition to the basics like clock speeds and feature utilization.

Other GPU uses

One downside to the Polaris generation AMD graphics cards is ROCm support was dropped years ago, so using the RX 460 for compute is a bit tricky.

With only 4 GB of VRAM and a few-generations-outdated GPU efficiency, it's not that compelling for things like LLMs or model training anyways. But one could pursue smaller models or other compute uses, as an academic exercise.

The much more enticing use is for transcoding.

Linux has decent support for GPU-accelerated video encode/decode, using the VA-API (Video Acceleration API).

And the RX 460 should support up to 10-bit H.264 encode/decode (at least if I'm reading specs correctly), up to 4K... but this is something I haven't gotten working on my setup, just yet. Checking with vainfo gives an error:

pi@pi5-pcie:~ $ DISPLAY=:0 vainfo
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/aarch64-linux-gnu/dri/radeonsi_drv_video.so
libva info: va_openDriver() returns -1
vaInitialize failed with error code -1 (unknown libva error),exit

So far transcoding support hasn't been Coreforge's focus, as much as memory alignment fixes. And on his machine, vainfo is showing support for H.264, HEVC, VC1, and MPEG2 transcoding!. At this point I'm trying to figure out what's different between our systems so I can get it working.

If you could grab a cheap old graphics card, drop it on a Pi 5, and transcode video with it, it makes for an even more compelling low-power, quiet Arm NAS running a completely open source stack like Jellyfin + SMB + OpenZFS.

What's left?

I've been running this configuration for a couple days, and it's perfectly stable. There are still memory alignment bugs that some applications run into, and the driver is not completely compatible yet.

Chromium UI freeze with external AMD GPU

The Chromium browser interface seems to freeze sometimes when running through the external graphics card, and I'm not quite sure what's causing that. The settings menu pops up, and the title bar updates (e.g. if you type in "Jeff Geerling" in the location bar and hit enter), but nothing else in the window updates.

I installed Firefox (sudo apt install firefox), and it didn't have any issues—so my best guess is Chromium is trying to use GPU acceleration for it's UI by default, and there's a driver bug it's hitting in that state.

Outside of that, it would be nice to get the amdgpu driver in the kernel working with all generations of AMD GPUs, so one could use newer cards, or experiment with modern ROCm on a Pi 5.

The PCI Express Gen 3 x1 bus speed is a limiting factor, but there are plenty of use cases where that's enough bandwidth.

Besides, it's just fun to push hardware to its limits. I've certainly learned a lot about PCIe, arm64, the Linux kernel, and AMD's drivers already!