NUMA Emulation speeds up Pi 5 (and other improvements)

Recently an Igalia engineer posted a NUMA Emulation patch for the Pi 5 to the Linux Kernel mailing list. He said it could improve performance of Geekbench 6 scores up to 6% for single-core, and 18% for multicore.

My testing didn't quite match those numbers, but I did see a significant and consistent performance increase across both Geekbench 6:

Raspberry Pi 5 Geekbench 6 Score comparison with NUMA Emulation enabled

And High Performance Linpack:

Raspberry Pi 5 HPL Gigaflops and efficiency comparison with NUMA Emulation enabled

If you want to see all the gory details of my test process and setup (and how to replicate the results), check out the issue I posted to my top500 repository: Benchmark Raspberry Pi 5 Linux kernel NUMA patch.

Update August 2024: Until this is in Pi OS proper, you can install the patch by running sudo rpi-update pulls/6273, see this issue. You can still follow the steps below if you like, but using rpi-update means you don't have to recompile the kernel :)

Evaluating the patch is a little involved (especially if you're not familiar with compiling the Linux kernel):

  1. Download the .mbox file for the kernel patch thread.
  2. Apply it to your raspberrypi/linux checkout with git am [filename.mbox]
  3. Rebuild the Linux kernel, ensuring NUMA Emulation is enabled in the kernel config.
  4. Add numa=fake=4 to /boot/firmware/cmdline.txt before the rootwait option, and reboot.
  5. Prefix any commands you want to test with numactl, e.g.: numactl --interleave=all ./geekbench6. (Install numactl with sudo apt install -y numactl.)

It remains to be seen whether the patch will make it in—similar NUMA emulation exists for x86 already, so there is precedent. Otherwise Raspberry Pi could maintain the code in their own Linux fork or pull some of the memory layout changes into firmware, maybe.

Pi 1, 3+ Efficiency gains via s2idle

Separately, Stefan Wahren posted a patch for the Raspberry Pi 1 B, 3 A+, and 3 B+, implementing support for S2Idle on those models.

Suspend-to-idle is a lightweight sleep state a computer can employ to save a little juice while it's not doing much.

In the Pi's case, at least on the Pi 1 B, this results in a 23% power savings while idle:

  • running but CPU idle = 1.67 W
  • suspend to idle = 1.33 W

The patch doesn't work with reducing the USB bus power draw (due to this issue), but if that could be solved, there may be even more upside in the future.

No word on whether this patch will make it in, but it's being actively reviewed at the time of this writing.

A2 microSD card Command Queueing support (for 2-3x faster random access)

One thing that's actually implemented on the Pi 5 now—no need for a kernel patch review—is A2 microSD card Command Queueing.

To enable it on your Pi 5, make sure you're on the latest update, and add dtparam=sd_cqe to /boot/firmware/config.txt and reboot.

If it's working, and you have an A2 card (most older cards are either A1 or not rated at all), then you should see something like the following in dmesg logs:

mmc0: Command Queue Engine enabled, 31 tags

Check my full test results here, but here's a summary of my testing with both the Raspberry Pi Diagnostics tool:

Raspberry Pi 5 A2 Command Queueing performance comparison - IOPS

...and my own disk-benchmark.sh tool using iozone:

Raspberry Pi 5 A2 Command Queueing performance comparison - Data

I have a full video on my YouTube channel going over everything in more detail, with a little more explanation, including why I haven't been able to test the NUMA emulation (which aims to be generic for all Arm devices) on Rockchip RK3588 boards:

Comments

Hello there. I tried to apply the .mbox patch, and I also looked into the setup details you put on the github issue you linked here. At the moment of applying, however, git gave me "empty patch" errors; i tried enabling NUMA emulation and compiling the kernel anyway, and added the lines to /boot/firmware/config.txt and /boot/firmware/cmdline.txt, but numactl still tells me that there is no NUMA emulation available. Did I miss some steps?