Real gaming router

Running GTA: Vice City on a TP-Link TL-WDR4900 wireless router

What is it?

A TP-Link wireless router, with an external AMD Radeon GPU connected via PCIe,
running Debian Linux and playing games:

What makes this router so special?

TP-LINK’s TL-WDR4900 v1 is a very interesting WiFi router:
Instead of the typical MIPS or ARM CPUs found in normal WiFi routers, the WDR4900 features a PowerPC-based CPU by NXP.

The NXP/Freescale QorIQ P1014 CPU used in the WDR4900 is a PowerPC e500v2 32bit processor.
These CPUs offer a full 36bit address space, a lot of performance (for a router released in 2013) and they have excellent PCIe controllers.

They quickly gained popularity in the OpenWrt and Freifunk communities for being cheap routers with a lot of CPU performance. Both 2.4 GHz and 5 GHz WiFi chipsets (made by Qualcomm/Atheros) are connected to the CPU via PCIe.

PCIe problems on embedded systems

PCIe cards are mapped into the host CPU’s memory space transparently. The PCIe controller in the host CPU will then send all accesses to this region to the PCIe device responsible for that memory region.

Each PCIe card can have several such mappings, also called “BARs” (Base Address Registers). The maximum size for such mappings varies between different CPUs.

In the past, even the common Raspberry Pi CM4 could only allocate 64 MiB of their address space to graphics cards:
https://github.com/raspberrypi/linux/commit/54db4b2fa4d17251c2f6e639f849b27c3b553939
Many other devices (like MIPS-based router CPUs) are limited to only 32 MiB (or less).

Basically, all modern graphics cards require the host system to have at least a 128 MiB BAR space available for communication with their driver. Even newer cards like Intel ARC even require “Resizable BAR”, a marketing term for very large, 64-bit memory regions. These cards will map their entire VRAM (on the order of 12+ GiB) into the hosts’ memory space.

Even with sufficient BAR space, PCIe device memory might not behave in the same way as regular memory (like on an x86 CPU):
This caused numerous issues when people tried to attach GPUs to a Raspberry Pi.
Similar issues (regarding memory-ordering/caching/nGnRE maps/alignment) even affect large Arm64 server CPUs, resulting in crude kernel hacks and workarounds like:

Retrofitting a miniPCIe slot

From factory, the router didn’t provide any external PCIe connectivity. To connect a graphics card, a custom miniPCIe breakout PCB was designed and connected with enameled copper wire into the router:

The PCIe traces leading from the CPU to one of the Atheros chipsets were cut and redirected to the miniPCIe slot.

U-boot reports PCIe2 being connected to an AMD Radeon HD 7470 graphics card:

U-Boot 2010.12-svn19826 (Apr 24 2013 - 20:01:21)

CPU:   P1014, Version: 1.0, (0x80f10110)
Core:  E500, Version: 5.1, (0x80212151)
Clock Configuration:
       CPU0:800  MHz,
       CCB:400  MHz,
       DDR:333.333 MHz (666.667 MT/s data rate) (Asynchronous), IFC:100  MHz
L1:    D-cache 32 kB enabled
       I-cache 32 kB enabled
Board: P1014RDB
SPI:   ready
DRAM:  128 MiB
L2:    256 KB enabled
Using default environment

PCIe1: Root Complex of mini PCIe Slot, x1, regs @ 0xffe0a000
  01:00.0     - 168c:abcd - Network controller
PCIe1: Bus 00 - 01
PCIe2: Root Complex of PCIe Slot, x1, regs @ 0xffe09000
  03:00.0     - 1002:6778 - Display controller
  03:00.1     - 1002:aa98 - Multimedia device
PCIe2: Bus 02 - 03
In:    serial
Out:   serial
Err:   serial
Net:   initialization for Atheros AR8327/AR8328
eTSEC1
auto update firmware: is_auto_upload_firmware = 0!
Autobooting in 1 seconds
=>

Installing Debian Linux

After installing OpenWrt on the router, we already had a working kernel and userland, but the OpenWrt userland is quite limiting (busybox, musl libc, no graphics/games libraries, etc.).

We were also missing AMD graphics drivers with the default OpenWrt kernel. The driver situation was solved by compiling a custom OpenWrt tree, with additional kernel modules enabled. This kernel was then loaded via TFTP directly from u-boot:

setenv ipaddr 10.42.100.4
tftpboot 0x2000000 10.42.100.60:wdr4900-nfs-openwrt.bin
bootm 0x2000000

Luckily, Debian Linux used to have a special “PowerPCSPE” architecture/port, specifically for this type of CPU (e500/e500v2). On a system with statically compiled QEMU user binaries and properly setup binfmt handlers, we can use Debian’s debootstrap tool to create a bootable userland from the mirrors:

sudo QEMU_CPU=e500v2 debootstrap --exclude=usr-is-merged --arch=powerpcspe --keyring ~/gamingrouter/debian-ports-archive-keyring-removed.gpg unstable "$TARGET" https://snapshot.debian.org/archive/debian-ports/20190518T205337Z/

debootstrap will chroot into the newly created root filesystem and just execute binaries (post-install hooks, etc.). This is done by qemu-user-static transparently handling the execution of the PowerPCSPE binaries on this amd64 host machine. The additional QEMU_CPU=e500v2 environment variable tells QEMU which CPU to emulate.

amdgpu (modern AMD) GPU

Our first experiments were done with an AMD Radeon RX570 GPU, using the modern amdgpu graphics driver. This resulted in very weird artifacts and no (real) image right away:

After doing some troubleshooting and finally installing a 32-bit x86 (i386) Linux on a different computer, we noticed that this same issue was also present on any other 32-bit platform, even regular Intel PCs. amdgpu seems to have some sort of incompatibility on 32-bit platforms.

We opened an issue for this bug, but there hasn’t been any progress so far:
https://gitlab.freedesktop.org/drm/amd/-/issues/1931

radeon (legacy AMD) GPU

With an AMD Radeon HD 7470 card, using the older, radeon driver instead, things started working:

Big endian troubles

reVC (a reverse-engineered version of GTA Vice City, with the source code publicly available) was compiled for the platform. This required custom builds of premake, glfw3, glew and reVC itself.

root@gaming-router:/home/user/GTAVC# ./reVC
Segmentation fault

Oops :)
More work is required. It turns out that the game and the rendering engine (at least in the decompiled version) aren’t big-endian aware at all. Loading the games assets will load structs (containing offsets, sizes, numbers, coordinates, etc.) directly into memory. These structs then contain little-endian data on a big-endian platform. This causes the game to try to access memory at absurd offsets and crash pretty much immediately.

We spent several days patching the game and the librw rendering engine to work correctly on big-endian machines. There were 100+ mechanisms in the source code that needed to be patched, most of the patches looked similar to these:

@@ -118,6 +136,7 @@ RwTexDictionaryGtaStreamRead1(RwStream *stream)
  assert(size == 4);
  if(RwStreamRead(stream, &numTextures, size) != size)
    return nil;
+  numTextures = le32toh(numTextures);

  texDict = RwTexDictionaryCreate();
  if(texDict == nil)
@@ -458,8 +477,8 @@ CreateTxdImageForVideoCard()
          RwStreamWrite(img, buf, num);
        }

-       dirInfo.offset = pos / CDSTREAM_SECTOR_SIZE;
-       dirInfo.size = size;
+       dirInfo.offset = htole32(pos / CDSTREAM_SECTOR_SIZE);
+       dirInfo.size = htole32(size);
        strncpy(dirInfo.name, filename, sizeof(dirInfo.name));
        pDir->AddItem(dirInfo);
        CStreaming::RemoveTxd(i);

After the game loaded some asset data using RwStreamRead(), data loaded into the structs needed to be converted from little-endian to our host-endianness.
Things like save games, settings, etc. needed the reverse mechanism to always be saved in little-endian instead.

We could now actually load the game, see the world, drive around in a car. Whenever a person/character was displayed, though, very strange graphical glitching would occur:

Player model glitching

When all players/NPCs were disabled, there were no glitches visible. Everything worked fine, the game was playable (as playable as it is without NPCs…).
We spent several days trying to find the bug in our code. Surely, we must’ve made some mistake when implementing big-endian support. All applicable variables, coordinates, vertexes, transforms were dumped as numbers and compared with a little-endian version of the game.

Everything looked perfectly correct and we couldn’t find any further issues.
The project was stuck at this point for several months.

Wii U port

We found another port of reVC online: the Wii U port. Wii U’s use an IBM Espresso CPU, which is a PowerPC-based processor, just like ours. It’s also running in big-endian mode.

We contacted Gary, the author of this Wii U port and asked very, very nicely if we may take a look at their big-endian patched source code. Thanks again!

After transplanting Gary’s patches back into the regular reVC codebase (leaving all the Wii U-specific changes behind), we were able to run reVC on the TP-Link with Gary’s, known-good patches…

The exact same graphical corruption happened. Huh?!

At this point, we were looking in all directions and questioning the sanity of each and every part of the system. Kernel, GPU drivers, compilers and libraries were all suspects.

PowerPC SPE isn’t a very common architecture (it was even removed in GCC 9), with very unusual floating point extensions (different from regular PowerPC CPUs).
Disabling spe (-mno-spe), switching to soft-floating point, switching the compilation targets to e500, e500v2, etc. didn’t change anything.

i386 Test

To prove we didn’t break the code, we connected the same GPU to an x86 machine (a trusty ThinkPad T430, via ExpressCard 34). Installed the same version of Debian 10, same libraries, same radeon driver, same firmware and compiled the same reVC source code for i386.

The game worked perfectly, with no corruption whatsoever.

Modern LLVM kernel

At this point, we wanted to try a newer kernel (with newer radeon drivers). GCC dropped support for PowerPC SPE and building a modern Linux 6.7 with GCC 8 doesn’t work. LLVM/clang has just gained PowerPC SPE support though, and Linux can also be built with clang.

make LLVM=1 ARCH=powerpc OBJCOPY="~/binutils-2.42/build/binutils/objcopy" all -j 40 V=1
mkimage -C none -a 0x1200000 -e 0x1200000 -A powerpc -d arch/powerpc/boot/simpleImage.tl-wdr4900-v1 uImage12-nvme

We needed to provide our own (PowerPC-capable) version of binutils/objcopy and ld.

The other changes required to target the TP-Link WDR4900 with a mainline kernel were pretty minor:

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 968aee202..5ce3eeb09 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -181,6 +181,7 @@ src-plat-$(CONFIG_PPC_PSERIES) += pseries-head.S
 src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
 src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
 src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
+src-plat-$(CONFIG_TL_WDR4900_V1) += simpleboot.c fixed-head.S

 src-plat-$(CONFIG_PPC_MICROWATT) += fixed-head.S microwatt.c

@@ -351,7 +352,7 @@ image-$(CONFIG_TQM8548)                     += cuImage.tqm8548
 image-$(CONFIG_TQM8555)                        += cuImage.tqm8555
 image-$(CONFIG_TQM8560)                        += cuImage.tqm8560
 image-$(CONFIG_KSI8560)                        += cuImage.ksi8560
-
+image-$(CONFIG_TL_WDR4900_V1)          += simpleImage.tl-wdr4900-v1
 # Board ports in arch/powerpc/platform/86xx/Kconfig
 image-$(CONFIG_MVME7100)                += dtbImage.mvme7100

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 352d7de24..414216454 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -345,6 +345,11 @@ adder875-redboot)
     platformo="$object/fixed-head.o $object/redboot-8xx.o"
     binary=y
     ;;
+simpleboot-tl-wdr4900-v1)
+    platformo="$object/fixed-head.o $object/simpleboot.o"
+    link_address='0x1000000'
+    binary=y
+    ;;
 simpleboot-*)
     platformo="$object/fixed-head.o $object/simpleboot.o"
     binary=y
diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S
index 39724ff5a..80da35f85 100644
--- a/arch/powerpc/kernel/head_85xx.S
+++ b/arch/powerpc/kernel/head_85xx.S
@@ -968,7 +968,7 @@ _GLOBAL(__setup_ehv_ivors)
 _GLOBAL(__giveup_spe)
        addi    r3,r3,THREAD            /* want THREAD of task */
        lwz     r5,PT_REGS(r3)
-       cmpi    0,r5,0
+       PPC_LCMPI       0,r5,0
        SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
        evxor   evr6, evr6, evr6        /* clear out evr6 */
        evmwumiaa evr6, evr6, evr6      /* evr6 <- ACC = 0 * 0 + ACC */
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 9315a3b69..86ba4b5e4 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -176,6 +176,18 @@ config STX_GP3
        select CPM2
        select DEFAULT_UIMAGE

+config TL_WDR4900_V1
+    bool "TP-Link TL-WDR4900 v1"
+    select DEFAULT_UIMAGE
+    select ARCH_REQUIRE_GPIOLIB
+    select GPIO_MPC8XXX
+    select SWIOTLB
+    help
+      This option enables support for the TP-Link TL-WDR4900 v1 board.
+
+      This board is a Concurrent Dual-Band wireless router with a
+      Freescale P1014 SoC.
+
 config TQM8540
        bool "TQ Components TQM8540"
        help
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 43c34f26f..55268278d 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
 obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
 obj-$(CONFIG_FB_FSL_DIU)       += t1042rdb_diu.o
 obj-$(CONFIG_STX_GP3)    += stx_gp3.o
+obj-$(CONFIG_TL_WDR4900_V1) += tl_wdr4900_v1.o
 obj-$(CONFIG_TQM85xx)    += tqm85xx.o
 obj-$(CONFIG_PPA8548)     += ppa8548.o
 obj-$(CONFIG_SOCRATES)    += socrates.o socrates_fpga_pic.o
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index b2d8c0da2..21bc5f06b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -272,7 +272,7 @@ config TARGET_CPU
        default "e300c2" if E300C2_CPU
        default "e300c3" if E300C3_CPU
        default "G4" if G4_CPU
-       default "8540" if E500_CPU
+       default "8548" if E500_CPU
        default "e500mc" if E500MC_CPU
        default "powerpc" if POWERPC_CPU

This resulted in a bootable kernel. No change in the graphical corruption. It was very nice to get rid of the OpenWrt toolchain entirely, though.

qemu-user-static with llvmpipe

To make debugging a bit easier, we copied the root filesystem to a local, amd64 machine (with qemu-user-static again) and configured an X server to run on a dummy/virtual monitor. This was then combined with x11vnc to view the dummy monitor.

Section "Device"
    Identifier  "Configured Video Device"
    Driver      "dummy"
    VideoRam    256000
EndSection

Section "Monitor"
    Identifier  "Configured Monitor"
    HorizSync   60.0 - 1000.0
    VertRefresh 60.0 - 200.0
    ModeLine    "640x480"   23.75  640 664 720 800  480 483 487 500 -hsync +vsync
              # "1920x1080" 148.50 1920 2448 2492 2640 1080 1084 1089 1125 +Hsync +Vsync
EndSection

Section "Screen"
    Identifier  "Default Screen"
    Monitor     "Configured Monitor"
    Device      "Configured Video Device"
    DefaultDepth 24
    SubSection "Display"
        Depth 24
        Modes "640x480"
    EndSubSection
EndSection

Inside the chroot (with QEMU_CPU set to e500v2), we ran Xorg, x11vnc and finally reVC:

export LIBGL_ALWAYS_SOFTWARE=true
export GALLIUM_DRIVER=llvmpipe
export DISPLAY=:2

Xorg -config /etc/xorg.conf :2 &
x11vnc -display :2 &
xrandr --output default --mode "800x600"
/home/user/GTAVC/reVC

… while this was absurdly slow (1 frame every ~20sec), it worked. It even worked with player models, without any graphical corruption. The main differences were:

  • QEMU emulated CPU instead of real hardware
  • llvmpipe instead of radeon / r600

We then set GALLIUM_DRIVER=llvmpipe on the real hardware. This resulted in even worse performance (about 1 frame every minute!), but it worked!
No graphical corruption visible (after waiting almost an hour to get in game…).

mesa update

We then set out to update mesa on the router. This required a number of dependencies to be updated as well. cmake, libglvnd, meson, drm and finally mesa were all built from scratch. Either directly from git or the latest release.

After installing the new libglvnd, drm and mesa, player rendering started to work fine on real hardware (with acceleration!). The exact issue (and library at fault) is still unknown, but we were more than happy to have finally resolved this issue.

Result