Real gaming router
Update, 18th March 2024
Read Part 2 / Router overclocking for an update on performance.What is it?
A TP-Link wireless router, with an external AMD Radeon GPU connected via PCIe,
running Debian Linux and playing games:
What makes this router so special?
TP-LINK’s TL-WDR4900 v1 is a very interesting WiFi router:
Instead of the typical MIPS or ARM CPUs found in normal WiFi routers, the WDR4900 features a PowerPC-based CPU by NXP.
The NXP/Freescale QorIQ P1014 CPU used in the WDR4900 is a PowerPC e500v2 32bit processor.
These CPUs offer a full 36bit address space, a lot of performance (for a router released in 2013) and they have excellent PCIe controllers.
They quickly gained popularity in the OpenWrt and Freifunk communities for being cheap routers with a lot of CPU performance. Both 2.4 GHz and 5 GHz WiFi chipsets (made by Qualcomm/Atheros) are connected to the CPU via PCIe.
PCIe problems on embedded systems
PCIe cards are mapped into the host CPU’s memory space transparently. The PCIe controller in the host CPU will then send all accesses to this region to the PCIe device responsible for that memory region.
Each PCIe card can have several such mappings, also called “BARs” (Base Address Registers). The maximum size for such mappings varies between different CPUs.
In the past, even the common Raspberry Pi CM4 could only allocate 64 MiB of their address space to graphics cards:
https://github.com/raspberrypi/linux/commit/54db4b2fa4d17251c2f6e639f849b27c3b553939
Many other devices (like MIPS-based router CPUs) are limited to only 32 MiB (or less).
Basically, all modern graphics cards require the host system to have at least a 128 MiB BAR space available for communication with their driver. Even newer cards like Intel ARC even require “Resizable BAR”, a marketing term for very large, 64-bit memory regions. These cards will map their entire VRAM (on the order of 12+ GiB) into the hosts’ memory space.
Even with sufficient BAR space, PCIe device memory might not behave in the same way as regular memory (like on an x86 CPU):
This caused numerous issues when people tried to attach GPUs to a Raspberry Pi.
Similar issues (regarding memory-ordering/caching/nGnRE maps/alignment) even affect large Arm64 server CPUs, resulting in crude kernel hacks and workarounds like:
Retrofitting a miniPCIe slot
From factory, the router didn’t provide any external PCIe connectivity. To connect a graphics card, a custom miniPCIe breakout PCB was designed and connected with enameled copper wire into the router:
The PCIe traces leading from the CPU to one of the Atheros chipsets were cut and redirected to the miniPCIe slot.
U-boot reports PCIe2 being connected to an AMD Radeon HD 7470 graphics card:
U-Boot 2010.12-svn19826 (Apr 24 2013 - 20:01:21)
CPU: P1014, Version: 1.0, (0x80f10110)
Core: E500, Version: 5.1, (0x80212151)
Clock Configuration:
CPU0:800 MHz,
CCB:400 MHz,
DDR:333.333 MHz (666.667 MT/s data rate) (Asynchronous), IFC:100 MHz
L1: D-cache 32 kB enabled
I-cache 32 kB enabled
Board: P1014RDB
SPI: ready
DRAM: 128 MiB
L2: 256 KB enabled
Using default environment
PCIe1: Root Complex of mini PCIe Slot, x1, regs @ 0xffe0a000
01:00.0 - 168c:abcd - Network controller
PCIe1: Bus 00 - 01
PCIe2: Root Complex of PCIe Slot, x1, regs @ 0xffe09000
03:00.0 - 1002:6778 - Display controller
03:00.1 - 1002:aa98 - Multimedia device
PCIe2: Bus 02 - 03
In: serial
Out: serial
Err: serial
Net: initialization for Atheros AR8327/AR8328
eTSEC1
auto update firmware: is_auto_upload_firmware = 0!
Autobooting in 1 seconds
=>
Installing Debian Linux
After installing OpenWrt on the router, we already had a working kernel and userland, but the OpenWrt userland is quite limiting (busybox, musl libc, no graphics/games libraries, etc.).
We were also missing AMD graphics drivers with the default OpenWrt kernel. The driver situation was solved by compiling a custom OpenWrt tree, with additional kernel modules enabled. This kernel was then loaded via TFTP directly from u-boot:
setenv ipaddr 10.42.100.4
tftpboot 0x2000000 10.42.100.60:wdr4900-nfs-openwrt.bin
bootm 0x2000000
Luckily, Debian Linux used to have a special “PowerPCSPE” architecture/port, specifically for this type of CPU (e500/e500v2). On a system with statically compiled QEMU user binaries and properly setup binfmt handlers, we can use Debian’s debootstrap tool to create a bootable userland from the mirrors:
sudo QEMU_CPU=e500v2 debootstrap --exclude=usr-is-merged --arch=powerpcspe --keyring ~/gamingrouter/debian-ports-archive-keyring-removed.gpg unstable "$TARGET" https://snapshot.debian.org/archive/debian-ports/20190518T205337Z/
debootstrap will chroot into the newly created root filesystem and just execute binaries (post-install hooks, etc.). This is done by qemu-user-static transparently handling the execution of the PowerPCSPE binaries on this amd64 host machine. The additional QEMU_CPU=e500v2
environment variable tells QEMU which CPU to emulate.
amdgpu (modern AMD) GPU
Our first experiments were done with an AMD Radeon RX570 GPU, using the modern amdgpu
graphics driver.
This resulted in very weird artifacts and no (real) image right away:
After doing some troubleshooting and finally installing a 32-bit x86 (i386) Linux on a different computer, we noticed that this same issue was also present on any other 32-bit platform, even regular Intel PCs. amdgpu
seems to have some sort of incompatibility on 32-bit platforms.
We opened an issue for this bug, but there hasn’t been any progress so far:
https://gitlab.freedesktop.org/drm/amd/-/issues/1931
radeon (legacy AMD) GPU
With an AMD Radeon HD 7470 card, using the older, radeon
driver instead, things started working:
Big endian troubles
reVC (a reverse-engineered version of GTA Vice City, with the source code publicly available) was compiled for the platform. This required custom builds of premake, glfw3, glew and reVC itself.
root@gaming-router:/home/user/GTAVC# ./reVC
Segmentation fault
Oops :)
More work is required. It turns out that the game and the rendering engine (at least in the decompiled version) aren’t big-endian aware at all.
Loading the games assets will load structs (containing offsets, sizes, numbers, coordinates, etc.) directly into memory.
These structs then contain little-endian data on a big-endian platform. This causes the game to try to access memory at absurd offsets and crash pretty much immediately.
We spent several days patching the game and the librw
rendering engine to work correctly on big-endian machines.
There were 100+ mechanisms in the source code that needed to be patched, most of the patches looked similar to these:
@@ -118,6 +136,7 @@ RwTexDictionaryGtaStreamRead1(RwStream *stream)
assert(size == 4);
if(RwStreamRead(stream, &numTextures, size) != size)
return nil;
+ numTextures = le32toh(numTextures);
texDict = RwTexDictionaryCreate();
if(texDict == nil)
@@ -458,8 +477,8 @@ CreateTxdImageForVideoCard()
RwStreamWrite(img, buf, num);
}
- dirInfo.offset = pos / CDSTREAM_SECTOR_SIZE;
- dirInfo.size = size;
+ dirInfo.offset = htole32(pos / CDSTREAM_SECTOR_SIZE);
+ dirInfo.size = htole32(size);
strncpy(dirInfo.name, filename, sizeof(dirInfo.name));
pDir->AddItem(dirInfo);
CStreaming::RemoveTxd(i);
After the game loaded some asset data using RwStreamRead()
, data loaded into the structs needed to be converted from little-endian to our host-endianness.
Things like save games, settings, etc. needed the reverse mechanism to always be saved in little-endian instead.
We could now actually load the game, see the world, drive around in a car. Whenever a person/character was displayed, though, very strange graphical glitching would occur:
Player model glitching
Warning: Flashing Imagery
The following video contains flickering pictures/glitches. If you are at risk of a seizure due to photosensitive epilepsy or other conditions, please don’t watch the following video clip.When all players/NPCs were disabled, there were no glitches visible. Everything worked fine, the game was playable (as playable as it is without NPCs…).
We spent several days trying to find the bug in our code. Surely, we must’ve made some mistake when implementing big-endian support.
All applicable variables, coordinates, vertexes, transforms were dumped as numbers and compared with a little-endian version of the game.
Everything looked perfectly correct and we couldn’t find any further issues.
The project was stuck at this point for several months.
Wii U port
We found another port of reVC online: the Wii U port. Wii U’s use an IBM Espresso CPU, which is a PowerPC-based processor, just like ours. It’s also running in big-endian mode.
We contacted Gary, the author of this Wii U port and asked very, very nicely if we may take a look at their big-endian patched source code. Thanks again!
After transplanting Gary’s patches back into the regular reVC codebase (leaving all the Wii U-specific changes behind), we were able to run reVC on the TP-Link with Gary’s, known-good patches…
The exact same graphical corruption happened. Huh?!
At this point, we were looking in all directions and questioning the sanity of each and every part of the system. Kernel, GPU drivers, compilers and libraries were all suspects.
PowerPC SPE isn’t a very common architecture (it was even removed in GCC 9), with very unusual floating point extensions (different from regular PowerPC CPUs).
Disabling spe (-mno-spe
), switching to soft-floating point, switching the compilation targets to e500, e500v2, etc. didn’t change anything.
i386 Test
To prove we didn’t break the code, we connected the same GPU to an x86 machine (a trusty ThinkPad T430, via ExpressCard 34). Installed the same version of Debian 10, same libraries, same radeon driver, same firmware and compiled the same reVC source code for i386.
The game worked perfectly, with no corruption whatsoever.
Modern LLVM kernel
At this point, we wanted to try a newer kernel (with newer radeon drivers). GCC dropped support for PowerPC SPE and building a modern Linux 6.7 with GCC 8 doesn’t work. LLVM/clang has just gained PowerPC SPE support though, and Linux can also be built with clang.
make LLVM=1 ARCH=powerpc OBJCOPY="~/binutils-2.42/build/binutils/objcopy" all -j 40 V=1
mkimage -C none -a 0x1200000 -e 0x1200000 -A powerpc -d arch/powerpc/boot/simpleImage.tl-wdr4900-v1 uImage12-nvme
We needed to provide our own (PowerPC-capable) version of binutils/objcopy and ld.
The other changes required to target the TP-Link WDR4900 with a mainline kernel were pretty minor:
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 968aee202..5ce3eeb09 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -181,6 +181,7 @@ src-plat-$(CONFIG_PPC_PSERIES) += pseries-head.S
src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
+src-plat-$(CONFIG_TL_WDR4900_V1) += simpleboot.c fixed-head.S
src-plat-$(CONFIG_PPC_MICROWATT) += fixed-head.S microwatt.c
@@ -351,7 +352,7 @@ image-$(CONFIG_TQM8548) += cuImage.tqm8548
image-$(CONFIG_TQM8555) += cuImage.tqm8555
image-$(CONFIG_TQM8560) += cuImage.tqm8560
image-$(CONFIG_KSI8560) += cuImage.ksi8560
-
+image-$(CONFIG_TL_WDR4900_V1) += simpleImage.tl-wdr4900-v1
# Board ports in arch/powerpc/platform/86xx/Kconfig
image-$(CONFIG_MVME7100) += dtbImage.mvme7100
diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 352d7de24..414216454 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -345,6 +345,11 @@ adder875-redboot)
platformo="$object/fixed-head.o $object/redboot-8xx.o"
binary=y
;;
+simpleboot-tl-wdr4900-v1)
+ platformo="$object/fixed-head.o $object/simpleboot.o"
+ link_address='0x1000000'
+ binary=y
+ ;;
simpleboot-*)
platformo="$object/fixed-head.o $object/simpleboot.o"
binary=y
diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S
index 39724ff5a..80da35f85 100644
--- a/arch/powerpc/kernel/head_85xx.S
+++ b/arch/powerpc/kernel/head_85xx.S
@@ -968,7 +968,7 @@ _GLOBAL(__setup_ehv_ivors)
_GLOBAL(__giveup_spe)
addi r3,r3,THREAD /* want THREAD of task */
lwz r5,PT_REGS(r3)
- cmpi 0,r5,0
+ PPC_LCMPI 0,r5,0
SAVE_32EVRS(0, r4, r3, THREAD_EVR0)
evxor evr6, evr6, evr6 /* clear out evr6 */
evmwumiaa evr6, evr6, evr6 /* evr6 <- ACC = 0 * 0 + ACC */
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
index 9315a3b69..86ba4b5e4 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -176,6 +176,18 @@ config STX_GP3
select CPM2
select DEFAULT_UIMAGE
+config TL_WDR4900_V1
+ bool "TP-Link TL-WDR4900 v1"
+ select DEFAULT_UIMAGE
+ select ARCH_REQUIRE_GPIOLIB
+ select GPIO_MPC8XXX
+ select SWIOTLB
+ help
+ This option enables support for the TP-Link TL-WDR4900 v1 board.
+
+ This board is a Concurrent Dual-Band wireless router with a
+ Freescale P1014 SoC.
+
config TQM8540
bool "TQ Components TQM8540"
help
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index 43c34f26f..55268278d 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_TWR_P102x) += twr_p102x.o
obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o
obj-$(CONFIG_FB_FSL_DIU) += t1042rdb_diu.o
obj-$(CONFIG_STX_GP3) += stx_gp3.o
+obj-$(CONFIG_TL_WDR4900_V1) += tl_wdr4900_v1.o
obj-$(CONFIG_TQM85xx) += tqm85xx.o
obj-$(CONFIG_PPA8548) += ppa8548.o
obj-$(CONFIG_SOCRATES) += socrates.o socrates_fpga_pic.o
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index b2d8c0da2..21bc5f06b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -272,7 +272,7 @@ config TARGET_CPU
default "e300c2" if E300C2_CPU
default "e300c3" if E300C3_CPU
default "G4" if G4_CPU
- default "8540" if E500_CPU
+ default "8548" if E500_CPU
default "e500mc" if E500MC_CPU
default "powerpc" if POWERPC_CPU
This resulted in a bootable kernel. No change in the graphical corruption. It was very nice to get rid of the OpenWrt toolchain entirely, though.
qemu-user-static with llvmpipe
To make debugging a bit easier, we copied the root filesystem to a local, amd64 machine (with qemu-user-static again) and configured an X server to run on a dummy/virtual monitor. This was then combined with x11vnc to view the dummy monitor.
Section "Device"
Identifier "Configured Video Device"
Driver "dummy"
VideoRam 256000
EndSection
Section "Monitor"
Identifier "Configured Monitor"
HorizSync 60.0 - 1000.0
VertRefresh 60.0 - 200.0
ModeLine "640x480" 23.75 640 664 720 800 480 483 487 500 -hsync +vsync
# "1920x1080" 148.50 1920 2448 2492 2640 1080 1084 1089 1125 +Hsync +Vsync
EndSection
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
Device "Configured Video Device"
DefaultDepth 24
SubSection "Display"
Depth 24
Modes "640x480"
EndSubSection
EndSection
Inside the chroot (with QEMU_CPU
set to e500v2
), we ran Xorg, x11vnc and finally reVC:
export LIBGL_ALWAYS_SOFTWARE=true
export GALLIUM_DRIVER=llvmpipe
export DISPLAY=:2
Xorg -config /etc/xorg.conf :2 &
x11vnc -display :2 &
xrandr --output default --mode "800x600"
/home/user/GTAVC/reVC
… while this was absurdly slow (1 frame every ~20sec), it worked. It even worked with player models, without any graphical corruption. The main differences were:
- QEMU emulated CPU instead of real hardware
- llvmpipe instead of radeon / r600
We then set GALLIUM_DRIVER=llvmpipe
on the real hardware. This resulted in even worse performance (about 1 frame every minute!), but it worked!
No graphical corruption visible (after waiting almost an hour to get in game…).
mesa update
We then set out to update mesa on the router. This required a number of dependencies to be updated as well. cmake, libglvnd, meson, drm and finally mesa were all built from scratch. Either directly from git or the latest release.
After installing the new libglvnd, drm and mesa, player rendering started to work fine on real hardware (with acceleration!). The exact issue (and library at fault) is still unknown, but we were more than happy to have finally resolved this issue.