Reviving a dead Gigabyte MJ11-EC1 mainboard

Diagnosis and Fix of a dead AMD EPYC Embedded server mainboard

Used Gigabyte MJ11-EC1 mainboards are available for low prices on ebay and at other hardware vendors.
After buying a couple of MJ11-EC1 boards for a homelab cluster project, I ended up with a dead board.
The server would turn on, BMC was reachable and could be used, but the server itself would never POST.
There were no POST codes, no beeps and no signs of life whatsoever.

Memory was tested (and OK), still swapped for a different DIMM, no change. CPU Vcore was present and stable.
Reflashing the BIOS image with the BMC didn’t change the behaviour either.

After logging onto the BMC via SSH and manually imaging the BIOS flash chip (via the /dev/mtd7 mapping), I noticed the checksum didn’t match the image that was just written onto the flash. After re-flashing and imaging the flash a couple of times, I noticed some bits were always 0 in the BIOS image. This pointed to a defective flash IC.

The original BIOS flash chip was then desoldered from the mainboard:
Close-up of a server mainboard, with MX25L12873F IC

and plugged into a MiniPro TL866 EEPROM programmer, erased completely and read back:

Screenshot of MiniPro v6.85, doing a Chip Blank test, reporting Not Empty!, address 0x06 as containing 0xBF

Oooops! As you can clearly see, this freshly erased chip has a stuck bit at address 6 (and others).
This explains why the server didn’t POST anymore – the BIOS image was always getting corrupted.

The chip was replaced with a new 128 MBit SPI flash ROM (taken from an ESP32 dev board) and lo and behold: Server mainboard with replaced SOIC-8 IC, XMC25QH128 Monitor, showing Please wait for BMC initialize... message

The server was back to life. This was the first time I ever saw a dead/stuck bit NOR flash chip in my career, so these things are pretty rare. Still something to be aware of.