Postmortem September 17th outage and rollback

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
The Kiwi Farms went down at approximately 5:30am EU. The database locked up unexpectedly and the site became unavailable. @CrunkLord420 correctly identified it was a disk issue, so I called in the two guys who help me manage our storage array ("raid"). They diagnosed that all four enterprise raid NVMe harddrives had failed simultaneously, which wiped out the entire database and all forum software. I verified we had a fresh copy of the backup and a recent-ish backup of the actual software.

I had two options moving forward:
1) Pull out remote hands on a US Saturday night and have them determine if the drives can be salvaged and overnight parts to fix whatever has actually broken, or
2) Completely reinstall the entire forum on a different raid until we can do #1 and then move it later.

I decided to go with #2 as the faster but more laborious option, which is why we've had data rollback.

This is was very close to a total nightmare scenario where the server would be completely destroyed. I've reinstalled the Kiwi Farms so much at this point that I know the procedure very well, and it only took about 7 hours, where most of that was just waiting on the database to import over 120,000,000 post stickers.

Anyways, sorry, we're back up. Let me know if there's any weirdness. I'll get chat up in a bit.

It's shit like this that makes you the GOAT (Greatest Of All Time) of the Kiwi Farms.
 
Back