Lessons from Facebook crash – How to set up your web systems to reduce your risk of failure

You probably already know by now that Facebook had a giant outage yesterday which lasted 6 hours. An ultra-simplification of what happened was, there was a misconfiguration in one of the critical internal Facebook server softwares, and everyone (both the public and internal Facebook engineers) were locked out of the system. Supposedly, they had to physically travel into the physical server and use an angle grinder (yes, the hardware tool) to get physical access into the server cage to get back access to the system. A layman’s analogy is, POS Malaysia headquarters had an accidental lockdown and all mail nationwide was stopped, and they had to drive there with the key to unlock the building.

What can you do to prevent this from happening to your own website?

1. Split your web components

Typically, website ownership involves 3 components – the domain, the DNS server, and the hosting server. The giant internet services companies like GoDaddy, Bluehost, Exabytes, etc all want you to host everything with them for “convenience”, with all services under one roof. However, this actually raises your risk – never put your eggs in one basket.

Talk to your web service professional to split these components into 3 separate providers. This way, if one fails, you can always switch to a failover.

a. For domains, I recommend registering them with a reputable large company, as this is the cardinal component of your web presence. GoDaddy, Namecheap, Domain.com, etc are all good candidates.

b. For DNS servers, I use Cloudflare, as they have loads of other beneficial web services as well, and has a great Free tier. There are other DNS servers out there too, such as DNSME, Amazone Route53, Sectigo, etc.

c. For hosting servers, there are a billion options out there, with a long list of variables for different use cases, so if you need actual suggestions, please comment below and I’ll try to provide advice. Generally I stay away from shared hosting and use VPS servers like Vultr or Upcloud.

*You may be wondering, why didn’t Facebook just do this? Well, they’re too big to use 3rd-party tools so they had to build their own. This gives them more freedom, flexibility, and performance – but as Uncle Ben says, with great power comes great responsibility. Misconfigurations happen all the time in tech, and some are worse than others – as we saw happen yesterday.

2. Host your backups offsite, and make backups of your backups

The Facebook incident didn’t affect the data, but you can bet your ass that they have multiple redundancies and backups of their data. Question is, do you?

A common mistake in many website owners is an assumption that “the server guys should have a backup”. Never make this assumption. Also, even if they did, chances are, they will charge you for restoring the data, or it may take a few days for them to do it, or the backup version is too outdated for you to restore critical data. Plus, if something were to happen to them, you’re dead in the water.

For example, on 10 March 2021, OVH, a major server provider had a fire in one of their datacenters, and some customers lost all data permanently. They couldn’t restore the backup, because the backups were stored in the same place, and everything burned to the ground (together with some website owners’s hopes and dreams, I’m sure).

Don’t let this happen to you.

The solution is simple – talk to your web professional to make OFFSITE BACKUPS, i.e. backups that are stored separately from the hosting server. For instance, Google Drive, Dropbox, or your hard drive at home. I would go one step further and make a secondary backup, stored elsewhere, just in case. As the saying goes, prepare the umbrella(s) before it rains.

3. Take ownership of your website

If you outsource your website to a freelance web professional, never fully rely on them to take care of everything for you. Always know where your systems are (point #1 above), where your backups are (point #2 above), and learn the basics of how they work. At the end of the day, your website is your responsibility – learn how they work, so that if shit hits the fan, you’re able to do something about it instead of letting the fan keep spinning for days and weeks.

Good luck!

Picture of Michael

Michael

Leave a Reply

Your email address will not be published. Required fields are marked *

About Me

I run Pixl.my, a managed hosting service in Kuala Lumpur, Malaysia.

This is a place for me to pen my thoughts.

Recent Posts