Everything you ever wanted to know about security policy management, and much more.
Social media giant Facebook was involved in a network outage on the 4th October 2021 that lasted for nearly six hours and took its sister platforms Instagram and WhatsApp offline.
As the story developed, it became apparent that the incident was caused by a configuration issue within Facebook’s BGP (Border Gateway Protocol), one of the systems that the internet uses to get your traffic where it needs to go as quickly as possible. The outage also cut off the company’s internal communications, along with authentication to third-party services including Google and Zoom. Some reports suggested security passes went offline, which stopped engineers from entering the building to physically reset the data center.
The impact was felt worldwide, with Downdetector recording more than 10 million problem reports, the largest number for one single incident. Facebook released an official statement following the outage stating: “Our engineering teams learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.”
While Facebook has assured its users that no data has been lost in this process, the outage is a stark reminder of how small configuration errors can have huge, far-reaching consequences.
At the fundamental level, Facebook suffered from a lack of application availability. When a change was actioned, it caused a major chain reaction that ultimately wiped Facebook and its related services from the internet because they couldn’t see the entire lifecycle of that change and the impact it would have.
To avoid an incident like this in the future, organizations should consider a few simple steps:
The outages across Facebook’s infrastructure highlight the operational risks all organizations face around faulty configuration changes which can drastically impact application availability. Intelligent automation, thorough change management and proactive checks are key to avoid these outages.
Receive notifications of new posts by email.