A couple of weeks ago I blogged about best practices for disaster recovery. Then last week Amazon Web Services (AWS) recently suffered a system outage – caused by a typo – which led to downtime for a significant number of its customers.
Users of Netflix, Airbnb, Reddit, Tinder and IMDb among many others were unable to access services during the outage, while business systems including Slack and Trello were knocked offline in some regions. What’s more, there were unexpected consequences as Ifttt, a web service which allows users to link together services from multiple providers into one seamless operation, was also knocked offline by the outage. This led to problems such as “smart home” owners reporting that they were unable to lock doors or switch on lights, as their IoT systems lost connectivity. The scale and number of business services and systems impacted by the outage underlines just how complex and interconnected today’s cloud infrastructures have become.
Public cloud providers offer an appealing infrastructure solution for businesses that are seeking scalability, flexibility and cost-efficiency. But, as this recent outage showed, utilizing the public cloud can also affect the day-to-day operations – and, ultimately, the bottom lines – of organizations which are dependent on the resilience and security of systems that are outside of their own environments, and therefore outside of their control.
This is a very different scenario to, say, the American Airlines outage earlier this year, where a misconfigured router resulted in service disruptions and grounded flights. Sure, this was a damaging incident, but it was ultimately contained within American Airlines’ network, and the fall-out was mostly restricted to American Airlines customers. The American Airlines outage was a “traditional outage” scenario, for which there are measures that organizations can take to reduce the chances of an outage occurring in the first place.
In contrast, the effects of a public cloud outage can ripple outwards, spreading to a variety of organizations that are customers of the cloud provider – and then impact in turn on their customers. And of course, those customers may be completely unaware of the original cause, which means that frustration and reputational damage may be leveled at the organizations using public cloud services, rather than the cloud provider itself.
However, regardless of where your assets are stored and managed, in a traditional data center or in the cloud, you are still responsible for them. Just because you are entrusting them to another organization’s datacenter doesn’t mean you can ignore ensuring their availability.
Any organization that is using public cloud services needs to be prepared for this scenario. From a technological standpoint, it is vital to have comprehensive visibility and control of the entire hybrid environment – including the public cloud – so that your IT team can quickly ascertain, when a problem occurs, whether it is within your own infrastructure or not. It is also vital to have a comprehensive, efficient and high availability data backup, disaster recovery and business continuity procedures in place, so that you are not relying on a public cloud provider’s own timescale for getting services back online.
Regardless of whether your infrastructure is on-premise or in the cloud it is critical that you have an action plan in place so that if something does fail you need are able to ride the waves that will inevitably be coming your way.
For more tips and tricks of what to do when the cloud goes down click here.
Receive notifications of new posts by email.