AlgoBuzz Blog

Everything you ever wanted to know about security policy management, and much more.

Search
Generic filters
Exact matches only
Search in title
Search in content
Search in excerpt
Filter by Custom Post Type
Posts

The summer of network misconfigurations

by

2,300 flights grounded across the US costing airlines an estimated $54 million (!!) in lost revenue and increased costs. A bank’s customers’ losing access to their accounts. Businesses in New England losing telephone services. A flash flood warning mistakenly issued for Washington DC ……..the list goes on and on.

What links all of these incidents? They are all the result of network outages during the month of July – costing millions of dollars in lost revenue and remediation costs, inconveniencing large numbers of customers, and damaging business reputations. And these are just a handful of the outages that we identified during the month of July. Let’s have a look at some of them in more detail.

Southwest Airlines, USA

On July 20th Southwest Airlines’ reservation systems went offline, taking five days to return to ‘business as usual’, leading to the cancellation of 2,300 flights – equivalent to about 12% of its schedule over the period – and delaying a further 8,000 flights.  Southwest revealed that the outage was caused by router failure which caused other systems to crash, slowing the airline’s systems so much that other functions became overloaded and froze up. The problems were further complicated when backup systems and the disaster recovery deployments also failed to work.

While the reasons for the router failure are unclear we know that the majority of business downtime is caused by simple misconfigurations and faults rather than sophisticated cyberattacks. Indeed our 2016 State of Automation survey found that 48% or organizations had experienced a network outage resulting from a manual change process. It serves as yet another reminder that even the slightest mistake in setting-up a router can have a significant impact on a business.

First National Bank, South Africa

On the 24th and 25th July customers of First National Bank in South Africa couldn’t access various services, and some were unable to withdraw funds during ‘payday weekend’. The bank explained that the outage was caused by an upgrade to the network.

Thousands of customers took to social media to complain about the outage, with some threatening to switch banks and others questioning why the upgrade was performed during a peak traffic time for the bank. As we recently blogged, the timing of an upgrade or update to the network needs to be carefully planned to ensure that if any issues arise, the impact is minimal. In this case, the timing of this outage once again highlights the reputational damage that organizations face if an upgrade doesn’t go to plan, particularly in key ‘trading’ periods.

Comcast, USA

In the middle of the month Comcast, whose business telephone network serves around 1 million US businesses in 39 states, suffered connectivity issues that prevented their business customers from being able to make or receive calls.  The problem was blamed on ‘issues within the network’ and was fixed within 24 hrs. The delay in fixing the problem suggests a lack of the necessary network visibility required to respond quickly to any connectivity issues – something that security policy management solutions provide.

National Weather Service, USA

“A major network issue” at the National Weather Service prevented the organization from issuing forecasts and weather warnings, and led to a false flash flood warning being issued for the Washington DC are. When explaining the fault, the National Weather Service said that it was in the process of “executing network and infrastructure upgrades that will reduce the likelihood of a similar outage in the future”.

The National Weather Service did not specify a link between the outage and the network upgrades, it is common for such projects to cause unexpected outages. Any change can effect existing application connectivity flows – something which solutions that automatically map applications and proactively assess the impact of application connectivity changes can alleviate.

With so much emphasis placed on the external threats posed to corporate networks, the fact is that misconfigurations and internal error can be equally if not more damaging to the business.

[Author’s: For those of you that are asking, this week’s headline news – Delta airline’s outage – was the result of a power outage rather than a security device misconfiguration or connectivity issue, which is why it isn’t included it in my blog post.].

Subscribe to Blog

Receive notifications of new posts by email.