Two US airlines recently suffered major network outages that left flights grounded and caused delays for tens of thousands of passengers. In July, a router misconfiguration at United Airlines grounded more than 90 aircraft for over two hours. In a statement, the company confirmed that “a router degraded network connectivity for various applications” but did not reveal any details about the specific router that caused the issue. Then last week, American Airlines experienced a “connectivity issue” that halted flights at three of its US-based hub airports.
These incidents remind us that even the slightest change or error in configuring security policies can bring an entire organization to its knees. Yet security changes are made to devices on a daily basis, whether these are alterations to filtering rules or changes to the traffic routing – networks are in a constant state of change. For organizations the size of United Airlines and American Airlines, 200-300 change requests per week would be a typical volume. Most of these will go through without any problems, since they’re normally allowing additional traffic or eliminating traffic that is no longer in use.
Unfortunately mistakes can, and will happen through simple human error. Whenever you have an element of manual input in your processes, there is always the chance of someone mistyping something or accidentally deleting a file (so-called ‘fat finger syndrome’). While some of these mistakes have little impact, others can have severely damaging consequences. In fact, the majority of business downtime is caused by trivial technical faults rather than sophisticated cyberattacks, yet organizations seem to devote far more resources to preventing the latter.
So what can you do to prevent these outages and minimise their impact when they do happen? The simple answer is more automation. The less human input you have in your IT processes, the less likely it is that mistakes will occur. For example, there are systems that automatically configure routers and filtering devices without any need for any manual keying in.
Something else that can help is automatic change monitoring, whereby an alert is issued whenever a configuration change is made. Change monitoring is a vital component of your network security defences but it’s something that many companies neglect. Sound change monitoring enables you to reduce the time it takes to diagnose and remediate technical issues, and is actually a requirement for compliance with a wide range of regulations (e.g. PCI).
You can also reduce the damage caused by misconfigurations by automatically backing up network configuration files, so that if something does go wrong, you can quickly revert back to a recent healthy state. Automating this process is far more dependable way of doing this rather than relying on employees to keep regular backups.
Lastly, you should have a system that’s aware of all the different business applications on your network and their connectivity flows, and that is capable of simulating traffic and checking whether the required traffic is allowed. You can then carry out an impact analysis to identify how a change of one element would affect the rest of the network. This is particularly relevant to large organizations like airlines that have hugely complex legacy IT systems and ever increasing volumes of data to contend with.
In today’s 24/7 digital age, large organizations like airlines simply cannot afford any downtime, such are the financial, operational and reputational ramifications. However, with manual processes, there is always the possibility of human error causing technical failures, which is why greater automation is the key to stopping your network being grounded.
Receive notifications of new posts by email.