2016 has seen high profile outage incidents hit a wide range of organizations. Delta Airlines experienced major system outages following a power failure at its main data center in Atlanta, which resulted in 740 flights being canceled and thousands delayed. As storms hit Sydney, Australia in June, Amazon Web Services in the region went down for around 10 hours, disrupting a range of services from banking to pizza deliveries. And of course, cyberattacks also took their toll: we saw the world’s biggest ever DDoS attack, powered by the Mirai botnet, targeting the company which controls much of the internet’s domain name system.
In this dynamic landscape, it’s more important than ever before to ensure that your organization can respond as fast as possible when a serious incident strikes. Part of this is about sophisticated forensics and analytics – being able to identify exactly what has happened and repair the damage. It is also about being able to get your systems back online as quickly as possible. In short, effective disaster recovery is a key part of your overall cybersecurity posture.
Every forward-thinking large organization must have a contingency plan in place in case its primary site is hit by a catastrophe – which, remember, could just as easily be a physical or environmental problem like a fire or flood, as well as a cyberattack. This involves creating a disaster recovery (DR) site in another city or even another country, and replicating all the equipment that is used at the primary site. This also means replicating the primary site’s network security infrastructure.
Here’s where the difficulty comes in. Let’s imagine a primary site containing routers, firewalls, servers and so on – and a DR site set up exactly the same. The problem is that just installing the same equipment in the same configuration isn’t enough. All of those devices rely on security policies, and the likelihood is that these policies change on a daily or even an hourly basis, every time applications and users are added, amended or removed. So whenever you make a policy change in primary site you need to ensure that an equivalent change is made in the equivalent part of the DR site.
This means that you need a connection between the two sites to automatically replicate policies every time they change. How you build that connection will depend on your exact equipment and setup; and it’s not always easy.
The most straightforward scenario occurs when you have the same equipment from same vendor at each site – and that vendor offers a unified firewall management system that means you can have the same policy applied to the security devices on both sides. Make the change once in the firewall management system, and the firewall management system will push the change out to each site.
More complex scenarios occur when you don’t have such a firewall management system – or if you’re using equipment from different vendors at each site. In such setups, the security policies at your two sites are not identical – they potentially even speak different languages – and if you rely only on human process to synchronize the two sides, the policies will eventually diverge. A better choice is to use an automated system to maintain the synchronization.
The last thing to consider is the IP addresses in place at your primary and secondary sites. Are they identical, or as is more likely, are you mapping between IP addresses on your main site and their counterparts on your DR site? In this case, the rule you install at the secondary site needs to be slightly different from the one at the primary site – and again, you will need some sort of automated system to maintain the correct mapping between IP addresses.
It is essential to consider all of these aspects of security policy management when building your DR site. If you neglect them, when disaster happens and you need to switch operations to your secondary site, your systems and applications won’t work as you need them to.
In 2017 and beyond, incident prevention will no longer be enough to ensure robust readiness to cyber threats. You also need to ensure that your incident response is as slick and unified as possible, so that when the worst happens, you can get back on your feet lightning-quick. Having your security policies configured and orchestrated across your entire organization is a key part of this.
If you’re interested in discussing this further in person, visit me at AlgoSec’s booth 1133 during RSA.
Also make sure to check out my recent “Professor Wool” whiteboard video on disaster recovery:
Receive notifications of new posts by email.