How many critical IT events does your organization have in a year? Recent research by analyst firm Quocirca states that on average, these events happen three times per month, with each costing over €100,000 to remediate – adding up to tens of millions annually while causing serious reputational damage to the business.
The research defined a ‘critical IT event’ as “when a business application or supporting infrastructure is down, or has a malfunction, whereby a business process is halted, or users are unable to reasonably carry out tasks and transactions.” The research found that the average time to repair these events was 6.8 hours – which goes a long way to explaining the hefty costs and business impacts associated with these types of incident. As a result, survey respondents stated that downtime had become their top IT-related concern over the past year.
In these complex, interconnected IT environments, a critical IT event is often caused by simple changes or errors when configuring these devices or managing network security policies.
Security changes to filtering rules or traffic routing are part of the daily routine of network management: a large enterprise’s IT team may well process hundreds of change requests a week. A majority of these will be made without any problems, but unfortunately mistakes can and do happen whenever there is a manual part of the processes – all it can take is a simple mistype of just one letter, as my colleague Joe DiPietro points out in his recent post.
In fact, errors such as these, or minor technical faults, cause more business downtime than sophisticated cyberattacks: our own 2016 State of Automation survey found that 48% or organizations had experienced a network outage resulting from manual change processes.
The problem is made worse by the fact that there are multiple groups and teams within the IT department that handle these types of changes – and changes are processed manually, one group may not always be aware of the changes that another has made. For example, say the Platform or Operating System group needs to allocate a server. Next, the server team needs to add software to that server. Then the networking team needs to create routes in the network to access the server, and finally the security team needs to apply the appropriate security policies. As you can see, even in these four relatively simple stages, there is tremendous potential for a misconfiguration or outage unless all teams are aware of the others’ activities, and have a streamlined process for collaborating effectively.
So what can organizations do to help prevent these outages happening, or at least reduce the time it takes to fix them? The answer is to use more automation to handle these increasingly complex processes, and to reduce the amount of error-prone manual input when handling changes to configurations and policies. Put simply, by reducing the amount of manual intervention and changes needed in your network, the less likely it is that mistakes – and outages – will occur.
An efficient automated system will be aware of all the different business applications on your network and their respective connectivity flows, and will be capable of simulating traffic and checking whether changes you plan to make will affect application traffic, and what that effect will be on other parts of the network. This enables IT teams to perform impact analyses on planned changes before they make them, so any negative effects can be minimized. This is especially relevant to large enterprises with complex environments comprising both on-premise and cloud applications.
The automation solution should also support automatic change monitoring, which alerts the various IT and application teams whenever a change to a device’s or network’s configuration is made. This helps to quickly identify, diagnose and remediate technical issues arising from the change: it’s also a requirement for several industry regulations, such as PCI.
In the Quocirca research report, analyst Bob Tarzey stated, “As IT complexity grows, critical IT events are inevitable in all organisations. The sooner these are dealt with and lessons learnt, the sooner IT staff can stop firefighting and return to delivering value.” Automating IT and network change processes is critical to helping IT teams do this, by eliminating costly human errors and their consequences.
Receive notifications of new posts by email.