By Matt Davies, Senior Director at Splunk

Always-on. 100% uptime. Low latency. Financial services businesses expect a lot from technology. After all, time is money. And by-and-large, it tends to deliver. But even if your digital estate is running to five nines, there’s still a 0.001 per cent, which has a cost. And the potential price tag of a lost trade or a missed deal can be astronomical. It’s not just the initial monetary outlay, the cost can also be less obviously tangible through reputational damage or the resulting lost clients and customers. These losses have made technology downtime a bigger concern than potential security issues for businesses today – according to recent research by Quocirca.

We recently took a look at critical IT events (or IT outages in layman’s terms) and how they affect businesses, particularly in the financial services industry. Hybrid cloud has made IT infrastructure more complex, while both companies and consumers are more reliant on technology than ever before. This can often make critical IT events all the more critical!

It could be an outage on a high frequency trading platform. It could even be an email provider experiencing service downtime meaning customers don’t receive their bank statements on time. Or worse still, it would be an IT outage that stops customers being able to take their money out of a cash machine. These events are increasingly in the public eye, causing financial and reputational damage.

CIEs do happen and they happen all the time, across every industry. In fact, on average, companies will experience three per month. But the real test is not whether you get knocked down in the first place, it’s how quickly you (and your systems) get back up and running.

The time it takes to get back up from these critical events is almost seven hours on average and takes a team of 18 people. That’s a lot of time for a critical service like a banking application or trading platform to be down. When we say there’s a tangible cost, in the financial services industry, it is, on average £105,035 per CIE alone. Breaking that down, the cost to business in each time the system goes down, is a pricey £84,839, with the cost to IT being £20,196. Customers have high expectations of their financial services providers, so the reputational impact can be considerable.  Companies are, of course, constantly aiming to bring down the time it takes for the IT department to respond, but even a significant reduction still leaves a cost.

Add the cost to IT to that of reputational loss and it is costing businesses £315,608 per month, and a staggering £3.8 million a year. That’s the kind of figure that’s going to make senior management sit up and listen.

The reason that businesses are more concerned about downtime than security is that these events aren’t going away – in fact, they’re growing in number. Almost half of the financial services businesses surveyed as pasrt of Quocirca’s research suggested that they had suffered multiple critical IT events, which have in turn caused reputational damage to the business. Therefore the need to bounce back more quickly, reduce their “criticality” and minimise the impact on the business – and ultimately the financial impact – has never been stronger. Performing constant analysis, as many companies already do, is vital to reducing the likelihood that the same event will happen in future. Using machine learning for focused investigation, intelligent alerting or predictive action is a key way to keep on top of any outages before they happen.

There are some barriers in the way though. For instance, although mobile devices are very active within every enterprise, they’re the devices where companies have the least visibility. This makes finding the root cause of these problems and potential downtime even more difficult . Organisations with higher levels of operational intelligence will be able to capture insights from all data sources across the business. This visibility reduces the time it takes for businesses to bounce back, and it also heightens detection of future events and reduces how critical they become. As an example, UniCredit Business Integrated Solutions (UBIS) has been using Splunk Enterprise to quickly identify issues and proactively prevent incidents. With proactive monitoring in place, the customer service team has seen a significant improvement in service quality and gained new efficiencies. About 40 per cent of incidents can now be managed before becoming evident to end-users.

Because infrastructure elements vary, different skills need to be activated in response to different events. Bringing these skills together quickly is a challenge, but arming them with total visibility of the infrastructure right away helps ensure success. These CIE teams can resolve and analyse these unplanned events quickly and bring the focus of IT back to innovation and driving value within the business.

Related Articles