Why do we continue to see large scale system failures within banking? A major global bank most recently suffered a “technical glitch”, impacting the payment of 275,000 customer BACS transfers; however, this was not an isolated episode, following a series of major incidents among other major banks in 2015.

We have an increasing reliance on 24/7 banking for direct debits, web services, Faster Payments and mobile payment services such as Barclays Pingit. Even short term failure in these services can have a major impact on our lives, affecting payment of wages, state benefits or mortgage completions. As many of these services have become mainstream within banking, the IT infrastructure needed to support these new services has become increasingly critical to business operations. The banking system is highly complex and integrated and so failure in any part of the supply (payment) chain (e.g. major clearing bank) is likely to have knock-on impacts at other institutions, increasing the potential number of customers impacted.

In most cases, IT hardware, supported by outsourced IT services, have become inexpensive so why has implementing resilient IT systems become so challenging?

One of the challenges facing many of our U.K. and global banks is the ageing IT infrastructure on which they rely. Excessive maintenance costs, conflicting processes and degraded business agility are increasingly materialising into strategic risks for the banks. To manage these risks, banks need to modernise their core IT infrastructure; however, according to recent research by Protiviti, less than a third of financial service companies surveyed have major modernisation programmes in place. The reluctance is understandable given the considerable costs and risks associated to such projects. These change projects can run into the hundreds of millions or even billions of pounds and have complex regulatory compliance requirements.

Interestingly, the push for resilient IT systems and processes has itself led to different challenges for the banks to address. Many banks replicate processing environments in order to provide real-time failovers for production systems. However, the issue with real-time replication is that incidents originating within the production environment, such as security vulnerabilities, data integrity errors or poorly written code, can replicate themselves into the failover environments. This has the potential to make the failover useless. Rolling back to a nightly backup from the day before is often no longer an option given the time taken and volume of data which could be lost.

As a result of the increasing amount of innovation in the industry, existing system change and release cycles are not adequate to meet the demands of the business. Many organisations are trying to move to more agile development cycles, but this is often difficult to achieve in large banking environments where the impact of changes can have such a major impact on their customers and business. If implemented incorrectly, there is a risk that increased agility could lead to greater error rates in development and implementation, increasing the likelihood of system downtime and IT resilience issues.

Given the widespread media coverage of cyber attacks and the cyber threat to organisations, there is a common misconception that many of these system failures are caused by security incidents. Research by IBM suggests that whilst by cyber security does present a risk to IT resilience, this is outweighed by human error and system failure[i].

The IT resilience challenges in the industry will present opportunities for some. The growing challenger bank market (new entrants to the market such as Metro Bank or Virgin Money) often do not have the same level of legacy IT environment, presenting the opportunity to be more agile and responsive to market needs. Many are also looking at alternative business models, including outsourcing IT as a service to banking platform providers. Some of these approaches do have different risks associated to them, but the greater flexibility for them to develop alternative operating models gives them the opportunity for competitive advantage in the future. The incumbent ‘big 4’ banks in the U.K. will no doubt be watching Apple Pay over their shoulders to see how the product develops over the coming years and the threat that it poses to more conventional banking.

So how can banks most effectively try and address these issues? There is no one solution to solving the problem, although investment in ageing IT infrastructure and testing of resilience is obviously important. Given the size and complexity of the IT environments within even mid-sized banking organisations, focusing investment in the right places will be key. In our experience within the industry, IT risk management practices within even some of the larger global banking organisations continue to be poorly aligned to operational risk. Unless the interdependencies between underlying IT infrastructure, business applications and the services and business processes can be effectively understood and managed, it is difficult for business executives to fully understand the areas on which to focus, making large scale investment more challenging.

With the increase in large scale incidents, it is not surprising to see increased scrutiny coming from the financial regulators. We are seeing increased fines coming from the Financial Conduct Authority (FCA) and IT resilience is included within the FCA’s seven-forward looking areas of focus for 2015/16. This is particularly the case where failover has a direct impact on customers and is evident from the 2014 fines for some well-known banks. The FCA attributes the failures in the banks’ IT risk management processes over system change and design processes.

In order to address these risks, executives need to understand and clearly articulate the benefits and risks associated to IT resilience programmes and then develop a manageable roadmap for the future. Without this clear approach, these system failures are likely to become increasingly common and regulatory oversight will increase.

