By Guy Warren, CEO of ITRS Group
We are now less than six months away from the FCA’s operational resilience requirements coming into force for the UK’s financial services sector, and just under two years since the parliamentary committee called on regulators to intervene following TSB’s IT meltdown.
Far from moving towards greater operational resilience in that time, businesses’ IT estates have only grown larger and more unwieldy. The rush to adapt to pandemic-enforced digital transformation has seen many rapidly move to insecure work-from-home systems, combine cloud and physical premises, and spread their estates over numerous new third-party providers with a view to slimming down their business models through outsourcing.
As a result, the financial services sector has opened its doors to new silos and operational blind spots and weaknesses, putting the industry at greater risk of IT failures and capital loss – not to mention creating huge inefficiencies.
It’s not the first time operational systems have been put to the test, but it is the first time that the C-suite will be front-and-centre if anything goes wrong. New regulations will see individual senior managers facing hefty fines among other punitive consequences.
With six months to go until the new regulations come into force, here are six top tips that all UK financial businesses should prioritise to achieve operational resilience and avoid regulatory and reputational backlash:
- Identify your transaction flows
The past fifty years have seen an astounding evolution of operations within the financial sector, layering new technologies onto legacy systems with little pause for thought.
These old IT systems are connected with new development and deployment techniques, creating unmanageably complex estates and rendering observability of transaction flows almost impossible.
To achieve operational resilience, firms must identify the paths which the key services use and they must target and remove any points of weakness and build on modern, up-to-date software that can operate across multiple computers so that, if one fails, the rest are able to pick up the slack. But, for a long time, it’s simply been seen as too expensive to overhaul.
Thankfully, third-party vendors are helping firms get their IT estates up to scratch quickly, consistently and affordably, without the need for a complete redo. Such vendors may be the key to allowing struggling firms – particularly smaller ones – avoid legacy rot and move into the next era of digital transformation with minimum cost and maximum efficiency.
Of course, this is not a one-and-done process. As firms inevitably continue in their pursuit of digital transformation, they must keep in mind that constantly adding to the estate means no one is taking away yesterday’s work. Businesses must instead seek to replace or update the outdated bits. After all, it’s not digital expansion, it’s digital transformation.
They must also make sure not to rush. Over 60% of outages occur as a result of poor change management and could be avoided with more careful planning and a system to fall back on if things aren’t up and running in time.
- Get to know your performance and uptime
Businesses will soon be expected to declare the level of performance and uptime they are prepared to commit to and stick to it. This is something firms should start thinking about today as it will require significant historic data to accurately calculate.
Google has popularised Site Reliability Engineering (SRE) the gold standard of uptime monitoring and performance delivery for internet giants and, increasingly, any firms with digital transformation ambitions. The SRE approach involves tracking data and trends over a long lifespan to identify and quickly fix degrading performance levels, and uses both Service Level Objectives (SLOS) and Service Level Indicators (SLIs) as a two-phase early warning system to ensure they are never close to being in breach of their SLA.
Less digitally-native sectors like banking should be following Google’s suit and pursue an SRE approach to operations. While Google has the benefit of massive resources and an incredibly experienced team dedicated to the monitoring of this data, third party providers can support smaller businesses with remote specialists and purpose-built software.
- Optimise Cloud usage
The pandemic has been an added boost to the Cloud’s upward trajectory, seeing 9 in 10 firms push cloud usage slightly or significantly higher during this period. Concurrently, the number of organizations spending at least $12 million on cloud annually has nearly doubled this year compared to last, with a whopping 30% of that going to waste.
This is because accurately forecasting cloud costs and demands is way more challenging than most would first assume. Far from being a like-for-like ‘lift and shift’ transition, in which the Cloud estate is mapped out as a virtual equivalent of the physical estate, moving to the Cloud requires enormous planning if it’s to be done efficiently.
A comprehensive stock take of the demand profile of business workloads is a critical first step. Firms must begin by right-sizing their estate and developing a thorough understanding of workload behaviour and demand profiles via detailed analytics.
Once a company gathers all this information, it can optimise its environment for the right workload configuration and accurately plan its monthly cloud spend based on a right-sized environment. This means more accurate instance sizes and, in the majority of cases, decreased financial input.
- Know your limits
Say you’ve had an issue with change management or Cloud migration and a few of your customers are experiencing IT failures. As soon as the rest of your customers get wind of this – which could be within minutes in the age of Twitter – they all rush to check their applications, compounding the issue with overcapacity.
In order to know for sure that the production environment is going to run properly at peak demand, pre-testing is essential to gauge what it can withstand. Firms need to not only identify the overall capacity ceiling of their systems, but specific bottlenecks and pinch points that can affect overall performance.
The right software will enable firms to model certain levels of demand on their systems. Load testing can simulate the number of users on a platform to see at what point the system will fail and provision for it precisely.
Underpinning this is the dire need for monitoring. Though the pandemic and oncoming regulations have rendered operational resilience more essential than ever, fewer businesses are now deploying security monitoring tools or undertaking any form of user monitoring compared with this time last year.
This is a trend that needs to be reversed as soon as possible if firms are to meet the March 31st deadline. With different disparate data and flashing alerts all flooding in at the same time, manual processing is inadequate and the right technology is crucial. By onboarding a proactive monitoring system that encompasses physical, cloud and third-party estates, firms can suppress the white noise and hone in on what’s valuable in real-time, helping them predict and mitigate IT failures before they occur.
- Integrate security into operations
In 2020, four in ten businesses (39%) in the UK report having cybersecurity breaches. Among those, around a quarter have experienced them at least once a week.
As opposed to traditional conceptions of security as separate to operations, firms must begin to integrate it into their operations and operational mindset from the get-go. Everyone involved in production should be trained with equal awareness of the critical importance of cybersecurity to ensure that not a single person in the business will let in that Trojan horse. This is particularly important in a COVID-normal world where remote working is increasingly the modus operandi for many.
The new best practice approach involves Zero Trust Networks – challenging firms to provide proof for each transaction made, even inside their own data centre.
- Nominate a Chief Resilience Officer
Finally, businesses that want to get on the front foot of new senior management requirements – namely SMF24 in the UK – should look to designate a senior leader to focus solely on operational resilience so that the c-suite’s slate is clean by the time they come under scrutiny. The fact SMF24 will backdate past discretions makes this all the more important to get on top of today.
Many have already made the start, with the UK’s banking and financial services sector now boasting over 60 individuals with the title Chief Resilience Officer (or equivalent) on LinkedIn.
And make sure the CRO is empowered to do what is necessary! The are on the line personally with the regulator.
The countdown is on
The resilience of IT systems no longer falls to the back office. To meet requirements on time and avoid punitive consequences, including hefty fines on individual senior managers, the c-suite must commit serious investment towards data analytics and estate monitoring technology.
Come March 31st, there will be no excuses made for shortcuts or sub-par capabilities. While it might seem costly at a time when most businesses are operating off of small margins, the bottom line is this: if you say you can’t afford to prioritise the operational resilience of your systems, then you can’t afford to be in business.