By Timothy Wiffen, head of Calypso practice and Alistair Milne, product manager, Formicary
Although rare, there does seem to have been a spate of IT problems hitting trading floors in recent years, resulting in suspended trading and trading losses, as reported in the Wall Street Journal.
In April 2014, a technical glitch at CME Group Inc. halted electronic trading in corn and other commodities, and saw the world’s largest futures exchange operator using shouting floor traders to fill orders instead of the computers that have largely replaced them. The problem was rectified after two hours but its impact was widely felt as futures and options trading in 31 different markets ranging from corn to wheat to live cattle to rainfall futures were halted.
More recently, in June 2014, NYSE Liffe said a technical glitch halted trading in billions of euros of futures contracts tied to European money-market rates, leaving some traders frustrated for several hours.
When an exchange is impacted it is a very public affair but every financial institution knows that problems with their internal trading platforms can be just as damaging. Any slowdown in transaction processing – whether it’s over a few minutes or several hours – can have dramatic consequences, particularly as the move towards superfast high volume trading continues at a pace.
Put simply, the quicker a problem can be identified and addressed, the less the financial impact.
It’s vital therefore that IT teams address technical issues, however minor, as soon as possible, before they escalate and go on to potentially impacting business performance.
Ideally, this means being aware of problems the precise moment they occur, not after the event when the causes of the issue may be harder to identify and there’ll be an ever growing number of log files to trawl through.
Easier said than done for many financial IT departments however. Their resources are often already stretched as they try to strike a balance between accommodating evolving regulations such as EMIR or Dodds Frank and addressing traders’unrelenting push for reliability and speed.
This pressure means that even the most diligent systems manager often finds themselves firefighting one problem before moving on to the next. There is little time for any post mortem into why a problem occurred so that the necessary proactive measures can be put in place to avoid a similar instance occurring again.
Monitoring the monitoring
Of course, most firms will have some form of platform monitoring in place, often requiring a sizable team to manage the process it on a daily basis.
While such tools advise when a major problem – such as a complete outage – has occurred, by the time the error message has eventually reached the team or individual who can address it, more often than not, the damage has already been done and the business impacted. This puts the IT team on the back foot once again as they strive to investigate what’s happened and pull in the appropriate technical experts to address the problem.
Instead, having an early warning trigger that can spot minor issues as they build up can keep systems managers one step ahead of a potential problem before it becomes a reality.
Generic vs Bespoke
Just as trading platforms such as Calypso, Murex, Misys Summit FT or Front Arena become more sophisticated, so too are the tools required to manage them for optimum performance.
It’s only by having a deep visibility into the inner workings of a platform that a systems manager stands a chance of being able to identify a higher number of technical glitches. For example, if a system has stopped processing trades, a system manager would want to know when exactly the problem happened and what else is or isn’t working elsewhere in order to help pinpoint the root of the problem and stop a domino effect of downtime occurring.
One option is to have a number of different generic systems in place to cover, for example, log processing, machine and system monitoring. Although supported with a strong cross industry user base, albeit from beyond the financial services space, seeing a bigger integrated picture can be a challenge.
Investigating problems using out of the box monitoring systems can be a considerable effort however as data from disparate systems and the platforms themselves need to be manually correlated to identify the cause of a problem.As such, many firms underestimate the effort and the associated time needed to manage issues, and in turn, are shocked when they calculate the total cost of ownership for a platform.
Alternatively, bespoke, platform specific monitoring tools can be designed specifically to infiltrate the intricate aspects of a trading system and provide a single, cohesive overview of activity.
They can passively monitor platform performance on a continual basis and automatically notify the support team should a problem arise. This proactivity can even go a stage further, with different alerts being directed to appropriate teams or individuals for even faster response.
By knowing the domain space well, there is even a potential for such tools to include an element of intelligence, enabling them to not only predict when a problem is potentially pending but suggest what could have caused it.
Additionally, automating platform monitoring in this way can free up those individuals who were previously tasked investigation, enabling them to be redirected to more proactive activities.
Fit for the future
Having great visibility into how a platform is being used can support the broader business beyond trouble shooting, such as providing data on system use and capacity planning for business expansion. For example, assessing how the number of users logging on may have increased over time or how database use has grown can help with support IT infrastructure investment decisions.
As the reliance on technology and the number of people accessing systems within an organisation continues to expand, the pressure facing IT teams is set to get more intense, particularly as greater interaction and more complex systems bring with them an increased risk of things going wrong.
Being able to predict when a platform is at risk and keeping downtime to a minimum will ultimately create time to focus on running the business, embark on innovative initiatives and move not just the IT department but the entire business forward.
About the authors:
This article is written by Timothy Wiffen and Alistair Milne of Formicary Ltd, an IT consultancy specialising in system integration for the financial services sector. Formicary is a Calypso Business & Service Partner, Murex Business Partner and LCH.CLearnetSwapClear CCP² Certified Partner. Formicary has developed CalMon, a bespoke system monitoring tool for Calypso which centrally monitors the trading platform environments and proactively respond to issues with real-time assessment, helping to reduce system failure and system downtime.