By Robert B. K. Dewar
Robert Dewar is a professor of computer science at New York University, and president of AdaCore, a company that specializes in providing tools for building reliable software (www.adacore.com).
As we all know, modern banking depends on complex software. Just as we entrust our lives to software when we board a plane, we trust our money to software when we do business with any bank today or when we invest in the stock market. So how are we doing? On planes, pretty well. No one has ever died from a software bug on a commercial airline as far as we know. With banks, not so well! Recently we have seen major problems. Notable among these are the NatWest meltdown that left a very large number of customers without access to their accounts, and the trading software malfunction that caused a major mess on the market, and cost the company half a billion dollars in half an hour. We can read examples of such happenings every week, and of course we know that banks are not eager to disclose problems, so undoubtedly there are major behind-the-scenes problems that we don’t know about, but whenever banks lose money, it is eventually the customers who pay.
What is remarkable about these banking disasters is the tendency to dismiss them as “glitches”. For example we read in The Telegraph: “Up to 12 million NatWest and Royal Bank of Scotland customers are still unable to pay bills or move money after a computer glitch left their accounts frozen.” Now we put the word “glitch” in quotes in the title for a reason. What does this word mean to you? Well a typical dictionary definition (this one is from www.thefreedictionary.com): “A minor malfunction, mishap, or technical problem”. In short the sort of minor mistake that anyone could make any time, and which has an air of inevitability about it. After all who could prevent the occasional glitch? The use of the word is a way of disclaiming responsibility. At least one online dictionary notices what is going on with this word. dictionary.reference.com has a second meaning: “Computers: any error, malfunction, or problem”. Note that the “minor” has disappeared as a qualifier. I certainly bet that if I had been one of those NatWest customers, I would not have been willing to see my serious situation dismissed as minor!
So why the difference between banks and planes? And what is to be done about it? Is banking software somehow much more complex than avionics software? Definitely not! Is there some fundamental difference between software requirements for planes and banks? Not that I can see! The difference is simply one of due care and expectations. On planes, we are very aware of how critical the software is, and we have developed technologies that come very close to guaranteeing freedom of serious errors in avionics software. These include rigorous development procedures, and the use of standards such as DO-178B, to which all avionics software must conform, to help guarantee reliability. We don’t make an absolute claim of 100% perfect software. Even when using such procedures, and there have been examples of bugs found, but for sure software is not the weak link in avionic safety. Banks, on the other hand, are developing software in secret using the same kind of lax procedures that bring software to your PC that crashes frequently. They are far too happy to dismiss such mishaps as glitches, fix them as quickly as possible, and hope that they can get away without causing too much mayhem!
Could we and should we expect more? To me the answer is a resounding yes! When Tesco discovers that it has been selling tainted meat to customers, it’s a major news story and the repercussions are going to be heard for a long time. But Tesco certainly does not attempt to dismiss this as a glitch. But when a banking software error causes major mayhem, we are much too ready to accept this kind of error as inevitable. Why can’t we demand that our banks exercise the same kind of care in writing software that we expect of aircraft manufacture? The technologies, in the form of more reliable programming languages, more reliable procedures, rigorous testing requirements, use of mathematical proof techniques, formalised specification etc. are well known. Furthermore, given the huge cost of bank software failing either because of plain bugs, or vulnerabilities to hackers bent on evil, it is probably less expensive to do things right?
So, why does the current situation continue? Partly it’s just what people are used to. Our students in universities are not trained in the techniques and tools needed for reliable software. Let’s just take one technical example that’s easy to understand. When you are testing software, it’s obviously a good idea to make sure that every line of code has been tested at least once. Furthermore, if you have a test of the form:
if Credit > 0 then Record_Credit; else Send_Bill; end if;
then it is obviously a good idea to test both possibilities. This is called coverage testing, and is a very standard technique that is required for all avionics code development. A few years ago, teaching a graduate course in programming languages at New York University, I asked my students, most of whom were professional programmers, many working at banks, “How many of you have used coverage testing in your work?” The answer: one out of about eighty students! And this is just an elementary first step in improving reliability of code. The infamous “long-line” AT&T bug which took out all long distance telephone service for 7 minutes (back in 1990 when we expected telephones to be reliable) was due to some code that had never been executed. If you google for “List of Software Bugs”, you will find a Wikipedia article full of similar events.
Another reason, really we should say excuse, that is sometimes given is that in banking, requirements change too rapidly to make it practical to test software and make it reliable. In the wake of the trading glitch we mentioned earlier in this article, the FTC suggested the possibility of requiring trading software to be tested before it was deployed. An Op-Ed piece in the New York Times the next day claimed this was a ludicrous idea, because the development and deployment of such software was so dynamic that testing was impractical. Well sorry! We expect banks to take care of our money. We appreciate new developments like ATM machines that can scan checks, but we don’t need such bells and whistles tomorrow. We do need to be sure that our money is in reliable hands, and it is time to insist that banks clean up their acts and ensure that the software we and they depend on works properly. No lesser standard of care is acceptable!