Big Data & Fraud Detection

Kiran Kalmadi and Niraj Juneja, Principal Consultants for Financial Services and Insurance at Infosys.
Kiran-KalmadiFraud is on the rise. Collectively, this global criminal enterprise is hitting financial services firms with a severe one-two punch.

The first punch involves cash. Lots of it. In 2011 alone, fraud losses on UK cards totalled £388 million in 2012, showing a 14 per cent increase from total fraud losses of £341m according to the UK Card Association( ). But the second part of the one-two punch might be even worse. That’s because fraud run amok can cause an immeasurable amount of harm to a bank’s reputation.

Bank fraudsters have gone digital, and keeping up with them has become a monumental task for those firms unwilling to invest in the right security tools. Smart global banks, however, are embracing the benefits of Big Data to stop criminals dead in their tracks. Banks are finding that the most effective tool to combat fraud is to develop algorithmic machine learning programs that out-fox even the most sophisticated digital criminals.

Benefits of Big Data
Machine learning begins with Big Data. True, it’s a term that means different things to different people. But at the core the term represents the notion that to leverage data – large, varying, and fast changing datasets – we need a new set of so-called Big Data technologies.  Although these technologies are as varied as Hadoop, In Memory databases, and NoSQL, they collectively shun the idea that traditional relational databases are the gold standard for storing and querying data.

Big Data technologies enable us to extract insights either as visualisations for human consumption or as mathematical equations (what are known as predictive models) for consumption by computers. In the realm of fraud detection, Big Data technologies help us win what’s lately been a losing battle involving three very important data elements: quality, timeliness, and breadth.

Today’s cybercriminals have a leg up on all three of these vital data elements. That’s why it’s essential to develop an integrated fraud prevention plan based on a combination of real-time anomaly detection and machine learning models. Such programs can tap into massive amounts of data created inside and outside the bank, which is the key to successfully curbing fraud in today’s digit age.

Antiquated fraud detection
Fraud detection is a predictive analytics problem. Predictive analytics techniques work by extracting patterns from past datasets to predict the future. These techniques typically assume that the future should mimic the past. For these techniques to be effective, we require accurate, timely, and broad datasets. The incumbent approaches to fraud detection falter on all the three dimensions of data and make the efficacy of current approaches to fraud detection questionable at the very least. Let’s discuss the potential shortcomings of each set:
Quality: The quality of datasets that banks would use in fraud analysis is poor because they didn’t detect all fraud cases. What should have been identified as fraud got tagged as not-fraud. Doing so led to building sub-optimal models with poor predictive quality. A good fraud detection model is one that has high tolerance to outlier cases and can withstand false-negatives without deterioration of the model. 

Timeliness:  Unlike human consumption patterns that help predict the next product we are likely to buy, fraud patterns can change with time. Fraudsters adopt new tricks as old ones start failing. As a result, there is a strong need to be able to detect new fraud patterns shaping up on the field and then to deploy updated models. Traditional analytical approaches typically take three to six months to develop and deploy into fraud detection systems. This timeframe for banks is laughably slow for a criminal bent on fraud.

Breadth of Data: The effectiveness of fraud detection models increases if we have more data around various scenarios. Weaving a story of the people involved using varying datasets like blogs of Internet activity, mobile interactions, and the data stored in a bank’s system can help carve out a complete story. Models can then improve the accuracy of their predictions. Most fraud detections systems today don’t leverage unstructured data outside a company’s systems.

Analytics are not created equally
Sophisticated companies are machine learning powered by Big Data technologies to build state-of-the-art fraud models. Traditional statistical approaches and modern day machine learning techniques are based on the same mathematics. But the terminology, culture, and toolset used in the two disciplines are so different that it merits treating machine learning as a distinct discipline in itself.

Machine learning has origins in the Artificial Intelligence world. Companies like Google and Amazon utilise machine learning to build automated predictive models.

For fraud detection, machine learning powered by Big Data technologies has a unique advantage in relation to the timeliness issue. Machine learning is essentially a group of algorithms that show improvement of the predictive model when more data is fed into those algorithms. The algorithms learn from data and keep on improving over time (see graph). With Big Data feeding into machine learning algorithms, the efficacy of fraud models improves significantly over time. This is completely opposite of traditional approaches where the model deteriorates over time.    

Machine learning s also related to neural networks. Decision trees and random forest methods have also shown better resilience to handling poor quality datasets inherent in fraud problems.  For example, experts cite random forest as the most successful technique used by data scientists in winning predictive modeling competitions hosted on

niraj pData beats mathematics
There is only so much you can do by optimising the mathematics that goes into building predictive models. Ultimately, what really gets a lift in the accuracy of models is more data (size and breadth). Banks collect a lot of data from customers through an array of service preferences in order to know them better. They also have systems in place to monitor or gather data on daily transactions (deposits, withdrawals, etc.) of customers. Banks can also monitor data from blogs, chat archives, feedbacks, survey responses and other forms of structured and unstructured data from multiple channels.

Numerous research papers (e.g., Unreasonable Effectiveness of Data) and enterprise scenarios like Google search have proved that feeding more data (size and breadth) into algorithms leads to a greater lift in model performance then spending more time in optimising the models. Not surprisingly, therefore, we see a big difference by using Big Data technologies for fraud detection. The ability to run models on population rather than samples combined with the ability to tap into no-traditional data formats available from social networks and emails creates a capability for fraud detection that has not existed … until now.

Global corporations measure success a number of ways. Accountants, for instance, can demonstrate year-over-year financial gains on income statements. Then there are the intangible aspects of success – what an accountant might call goodwill. Elements like reputation and intellectual capital might take decades for a company to build up. Yet sophisticated fraudsters are damaging the reputations of once-respected banks by running circles around them in cyberspace. That’s why financial services firms need to utilise Big Data. By leveraging large, diverse, and fast-changing datasets, Big Data technologies take fraud detection leaps and bounds ahead of traditional approaches. By storing and analysing data in new ways, financial institutions can detect fraud in advance and beat criminals with a one-two punch of their own.

About the Authors:
Niraj Juneja is a Principal Consultant in the Infosys’s Management Consulting division. His focus is on using data science techniques to enable better decision making for Financial Services firms. As a practitioner of analytical techniques that use Big Data technologies, Niraj believes that the traditional approaches  to decision making that rely heavily  on recommendations from gurus and human intuition will undergo a major shift towards “data driven” decision making enabled by Big Data technologies.

Niraj has several years of experience consulting for Fortune 100 Financial services firms and has successfully executed large scale data driven business transformation programs.

Kiran Kalmadi is a Principal Consultant in the Financial Services and Insurance (FSI) business unit and leads the FSI Research team. He has around 13 years’ experience in bespoke research and analysis for strategy development, consulting, marketing and business development.

Kiran has worked extensively in the retail banking and payments domain and has been involved in developing research-based consultative insights and analysis for business pursuits and client engagement.He has a keen interest in Social Media, Payments, Analytics, Internet, and Mobile Banking and its adoption by financial institutions.


Share this Article

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS