Emil Eifrem of Neo Technology explains why traditional relational database technologies aren’t financial services organisations best tools for stopping sophisticated scams

Emil Eifrem
Emil Eifrem

While no fraud prevention measure can ever be 100% bullet-proof, a lot of help can be provided to risk managers by a new approach to working with banking data; one  that focuses on relationships and uncovers the hidden patterns of suspicious activity, in order to head off the fraudsters.

But when we say ‘relationships,’ isn’t that what our existing, relational database management systems are doing? The answer is yes, but there’s a subtle yet critical difference. SQL based databases are great, but essentially you have to tell them what to look for – while the technique we’re discussing goes from points of data to all the connections between them.

And looking at data relationships doesn’t necessarily mean gathering new or more data. The key to it is to look at data in a way that helps make explicit underlying connections.

This is achieved by a powerful, well-proven approach, graph databases, that is gaining increasing attention. And a growing number of enterprises, banks and financial institutions are using them to solve a variety of data problems, in particular to identify advance fraud scenarios – and in real-time, too.

A case in point: PayPal uses graph techniques to perform sophisticated fraud detection on eBay and StubHub transactions for just this purpose. Market-watchers IDC estimate that this has already saved it more than $700 million and also enables the company to perform predictive fraud analysis.

If they act the same as a proper customer – how to spot the electronic burglar?

Let’s explore in more detail how relationship-spotting can help in anti-fraud activities and risk management. As we know, there are various types of fraud:  first-party bank fraud, insurance fraud, and e-commerce fraud, to name some of the most troublesome. First-party fraud involves criminals who apply for credit cards, loans, overdrafts and unsecured banking credit lines, but who have no intention of paying the money back, something that’s a serious problem for banking institutions; it’s estimated that as much as 10 to 20% of unsecured bad debt at leading US and European banks is drawn from this form of deceit.

The surprisingly large size of these losses is due to first-party fraud being hard to pick up as fraudsters behave in the same way as your legitimate customer until the day they cash in their inflated accounts and abscond with the money. At the same time, there is the relationship between the number of participants to the value of their illegal gains, where a small number of perpetrators can translate in to a distressingly high numbers of illegitimate transactions. In a fraud ring of just two individuals, sharing only phone number and address, this ring can create 22= 4 synthetic identities with fake names and with 4-5 accounts for each synthetic identity, a total of 18 accounts. Assuming an average of £4,000 in credit exposure per account, the bank’s loss could be a whopping £72,000 as a result – perhaps a lot more.

Catching fraud rings and stopping them before they cause damage is what we all want to be able to do. One reason for that still being a rarity is that traditional methods of fraud detection are not geared to look for the right thing – in this case, the fraud rings that are created by shared identifiers.

What they all have in common in terms of criminal practices is layers of indirection to hide the crime that can only be uncovered via connected analysis. The good news is that while these exponential relationships are what make these schemes so damaging for banks, it’s also the characteristic that makes them open to graph techniques. Thus standard stats-based tools, e.g. deviations from normal purchasing patterns, use discrete data, not the connections we’ve been discussing. Discrete methods are useful for catching fraudsters acting on their own, but fall short when it comes to collectives who may work cross-border, even cross-continent. Furthermore, such methods are prone to the notorious ‘false positive,’ which creates undesired side-effects in annoyed customers and lost revenue opportunity.

Doable in SQL, yes – but not easily or cheaply

By contrast, uncovering fraud rings with traditional relational database technologies requires modelling the data as a set of tables and columns then carrying out a series of complex joins and self-joins. Queries like that work, but are complex to build, and expensive to run. Scaling them in a way that supports real-time access also poses significant technical challenges, with performance becoming exponentially worse not only as the size of the ring increases, but as the size of total data set grows.

Augmenting existing fraud detection infrastructure to support ring detection can be easily done by running appropriate entity link analysis queries using a graph database, augmented by running checks during key stages in the customer and account lifecycle, such as once a credit balance threshold is hit, when a cheque bounces and so on.

Real-time graph traversals tied to the right kinds of events can help banks identify probable fraud rings: during or even before the bust-out occurs.

Widening access to a powerful anti-fraud tech

Another use case is bogus car insurance activity. In the UK, insurers estimate that bogus ‘whiplash’ accident claims add £144 per year to every driver’s policy. In a typical hard fraud scenario, rings of fraudsters work together to stage fake accidents complete with fake drivers, fake passengers, fake pedestrians and even fake witnesses.

Once again, graph databases can be a powerful tool in combatting this kind of multi-party fraud. As in the bank fraud example, identifying that a ring is in operation would require joining a number of tables in a complex schema (accidents, vehicles, owners, drivers, passengers, pedestrians, witnesses, providers) and joining these together multiple times – once per potential role – in order to uncover the full picture is not an easy task in SQL, easier using a graph approach. Graph databases’ ability to draw relationship connections, again, provides an exciting answer here.

Clearly, graph databases are emerging as the ideal solution to finding hidden patterns, and at scale. Interestingly, they have been powering the social web for some time – but while early graph database converts like PayPal and Facebook had to build their own in-house graph data stores from scratch, off-the-shelf graph databases are now available for any size of business. Forrester Research estimates that more than one in four enterprises will be using such technology by 2017 – and no wonder given the efficacy of what they do.

To sum up, traditional database technologies, while still suitable and necessary for certain types of prevention, are not designed to detect the most elaborate fraud operations. In contrast, graph databases provide a unique ability to uncover a variety of important fraud patterns, in real time, either in groups or on an individual basis – and so are a powerful addition to any financial services firm’s security arsenal.

And with their growing availability, there’s no excuse for ignoring their potential any more.

The author is co-founder and CEO of Neo Technology, the company behind Neo4j, the world’s leading graph database (

Comments are closed