H.P. Bunaes, Director of Banking Practice at DataRobot
I don’t know that I’ve come across a problem better suited to machine learning than Anti Money Laundering (AML) in banking. For many other predictive applications, banks find that the availability of data for machine learning is an issue. The data may not exist, or if it does it may be of dubious quality. Not so in AML.
Banks have been collecting client information as part of their Customer Identification Programs (CIP) and Know Your Customer (KYC) programs for years.
Transactional details and suspicious activity alerts are accessible and in place. The findings and outcomes from investigations are available – specifically, whether a suspicious activity report (SAR) was filed or not. This constitutes ideal training data for machine learning which can be used to solve many of the common challenges throughout the typical AML program.
The most immediate opportunity is to screen out false positives from suspicious activity alerts. Many banks use rules-based systems for detecting suspicious activity and generating alerts. Since the downside risk of not detecting truly suspicious activity is severe, these rules tend to be conservative — better to get too many alerts than too few. The result can be an overwhelming number of alerts, the bulk of which turn out to be innocuous activity. But, it’s too risky to restrict alerts by narrowing detection rules.
Fortunately, banks can leverage their own data, and machine learning, to filter out the vast majority of false positive alerts with little or no downside risk. By definition, banks know the outcome of their investigations — whether a SAR was filed or not. Using machine learning, banks can use this historical data to train a model to screen out false positives (or at the very least, prioritise them lower) using the known outcomes. The model may learn, for example, to eliminate an alert for a particular combination of product, transaction size, KYC risk score, and location that has never resulted in a SAR.
To ease concerns about truly suspicious activity getting missed, models can be tuned so that there are zero false negatives. Even when tuned to “zero false negatives,” we have found that typically more than half of the false alarms can still be detected and eliminated. And the false positive rate on the remaining population also falls significantly.
In addition to screening false positives, a trained model can point out data features – client or transactional data for example – that are strong indicators of money laundering based on SARs generated when those patterns are evident. Using this information banks can create “smart” rules to detect suspicious activity that are unique to their client set, the product set, and the investigation outcomes. These detection rules are also far more dynamic: they will self-adjust for changes in products and client behavior as the model is periodically retrained.
Banks may also want to use these insights to validate their KYC process. Features of the data that are strong AML predictors can be built into the KYC procedure. Clients that fit these criteria, or whose product use is likely to “trip” an alert, would be subject to enhanced due diligence (a higher than normal degree of scrutiny upfront). KYC questions that are not good predictors of money laundering risk could be eliminated entirely.
Either way, banks have an ironclad defense of their detection rules and KYC process: their own data with rigorous analytics applied. In my experience, a data-based analysis is always easier to justify than an expert opinion. Expert opinions vary.
Finally, there are unsupervised machine learning algorithms that can be applied for anomaly detection. This can be a “last line of defence” to detect previously unseen activity for a client, or to identify a client acting completely unlike similar clients – unusual activity that might otherwise go undetected.
Money launderers are smart, and AML programs need to be smarter to stay one step ahead. Machine learning is a great way to do just that.
About the Author:
H.P. Bunaes leads the banking practice at DataRobot, helping banks leverage AI and machine learning for predictive analytics and data mining. H.P. has 35 years’ experience in banking, with broad banking domain knowledge and deep expertise in data and analytics. Prior to joining DataRobot, H.P. held a variety of leadership positions at SunTrust and FleetBoston. H.P. is a graduate of the Massachusetts Institute of Technology where he earned a Masters Degree in Management Information Systems and of Trinity College where he achieved a Bachelor of Science degree in Computer Science and Mechanical Engineering.