Using real-world transaction data to identify money laundering: Leveraging traditional regression and machine learning techniques

STEM Fellowship Journal(2021)

Cited 1|Views5
No score
Abstract
Money laundering is a pervasive legal and economic problem that hides criminal activity. Identifying money laundering is a priority for both banks and governments, thus, machine learning algorithms have emerged as a possible strategy to detect suspicious financial activity within financial institutions. We used traditional regression and supervised machine learning techniques to identify bank customers at an increased risk of committing money laundering. Specifically, we assessed whether model performance differed across varying operationalizations of the outcome (e.g., multinomial vs. binary classification) and determined whether the inclusion of investigator-derived novel features (e.g., averages across existing features) could improve model performance. We received two proprietary datasets from Scotiabank, a large bank headquartered in Canada. The datasets included customer account information (N = 4,469) and customers’ monthly transaction histories (N = 2,827) from April 15, 2019 to April 15, 2020. We implemented traditional logistic regression, logistic regression with LASSO regularization (LASSO), K-nearest neighbours (KNN), and extreme gradient boosted models (XGBoost). Results indicated that traditional logistic regression with a binary outcome, conducted with investigator-derived novel features, performed the best with an F1 score of 0.79 and accuracy of 0.72. Models with a binary outcome had higher accuracy than the multinomial models, but the F1 scores yielded mixed results. For KNN and XGBoost, we observed little change or worsening performance after the introduction of the investigator-derived novel features. However, the investigator-derived novel features improved model performance for LASSO and traditional logistic regression. Our findings demonstrate that investigators should consider different operationalizations of the outcome, where possible, and include novel features derived from existing features to potentially improve the detection of customers at risk of committing money laundering.
More
Translated text
Key words
money laundering,machine learning,traditional regression,real-world
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined