businesswoman holding projection of financial data

Top 15 Tools and Techniques

Detecting potentially fraudulent customers and transactions and projecting compelling visual insights have traditionally been a rule-based model plus BI game zone. In this era of high-velocity and voluminous data, we have gradually adopted machine learning and advanced analytics in this space. Sanjukta Dhar of Tata Consultancy Services deep dives into the 15 most prominent analytical tools and techniques that are becoming the optimal choices within financial crime and compliance technology.

Why Analytics?

Statistical models and techniques have long been used to predict anomalous transactions and fraudulent customers or to detect anomalous relationships within a financial system. With the gradual uptrend in the 4 Vs of big data (volume, veracity, variety and velocity), real-time fraud detection has been the need of the hour. In the area of financial crime and compliance, advanced analytics is widely used for the following well-defined purposes.

If we have to list the business objectives of today’s analytical systems, then the following should top the list:

  • analyzing huge volumes of data in real time
  • generating meaningful insights
  • visualizing intricate networks of entities and attributes and generating a graphical interface for it

Sample Financial Crime Use Case Candidates for Analytics

Payment fraud, identity theft, credit card fraud, money laundering transactions, insurance claim fraud, insider trading, terrorist financing, invoice fraud or trade-based money laundering

Let’s talk about some advanced analytical tools and techniques that can give our fraud detection or money-laundering detection systems a boost in terms of performance, accuracy and speed.

1. Technique: One-class support vector machine

Purpose: supervised approach, segregating outliers

Description: One-class support vector machine tries to segregate a majority of customer data or transactional data by maximizing the distance between the separator hyperplane and most of the data points. The rest of the data points remaining between the hyperplane and the origin are the outliers.

2. Technique: Binary logistic regression

Purpose: supervised approach, classifying transactions into fraudulent or nonfraudulent categories

Description: Logistic regression is based on the maximum likelihood premise. A logarithm of the odds ratio is plotted to derive the dependent variable, which is bounded between 0 and 1; hence, the target variable is binary, showing the likelihood of fraudulent transaction, with 0 depicting least and 1 depicting maximum likelihood.

3. Technique: Neural network

Purpose: supervised approach, classifying customers into fraudulent or nonfraudulent categories

Description: Neural network is a complex supervised learning technique capable of handling a large amount of noise. It takes various customer behavior/demographic/transaction profile data as input, extracts features, assigns weight in the hidden layer and generates a continuous variable as target score. Often referred to as a “black box” lacking substantial transparency and interpretability, this approach is very fast, but requires substantial training data.

4. Technique: Association rule analysis

Purpose: exploring the relationship between transaction/customer attributes for greater insight

Description: Association rule or affinity analysis seeks to establish implicit relationships between a few apparently unrelated parameters (e.g., transactions, customer attributes, external entities) in the form of “if antecedent, then consequent with minimum x% support and y% confidence.”

5. Technique: K means clustering

Purpose: unsupervised approach, grouping similar types of transactions/customers based on how similar their features are

Description: Clustering payment transaction data based on homogeneity in certain attributes. For example, data can be grouped together based on similarity of recency (time elapsed since last transaction), frequency (number of transactions per day/week/month) and monetary (average amount/max amount/min amount/median amount per customer ID per account) parameters, and also segregating dissimilar transaction data based on the same attributes.

6. Technique: Self-organizing maps (SOM)

Purpose: unsupervised approach, visualizing and clustering data based on unseen features

Description: SOM is a neural-network-based data visualizing technique capable of projecting high dimensional data on visualizations like U-matrix or a component plane to enable implicit relationship discovery.

7. Technique: Social network analytics

Purpose: visualizing related data

Description: Social network analytics explores complex, evolving, atypical networks to extract useful statistics within a network of connected entities using graph theory. By visualizing a network of connected entities where prior fraudulent entities are marked, the entity with highest probability of fraudulent actions can be identified.

8. Tool: Bloom filter

Purpose: optimized prediction and false positive reduction where dataset is huge (e.g., an enormous KYC alert management database)

Description: A Bloom filter is a memory efficient and faster way to predict if an entity is part of a set of entities. The beauty of this probabilistic data structure is that it will produce fewer false positives and it will predict 100 percent of the time if the element does not really exist in the set (i.e., 0 percent false negative).

9. Tool: Graph database

Purpose: Data visualization

Description: Graph databases enable real-time identification of fraud rings, provide correlation mining between linked entities and beneficial owners and offer a comprehensive visualization.

10. Technique: Behavioral analytics

Purpose: Predictive analytical technique

Description: Predict customer intent from subtle behavioral features (e.g., click speed, time spent in a webpage, IP address, products purchased online, surfing behavior, etc.).

11. Technique: Optical character recognition (OCR)

Purpose: predict anomalies in optical/scanned images

Description: Detect suspicious invoices/claims by learning from a series of PDF/scanned data sets and learning their features. There are a number of trainable machine learning libraries (e.g., Google tesseract, OpenCV, etc.) that offer built-in capabilities for feature extraction.

12. Technique: Benford’s law

Purpose: data exploration technique useful for identifying potentially fraudulent transaction data

Description: Hypothetical frequency distribution in five various orders that can be compared with any real-life frequency distribution to uncover potential sources of anomalies for farther investigation. May be suitable for accounting fraud.

13. Technique: Integrated device data surveillance

Purpose: data exploration technique useful for identifying potentially fraudulent transaction data

Description: Exploring unstructured device data collected from multiple customer access channels such as audio recording (from mobile phones), video footage (from CCTV, trader cameras), browser cookies, IP address, (from user machine) and carrier ID (network provider) for profiling the device used by a customer and performing subsequent anomaly detection.

14. Tool: Translytical database

Purpose: a new-age, big-data-enabled, real-time data storage analysis technique

Description: An amalgamated transactional storage and analytical database solution facilitating real-time storage, early warning indicators/insights and 3600 view.

15. Technique: News Analytics

Purpose: Proactive customer screening and negative news search and insight generation

Description: News analytics is an NLP/text-mining-based approach where public newsfeed data is ethically scraped and early insights about fraudulent/risky/anomalous customers are generated by extracting features from news data and running a segmentation/classification model on top of it. It is another sophisticated early warning alert mechanism for operations.

The Way Forward

While a number of modelling and data mining techniques are abundantly available in the market, we have to consider a number of factors, such as data imbalance (proportion of fraudulent vs. nonfraudulent data in the set), proportion of noise/outliers, model performance, model explainability, ensemble of supervised plus unsupervised models to enhance performance, availability of consolidated view of data (transactional plus analytical plus unstructured plus third party) while deploying them for the business case. Also, while rule-based systems are passé, there will still be scenarios where deploying a combination of rule-based systems, machine learning models and judgment-based approach will be the right choice. The solution lies in the right balance of choice and the availability of the right infrastructure.

Sanjukta Dhar

Sanjukta Dhar leads the market and treasury risk management portfolio within the BFSI CRO Strategic Initiative of TCS. Dhar has played the role of a business analyst, solution architect, SME and implementation lead across multiple financial risk management system implementations for major banks and financial services.

Related Post

Got Compliance News?

We do!  Sign up for CCI’s free weekly eBlast to get GRC news, views, jobs & events delivered to your inbox once a week.  Cancel anytime.

Click to Subscribe.