Fraud Detection and risk analysis

This project presents an exploratory data analysis and anomaly detection approach for identifying suspicious invoice activity within a financial services context. The goal is to uncover patterns of potential fraud or credit abuse in customer transaction data following a system integrity issue.

📌 Project Overview

A system error led to a period where standard risk checks—such as internal scoring, external credit ratings, and fraud detection mechanisms—were bypassed. As a result, a large number of transactions were approved without validation. This repository contains a structured investigation into invoice-level transaction data to identify unusual patterns and behaviors that may suggest fraudulent activity.

Since no confirmed fraud cases exist, this task focuses on unsupervised anomaly detection, domain-driven insights, and exploratory methods to assess the extent of potential risk exposure.

🔍 Objectives

Analyze invoice transactions to identify irregularities or high-risk behaviors.
Define and apply custom anomaly detection criteria based on domain knowledge.
Detect potential fraud signals, such as:
- Unusual purchase volumes
- Irregular address or SSN usage
- Repeated or coordinated transactions across multiple identities
- Signs of identity misuse or synthetic identities
model development and deployment

🧰 Tools & Technologies

Python
Pandas, NumPy for data processing
Matplotlib, Seaborn for data visualization
Scikit-learn, Isolation Forest, DBSCAN for unsupervised anomaly detection
Jupyter Notebook for reproducible and interactive analysis

📁 Project Structure

. ├── data/ │ └── invoices.csv # Sample anonymized invoice data ├── notebooks/ │ └── fraud_detection_analysis.ipynb # Main analysis notebook ├── src/ │ └── preprocessing.py # Data cleaning and feature engineering │ └── anomaly_detection.py # Custom anomaly detection functions ├── requirements.txt # Python dependencies └── README.md # Project documentation

📊 Data Description

Each row represents a single invoice sent to a customer. Below are selected features included in the dataset:

Column Name	Description
`invoice_no`	Unique invoice ID
`email`	Anonymized customer email
`social_security_number`	Pseudonymized social security number / ID
`customer_no`	Internal customer ID
`principal_amount`	Original invoice amount
`is_debt_collection_stopped`	Binary flag indicating whether collections were stopped
`last_event`	Last status or activity on the invoice
`last_event_date`	Date of the last event
`period_start`, `period_end`	Invoice period start and end dates
`born_year`	Extracted year of birth (ambiguous 2-digit year)
Address fields	Includes city, postal code, street, and house number
`firstname`, `lastname`	Pseudonymized customer names

🧠 Methodology

Exploratory Analysis: Summary statistics, customer segmentation, time-based trends
Rule-Based Filters: Manually crafted signals such as:
- High purchase amounts in a short period
- Duplicate SSNs or addresses across accounts
- Unusual frequency of orders from the same city or postal code
Unsupervised Learning: Applied Isolation Forest and DBSCAN to detect outliers in multidimensional feature space
Visualization: Highlighted anomalous points and distribution shifts during the risk system outage

🚀 Key Insights

Detection of customer groups showing bulk purchase behavior shortly after the validation checks failed.
Identification of duplicate or suspiciously similar SSNs and email patterns.
Clustered activities around certain geographic locations potentially linked to organized behavior.

✅ How to Run

Clone the repository

git clone https://github.com/RozaAbolghasemi/Fraud-detection/blob/main/Fraud_Detection.ipynb

Install the requirements pip install -r requirements.txt
Run the analysis Open the Jupyter notebook in the notebooks/ directory and follow the step-by-step analysis.

🧾 Disclaimer This project is a demonstration of anomaly detection and exploratory techniques on synthetic, anonymized data. It is intended for illustrative and educational purposes in a financial risk analysis context. No real individuals can be identified, and no actual customer data is used.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Anomaly_detection_Fraud.ipynb		Anomaly_detection_Fraud.ipynb
Fraud_Detection.ipynb		Fraud_Detection.ipynb
LICENSE		LICENSE
README.md		README.md
RED_ALERT_DATASET.zip		RED_ALERT_DATASET.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fraud Detection and risk analysis

📌 Project Overview

🔍 Objectives

🧰 Tools & Technologies

📁 Project Structure

📊 Data Description

🧠 Methodology

🚀 Key Insights

✅ How to Run

About

Uh oh!

Releases

Packages

Languages

License

RozaAbolghasemi/Fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection and risk analysis

📌 Project Overview

🔍 Objectives

🧰 Tools & Technologies

📁 Project Structure

📊 Data Description

🧠 Methodology

🚀 Key Insights

✅ How to Run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages