This project implements a Random Forest Classifier to detect fraudulent credit card transactions using an imbalanced dataset of over 284,000 records. After data cleaning, visualization, and train-test splitting, the model was trained with 100 estimators and evaluated using a classification report. The final model achieved an F1-score of 0.70 and precision of 84.3%, successfully identifying rare fraud cases despite class imbalance.
Technology | Purpose |
---|---|
Python | Primary programming language |
Pandas | Data preprocessing and manipulation |
Matplotlib & Seaborn | Data visualization |
Scikit-learn | Modeling and evaluation (Random Forest) |
Jupyter Notebook / Colab | Notebook-based development and execution |
- ⚖️ Worked with heavily imbalanced data (0.17% fraud cases)
- 🌲 Trained a Random Forest model with 100 estimators
- 📊 Achieved F1-score: 0.70, Precision: 84.3% on test set
- 📉 Visualized correlation heatmap and fraud distribution
- 📁 Included confusion matrix and classification report for evaluation
- Hands-on experience handling imbalanced datasets in classification
- Applied Random Forest with parameter tuning
- Used precision and F1-score for evaluation of rare-event prediction
- Visualized data imbalance and performance metrics using Seaborn
Sumdiboii – Machine Learning Enthusiast & Software Developer
LinkedIn – Sumedh Pimplikar
Detecting the undetectable — this project highlights the power of ensemble learning in uncovering rare patterns in financial fraud detection.