This project focuses on building a machine learning pipeline to predict taxi trip prices based on multiple real-world factors such as distance, time, weather, traffic conditions, and fare structure. Through robust preprocessing and regression modeling, we aim to generate accurate trip cost predictions to assist ride-hailing services, pricing engines, and transportation analytics.
-
Dataset: Contains 1000 real-world taxi trips with 11 features such as:
Trip_Distance_km
,Passenger_Count
,Traffic_Conditions
,Weather
,Trip_Duration_Minutes
, and various fare components.
-
Preprocessing:
- Imputed missing values using formula-based calculations and default statistical methods
- Applied log transformation for skewed distributions
- Encoded categorical variables using LabelEncoder
-
Feature Selection:
- Used correlation thresholding to select highly relevant features
-
Models Applied:
- Ridge Regression
- XGBoost Regressor
- Random Forest Regressor
- AdaBoost Regressor
- Gradient Boosting Regressor
- Bagging Regressor
-
Best Performance:
- π Gradient Boosting Regressor achieved RΒ² score of 0.914
- πΈ Dynamic Fare Estimation for ride-hailing apps and taxi meters
- π Cost prediction under varying traffic and weather scenarios
- π§ͺ Educational reference for regression modeling, feature engineering, and data cleaning
- π οΈ Basis for building real-time APIs and fare calculators
Clone the repository
git clone https://github.com/BhaveshBhakta/Taxi-Price-Prediction-Using-ML.git
cd axi-Price-Prediction-Using-ML
Contributions are welcome! If youβd like to improve model performance, add new visualizations, or integrate the project with a web interface.