A Python implementation of linear regression using both analytical (Normal Equations) and iterative (Gradient Descent) methods for predicting car prices based on mileage.
This project implements linear regression to predict car prices based on their mileage (kilometers driven). It provides two different approaches:
- Normal Equations - Analytical solution that directly computes optimal parameters
- Gradient Descent - Iterative optimization method that gradually finds optimal parameters
- Data normalization to prevent numerical overflow
- Both analytical and iterative solutions
- Progress tracking during gradient descent
- Automatic parameter denormalization
- Model weights saving to file
- Python 3.x
- NumPy
- Pandas
Install dependencies:
pip install numpy pandasRun the training script with your dataset:
python3 train.py data.csvIteration 0: Cost = 0.500000
Iteration 100: Cost = 0.234567
Iteration 200: Cost = 0.123456
...
Iteration 900: Cost = 0.001234
Theta 0: 8484.761234, Theta 1: -0.021234
The trained parameters will be saved to weights.txt.
ft_linear_regression/
├── train.py # Main training script
├── predict.py # Prediction script (if available)
├── data.csv # Training dataset
├── weights.txt # Saved model parameters (generated)
└── README.md # This file
The input CSV file should contain two columns:
km- Mileage in kilometersprice- Car price
Example:
km,price
240000,3650
139800,3800
150500,4400
...
The model predicts price using the linear equation:
price = θ₀ + θ₁ × mileage
Where:
θ₀(theta0) - Intercept parameterθ₁(theta1) - Slope parameter
Computes optimal parameters directly using:
θ₁ = (n×Σ(xy) - Σ(x)×Σ(y)) / (n×Σ(x²) - (Σ(x))²)
θ₀ = (Σ(y) - θ₁×Σ(x)) / nUpdates parameters iteratively using:
tmp_θ₀ = learning_rate × (1/m) × Σ(errors)
tmp_θ₁ = learning_rate × (1/m) × Σ(errors × mileage)Where:
learning_rate- Step size for parameter updates (default: 0.01)errors- Difference between predicted and actual pricesm- Number of training examples
To prevent numerical overflow with large mileage values, the algorithm:
- Normalizes input data:
(x - mean) / std - Trains on normalized data
- Denormalizes final parameters back to original scale
__init__(path)- Initialize with dataset path and normalize datacalculate_coef()- Compute parameters using normal equationsgradient_descent(learning_rate, iterations)- Train using gradient descent
learning_rate- Learning rate for gradient descent (default: 0.01)iterations- Number of training iterations (default: 1000)
from train import TrainLR
# Initialize trainer
lr = TrainLR("data.csv")
# Method 1: Normal equations (fast, analytical)
lr.calculate_coef()
theta0, theta1 = lr.theta0, lr.theta1
# Method 2: Gradient descent (iterative)
theta0, theta1 = lr.gradient_descent(learning_rate=0.01, iterations=1000)
print(f"Parameters: θ₀={theta0:.6f}, θ₁={theta1:.6f}")Contains the trained parameters in the format:
8484.761234
-0.021234
Line 1: θ₀ (intercept) Line 2: θ₁ (slope)