This project focuses on classifying the species of Iris flowers using machine learning techniques. It is a beginner-friendly data science project that demonstrates the full workflow of building a classification model — from data preprocessing to model evaluation and visualization.
This notebook performs the following steps:
-
Loads the Iris Dataset
Reads the.csv
file containing sepal and petal measurements of iris flowers. -
Explores the Data
Uses visualizations (likepairplot
) to understand relationships between features. -
Preprocesses the Dataset
- Drops unnecessary columns (e.g., ID)
- Encodes categorical labels into numerical form
-
Splits Data into Train and Test Sets
Uses 60% of the data for training and 40% for testing. -
Trains a Machine Learning Model
Applies Random Forest Classifier to classify the flower species. -
Evaluates the Model
Calculates accuracy, precision, recall, F1-score, and confusion matrix. -
Visualizes Results
Plots a confusion matrix heatmap to better understand prediction performance. -
Predicts New Samples
Takes new flower measurements and predicts the species.
- Dataset:
Iris.csv
- Sepal Length (cm)
- Sepal Width (cm)
- Petal Length (cm)
- Petal Width (cm)
- Species (
setosa
,versicolor
,virginica
)
- Supervised Learning
- Classification (Random Forest)
- Label Encoding
- Train/Test Split
- Model Evaluation Metrics
- Model Accuracy: ~98% on test set
- Predicted Class: Given
[6.3, 2.5, 5.0, 1.9]
, the model correctly returns the predicted iris species.
- End-to-end implementation of a machine learning pipeline
- Hands-on experience with Scikit-learn, Seaborn, Pandas, and Matplotlib
- Understanding of data loading, visualization, model training, prediction, and evaluation
👨💻 Author Zohaib Sattar Data Scientist | Data Analyst | Machine Learning Enthusiast
📧 Email: zabizubi86@gmail.com 🔗 GitHub: https://github.com/ZohaibSattarDataAI 🔗 LinkedIn: https://www.linkedin.com/feed/update/urn:li:activity:7340801036640473088/