Sentiment Analysis on IMDb Movie Reviews

This project focuses on Sentiment Analysis of IMDb movie reviews using Natural Language Processing (NLP) techniques. The goal is to classify movie reviews as either positive or negative based on their content.

Overview

The sentiment analysis model uses IMDb movie reviews dataset to classify reviews into two categories: positive and negative. The project demonstrates the usage of TF-IDF vectorization and Logistic Regression as the primary methods for feature extraction and model training. The model is evaluated using metrics like accuracy, precision, recall, and F1-score.

Key Features:

Preprocessing: Text cleaning, tokenization, stopword removal, and lemmatization.
Vectorization: TF-IDF (Term Frequency-Inverse Document Frequency) for feature extraction.
Modeling: Logistic Regression for classification.
Evaluation: Performance metrics for classification accuracy and precision.

Technologies Used

Python 3.8+
Libraries:
- nltk (for text preprocessing)
- sklearn (for machine learning and model evaluation)
- pandas (for data manipulation)
- matplotlib (for visualization)
Machine Learning Algorithms: Logistic Regression

Data

The dataset used in this project comes from the IMDb Movie Reviews dataset, which contains labeled movie reviews, with each review classified as either positive or negative.

Training Data: Contains 12,500 positive reviews and 12,500 negative reviews.
Testing Data: Similarly balanced with 12,500 positive and 12,500 negative reviews.

The data is processed using TF-IDF vectorization to transform text data into numerical features suitable for machine learning.

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/YousefAlaaAli/sentiment-analysis-imdb.git
cd sentiment-analysis-imdb

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

To run the sentiment analysis:

Make sure all dependencies are installed.
Run the Python script to train and evaluate the model:
```
python sentiment_analysis.py
```

After running the script, you will get the model's performance metrics (accuracy, precision, recall, and F1-score).

Model Evaluation

The model achieves the following performance on the test set:

Accuracy: 87.88%
Precision: 0.88 (for both positive and negative classes)
Recall: 0.88 (for both positive and negative classes)
F1-Score: 0.88 (for both positive and negative classes)

This indicates that the model performs very well on both positive and negative reviews.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request if you have any improvements, fixes, or new features you'd like to add.

Steps to contribute:

Fork the repository
Create a new branch (git checkout -b feature-branch)
Make your changes
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature-branch)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
IMDb_Sentiment_Classifier.ipynb		IMDb_Sentiment_Classifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis on IMDb Movie Reviews

Table of Contents

Overview

Key Features:

Technologies Used

Data

Installation

Usage

Model Evaluation

Contributing

Steps to contribute:

About

Uh oh!

Releases

Packages

Languages

YousefAlaaAli/IMDb-Sentiment-Classifier

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on IMDb Movie Reviews

Table of Contents

Overview

Key Features:

Technologies Used

Data

Installation

Usage

Model Evaluation

Contributing

Steps to contribute:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages