Neural-based Propaganda Detection

README.md

Neural-based Propaganda Detection

This repository implements a neural network-based classifier to detect propaganda in textual data. The model is trained using GloVe word embeddings and a Multi-Layer Perceptron (MLP) architecture. It predicts whether a given sentence belongs to the class of "propaganda" or "non-propaganda".

Features

Pre-trained GloVe embeddings for semantic word representations.
Robust neural architecture using PyTorch.
Pickle-based model serialization for easy inference.
Simple and efficient sentence vectorization using GloVe.
CLI-based inference for predicting individual sentences.

Technologies and Tools

1. Python Libraries

PyTorch: Deep learning framework used for building and training the MLP classifier.
Gensim: For loading and utilizing pre-trained GloVe word embeddings.
NLTK: Tokenizer for breaking sentences into words.
NumPy: Efficient numerical computations for vector operations.
Scikit-learn: For evaluating the model (precision, recall, F1-score).
Pandas: Data manipulation during preprocessing.

2. Pre-trained Embeddings

GloVe (Global Vectors for Word Representation):
- Version: glove.6B.300d.txt.
- Provides 300-dimensional dense vector representations for English words.

3. Dataset

Textual data provided in train.tsv.
- Contains sentences, article titles, and their corresponding labels (propaganda or non-propaganda).

4. File Serialization

Pickle:
- Saves the trained model weights, input parameters, and OOV vector for efficient inference.

Setup Instructions

1. Clone the Repository

git clone https://github.com/ash-sha/propaganda-detection.git
cd propaganda-detection

2. Install Dependencies

Ensure Python 3.8+ is installed. Then install required libraries:

pip install -r requirements.txt

3. Download Pre-trained GloVe Embeddings

Download glove.6B.300d.txt from the GloVe website.
Place the file in the appropriate directory (e.g., ./glove.6B.300d.txt).

4. Training

(Optional) To retrain the model from scratch:

train.ipynb

5. Inference

Ensure the model file (propoganda.pickle) is present in the working directory.
Run the inference script:

test.ipynb

Enter your sentence when prompted, and get the prediction:

Enter the query: The government is spreading fake news to mislead the public.
Predicted label: propaganda

Repository Structure

propaganda-detection/
│
├── glove.6B.300d.txt   # Pre-trained GloVe embeddings
├── train.tsv   # Dataset used for training
├── propoganda.pickle   # Serialized trained model
├── train.ipynb                # Script for training the model
├── test.ipynb            # Script for running inference
├── requirements.txt        # Python dependencies
├── README.md               # Project documentation

Model Workflow

Data Preprocessing:
- Load sentences and labels from the dataset.
- Shuffle and split data into training, validation, and testing sets.
- Normalize and vectorize sentences using GloVe embeddings.
Model Architecture:
- Multi-Layer Perceptron (MLP) with one hidden layer.
- Dropout regularization to prevent overfitting.
Training:
- Optimize using the Adam optimizer and CrossEntropy loss.
- Evaluate on the validation set after each epoch.
- Save the model with the highest validation F1-score.
Inference:
- Load the trained model (propoganda.pickle).
- Use GloVe embeddings to vectorize the input query.
- Predict the class label for the input sentence.

Sample Results

Input Query: The government is spreading fake news to mislead the public.
Predicted Label: propaganda

Future Work

Add support for multi-class propaganda classification (e.g., detecting specific propaganda techniques).
Improve sentence vectorization by integrating contextual embeddings (e.g., BERT, RoBERTa).
Implement a web or mobile interface for real-time predictions.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README.md

Neural-based Propaganda Detection

Features

Technologies and Tools

1. Python Libraries

2. Pre-trained Embeddings

3. Dataset

4. File Serialization

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Download Pre-trained GloVe Embeddings

4. Training

5. Inference

Repository Structure

Model Workflow

Sample Results

Future Work

Contributing

License

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
propoganda.pickle		propoganda.pickle
requirements.txt		requirements.txt
test.ipynb		test.ipynb
train.ipynb		train.ipynb
train.tsv		train.tsv

License

ash-sha/propaganda-detection

Folders and files

Latest commit

History

Repository files navigation

README.md

Neural-based Propaganda Detection

Features

Technologies and Tools

1. Python Libraries

2. Pre-trained Embeddings

3. Dataset

4. File Serialization

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Download Pre-trained GloVe Embeddings

4. Training

5. Inference

Repository Structure

Model Workflow

Sample Results

Future Work

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages