Skip to content

Commit 468e0a7

Browse files
committed
adds whole project with docs
0 parents  commit 468e0a7

File tree

199 files changed

+7670
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

199 files changed

+7670
-0
lines changed

README.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Sentimento ![Logo](https://play-lh.googleusercontent.com/nAaDRtLZlshur9o3A2XS_K__4I8m_yZ0gvucECrZtGoEGq8NUWE0Zj1vsyjALBui2Q=w35) Mobile Application ![Logo](https://play-lh.googleusercontent.com/1FikpccbOFZsDc5k9x1OQegu8A53tYcY8dkk_neZiCuOcdxWjzUcF3QebE_E9UQNiW4=w40)
2+
3+
4+
**Sentimento**, a social media assisting platform with sentiment analysis feature is a
5+
mobile application built using Flutter for front end, Flask for back end and Multinomial
6+
Naive Bayes Theorem for the algorithmic part.
7+
8+
[![Click here to download from Google Play](./report-documentation/doc-assets/googleplay.png)](https://play.google.com/store/apps/details?id=com.awarself.sentimento)
9+
10+
11+
Undergraduate Final Year Project of student **Prashant Ghimire** at
12+
[![University Logo](./report-documentation/doc-assets/lmu.png)](https://www.londonmet.ac.uk/)
13+
[![College Logo](./report-documentation/doc-assets/islington.png)](https://islington.edu.np/)
14+
15+
## Read Documentation
16+
17+
- [Final Report](./report-documentation/Final%20Report.pdf)
18+
- [Risk Identification and Assessment Document](./report-documentation/Risk%20Identification%20and%20Assessment%20Document.pdf)
19+
- [Software Requirement Specification](./report-documentation/Software%20Requirement%20Specification.pdf)
20+
- [User Manual](./report-documentation/User%20Manual-%20Sentimento.pdf)
21+
- [Weekly Task Information](./report-documentation/Weekly%20Task%20Information.pdf)
22+
23+
24+
## Tech used
25+
26+
Sentimento uses a number of open source projects to work properly:
27+
28+
- [Flutter] - Cross platform application development kit
29+
- [Flask] - Micro web framework written in Python
30+
- [MultinomialNB] - Probabilistic Classifier with discrete features
31+
- [NumPy] - Support for large, multi-dimensional arrays and matrices
32+
- [Tweepy] - For accessing Twitter API
33+
- [Google API Core] - For accessing YouTube API
34+
- [Vs Code] - Used this code editor for the project
35+
- [GitHub] - Used as version control
36+
- [Postman API Platform] - Used to test built API's
37+
38+
39+
## Installation
40+
41+
**Sentimento** [application] can either be used by installing from Google Play
42+
#### **or**
43+
can be used by cloning this [repository]
44+
45+
- To run the frontend application you are expected to have the [Flutter] setup ready on your system.
46+
47+
- To run the backend application, install the python dependencies as listed in requirements file.
48+
49+
```sh
50+
cd backend-flask # to go inside backend dir
51+
virtualenv env # to use separate environment to run
52+
sudo pip install -r requirements.txt # to get packages
53+
source env/bin/activate # to start using environment
54+
flask run --port=80 # to start the backend application
55+
```
56+
57+
Verify the deployment by navigating to your server address in
58+
your preferred browser.
59+
60+
```sh
61+
127.0.0.1:80
62+
```
63+
64+
## License
65+
66+
Sentimento project as listed here in this [repository] is open source, but the [application] available on Google Play is licensed.
67+
68+
### **Thank you**
69+
Please do email at hello@ghimireprashant.com.np for any purposes.
70+
71+
[//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax)
72+
73+
[Flutter]: <https://flutter.dev/>
74+
[Flask]: <https://flutter.dev/>
75+
[MultinomialNB]: <https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html>
76+
[NumPy]: <https://pypi.org/project/numpy/>
77+
[Tweepy]: <https://pypi.org/project/tweepy/>
78+
[Google API Core]: <https://pypi.org/project/google-api-core/>
79+
[VS Code]: <https://code.visualstudio.com/>
80+
[GitHub]: <https://github.com/>
81+
[Postman API Platform]: <https://www.postman.com/>
82+
[application]: <https://play.google.com/store/apps/details?id=com.awarself.sentimento>
83+
[repository]: <https://github.com/aprashantz/final-year-project-undergrad>

backend-flask/.flaskenv

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
export FLASK_ENV=development
2+
export FLASK_APP=src
3+
export SQLALCHEMY_DB_URI=sqlite:///sentimento.db
4+
5+
export JWT_SECRET_KEY='assign your secret key here'

backend-flask/requirements.txt

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
astroid==2.4.2
2+
autopep8==1.6.0
3+
cachetools==4.2.4
4+
certifi==2021.10.8
5+
charset-normalizer==2.0.9
6+
click==8.0.3
7+
colorama==0.4.4
8+
distlib==0.3.4
9+
filelock==3.4.2
10+
Flask==2.0.2
11+
Flask-JWT-Extended==4.3.1
12+
Flask-SQLAlchemy==2.5.1
13+
google-api-core==2.3.2
14+
google-api-python-client==2.33.0
15+
google-auth==2.3.3
16+
google-auth-httplib2==0.1.0
17+
googleapis-common-protos==1.54.0
18+
greenlet==1.1.2
19+
gunicorn==20.1.0
20+
httplib2==0.20.2
21+
idna==3.3
22+
isort==4.3.21
23+
itsdangerous==2.0.1
24+
Jinja2==3.0.3
25+
joblib==1.1.0
26+
lazy-object-proxy==1.4.3
27+
MarkupSafe==2.0.1
28+
mccabe==0.6.1
29+
nltk==3.6.5
30+
numpy==1.19.1
31+
oauthlib==3.1.1
32+
pandas==1.3.4
33+
platformdirs==2.4.1
34+
protobuf==3.19.1
35+
pyasn1==0.4.8
36+
pyasn1-modules==0.2.8
37+
pycodestyle==2.8.0
38+
PyJWT==2.3.0
39+
pylint==2.5.3
40+
pyparsing==3.0.6
41+
python-dateutil==2.8.2
42+
python-dotenv==0.19.2
43+
pytz==2021.3
44+
regex==2021.11.10
45+
requests==2.26.0
46+
requests-oauthlib==1.3.0
47+
rsa==4.8
48+
scikit-learn==1.0.1
49+
scipy==1.7.3
50+
six==1.15.0
51+
sklearn==0.0
52+
SQLAlchemy==1.4.29
53+
textblob==0.17.1
54+
threadpoolctl==3.0.0
55+
toml==0.10.2
56+
tqdm==4.62.3
57+
tweepy==4.4.0
58+
uritemplate==4.1.1
59+
urllib3==1.26.7
60+
virtualenv==20.13.0
61+
Werkzeug==2.0.2
62+
wrapt==1.12.1

backend-flask/src/__init__.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# backend in monolithic architecture
2+
# middleware file
3+
4+
from flask import Flask
5+
import os
6+
from flask_jwt_extended import JWTManager
7+
from src.register import register_blueprint
8+
from src.login import login_blueprint
9+
from src.report import report_blueprint
10+
from src.profile import profile_blueprint
11+
from src.vacancy import vacancy_blueprint
12+
from src.youtube import youtube_blueprint
13+
from src.twitter import twitter_blueprint
14+
from src.manual import manual_blueprint
15+
from src.database import db
16+
17+
18+
def create_app(test_config=None):
19+
app = Flask(__name__, instance_relative_config=True)
20+
21+
if test_config is None:
22+
app.config.from_mapping(
23+
SECRET_KEY=os.environ.get("SECRET_KEY"),
24+
SQLALCHEMY_DATABASE_URI=os.environ.get("SQLALCHEMY_DB_URI"),
25+
SQLALCHEMY_TRACK_MODIFICATIONS=False,
26+
JWT_SECRET_KEY=os.environ.get('JWT_SECRET_KEY')
27+
)
28+
29+
else:
30+
app.config.from_mapping(test_config)
31+
32+
app.register_blueprint(register_blueprint)
33+
app.register_blueprint(login_blueprint)
34+
app.register_blueprint(report_blueprint)
35+
app.register_blueprint(profile_blueprint)
36+
app.register_blueprint(vacancy_blueprint)
37+
app.register_blueprint(youtube_blueprint)
38+
app.register_blueprint(twitter_blueprint)
39+
app.register_blueprint(manual_blueprint)
40+
41+
# decorator for index page of the backend
42+
@app.route("/")
43+
def index():
44+
return {
45+
"Registration": "/register/ [POST]",
46+
"Login": "/login/ [POST]",
47+
"User profile": "/profile/ [GET]",
48+
"YouTube comments analysis": "/red/ [POST]",
49+
"Tweets analysis": "/blue/ [POST]",
50+
"User reports": "/report/ [GET/POST]",
51+
"Vacancy": "/vacancy/ [GET/POST] & /vacancy/all?filter=freelancer or vacancy [GET]",
52+
"Sentence/Paragraph polarity": "/manual/ [POST]"}
53+
54+
db.app = app
55+
db.init_app(app)
56+
57+
JWTManager(app)
58+
59+
return app
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
import joblib
2+
from sklearn.feature_extraction.text import CountVectorizer
3+
from sklearn.naive_bayes import MultinomialNB
4+
import pandas as pd
5+
from src.algo.data_preprocessor import text_cleaner
6+
7+
8+
# for production level, we need shorter processing time as much as possible
9+
# so, making joblib's pickle file of the training model instead of performing calculation every time
10+
def production_multinomial(testing_data, layer):
11+
# Deserializing CV's pkl file to object in runtime env
12+
pickled_count_vectorizer = CountVectorizer()
13+
if (layer == "sarcasm"):
14+
pickled_count_vectorizer = joblib.load(
15+
'src/algo/sarcasmpickle_countvectorizer.pkl')
16+
if (layer == "spam"):
17+
pickled_count_vectorizer = joblib.load(
18+
'src/algo/spampickle_countvectorizer.pkl')
19+
X_test = pickled_count_vectorizer.transform(testing_data)
20+
21+
# decerializing MN's pkl file to object in runtime env
22+
pickled_multinomial_nv = MultinomialNB()
23+
if (layer == "sarcasm"):
24+
pickled_multinomial_nv = joblib.load(
25+
'src/algo/sarcasmpickle_multinomial.pkl')
26+
if (layer == "spam"):
27+
pickled_multinomial_nv = joblib.load(
28+
'src/algo/spampickle_multinomial.pkl')
29+
prediction_of_each_data = pickled_multinomial_nv.predict(
30+
X_test).tolist() # converted numpyarray to list
31+
# returns list of 1 or 0 items where 1 for yes and 0 for no
32+
return prediction_of_each_data
33+
34+
35+
# this debug function is needed to update our training model if new data are added to Sentimento's training datasets
36+
# this function will perform count vectoization calculation for training data too which takes longer time than using pickled data
37+
def debug_multinomial(testing_data, layer):
38+
training_data = None
39+
preprocessed_training_data = []
40+
training_label = []
41+
if (layer == "spam"):
42+
training_data = pd.read_csv('src.algo.spam_training.csv').values
43+
for each in training_data:
44+
preprocessed_training_data.append(text_cleaner(each[3]))
45+
training_label.append(each[4])
46+
if (layer == "sarcasm"):
47+
training_data = pd.read_csv('src.algo.sarcasm_training.csv').values
48+
for each in training_data:
49+
preprocessed_training_data.append(text_cleaner(each[0]))
50+
training_label.append(each[1])
51+
52+
# now count vectorizing part
53+
cv = CountVectorizer(ngram_range=(1, 2))
54+
X_train = cv.fit_transform(preprocessed_training_data)
55+
56+
# serialization
57+
# Save the model as a pickle in a file
58+
# unccoment this parts to dump in pickle file for production use
59+
if(layer == "spam"):
60+
joblib.dump(cv, 'spampickle_countvectorizer.pkl')
61+
if (layer == "sarcasm"):
62+
joblib.dump(cv, 'sarcasmpickle_countvectorizer.pkl')
63+
64+
X_test = cv.transform(testing_data)
65+
mn = MultinomialNB()
66+
mn.fit(X_train, training_label)
67+
68+
# serialization
69+
# unccoment this parts to dump in pickle file for production use
70+
if(layer == "spam"):
71+
joblib.dump(mn, 'spampickle_multinomial.pkl')
72+
if (layer == "sarcasm"):
73+
joblib.dump(mn, 'sarcasmpickle_multinomial.pkl')
74+
75+
prediction_of_each_data = mn.predict(X_test).tolist()
76+
return prediction_of_each_data
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
from src.algo.MultinomialNaiveBayes import production_multinomial
2+
from src.algo.data_preprocessor import get_clean_texts
3+
from nltk.sentiment.vader import SentimentIntensityAnalyzer
4+
5+
6+
class Sentimento:
7+
8+
# instance attribute, constructor
9+
def __init__(self, testing_data):
10+
self.testing_data = testing_data
11+
self.preprocessed_testing_data = get_clean_texts(testing_data)
12+
self.spam_detected_list = production_multinomial(
13+
self.preprocessed_testing_data, layer="spam")
14+
self.sarcasm_detected_list = production_multinomial(
15+
self.preprocessed_testing_data, layer="sarcasm")
16+
17+
# to get total data taken to analyse
18+
def data_count(self):
19+
return len(self.preprocessed_testing_data)
20+
21+
# to get total sarcasm/spam detected data's
22+
def layer_count(self):
23+
spam_count = 0
24+
sarcasm_count = 0
25+
layer_count = []
26+
for each in self.spam_detected_list:
27+
if each == 1:
28+
spam_count += 1
29+
for each in self.sarcasm_detected_list:
30+
if each == 1:
31+
sarcasm_count += 1
32+
layer_count.append(spam_count)
33+
layer_count.append(sarcasm_count)
34+
return layer_count # index 0 for spam count and index 1 for sarcasm count
35+
36+
# to get dict containing overall polarity, positive/negative count from provided data
37+
def overall_polarity(self):
38+
# considering -1 to -0.2 as negative, -0.2 to 0.2 as neutral and 0.2 to 1 as positive
39+
positive_count = 0
40+
negative_count = 0
41+
neutral_count = 0
42+
overall_polarity = 0.00000
43+
sid = SentimentIntensityAnalyzer()
44+
polarity_result = {}
45+
compound_polarity = 0
46+
sum_of_all_polarity = 0
47+
for each in self.preprocessed_testing_data:
48+
compound_polarity = sid.polarity_scores(each)["compound"]
49+
sum_of_all_polarity += compound_polarity
50+
if (compound_polarity <= -0.2):
51+
negative_count += 1
52+
elif (compound_polarity >= 0.2):
53+
positive_count += 1
54+
else:
55+
neutral_count += 1
56+
overall_polarity = (sum_of_all_polarity/self.data_count())
57+
polarity_result["positive_count"] = positive_count
58+
polarity_result["negative_count"] = negative_count
59+
polarity_result["neutral_count"] = neutral_count
60+
polarity_result["overall_polarity"] = overall_polarity
61+
return polarity_result
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
from src.algo.stopwords import customStopWords
2+
from nltk.tokenize import RegexpTokenizer
3+
from nltk.stem.porter import PorterStemmer
4+
5+
6+
# function that takes list of raw texts and returns list of clean texts
7+
def get_clean_texts(list_of_texts):
8+
list_of_clean_texts = []
9+
for each in list_of_texts:
10+
cleaned_text = text_cleaner(each)
11+
if (cleaned_text != ''):
12+
list_of_clean_texts.append(cleaned_text)
13+
return list_of_clean_texts
14+
15+
16+
# function that takes a raw text and return clean text
17+
def text_cleaner(text):
18+
tokenizer = RegexpTokenizer(r'\w+')
19+
ps = PorterStemmer()
20+
tokenized_text = tokenizer.tokenize(text.lower())
21+
clean_tokenized_text = [] # will add words after filtering stopwords
22+
for each_token in tokenized_text:
23+
if each_token not in customStopWords():
24+
# to remove stopword tokens
25+
clean_tokenized_text.append(each_token)
26+
stemmed_text = []
27+
for token in clean_tokenized_text:
28+
# appending the stemmed words in stemmed data
29+
stemmed_text.append(ps.stem(token))
30+
clean_data = " ".join(stemmed_text) # changing tokens into one sentence
31+
return clean_data

0 commit comments

Comments
 (0)