This project was developed as part of the curricular unit "Elements of Artificial Intelligence and Data Science" of the BSc in Artificial Intelligence and Data Science.
It was graded 20/20.
The objective is to develop a complete data science pipeline to predict the one-year survival of patients diagnosed with Hepatocellular Carcinoma (HCC).
The project is divided into the following main stages:
- Exploratory data analysis
- Examination of feature types
- Class distribution
- Attribute-level values
- Identification of data inconsistencies
- Missing value imputation
- Data transformation and scaling
- Feature engineering
- Selection of classification algorithms
- Definition of training and testing sets
- Model performance evaluation
Algorithms used:
- Decision Trees
- K-Nearest Neighbors (KNN)
- Random Forest
- Gradient Boosting
- Multi-Layer Perceptron (MLP)
- Logistic Regression
- Stacking Classifier
- Support Vector Classifier (SVC)
- Comparison of classification results using standard metrics:
- Confusion Matrix
- AUC/ROC
- Precision
- Recall
- Accuracy
- Extraction of meaningful insights
- Explanation of model behaviors
- Recommendations for future analyses
Ensure you have Python 3 installed.
All required libraries can be installed from the requirements.txt file.