Skip to content

A machine learning project predicting suicide risk based on multiple socio-economic and environmental factors using data mining techniques.

Notifications You must be signed in to change notification settings

Athharv5/Suicide-Prediction-System

Repository files navigation

🧠 Suicide Prediction System: A Socioeconomic Analysis

This Suicide Prediction System leverages data science 🧮 to analyze various environmental and socioeconomic factors such as GDP, weather conditions, happiness index, suicide rates, and population. By examining these indicators, the system identifies trends and patterns in suicide rates and develops predictive models to assess suicide risks. These insights can help organizations and mental health professionals take proactive measures to prevent suicides and improve public health outcomes. 🚑


🌐 Live Website

👉 Check out the live demo: Suicide Prediction Website


📊 Data Collection, Preparation, Cleaning & Visualizations

📥 Data Sources

  • GDP 💰
  • Weather (Temperature) 🌡️
  • Happiness Index 😊
  • Suicide Rates ⚰️
  • Population 👥

🧹 Data Preparation

  • ✅ Web scraping & API integration.
  • ✅ Missing value imputation.
  • ✅ Outlier detection & removal.
  • ✅ Dataset merging & standardization.

🖼️ Sample Data

Before Cleaning

  • GDP Data
Country 1999 2000 2001 2002
Country A 162.64 160.88 161.32 164.12
  • Weather Data
Country 1999 2000 2001 2002
Country A 13 15 14 14.5
  • Happiness Index
Country Overall Rank Life Evaluation
Country A 4 1.7666
  • Suicide Rates
Country Male Female Both
Country A 4.4 2.4 3.2
  • Population
Country 1988 1989 1990
Country A 720859132 720907282 720889272

After Cleaning

Country GDP Happiness Male Suicide Female Suicide Both Suicide Population Temperature Year
Country A 165 1.766 4.4 2.4 3.2 720859132 15 1988

📈 Insights from Visualizations

  • 📊 Bar Charts: Suicide rates across countries.
  • 🔬 Scatter Plots: Relationship between population and suicide rates.
  • 🌡️ Heatmaps: Correlation matrix of socioeconomic factors.

🤖 Model Implementation

Models Used

  • 🎯 XGBoost
  • 🌲 Random Forest
  • 📍 K-Nearest Neighbors (KNN)
  • 🚀 LightGBM
  • ⚙️ Support Vector Machines (SVM)

📌 Conclusion

  • ✅ KNN achieved the highest accuracy of 97%.
  • ✅ Random Forest achieved 93.9%.
  • ✅ Key predictive factors: GDP, Population Size, Happiness Index.
  • ✅ Medium-risk classification remains challenging and needs further optimization.
  • ✅ Future work includes feature engineering, additional data sources, and real-time prediction models.

🚀 Future Scope

  • Improve model accuracy on medium-risk group.
  • Add additional behavioral and psychological data.
  • Build real-time prediction dashboard for public health monitoring.

📞 Contact

Atharv Kadam
📧 atharva895@gmail.com