Skip to content

πŸ”¬ SciPyMasterPro β€” A hands-on, modular project to master SciPy for statistics, optimization, linear algebra, curve fitting, and simulations. Includes 10+ Jupyter notebooks, an interactive Streamlit app, synthetic datasets, reusable utility functions, Dockerized setup, and cheatsheets for fast recall, portfolio building, and interview prep.

License

Notifications You must be signed in to change notification settings

SatvikPraveen/ScipyMasterPro

🧠 SciPyMasterPro

License: GPL v3 Python Notebooks SciPy Focused Streamlit Synthetic Data Portfolio Ready


🎯 Project Goal

SciPyMasterPro is a hands-on, deep-dive project built to master the complete range of functionality offered by SciPy. It emphasizes numerical computing, distribution fitting, hypothesis testing, optimization, and simulation through clean, synthetic data.

This project helps you build deep fluency with scipy.stats, scipy.optimize, scipy.interpolate, and scipy.linalg β€” and supporting libraries.


πŸš€ Key Features

βœ… 10 concept-driven Jupyter notebooks
βœ… Interactive Streamlit web application for live statistical exploration
βœ… All statistical logic done with pure SciPy (no heavy reliance on statsmodels)
βœ… Modular utility functions for resampling, optimization, diagnostics
βœ… Synthetic data generator for reproducible, controlled experiments
βœ… Shared notebooks comparing SciPy vs Statsmodels
βœ… Markdown cheatsheet and mastery checklist for fast recall
βœ… Docker-ready for seamless environment setup (Jupyter + Streamlit in one container)
βœ… Perfect for interview prep, portfolio building, and teaching use cases


🌱 Why Synthetic?

This project uses synthetic datasets to:

  • ✨ Focus on concepts, not domain-specific noise
  • πŸ” Enable repeatable simulation and inference
  • πŸ§ͺ Make assumption validation crystal clear
  • πŸ“ Generate precise shapes and edge cases needed for testing

🧱 Project Structure

SciPyMasterPro/
β”œβ”€β”€ notebooks/               # Core concept notebooks (distribution fitting, optimization, etc.)
β”œβ”€β”€ shared_notebooks/        # Comparison notebooks with statsmodels (PDF/ECDF, power analysis)
β”œβ”€β”€ streamlit_app/           # Interactive Streamlit web app (hypothesis tests, inference tools)
β”œβ”€β”€ synthetic_data/          # Scripts + outputs for synthetic datasets
β”œβ”€β”€ utils/                   # Reusable code: bootstrapping, fitting, plotting, diagnostics
β”œβ”€β”€ cheatsheets/             # Markdown cheatsheet + mastery checklist
β”œβ”€β”€ exports/                 # All plots and tabular results from notebooks and app
β”‚   β”œβ”€β”€ plots/
β”‚   └── tables/
β”œβ”€β”€ requirements.txt         # Main dependencies
β”œβ”€β”€ requirements_dev.txt     # Full development environment
β”œβ”€β”€ Dockerfile               # Docker environment for Jupyter + Streamlit
β”œβ”€β”€ README.md                # This file

πŸ“˜ Notebook Modules

Notebook Conceptual Focus
01_descriptive_stats Moments, trimmed stats, robust summaries
02_hypothesis_tests Parametric and nonparametric tests, assumption checks
03_distribution_fitting .fit(), .pdf(), .cdf(), MLE
04_sampling_resampling Stratified sampling, rv_discrete, Dirichlet, multinomial
05_bootstrap_simulation Manual bootstrapping, CI, distribution shape checking
06_multivariate_analysis Mahalanobis, covariance, chiΒ², permutation tests
07_optimization_minimization Minimize functions, constraints, real-world losses
08_linear_algebra_stats SVD, eigen, least squares, matrix ops
09_interpolation_curvefitting Splines, interpolators, curve_fit()
10_inference_from_raw Inference from summary stats, sem(), interval estimation

πŸ” Shared Notebooks with Statsmodels

Notebook Topics Compared
shared_pdf_ecdf.ipynb ECDF, fitted PDFs, visual fit quality
shared_statistical_power.ipynb Manual power analysis using SciPy vs statsmodels

🧬 Synthetic Data Preview

Dataset Source Use Case
generate_normal_skewed() Skew/kurtosis comparison and descriptive stats
generate_mixed_distributions() Distribution fitting & tail analysis
generate_multivariate_gaussian() Mahalanobis distance, PCA
generate_sample_for_optimization() Optimization curve, cost function
generate_noisy_curve_fitting_data() Model calibration and smoothing
generate_poisson_data() Discrete probability testing

πŸ“Š Exports Example

exports/
β”œβ”€β”€ plots/
β”‚   β”œβ”€β”€ ecdf_vs_pdf.png
β”‚   β”œβ”€β”€ bootstrap_distribution.png
β”‚   └── optimization_convergence.png
β”œβ”€β”€ tables/
β”‚   β”œβ”€β”€ fitted_parameters_gamma.csv
β”‚   β”œβ”€β”€ mahalanobis_distances.csv
β”‚   └── power_curve_results.csv

βœ… Cheatsheet & Mastery Checklist

πŸ“ cheatsheets/ includes:

  • scipy_cheatsheet.md β†’ syntax, use cases, formulas

πŸ›  Utilities in utils/

  • stats_tests_utils.py β†’ Wrapper for t-tests, chiΒ², normality tests, rank-based methods
  • distribution_utils.py β†’ Fit, sample, evaluate PDFs/CDFs for multiple distributions
  • sim_utils.py β†’ Bootstrap, permutation tests, resampling utilities
  • viz_utils.py β†’ ECDF, diagnostic plots, confidence bands, linear algebra plots
  • inference_utils.py β†’ Compute SEM, confidence intervals, t-tests from summary stats
  • linear_algebra_utils.py β†’ Matrix generation, eigen decomposition, SVD, least squares solutions
  • optimization_utils.py β†’ Solve constrained and unconstrained optimization problems
  • pdf_ecdf_utils.py β†’ Manual ECDF computation, PDF–ECDF overlays, fit quality visualization
  • power_utils.py β†’ Statistical power analysis, effect size estimation, sample size planning
  • interpolation_utils.py β†’ Curve fitting, splines, polynomial interpolation

All results export to exports/ automatically with timestamp/version control.


πŸ“¦ Installation Instructions

# Clone repo
git clone https://github.com/SatvikPraveen/SciPyMasterPro.git
cd SciPyMasterPro

# Create virtualenv
python3 -m venv scipy_env
source scipy_env/bin/activate

# Install dependencies
pip install -r requirements.txt

🐳 Docker Setup

Build Docker image:

docker build -t scipy-masterpro .

Run Streamlit app:

docker run -p 8501:8501 scipy-masterpro

Run JupyterLab:

docker run -p 8888:8888 scipy-masterpro

Run both Streamlit + Jupyter in background:

docker run -d -p 8501:8501 -p 8888:8888 scipy-masterpro

πŸ’Ό Portfolio Impact

This project was designed to:

  • βœ… Fill gaps from statsmodels and NumPy
  • βœ… Build working fluency with SciPy's major submodules
  • βœ… Provide clean synthetic demonstrations of core stats ideas
  • βœ… Enable faster recall via organized notebooks, exports, and cheatsheets
  • βœ… Become your go-to resource for reviewing stats & optimization in interviews

πŸ“œ License

This project is licensed under the GNU General Public License v3.0.

You are free to use, study, share, and modify this project under the terms of the GPLv3. Contributions are welcome and must also be licensed under GPLv3.


πŸ™Œ Acknowledgements

Thanks to the contributors of the SciPy ecosystem β€” especially the authors behind scipy.stats, scipy.optimize, and scipy.linalg β€” for making scientific computing accessible and extensible in Python.


πŸ”— Related Projects


About

πŸ”¬ SciPyMasterPro β€” A hands-on, modular project to master SciPy for statistics, optimization, linear algebra, curve fitting, and simulations. Includes 10+ Jupyter notebooks, an interactive Streamlit app, synthetic datasets, reusable utility functions, Dockerized setup, and cheatsheets for fast recall, portfolio building, and interview prep.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published