SciPyMasterPro is a hands-on, deep-dive project built to master the complete range of functionality offered by SciPy
. It emphasizes numerical computing, distribution fitting, hypothesis testing, optimization, and simulation through clean, synthetic data.
This project helps you build deep fluency with scipy.stats
, scipy.optimize
, scipy.interpolate
, and scipy.linalg
β and supporting libraries.
β
10 concept-driven Jupyter notebooks
β
Interactive Streamlit web application for live statistical exploration
β
All statistical logic done with pure SciPy (no heavy reliance on statsmodels)
β
Modular utility functions for resampling, optimization, diagnostics
β
Synthetic data generator for reproducible, controlled experiments
β
Shared notebooks comparing SciPy vs Statsmodels
β
Markdown cheatsheet and mastery checklist for fast recall
β
Docker-ready for seamless environment setup (Jupyter + Streamlit in one container)
β
Perfect for interview prep, portfolio building, and teaching use cases
This project uses synthetic datasets to:
- β¨ Focus on concepts, not domain-specific noise
- π Enable repeatable simulation and inference
- π§ͺ Make assumption validation crystal clear
- π Generate precise shapes and edge cases needed for testing
SciPyMasterPro/
βββ notebooks/ # Core concept notebooks (distribution fitting, optimization, etc.)
βββ shared_notebooks/ # Comparison notebooks with statsmodels (PDF/ECDF, power analysis)
βββ streamlit_app/ # Interactive Streamlit web app (hypothesis tests, inference tools)
βββ synthetic_data/ # Scripts + outputs for synthetic datasets
βββ utils/ # Reusable code: bootstrapping, fitting, plotting, diagnostics
βββ cheatsheets/ # Markdown cheatsheet + mastery checklist
βββ exports/ # All plots and tabular results from notebooks and app
β βββ plots/
β βββ tables/
βββ requirements.txt # Main dependencies
βββ requirements_dev.txt # Full development environment
βββ Dockerfile # Docker environment for Jupyter + Streamlit
βββ README.md # This file
Notebook | Conceptual Focus |
---|---|
01_descriptive_stats |
Moments, trimmed stats, robust summaries |
02_hypothesis_tests |
Parametric and nonparametric tests, assumption checks |
03_distribution_fitting |
.fit() , .pdf() , .cdf() , MLE |
04_sampling_resampling |
Stratified sampling, rv_discrete , Dirichlet, multinomial |
05_bootstrap_simulation |
Manual bootstrapping, CI, distribution shape checking |
06_multivariate_analysis |
Mahalanobis, covariance, chiΒ², permutation tests |
07_optimization_minimization |
Minimize functions, constraints, real-world losses |
08_linear_algebra_stats |
SVD, eigen, least squares, matrix ops |
09_interpolation_curvefitting |
Splines, interpolators, curve_fit() |
10_inference_from_raw |
Inference from summary stats, sem() , interval estimation |
Notebook | Topics Compared |
---|---|
shared_pdf_ecdf.ipynb |
ECDF, fitted PDFs, visual fit quality |
shared_statistical_power.ipynb |
Manual power analysis using SciPy vs statsmodels |
Dataset Source | Use Case |
---|---|
generate_normal_skewed() |
Skew/kurtosis comparison and descriptive stats |
generate_mixed_distributions() |
Distribution fitting & tail analysis |
generate_multivariate_gaussian() |
Mahalanobis distance, PCA |
generate_sample_for_optimization() |
Optimization curve, cost function |
generate_noisy_curve_fitting_data() |
Model calibration and smoothing |
generate_poisson_data() |
Discrete probability testing |
exports/
βββ plots/
β βββ ecdf_vs_pdf.png
β βββ bootstrap_distribution.png
β βββ optimization_convergence.png
βββ tables/
β βββ fitted_parameters_gamma.csv
β βββ mahalanobis_distances.csv
β βββ power_curve_results.csv
π cheatsheets/
includes:
scipy_cheatsheet.md
β syntax, use cases, formulas
stats_tests_utils.py
β Wrapper for t-tests, chiΒ², normality tests, rank-based methodsdistribution_utils.py
β Fit, sample, evaluate PDFs/CDFs for multiple distributionssim_utils.py
β Bootstrap, permutation tests, resampling utilitiesviz_utils.py
β ECDF, diagnostic plots, confidence bands, linear algebra plotsinference_utils.py
β Compute SEM, confidence intervals, t-tests from summary statslinear_algebra_utils.py
β Matrix generation, eigen decomposition, SVD, least squares solutionsoptimization_utils.py
β Solve constrained and unconstrained optimization problemspdf_ecdf_utils.py
β Manual ECDF computation, PDFβECDF overlays, fit quality visualizationpower_utils.py
β Statistical power analysis, effect size estimation, sample size planninginterpolation_utils.py
β Curve fitting, splines, polynomial interpolation
All results export to exports/
automatically with timestamp/version control.
# Clone repo
git clone https://github.com/SatvikPraveen/SciPyMasterPro.git
cd SciPyMasterPro
# Create virtualenv
python3 -m venv scipy_env
source scipy_env/bin/activate
# Install dependencies
pip install -r requirements.txt
Build Docker image:
docker build -t scipy-masterpro .
Run Streamlit app:
docker run -p 8501:8501 scipy-masterpro
Run JupyterLab:
docker run -p 8888:8888 scipy-masterpro
Run both Streamlit + Jupyter in background:
docker run -d -p 8501:8501 -p 8888:8888 scipy-masterpro
This project was designed to:
- β
Fill gaps from
statsmodels
andNumPy
- β
Build working fluency with
SciPy
's major submodules - β Provide clean synthetic demonstrations of core stats ideas
- β Enable faster recall via organized notebooks, exports, and cheatsheets
- β Become your go-to resource for reviewing stats & optimization in interviews
This project is licensed under the GNU General Public License v3.0.
You are free to use, study, share, and modify this project under the terms of the GPLv3. Contributions are welcome and must also be licensed under GPLv3.
Thanks to the contributors of the SciPy ecosystem β especially the authors behind scipy.stats
, scipy.optimize
, and scipy.linalg
β for making scientific computing accessible and extensible in Python.
- π PandasPlayground β Data manipulation workflows with pandas
- π’ NumPyMasterPro β Deep dive into vectorization and broadcasting
- π StatsmodelsMasterPro β Modeling & inference with
statsmodels
- π¨ SeabornMasterPro β Statistical plotting with Seaborn
- π PlotlyVizPro β Interactive dashboards with Plotly