This repo contains notebooks and functions used to run cluster analysis on biomass products. They can be clustered by either location or products. To run the analysis, see template files for a walkthrough (or previously done analysis for further functionality)
The repository includes the following files and directories:
By_Prod\
contains files used in an intial clustering attempt by locationsBy_Loc\
contains files used in an intial clustering attempt by productsclustering_explorations
contains notebooks with clustering results for various locations, as well as a Simulation exercisedata
contains all the datasets used for the clusteringProd_Clustering_Template.ipynb
: template with demonstration of how to run and visualize the clusteringClusters_by_n.ipynb
: exploration using different parameters in clustering algorithms to find appropriate valuessimilarity_metric_template.ipynb
: template to find product similarity using hamming distanceutils.py
: contains functions used during clustering and analysis and preparation of datautils_char_analysis.py
: contains functions used for finding product similarity metricsREADME.md
: This file, providing an overview of the repository and instructions for running the code.
The code requires the following dependencies:
- Python 3.x
- NumPy
- python sklearn
- Pandas
- seaborn
- matplotlib
- scipy
- import functions from utils/utils_char_analysis