Unther other assignments, this project is best formulated as a story. Included in the project folder are the following:
--the data used (while the data is small, the code is scalabe for larger projects)
--an ipynb notebook that displays, describes, and models the data
--a pdf detailing the data, project design, and results
Make sure to have pyspark, and numpy installed where you plan to clone the project, and also Seaborn if you wish to view the data visually.
- Clone the project
- CD to the appropriate folder
- use
pyspark
to open the notebook.