Project Title: Investigation of Methods for Resolving Statistical Noise and Understanding Correlation Structure in High-Dimensional Data

Description

This project, conducted as part of the ELEN90094 – Large Data Methods & Applications course at The University of Melbourne, investigates methods for resolving statistical noise and understanding correlation structures in high-dimensional data, with a focus on financial stock return data. The analysis is divided into two stages:

Stage 1: An in-depth study of data analysis methods for identifying true correlation information in complex, high-dimensional datasets. The study involves reviewing and critiquing approaches presented in the research paper by V. Plerou et al. (Physical Review E, 2002), specifically methods for investigating correlation patterns in financial data.
Stage 2: Application of the methods studied in Stage 1, along with additional methods covered in the course, to real-world financial stock return datasets from periods before and after the emergence of COVID-19. The analysis aims to distinguish true correlations from noise, quantify statistical noise, and reveal structured correlation patterns.

The project will produce a comprehensive report and code that documents the methods, experiments, and results. The results will provide insights into the dependencies and correlation structures in financial data, and the project will explore the stability and properties of estimated correlation matrices.

Key Components:

Literature review of high-dimensional data analysis methods.
Application and interpretation of these methods on real-world financial data.
Documentation of numerical experiments and their results.
Insights into noise resolution and correlation structures in complex datasets.

The full report is available here Final report_GoupA.pdf

Setup Instructions for Luca's code

The code is stored in the file "Formatted code.Rmd". It is a R-Notebook. The code is divided into two sections: Plots/Results and Functions. The first section, Plots/Results, shall be run after the second section "Functions". The order is chosen such that the IDE remains uncluttered.

Setup Instructions for Mukul's code

VS Code setup

Open then data_analysis_pipeline.ipynb.
Select Kernel here
Select Python Environments
Create Python Environment
Venv -> Python 3.12.7
Follow the prompts on the screen, it will ask you to select the requirements file, select requirements.txt and install all the packages required.

Requirements file

The following Python packages are required for this project:

Python 3.12.7
pandas
numpy
matplotlib
scipy
scikit-learn
jupyter
notebook
ipykernel

If you need to manually install additional packages, you can do so using:

pip install <package-name>

Data Description

The project uses the following datasets:

Stock_metadata.csv: Metadata table for the stocks.
Data_PreCovid_20170101_20200109.csv: Daily log-returns for each stock during the pre-COVID period (2017-01-01 to 2020-01-09).
Data_PostCovid_20200110_20221231.csv: Daily log-returns for each stock during the post-COVID period (2020-01-10 to 2022-12-31).

Each dataset contains information for 98 stocks from the US market.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Datasets		Datasets
References		References
Results		Results
.gitignore		.gitignore
Final report_GroupA.pdf		Final report_GroupA.pdf
Formatted code.Rmd		Formatted code.Rmd
LICENSE		LICENSE
Project Guidelines 2024.pdf		Project Guidelines 2024.pdf
README.md		README.md
data_analysis_pipeline.ipynb		data_analysis_pipeline.ipynb
requirements.txt		requirements.txt
setup_environment.sh		setup_environment.sh
stock_analysis.py		stock_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Title: Investigation of Methods for Resolving Statistical Noise and Understanding Correlation Structure in High-Dimensional Data

Description

Setup Instructions for Luca's code

Setup Instructions for Mukul's code

VS Code setup

Requirements file

Data Description

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Muckthebuck/NoiseResolution-HighDimData

Folders and files

Latest commit

History

Repository files navigation

Project Title: Investigation of Methods for Resolving Statistical Noise and Understanding Correlation Structure in High-Dimensional Data

Description

Setup Instructions for Luca's code

Setup Instructions for Mukul's code

VS Code setup

Requirements file

Data Description

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages