This project replicates the data collection and analysis process described in the paper CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software by Guru Bhandari, Amara Naseer, and Leon Moonen (reference). The CVEfixes tool automates the collection of Common Vulnerabilities and Exposures (CVEs) and their corresponding fixes from open-source projects, building a curated dataset for security research.
CVEfixes is a tool designed to automate the process of collecting CVEs and their fixes from Free and Open Source Software (FOSS) repositories. It retrieves vulnerability data from the National Vulnerability Database (NVD), classifies vulnerabilities using the Common Weakness Enumeration (CWE), and links each CVE to its fixing commit in the relevant project repository. The resulting dataset enables measurement studies on vulnerability introduction, mitigation, and lifecycle in open-source software.
DuckDB is an in-process SQL OLAP (Online Analytical Processing) database management system. It is designed for efficient analytical queries on large datasets and uses a columnar storage model. DuckDB can directly query SQLite databases and is well-suited for data analysis tasks in Python, making it ideal for exploring the large CVEfixes dataset.
cve_stats.ipynb
: Jupyter notebook for data analysis and table reproduction using DuckDB on a CVEFixes database collected on December 6th, 2023.
- Python 3.8+
- DuckDB (
pip install duckdb
) - CVEfixes database