This project explores video game sales data through data preprocessing, exploratory data analysis (EDA), and unsupervised learning techniques. The primary goal is to uncover insights into sales trends and classify video games based on their market performance.
- Data Preprocessing: Handle missing values, remove duplicates, and ensure data integrity.
- Exploratory Data Analysis (EDA): Gain insights into sales distributions, regional trends, and top-performing games.
- Clustering with K-Means: Group video games based on sales performance to identify common characteristics.
- Dimensionality Reduction (PCA): Optimize high-dimensional data for better visualization and interpretation.
- Programming Language: Python
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Machine Learning Techniques: K-Means Clustering, Principal Component Analysis (PCA)
The dataset used in this analysis contains sales data of video games across different regions, including North America, Europe, and Japan. It includes features such as game title, genre, platform, and sales figures.
- Identification of sales trends and influential factors in video game success.
- Classification of video games based on their market performance.
- Actionable insights that can aid in strategic decision-making for game developers and publishers.
- Clone this repository.
- Install the required dependencies.
- Run the Jupyter Notebook to explore the dataset and execute the analysis.
- Expanding the dataset to include more recent sales figures.
- Incorporating additional clustering techniques for comparison.
- Enhancing visualizations for deeper insights.