An end-to-end data analysis project on Walmart sales using Python, SQL (MySQL & PostgreSQL), and Power BI. Includes data cleaning, feature engineering, complex SQL queries, and insightful dashboards to uncover trends, revenue patterns, and customer behavior for business decision-making.
This project is an end-to-end data analysis solution aimed at extracting valuable business insights from Walmart’s sales data. By integrating Python for data cleaning and feature engineering, SQL (MySQL and PostgreSQL) for querying, and Power BI for visualization, we deliver a robust data pipeline and dashboard that addresses key business questions.
It serves as an ideal case study for aspiring Data Analysts and professionals looking to improve skills in data manipulation, SQL querying, dashboard reporting, and analytical problem-solving.
- Languages: Python (v3.8+), SQL
- Databases: MySQL, PostgreSQL
- Libraries:
pandas
,numpy
,sqlalchemy
,mysql-connector-python
,psycopg2
- Visualization: Power BI
- Environment: Visual Studio Code
- APIs: Kaggle API for dataset download
- Python 3.8+
- MySQL and PostgreSQL setup
- Kaggle API access
- Required libraries installed :
pandas
,numpy
,sqlalchemy
,mysql-connector-python
,psycopg2
- Organized project structure using VS Code.
- Created dedicated folders for scripts, data, and notebooks.
- Retrieved
kaggle.json
API token from Kaggle. - Configured access via CLI to directly download datasets.
- Source: Walmart Sales Dataset.
- Downloaded into
data/
directory for easy access.
pip install pandas numpy sqlalchemy mysql-connector-python psycopg2
- Data loaded into Pandas DataFrames for initial analysis.
- Performed
.info()
,.describe()
, and.head()
checks. - Identified data types, outliers, and missing values.
- Removed duplicates.
- Handled missing values using drop/fill techniques.
- Standardized date and currency formats.
- Converted appropriate columns to
datetime
,float
, etc.
- Prepared dataset for SQL-based aggregations.
- Used
SQLAlchemy
to connect to both databases. - Automated table creation and data insertion.
- Verified data via initial SQL checks.
Follow for SQL Business Queries : Business SQL Queries
Answered key questions like:
- Which branches and product categories generate the most revenue?
- What are the top-selling products and cities?
- Which time periods have peak sales?
- What are the trends in payment methods and ratings?
Each SQL script includes:
- Problem statement
- Query logic
- Observations and conclusions
-
Visualized insights in Power BI.
-
Created dynamic reports highlighting:
- Sales by branch and product type
- Customer purchase behavior
- Time-based sales patterns
- Branches with the highest revenue identified.
- Top-performing product categories and best-selling items revealed.
- High-margin categories and cities analyzed for profitability.
- Peak shopping times, preferred payment modes, and customer satisfaction trends visualized.
Here are 3 potential future upgrades:
- Real-time Streaming Dashboard – Integrate with APIs to track live sales metrics.
- Predictive Analytics – Apply machine learning models for sales forecasting.
📦 Walmart-Revenue-Trends/
├── data/ # Raw data files
├── notebooks/ # Jupyter analysis files
├── scripts/ # Python scripts for ETL and SQL loading
├── dashboards/ # Power BI .pbix files
├── sql/ # SQL queries used for analysis
└── README.md # Project documentation (this file)