Skip to content

Yash49hdfmj/Enterprise-Retail-Analytics-using-Gold-Layer-SQL-EDA-

Repository files navigation

Data Warehouse Analysis using SQL Server

Project Summary

This project involves performing comprehensive Exploratory Data Analysis (EDA) on a simulated retail data warehouse modeled in a star schema format. Using SQL Server Management Studio (SSMS) and T-SQL, the project aims to uncover valuable business insights by querying fact and dimension tables.

The goal was to derive actionable intelligence around customer behavior, product performance, regional trends, and sales timelines — setting the stage for more advanced analytics such as BI dashboarding, forecasting, and segmentation.


Objectives

  • Understand the structure and volume of the retail data warehouse.
  • Analyze key business entities: customers, products, sales transactions, and time.
  • Generate analytical views of business magnitude and performance.
  • Apply ranking and aggregation techniques to identify top/bottom contributors.
  • Enable data-driven decision-making by providing clear metrics.

Tools & Technologies

  • SQL Server 2019
  • T-SQL
  • SSMS (SQL Server Management Studio)
  • Star Schema Modeling
  • Aggregate and Window Functions

Dataset Schema (Star Schema)

The database is designed in a star schema, featuring:

  • Fact Table: fact_sales Contains transactional records of orders with measures like sales, quantity, price, and order_date.

  • Dimension Tables:

    • dim_customers – Customer demographics, location, birthdate, etc.
    • dim_products – Product hierarchy including category, subcategory, product name, and cost.
    • Additional date-related fields (e.g., order_date, ship_date) embedded in the fact table.

1. Database Exploration

The analysis begins with exploring the structure of the database using:

SELECT * FROM INFORMATION_SCHEMA.TABLES;
SELECT * FROM INFORMATION_SCHEMA.COLUMNS;

Findings:

  • 6+ key tables
  • ~100+ columns
  • Clear star-schema relationships with proper foreign key design.

2. Dimension Analysis

Purpose: Understand the static attributes of the business (customers, products, geography).

Highlights:

  • Identified unique countries customers are located in:

    SELECT DISTINCT country FROM dim_customers;
  • Explored product hierarchy:

    • 3 main categories: Accessories, Bikes, Clothing, and Components.
    • 295 distinct products across subcategories.
  • Breakdown of products by category:

    SELECT category, COUNT(product_id) FROM dim_products GROUP BY category;

3. Date-Based Analysis

Used MIN, MAX, and DATEDIFF to analyze order timelines:

Findings:

  • First Order Date: Earliest transaction in the dataset.
  • Last Order Date: Most recent transaction.
  • Time Span Covered: Over 4 years of transactional data.
SELECT MIN(order_date), MAX(order_date), DATEDIFF(YEAR, MIN(order_date), MAX(order_date)) FROM fact_sales;

Also analyzed customer age range:

SELECT MIN(birth_date), MAX(birth_date) FROM dim_customers;
  • Oldest customer age: ~70+ years
  • Youngest customer age: ~18 years

4. Measure-Based Aggregation

Analyzed the core quantitative metrics using SQL aggregation functions:

SELECT SUM(sales), SUM(quantity), AVG(price), COUNT(DISTINCT order_number) FROM fact_sales;

Key Metrics:

  • Total Sales: ₹23M+
  • Total Quantity Sold: 400K+
  • Average Price: ₹58.3
  • Total Orders (Distinct): ~27K
  • Total Products: 295
  • Total Customers: 18K
  • Active (ordering) Customers: 100%

Generated a consolidated business summary report using UNION ALL:

SELECT 'TOTAL SALES', SUM(sales) FROM fact_sales
UNION ALL
SELECT 'TOTAL QUANTITY', SUM(quantity) ...

5. Magnitude Analysis

Purpose: Compare aggregate metrics across dimensions (e.g., revenue by category, products by gender, etc.).

Key Queries:

  • Revenue by Product Category

    SELECT category, SUM(sales) FROM fact_sales JOIN dim_products GROUP BY category;
  • Customer Count by Country/Gender

    SELECT country, COUNT(*) FROM dim_customers GROUP BY country;
  • Average Cost by Product Category

    SELECT category, AVG(product_cost) FROM dim_products GROUP BY category;
  • Quantity Distribution by Country

    SELECT country, SUM(quantity) FROM fact_sales JOIN dim_customers GROUP BY country;

Business Observations:

  • Bikes dominate revenue contributions.
  • United States leads in customer count and order volume.
  • Male customers slightly outnumber female customers.
  • Some categories have higher average costs but lower sales volumes.

6. Ranking Analysis (Advanced SQL)

Used SQL window functions to identify top and bottom performers in a business context.

a. Top 5 Products by Revenue

SELECT product_name, ROW_NUMBER() OVER (ORDER BY SUM(sales) DESC) AS rank FROM fact_sales ...

b. Bottom 5 Products by Revenue

... OVER (ORDER BY SUM(sales) ASC)

c. Top 10 Customers by Revenue

SELECT TOP 10 customer_id, SUM(sales) ...

d. 3 Customers with Fewest Orders

SELECT TOP 3 customer_id, COUNT(DISTINCT order_number) ...

Insights:

  • The top 5 products contribute disproportionately to revenue.
  • A few customers have placed very high-value orders.
  • Some customers exist with only one or two purchases — useful for churn modeling.

7. High vs Low Cardinality Dimensions

Identified which dimensions are high or low in cardinality to aid future OLAP modeling:

  • Low cardinality:

    • gender, country, category (Useful for slicing data in dashboards)
  • High cardinality:

    • product_name, customer_id (Important for indexing and efficient storage)

8. Final Takeaways and Next Steps

What Was Achieved:

  • Complete understanding of data warehouse structure and content.
  • Quantitative profiles of products, customers, and time-based activity.
  • Ranking insights to identify sales leaders and underperformers.
  • Built a solid foundation for advanced use cases.

Future Scope:

  • Business Intelligence (BI): Create Power BI/Tableau dashboards using pre-aggregated views.
  • Predictive Analytics: Feed insights into ML models for churn, upselling, and demand forecasting.
  • OLAP Cubes: Create aggregations across time and region dimensions for fast reporting.

GitHub Repository

Pre-Requisite Medallion Architecture Project Access full SQL code and documentation: 🔗 GitHub: SQL Data Warehouse Project


Final Word

This project demonstrates the real-world application of advanced SQL in a data warehousing environment. By leveraging analytical thinking, dimensional modeling, and SQL optimization techniques, we produced valuable insights for decision-makers.

If you're interested in data modeling, warehousing, or SQL performance tuning — this project provides a strong foundation in those areas.

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages