Skip to content

supakunz/Book-Revenue-Pipeline-GCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Book Sales End-To-End Data Engineering Project on Google Cloud Platform

An end-to-end modern data engineering project, including deployment of ETL pipeline on Google Cloud Platform, using BigQuery for data analysis and leveraging Looker to generate an insight dashboard.

Architecture

Project Architecture

Technology Stack

Languages:

  • Python
  • SQL

Google Cloud Platform:

  • Google Storage
  • Google Composer
  • Big Query
  • Looker Studio

Data Storage

The raw data and output files are too large to store in the repository. They are stored on Google Drive.

Data Modeling

Uber Data Model

ETL Pipeline

ETL pipeline

Output

  • The final output from Looker Studio can be accessed via the following link: View Dashboard. Note: The dashboard reads data from a static CSV file exported from BigQuery.

❄️ Setup

  1. Clone this repository :
git clone https://github.com/supakunz/Book-Revenue-Pipeline-GCP.git
  1. Navigate to the project folder and Set up the environment variables :
cd Book-Revenue-Pipeline-GCP
  • Create a .env file in the root directory.

  • Add the following variables to the .env file, replacing the placeholder values with your own:

MYSQL_CONNECTION = mysql_default #file name in Data Storage --> <data_audible_data_merged.csv>
CONVERSION_RATE_URL = <your_api_url> #file name in Data Storage --> <data_conversion_rate.csv>
MYSQL_OUTPUT_PATH = /home/airflow/gcs/data/audible_data_merged.csv
CONVERSION_RATE_OUTPUT_PATH = /home/airflow/gcs/data/conversion_rate.csv
FINAL_OUTPUT_PATH = /home/airflow/gcs/data/output.csv

Contact

Supakun Thata (supakunt.thata@gmail.com)

Releases

No releases published

Packages

No packages published

Languages