This project performs a simple but production-ready ETL pipeline that processes multiple CSV sales files, transforms the data, stores it in a SQLite database, and exports a monthly summary.
- Load multiple CSV files from a folder
- Clean and convert date columns
- Extract and add a
month
column - Save the full dataset to a SQLite database (
sales_data.db
) - Generate a monthly and category sales summary to CSV (
summary.csv
) - Track every ETL step using Python's built-in
logging
module
etl_multi_csv_project/
├── data/ # Raw input CSV files
├── etl/ # All ETL scripts (load, process, save, export)
├── output/ # Generated database, summary, and log files
├── main.py # Main script to run the ETL pipeline
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Install dependencies:
pip install -r requirements.txt
- Run the pipeline:
python main.py
sales_data.db
— cleaned and combined data saved to SQLitesummary.csv
— total sales per month and categoryetl.log
— ETL process logs
- pandas
- SQLAlchemy
- logging (built-in)
Khairu Ikramendra
Available for freelance dashboard & data analytics projects.
Let’s connect on Linkedin or explore more on Upwork !
MIT License — feel free to use and modify for your own projects.
This project is suitable for practicing ETL, documenting data projects, and building the foundation for automated dashboards and reports.