Project based on application of azure databricks
-
Updated
Mar 7, 2023 - Python
Project based on application of azure databricks
This project implements a real-time data pipeline with Kafka, Spark, and MongoDB. It generates vehicle data using UXSIM, streams it to a Kafka broker, processes it with Spark, and stores raw and processed data in MongoDB. Queries analyze vehicle counts, speeds, and routes over specified periods.
This project performed data wrangling, analysis, visualization as well as machine learning prediction on a hypothetical music app's user churn with pyspark.
End-to-end data pipeline transforming Olist e-commerce data through Azure cloud services. Implements medallion architecture (Bronze-Silver-Gold) with multi-source ingestion, Spark-based processing, and OLTP-to-OLAP optimization for analytics-ready datasets.
ontains the code and examples for my article on Medium, which introduces the English SDK for Apache Spark, showcasing how to combine the power of Apache Spark with large language models (LLMs)
NBA shot predictions with PySpark and SparkML
A repository concentrating on using High end parallel pipelines to perform ETL across various data sources
Add a description, image, and links to the pysaprk topic page so that developers can more easily learn about it.
To associate your repository with the pysaprk topic, visit your repo's landing page and select "manage topics."