You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
End-to-end data pipeline transforming Olist e-commerce data through Azure cloud services. Implements medallion architecture (Bronze-Silver-Gold) with multi-source ingestion, Spark-based processing, and OLTP-to-OLAP optimization for analytics-ready datasets.
This project implements a real-time data pipeline with Kafka, Spark, and MongoDB. It generates vehicle data using UXSIM, streams it to a Kafka broker, processes it with Spark, and stores raw and processed data in MongoDB. Queries analyze vehicle counts, speeds, and routes over specified periods.
This project performed data wrangling, analysis, visualization as well as machine learning prediction on a hypothetical music app's user churn with pyspark.
ontains the code and examples for my article on Medium, which introduces the English SDK for Apache Spark, showcasing how to combine the power of Apache Spark with large language models (LLMs)