This project presents a scalable end-to-end data pipeline designed for processing and analyzing car sales data using the Azure Cloud and Databricks ecosystem.
The pipeline ingests car sales data from a github repository into Azure SQL Database, processes it through a multi-layered transformation pipeline in Databricks, and outputs a structured star schema suitable for downstream reporting.
It automates:
- Data ingestion from GitHub β Azure SQL DB
- ETL transformations using Delta Lake (Bronze β Silver β Gold)
- Dimensional modeling with fact and dimension tables
- Visualization/reporting via Power BI
The architecture follows a modern lakehouse pattern, enabling scalable and secure data engineering.
This pipeline automates the ingestion phase, pulling data from GitHub and loading it into Azure SQL Database.
- Lastload and current_load lookups help manage incremental load logic.
- Data is copied into the Bronze Layer (raw zone).
- A stored procedure updates the watermark after successful ingestion.
The Databricks pipeline implements the medallion architecture using Delta Live Tables:
| Layer | Description |
|---|---|
| Silver | Cleaned and normalized data from the bronze layer |
| Gold | Dimension and fact tables ready for analytics/reporting |
The project consists of the following transformation notebooks:
cars_sales_project/
βββ Silver_Notebook.python # Initial silver layer transformations
βββ Gold DimBranch.python # Branch dimension table
βββ Gold DimDate.python # Date dimension table
βββ Gold DimDealer.python # Dealer dimension table
βββ Gold DimModel.python # Model dimension table
βββ Gold_fact_sales.python # Fact table for car sales
Each notebook processes data from the silver layer and applies business logic for gold layer generation.
- Azure Data Factory β Orchestrates data movement and ingestion
- Azure SQL Database β Temporary data storage post-ingestion
- Azure Data Lake Gen2 β Scalable and secure data storage layer
- Databricks (Delta Live Tables) β Data transformation & processing
- Delta Lake Format β Supports ACID transactions & versioning
- Power BI β Reporting and visualization layer
- GitHub β Source control for dataset and notebooks
The Gold Layer produces a Star Schema:
- Central
Fact_Salestable - Dimensions:
Dim_Branch,Dim_Dealer,Dim_Model,Dim_Date

