1
- # Tickit Data Lake : Building a Data Lake Using an Orchestrator + AWS Resources
1
+ # Tickit Data Pipeline
2
2
3
3
## Overview
4
4
Welcome to the Tickit Data Lake project! The Tickit Data Lake project demonstrates the construction
5
- of a scalable and robust 3-tier data lake on AWS , leveraging the power of Apache Airflow for orchestration
5
+ of a scalable and robust data pipeline , leveraging the power of Apache Airflow for orchestration
6
6
and automation. This project provides a practical example of building a modern data pipeline capable of
7
7
handling the extraction, loading, and transformation (ELT) of batch data, specifically designed to support
8
- the analytical needs of a business using the Tickit Dataset.
8
+ the analytical needs of a business using the Tickit Dataset as a case study .
9
9
10
10
## Key Features and Technologies:
11
11
@@ -14,15 +14,12 @@ and managing the entire data pipeline. It defines the workflow as a Directed Ac
14
14
dependencies between tasks are correctly handled. Airflow's robust features enable task retries, logging,
15
15
and alerting, ensuring pipeline reliability.
16
16
17
- - AWS Integration: The project seamlessly integrates with various AWS resources, including:
18
- 1 . EC2: Reliable and highly available computing for running the orchestrator.
19
-
20
- 2 . S3: Scalable object storage for the Bronze, Silver, and Gold layers.
21
-
22
- 3 . Redshift: Scalable data warehouse used for providing a high-performance analytical database.
17
+ - Integration of Multiple Data Sources: The project seamlessly integrates with various data sources including:
18
+ 1 . On-premises SQL and NoSQL databases
19
+ 2 . Cloud-hosted SQL and NoSQL databases
23
20
24
21
## Value
25
- This project serves as a valuable example of building a modern data lake on AWS using Airflow, showcasing best
22
+ This project serves as a valuable example of building a modern data pipelines using Airflow, showcasing best
26
23
practices for data ingestion, processing, and transformation. It provides a solid foundation for building a
27
24
robust data platform to support a wide range of analytical needs.
28
25
0 commit comments