Skip to content

Repository Structure #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 20, 2025
Merged

Repository Structure #1

merged 16 commits into from
Mar 20, 2025

Conversation

jibbs1703
Copy link
Owner

This PR:

  • merges the changes in project structure
  • creates the client to connect to the on-prem NoSQL database
  • adds tests for the connection
  • adds pre-commit and github actions configuration files.

Included an overview of project in README.md file. Got started on scripts for accessing AWS S3 and AWS Glue through the python SDK.
Updated project description in README.md file. Got started AWS Redshift script and continued on scripts for accessing AWS S3 and AWS Glue and through the python SDK.
Added more methods to AWS S3, Glue and Redshift scripts. Added PySpark script for extracting data from S3 source to Bronze tier of data lake.
Made changes to extraction scripts, reorganized project directory to suit three data lake tiers.
Made changes to extraction scripts for each of the tables from the source database
Refactored the extraction scripts package by creating a function that dynamically extracts all tables from the database by iterating through the table names. I also removed the extraction scripts package after testing the extraction function was functional.
Created separate modules for running the ETL pipeline using either a python shell or glue job, depending on the size of the data to be processed from the source S3 bucket.
Separated dag script into into glue dag and python shell dag, reflecting future project direction.
Updated the python shell dag script to show how the data was extracted to the source bucket and how it would be moved to the bronze tier from the source bucket. Also updated README.md file with project outline going forward.
@jibbs1703 jibbs1703 self-assigned this Mar 20, 2025
@jibbs1703 jibbs1703 merged commit b6e4096 into main Mar 20, 2025
2 checks passed
@jibbs1703 jibbs1703 deleted the feature/structure-repository branch March 20, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant