Skip to content

Commit 212a371

Browse files
committed
restructured repository folder structure to reflect new project direction
1 parent 96300c5 commit 212a371

29 files changed

+117
-969
lines changed
File renamed without changes.

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ __pycache__/
66
# C extensions
77
*.so
88

9+
# Ruff
10+
.ruff_cache/
11+
912
# Distribution / packaging
1013
.Python
1114
build/
@@ -123,7 +126,7 @@ celerybeat.pid
123126

124127
# Environments
125128
config.ini
126-
aws_resources/.env
129+
.env
127130
.venv
128131
env/
129132
venv/

.pre-commit-config.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
repos:
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
rev: v0.9.7
4+
hooks:
5+
- id: ruff
6+
args: [ --fix ]
7+
- id: ruff-format
8+
9+
- repo: https://github.com/sqlfluff/sqlfluff
10+
rev: 2.3.3
11+
hooks:
12+
- id: sqlfluff-lint
13+
args: ["--dialect", "mysql"]
14+
- id: sqlfluff-fix
15+
args: ["--dialect", "mysql"]

.sqlfluff

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
[sqlfluff]
2+
dialect = mysql
3+
4+
[sqlfluff:rules]
5+
max_line_length = 100
6+
capitalisation.policy = consistent
7+
8+
[sqlfluff:rules:L010]
9+
capitalisation_policy = upper
10+
11+
[sqlfluff:rules:L011]
12+
capitalisation_policy = lower
13+
14+
[sqlfluff:rules:L014]
15+
capitalisation_policy = lower
16+
17+
[sqlfluff:rules:L016]
18+
forbid_multiline = True
19+
20+
[sqlfluff:rules:L018]
21+
require_aliases = True
22+
23+
[sqlfluff:rules:L019]
24+
comma_style = leading
25+
26+
[sqlfluff:rules:L022]
27+
aliasing = explicit
28+
29+
[sqlfluff:rules:L025]
30+
force_enable = True
31+
32+
[sqlfluff:rules:L030]
33+
require_final_semicolon = True
34+
35+
# Formatting settings
36+
[sqlfluff:format]
37+
indent_width = 4
38+
tab_space_size = 4
39+
reindent_aligned = True
40+
strip_whitespace_lines = True

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This pipeline can be modified to source data from various external inputs includ
1212
application logs, databases, and mobile applications. The steps in the pipeline can be performed using either
1313
the Python shell or Pyspark jobs.
1414

15-
In this project, raw, untransformed data resides in external databases and is initially extracted as .csv files
15+
In this project, raw, untransformed data resides in on-premises NOSQL databases and is initially extracted as .csv files
1616
into a bronze tier S3 bucket. The pipeline works on the raw data, processing it, and subsequently storing it in
1717
the appropriate data lake tier as determined by business requirements. The tiers are represented as folders within
1818
a single S3 bucket for this project. However, each tier should be given a dedicated bucket (as it is in production

0 commit comments

Comments
 (0)