Skip to content

Commit c31a034

Browse files
committed
Preuve de concept de l'utilisation de DBT
1 parent deeefb7 commit c31a034

27 files changed

+739
-101
lines changed

.secrets.baseline

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@
160160
"filename": "docker-compose.yml",
161161
"hashed_secret": "3cf2012487b086bba2adb3386d69c2ab67a268b6",
162162
"is_verified": false,
163-
"line_number": 54
163+
"line_number": 55
164164
}
165165
],
166166
"iframe_without_js.html": [
@@ -207,5 +207,5 @@
207207
}
208208
]
209209
},
210-
"generated_at": "2025-01-29T18:47:34Z"
210+
"generated_at": "2025-02-14T10:44:30Z"
211211
}

airflow-requirements.in

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
--constraint=requirements.txt
22

3-
apache-airflow==2.10.4
43
apache-airflow-providers-postgres
4+
apache-airflow==2.10.4
5+
dbt-core==1.9.2
6+
dbt-postgres==1.9
57
fuzzywuzzy
68
# related to https://github.com/pandas-dev/pandas/issues/57049 because sqlalchemy & numpy should be < 2.0
79
pandas==2.1.4
810
pyproj
911
python-decouple
1012
ratelimit
13+
scikit-learn==1.3.2
1114
shortuuid
12-
unidecode
13-
scikit-learn==1.3.2
15+
unidecode

airflow-requirements.txt

Lines changed: 338 additions & 91 deletions
Large diffs are not rendered by default.

airflow-scheduler.Dockerfile

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,24 @@ COPY ./core/ /opt/airflow/core/
3535
COPY ./qfdmo/ /opt/airflow/qfdmo/
3636
COPY ./qfdmd/ /opt/airflow/qfdmd/
3737
COPY ./data/ /opt/airflow/data/
38+
COPY ./dbt/ /opt/airflow/dbt/
3839
COPY ./dsfr_hacks/ /opt/airflow/dsfr_hacks/
3940

4041
# Classique Airflow
4142
COPY ./dags/ /opt/airflow/dags/
4243
COPY ./config/ /opt/airflow/config/
4344
COPY ./plugins/ /opt/airflow/plugins/
44-
RUN mkdir -p /opt/airflow/logs/
45+
46+
WORKDIR /opt/airflow/dbt
47+
USER 0
48+
RUN chown -R ${AIRFLOW_UID:-50000}:0 /opt/airflow/dbt
49+
USER ${AIRFLOW_UID:-50000}:0
50+
51+
# RUN mkdir -p /opt/airflow/.dbt/logs
52+
# ENV DBT_LOG_PATH=/opt/airflow/.dbt/logs/dbt.log
53+
ENV DBT_PROFILES_DIR=/opt/airflow/dbt
54+
ENV DBT_PROJECT_DIR=/opt/airflow/dbt
55+
56+
RUN dbt deps
4557

4658
CMD ["scheduler"]

dags/.env.template

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,11 @@ AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK='true'
3030
_PIP_ADDITIONAL_REQUIREMENTS=${_PIP_ADDITIONAL_REQUIREMENTS:-}
3131
AIRFLOW_CONN_QFDMO-DJANGO-DB='postgres://qfdmo:qfdmo@lvao-db:5432/qfdmo' # pragma: allowlist secret
3232
DATABASE_URL=postgis://qfdmo:qfdmo@lvao-db:5432/qfdmo # pragma: allowlist secret
33+
34+
# DBT env vars
35+
POSTGRES_HOST=lvao-db
36+
POSTGRES_PORT=5432
37+
POSTGRES_USER=qfdmo
38+
POSTGRES_PASSWORD=qfdmo
39+
POSTGRES_DB=qfdmo
40+
POSTGRES_SCHEMA=public
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
from datetime import datetime, timedelta
2+
3+
from airflow import DAG
4+
from airflow.operators.bash import BashOperator
5+
6+
default_args = {
7+
"owner": "airflow",
8+
"depends_on_past": False,
9+
"start_date": datetime(2025, 2, 14),
10+
"email_on_failure": False,
11+
"email_on_retry": False,
12+
"retries": 1,
13+
"retry_delay": timedelta(minutes=5),
14+
}
15+
16+
with DAG("dbt_dag", default_args=default_args) as dag:
17+
18+
run_dbt_model = BashOperator(
19+
task_id="run_dbt_model",
20+
bash_command=(
21+
"cd /opt/airflow/dbt/ && dbt run --select qfdmo.exhaustive_acteur_views"
22+
),
23+
dag=dag,
24+
)
25+
test_dbt_model = BashOperator(
26+
task_id="test_dbt_model",
27+
bash_command=(
28+
"cd /opt/airflow/dbt/ && dbt test --select qfdmo.exhaustive_acteur_views"
29+
),
30+
dag=dag,
31+
)
32+
run_dbt_model >> test_dbt_model

dbt/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
2+
target/
3+
dbt_packages/
4+
logs/
5+
.user.yml

dbt/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Preuve de concept d'utilisation de dbt pour la gestion des données
2+
3+
## Installation
4+
5+
A la racine du projet
6+
7+
```sh
8+
pip install -r airflow-requirements.txt -r requirements.txt -r dev-requirements.txt
9+
```
10+
11+
Puis dans le dossier dbt
12+
13+
```sh
14+
cd dbt
15+
dbt deps
16+
```
17+
18+
## Utilisation
19+
20+
Lancer dbt dans le dossier dbt.
21+
L'option select permet de lancer un seul ensemble de models, cf [project.yml](./dbt_project.yml).
22+
23+
```sh
24+
dbt run --select qfdmo.exhaustive_acteur_views
25+
```
26+
27+
Lancer les tests
28+
29+
```sh
30+
dbt run --select qfdmo.exhaustive_acteur_views
31+
```
32+
33+
### Resources:
34+
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
35+
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
36+
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
37+
- Find [dbt events](https://events.getdbt.com) near you
38+
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices

dbt/analyses/.gitkeep

Whitespace-only changes.

dbt/dbt_project.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
2+
# Name your project! Project names should contain only lowercase characters
3+
# and underscores. A good package name should reflect your organization's
4+
# name or the intended use of these models
5+
name: 'qfdmo'
6+
version: '1.0.0'
7+
8+
# This setting configures which "profile" dbt uses for this project.
9+
profile: 'dbt_test'
10+
11+
# These configurations specify where dbt should look for different types of files.
12+
# The `model-paths` config, for example, states that models in this project can be
13+
# found in the "models/" directory. You probably won't need to change these!
14+
model-paths: ["models"]
15+
analysis-paths: ["analyses"]
16+
test-paths: ["tests"]
17+
seed-paths: ["seeds"]
18+
macro-paths: ["macros"]
19+
snapshot-paths: ["snapshots"]
20+
21+
clean-targets:
22+
- "target"
23+
- "dbt_packages"
24+
25+
models:
26+
qfdmo:
27+
exhaustive_acteur_views:
28+
materialized: view
29+
30+
opendata_acteur_views:
31+
materialized: view

0 commit comments

Comments
 (0)