-
Notifications
You must be signed in to change notification settings - Fork 5
🌆 [ENRICH] BAN: villes anciennes -> nouvelles #1534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
0f405be
RGPD: anonymiser noms acteurs: init commit
maxcorbeau 6125181
ajout template django data
maxcorbeau f7f3136
migration Django data
maxcorbeau 95a6f3d
refacto dags/rgpd -> dags/enrich, utilisation dbt
maxcorbeau 0edfa4a
ajout nouvelle config & test rgpd
maxcorbeau a806be6
modèle RGPD, tests & migration
maxcorbeau 4633021
v1 qui fonctionne
maxcorbeau 1d8a11d
début refacto pour factoriser RGPD + fermetures
maxcorbeau b746d10
début refacto et progrès vers décision métier
maxcorbeau 4c7c03b
suggestions: début de création
maxcorbeau d48b1e3
suggestions: presque fonctionnel
maxcorbeau 4a45d91
suggestions: tests qui fonctionnent
maxcorbeau b75ccaa
refacto du DAG avec dbt & suggestions
maxcorbeau 4dcc24b
dbt: nettoyage & sampling
maxcorbeau 2713d89
DAG & Admin UI fonctionnels
maxcorbeau 3cb369e
create_as_child, numéro rue, UI
maxcorbeau 7f7dde2
refacto modèles & tests
maxcorbeau 2a84f56
ajout tolérance échec
maxcorbeau de99c1a
udfs: norma exclusion petits mots, cleanup
maxcorbeau 1ddd4ee
suppression code RGPD
maxcorbeau 9d221c8
cont. suppression RGPD & move data_reconstruct
maxcorbeau 55a72d9
recréer migration django + fix imports cassés via rebase
maxcorbeau 89f468d
fix imports + data_serialize en doublon
maxcorbeau 91295e2
cont. del rgpd, fix acteurs model & tests
maxcorbeau 354a541
suppression des prints
maxcorbeau 16aede7
drop changes in restore script
maxcorbeau 836d577
cont. fix script import
maxcorbeau 6a15cc6
renommage replace -> suggest
maxcorbeau fc01b2b
cohorte: simplification, label uniquement
maxcorbeau b67bb03
fix migration after rebase
maxcorbeau c92cb7f
gestion contexte + del marts villes
maxcorbeau 5d7482b
gestion contexte
maxcorbeau 30d0d2c
intégration dbt & django admin
maxcorbeau 0318022
BAN: modèles dbt début
maxcorbeau 0b79a29
fix filtre sur base
maxcorbeau ba803a7
dbt marts & dag airflow
maxcorbeau ed1608d
dag ariflow
maxcorbeau f8736bd
rebase + ajout migration & tests
maxcorbeau 2409b1a
fix after rebase
maxcorbeau 2a2b129
fix after rebase 2
maxcorbeau c9cd15c
fix après rebase 3
maxcorbeau 33315cb
fix après rebase 4
maxcorbeau 8df023d
del fichier bidon passé inapercu
maxcorbeau f6258bc
fix after rebase
maxcorbeau 1c10dfc
fix after rebase (rgpd)
maxcorbeau bf5e989
migration django
maxcorbeau a0c9c09
petite amélioration commande dbt
maxcorbeau File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
"""DAG to suggestion city corrections based on BAN data""" | ||
|
||
from airflow import DAG | ||
from enrich.config import ( | ||
COHORTS, | ||
DBT, | ||
TASKS, | ||
EnrichActeursVillesConfig, | ||
) | ||
from enrich.tasks.airflow_logic.enrich_config_create_task import ( | ||
enrich_config_create_task, | ||
) | ||
from enrich.tasks.airflow_logic.enrich_dbt_model_suggest_task import ( | ||
enrich_dbt_model_suggest_task, | ||
) | ||
from enrich.tasks.airflow_logic.enrich_dbt_models_refresh_task import ( | ||
enrich_dbt_models_refresh_task, | ||
) | ||
from shared.config import CATCHUPS, SCHEDULES, START_DATES, config_to_airflow_params | ||
|
||
with DAG( | ||
dag_id="enrich_acteurs_villes", | ||
dag_display_name="🌆 Enrichir - Acteurs Villes", | ||
default_args={ | ||
"owner": "airflow", | ||
"depends_on_past": False, | ||
"email_on_failure": False, | ||
"email_on_retry": False, | ||
"retries": 0, | ||
}, | ||
description=("Un DAG pour suggérer des corrections de villes"), | ||
tags=["annuaire", "entreprises", "ae", "acteurs", "juridique"], | ||
schedule=SCHEDULES.NONE, | ||
catchup=CATCHUPS.AWLAYS_FALSE, | ||
start_date=START_DATES.YESTERDAY, | ||
params=config_to_airflow_params( | ||
EnrichActeursVillesConfig( | ||
dbt_models_refresh=True, | ||
dbt_models_refresh_command=( | ||
"dbt build --select tag:marts,tag:enrich,tag:villes" | ||
), | ||
filter_equals__acteur_statut="ACTIF", | ||
) | ||
), | ||
) as dag: | ||
# Instantiation | ||
config = enrich_config_create_task(dag) | ||
dbt_refresh = enrich_dbt_models_refresh_task(dag) | ||
suggest_typo = enrich_dbt_model_suggest_task( | ||
dag, | ||
task_id=TASKS.ENRICH_VILLES_TYPO, | ||
cohort=COHORTS.VILLES_TYPO, | ||
dbt_model_name=DBT.MARTS_ENRICH_VILLES_TYPO, | ||
) | ||
suggest_new = enrich_dbt_model_suggest_task( | ||
dag, | ||
task_id=TASKS.ENRICH_VILLES_NEW, | ||
cohort=COHORTS.VILLES_NEW, | ||
dbt_model_name=DBT.MARTS_ENRICH_VILLES_NEW, | ||
) | ||
config >> dbt_refresh # type: ignore | ||
dbt_refresh >> suggest_typo # type: ignore | ||
dbt_refresh >> suggest_new # type: ignore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
dags_unit_tests/enrich/config/test_enrich_acteurs_closed_config.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
100 changes: 100 additions & 0 deletions
100
dags_unit_tests/enrich/tasks/test_enrich_suggestions_cities.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
import pandas as pd | ||
import pytest | ||
from enrich.config import COHORTS, COLS | ||
from enrich.tasks.business_logic.enrich_dbt_model_to_suggestions import ( | ||
enrich_dbt_model_to_suggestions, | ||
) | ||
|
||
|
||
@pytest.mark.django_db | ||
class TestEnrichSuggestionsCities: | ||
|
||
@pytest.fixture | ||
def df_new(self): | ||
return pd.DataFrame( | ||
{ | ||
COLS.SUGGEST_COHORT: [COHORTS.VILLES_NEW] * 2, | ||
COLS.SUGGEST_VILLE: ["new town 1", "new town 2"], | ||
COLS.ACTEUR_ID: ["new1", "new2"], | ||
COLS.ACTEUR_VILLE: ["old town 1", "old town 2"], | ||
} | ||
) | ||
|
||
@pytest.fixture | ||
def df_typo(self): | ||
return pd.DataFrame( | ||
{ | ||
COLS.SUGGEST_COHORT: [COHORTS.VILLES_TYPO] * 2, | ||
COLS.SUGGEST_VILLE: ["Paris", "Laval"], | ||
COLS.ACTEUR_ID: ["typo1", "typo2"], | ||
COLS.ACTEUR_VILLE: ["Pâris", "Lâval"], | ||
} | ||
) | ||
|
||
@pytest.fixture | ||
def acteurs(self, df_new, df_typo): | ||
# Creating acteurs as presence required to apply changes | ||
from unit_tests.qfdmo.acteur_factory import ActeurFactory | ||
|
||
for _, row in pd.concat([df_new, df_typo]).iterrows(): | ||
ActeurFactory( | ||
identifiant_unique=row[COLS.ACTEUR_ID], | ||
ville=row[COLS.ACTEUR_VILLE], | ||
) | ||
|
||
def test_cohort_new(self, acteurs, df_new): | ||
from data.models.suggestion import Suggestion, SuggestionCohorte | ||
from qfdmo.models import RevisionActeur | ||
|
||
# Write suggestions to DB | ||
enrich_dbt_model_to_suggestions( | ||
df=df_new, | ||
cohort=COHORTS.VILLES_NEW, | ||
identifiant_action="test_new", | ||
dry_run=False, | ||
) | ||
|
||
# Check suggestions have been written to DB | ||
cohort = SuggestionCohorte.objects.get(identifiant_action="test_new") | ||
suggestions = Suggestion.objects.filter(suggestion_cohorte=cohort) | ||
assert len(suggestions) == 2 | ||
|
||
# Apply suggestions | ||
for suggestion in suggestions: | ||
suggestion.apply() | ||
|
||
# Verify changes | ||
# 2 revisions should be created but not parent | ||
new1 = RevisionActeur.objects.get(pk="new1") | ||
assert new1.ville == "new town 1" | ||
|
||
new2 = RevisionActeur.objects.get(pk="new2") | ||
assert new2.ville == "new town 2" | ||
|
||
def test_cohort_typo(self, acteurs, df_typo): | ||
from data.models.suggestion import Suggestion, SuggestionCohorte | ||
from qfdmo.models import RevisionActeur | ||
|
||
# Write suggestions to DB | ||
enrich_dbt_model_to_suggestions( | ||
df=df_typo, | ||
cohort=COHORTS.VILLES_TYPO, | ||
identifiant_action="test_typo", | ||
dry_run=False, | ||
) | ||
|
||
# Check suggestions have been written to DB | ||
cohort = SuggestionCohorte.objects.get(identifiant_action="test_typo") | ||
suggestions = Suggestion.objects.filter(suggestion_cohorte=cohort) | ||
assert len(suggestions) == 2 | ||
|
||
# Apply suggestions | ||
for suggestion in suggestions: | ||
suggestion.apply() | ||
|
||
# Verify changes | ||
typo1 = RevisionActeur.objects.get(pk="typo1") | ||
assert typo1.ville == "Paris" | ||
|
||
typo2 = RevisionActeur.objects.get(pk="typo2") | ||
assert typo2.ville == "Laval" |
35 changes: 35 additions & 0 deletions
35
data/migrations/0013_alter_suggestioncohorte_type_action.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Generated by Django 5.1.6 on 2025-04-28 05:22 | ||
|
||
from django.db import migrations, models | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
("data", "0012_alter_suggestioncohorte_type_action"), | ||
] | ||
|
||
operations = [ | ||
migrations.AlterField( | ||
model_name="suggestioncohorte", | ||
name="type_action", | ||
field=models.CharField( | ||
blank=True, | ||
choices=[ | ||
("CRAWL_URLS", "🔗 URLs scannées"), | ||
("ENRICH_ACTEURS_CLOSED", "🚪 Acteurs fermés"), | ||
("ENRICH_ACTEURS_RGPD", "🕵 Anonymisation RGPD"), | ||
("ENRICH_ACTEURS_VILLES_TYPO", "🏙️ Acteurs villes typographiques"), | ||
("ENRICH_ACTEURS_VILLES_NEW", "🏙️ Acteurs villes nouvelles"), | ||
("CLUSTERING", "regroupement/déduplication des acteurs"), | ||
("SOURCE_AJOUT", "ingestion de source de données - nouveau acteur"), | ||
( | ||
"SOURCE_MODIFICATION", | ||
"ingestion de source de données - modification d'acteur existant", | ||
), | ||
("SOURCE_SUPRESSION", "ingestion de source de données"), | ||
], | ||
max_length=50, | ||
), | ||
), | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.