Skip to content

Commit 928dfd0

Browse files
authored
feat: taxonomy patch instead of re-generating (#554)
The goal is to be able to patch taxonomy text files instead of re-generating them completely. This will avoid having a lot of changes that are not related to the real modifications made by a contributor. For this we need to * add a "modified" property to entries (to track modified entries) * a parent id change marks an entry as changed (as we need to regenerate it in the file to have parent id right) * track lines location of entries (to know where to change the original file) * keep removed entries (because we need to remove them from the original file) by prefixing their type label by REMOVED (REMOVED_ENTRY for example) * added integration tests * new entries are put next to their parents or at the end of the file Relates to: #541 and #366
1 parent 11505af commit 928dfd0

File tree

24 files changed

+1447
-154
lines changed

24 files changed

+1447
-154
lines changed

.github/workflows/github-projects-for-openfoodfacts-design.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,13 +59,13 @@ jobs:
5959
project-url: https://github.com/orgs/openfoodfacts/projects/5 # Add issue to the folksonomy project
6060
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}
6161
labeled: 🏷️ Folksonomy Project
62-
label-operator: OR
62+
label-operator: OR
6363
- uses: actions/add-to-project@main
6464
with:
6565
project-url: https://github.com/orgs/openfoodfacts/projects/44 # Add issue to the data quality project
6666
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}
6767
labeled: 🧽 Data quality
68-
label-operator: OR
68+
label-operator: OR
6969
- uses: actions/add-to-project@main
7070
with:
7171
project-url: https://github.com/orgs/openfoodfacts/projects/82 # Add issue to the search project
@@ -77,19 +77,19 @@ jobs:
7777
project-url: https://github.com/orgs/openfoodfacts/projects/41 # Add issue to the producer platform project
7878
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}
7979
labeled: 🏭 Producers Platform
80-
label-operator: OR
80+
label-operator: OR
8181
- uses: actions/add-to-project@main
8282
with:
8383
project-url: https://github.com/orgs/openfoodfacts/projects/19 # Add issue to the infrastructure project
8484
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}
8585
labeled: infrastructure
86-
label-operator: OR
86+
label-operator: OR
8787
- uses: actions/add-to-project@main
8888
with:
8989
project-url: https://github.com/orgs/openfoodfacts/projects/92 # Add issue to the Nutri-Score project
9090
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}
9191
labeled: 🚦 Nutri-Score
92-
label-operator: OR
92+
label-operator: OR
9393
- uses: actions/add-to-project@main
9494
with:
9595
project-url: https://github.com/orgs/openfoodfacts/projects/132 # Add issue to the Top upvoted issues board
@@ -107,4 +107,4 @@ jobs:
107107
project-url: https://github.com/orgs/openfoodfacts/projects/35 # Add issue to the ♿️ accessibility project
108108
github-token: ${{ secrets.ADD_TO_PROJECT_PAT }}
109109
labeled: ♿️ accessibility
110-
label-operator: OR
110+
label-operator: OR

Makefile

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,14 @@ local_config_quality: ## Run on lint configuration files
9999

100100
build: ## Build docker images
101101
@echo "🍜 Building docker images"
102-
${DOCKER_COMPOSE} build
102+
${DOCKER_COMPOSE} build ${args}
103103
@echo "🍜 Project setup done"
104104

105+
backend_poetry_update: ## Update poetry.lock file
106+
@echo "🍜 Updating poetry.lock file"
107+
${DOCKER_COMPOSE} run --user root --rm taxonomy_api bash -c "pip install poetry==1.4.2 && poetry lock --no-update"
108+
109+
105110
up: ## Run the project
106111
@echo "🍜 Running project (ctrl+C to stop)"
107112
@echo "🍜 The React app will be available on http://ui.taxonomy.localhost:8091"
@@ -177,11 +182,11 @@ config_quality: ## Run quality checks on configuration files
177182

178183
tests: backend_tests ## Run all tests
179184

180-
backend_tests: ## Run python tests
185+
backend_tests: ## Run python tests, you might provide additional arguments with args="…"
181186
@echo "🍜 Running python tests"
182187
${DOCKER_COMPOSE_TEST} up -d neo4j
183-
${DOCKER_COMPOSE_TEST} run --rm taxonomy_api pytest /parser
184-
${DOCKER_COMPOSE_TEST} run --rm taxonomy_api pytest /code/tests
188+
${DOCKER_COMPOSE_TEST} run --rm taxonomy_api pytest /parser ${args}
189+
${DOCKER_COMPOSE_TEST} run --rm taxonomy_api pytest /code/tests ${args}
185190
${DOCKER_COMPOSE_TEST} stop neo4j
186191

187192
checks: quality tests ## Run all checks (quality + tests)

backend/editor/api.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ async def find_all_nodes(response: Response, branch: str, taxonomy_name: str):
184184
Get all nodes within taxonomy
185185
"""
186186
taxonomy = TaxonomyGraph(branch, taxonomy_name)
187-
all_nodes = await taxonomy.get_all_nodes("")
187+
all_nodes = await taxonomy.get_all_nodes()
188188
return all_nodes
189189

190190

@@ -235,7 +235,7 @@ async def find_one_synonym(response: Response, branch: str, taxonomy_name: str,
235235
Get synonym corresponding to id within taxonomy
236236
"""
237237
taxonomy = TaxonomyGraph(branch, taxonomy_name)
238-
one_synonym = await taxonomy.get_nodes("SYNONYMS", synonym)
238+
one_synonym = await taxonomy.get_nodes(NodeType.SYNONYMS, synonym)
239239

240240
check_single(one_synonym)
241241

@@ -248,7 +248,7 @@ async def find_all_synonyms(response: Response, branch: str, taxonomy_name: str)
248248
Get all synonyms within taxonomy
249249
"""
250250
taxonomy = TaxonomyGraph(branch, taxonomy_name)
251-
all_synonyms = await taxonomy.get_all_nodes("SYNONYMS")
251+
all_synonyms = await taxonomy.get_all_nodes(NodeType.SYNONYMS)
252252
return all_synonyms
253253

254254

@@ -258,7 +258,7 @@ async def find_one_stopword(response: Response, branch: str, taxonomy_name: str,
258258
Get stopword corresponding to id within taxonomy
259259
"""
260260
taxonomy = TaxonomyGraph(branch, taxonomy_name)
261-
one_stopword = await taxonomy.get_nodes("STOPWORDS", stopword)
261+
one_stopword = await taxonomy.get_nodes(NodeType.STOPWORDS, stopword)
262262

263263
check_single(one_stopword)
264264

@@ -271,7 +271,7 @@ async def find_all_stopwords(response: Response, branch: str, taxonomy_name: str
271271
Get all stopwords within taxonomy
272272
"""
273273
taxonomy = TaxonomyGraph(branch, taxonomy_name)
274-
all_stopwords = await taxonomy.get_all_nodes("STOPWORDS")
274+
all_stopwords = await taxonomy.get_all_nodes(NodeType.STOPWORDS)
275275
return all_stopwords
276276

277277

@@ -281,7 +281,7 @@ async def find_header(response: Response, branch: str, taxonomy_name: str):
281281
Get __header__ within taxonomy
282282
"""
283283
taxonomy = TaxonomyGraph(branch, taxonomy_name)
284-
header = await taxonomy.get_nodes("TEXT", "__header__")
284+
header = await taxonomy.get_nodes(NodeType.TEXT, "__header__")
285285
return header[0]
286286

287287

@@ -291,7 +291,7 @@ async def find_footer(response: Response, branch: str, taxonomy_name: str):
291291
Get __footer__ within taxonomy
292292
"""
293293
taxonomy = TaxonomyGraph(branch, taxonomy_name)
294-
footer = await taxonomy.get_nodes("TEXT", "__footer__")
294+
footer = await taxonomy.get_nodes(NodeType.TEXT, "__footer__")
295295
return footer[0]
296296

297297

@@ -395,7 +395,7 @@ async def upload_taxonomy(
395395
"""
396396
taxonomy = TaxonomyGraph(branch, taxonomy_name)
397397
if not taxonomy.is_valid_branch_name():
398-
raise HTTPException(status_code=422, detail="branch_name: Enter a valid branch name!")
398+
raise HTTPException(status_code=422, detail="branch_name: Enter a valid branch name!")
399399
if await taxonomy.does_project_exist():
400400
raise HTTPException(status_code=409, detail="Project already exists!")
401401
if not await taxonomy.is_branch_unique(from_github=False):
@@ -431,7 +431,7 @@ async def edit_entry(request: Request, branch: str, taxonomy_name: str, entry: s
431431
incoming_data = await request.json()
432432
incoming_data["id"] = entry
433433
new_entry = EntryNode(**incoming_data)
434-
updated_entry = await taxonomy.update_node("ENTRY", new_entry)
434+
updated_entry = await taxonomy.update_node(NodeType.ENTRY, new_entry)
435435
return updated_entry
436436

437437

@@ -467,5 +467,5 @@ async def delete_project(branch: str, taxonomy_name: str):
467467
"""
468468
Delete a project
469469
"""
470-
taxonomy = TaxonomyGraph(branch, taxonomy_name)
471-
await project_controller.delete_project(taxonomy.project_name)
470+
project_id = project_controller.get_project_id(branch, taxonomy_name)
471+
await project_controller.delete_project(project_id)

backend/editor/controllers/node_controller.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import datetime
12
import logging
23

34
from openfoodfacts_taxonomy_parser import utils as parser_utils
@@ -12,7 +13,7 @@
1213
async def delete_project_nodes(project_id: str):
1314
"""
1415
Remove all nodes for project.
15-
This includes entries, stopwords, synonyms and errors
16+
This includes entries, stopwords, synonyms, errors and removed entries
1617
"""
1718

1819
query = f"""
@@ -38,6 +39,8 @@ async def create_entry_node(
3839
"id": language_code + ":" + normalized_name,
3940
f"tags_{language_code}": [name],
4041
f"tags_ids_{language_code}": [normalized_name],
42+
"modified": datetime.datetime.now().timestamp(),
43+
"is_external": False,
4144
}
4245
params = {"entry_node": entry_node_data}
4346

backend/editor/controllers/project_controller.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44
from .utils.result_utils import get_unique_record
55

66

7+
def get_project_id(branch_name: str, taxonomy_name: str) -> str:
8+
return "p_" + taxonomy_name + "_" + branch_name
9+
10+
711
async def get_project(project_id: str) -> Project:
812
"""
913
Get project by id
@@ -78,3 +82,42 @@ async def delete_project(project_id: str):
7882
params = {"project_id": project_id}
7983
await get_current_transaction().run(query, params)
8084
await delete_project_nodes(project_id)
85+
86+
87+
async def clone_project(source_branch: str, taxonomy_name: str, target_branch: str):
88+
"""
89+
Clone a project using a new branch name
90+
91+
Currently used for tests only.
92+
"""
93+
source_id = get_project_id(source_branch, taxonomy_name)
94+
target_id = get_project_id(target_branch, taxonomy_name)
95+
# clone project node
96+
query = """
97+
MATCH (p:PROJECT {id: $project_id})
98+
WITH p
99+
CALL apoc.refactor.cloneNodes([p], true, ['id', 'branch'] )
100+
YIELD output as new_node
101+
WITH new_node
102+
SET new_node.created_at = datetime(),
103+
new_node.branch_name = $target_branch,
104+
new_node.id = $target_id
105+
RETURN new_node
106+
"""
107+
params = {
108+
"project_id": source_id,
109+
"target_branch": target_branch,
110+
"target_id": get_project_id(target_branch, taxonomy_name),
111+
}
112+
await get_current_transaction().run(query, params)
113+
# clone nodes thanks to apoc.refactor.cloneSubgraph
114+
query = f"""
115+
MATCH (n:{source_id})
116+
WITH collect(n) AS source_nodes
117+
CALL apoc.refactor.cloneSubgraph(source_nodes)
118+
YIELD output as new_node
119+
WITH new_node
120+
REMOVE new_node:{source_id}
121+
SET new_node:{target_id}
122+
"""
123+
await get_current_transaction().run(query)

0 commit comments

Comments
 (0)