Skip to content

Add natural translation for DSL #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 113 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
d1a544b
Made first version for translation using !natural_language
BrentBlanckaert Dec 11, 2024
d0cf7de
forgot to push actual file
BrentBlanckaert Dec 11, 2024
0334e1f
fixed linting
BrentBlanckaert Dec 11, 2024
c1114bc
fixed pyright issue
BrentBlanckaert Dec 11, 2024
c61b563
add test for unit-test
BrentBlanckaert Dec 11, 2024
145deae
Fixed some bugs and wrote another test for io
BrentBlanckaert Dec 12, 2024
e14c758
setup main
BrentBlanckaert Dec 12, 2024
65fb097
Made a small fix
BrentBlanckaert Dec 12, 2024
e941ef6
Tested an extra edge case
BrentBlanckaert Dec 13, 2024
eef397b
Cleaned up code and added extra cases.
BrentBlanckaert Dec 13, 2024
30bcdcc
Started on usage with translation table.
BrentBlanckaert Dec 13, 2024
4230003
Added support for translation-table in global scope, tab-scope and co…
BrentBlanckaert Dec 14, 2024
1ddef15
Cleaned up code and fixed pyright issue
BrentBlanckaert Dec 15, 2024
5dabc80
fixed tests and added more
BrentBlanckaert Dec 15, 2024
69a77d3
fixed some small issues
BrentBlanckaert Dec 15, 2024
eb62f92
made some small fixes
BrentBlanckaert Dec 17, 2024
6e258cf
wrote an extra test
BrentBlanckaert Dec 17, 2024
1db1db2
fix spelling mistake
BrentBlanckaert Dec 17, 2024
4dedd8c
fixed linting issue
BrentBlanckaert Dec 17, 2024
b9786c2
increasing test coverage
BrentBlanckaert Dec 17, 2024
c63e391
removed some redundant code
BrentBlanckaert Dec 19, 2024
a35c49a
Adding a few comments
BrentBlanckaert Dec 19, 2024
10b0eb3
Cleaned up code some more and added extra cases for input and output …
BrentBlanckaert Dec 28, 2024
3539227
Updated statement/expression case and added programmingLanguageMap fo…
BrentBlanckaert Dec 28, 2024
e1180f9
started added new json schema
BrentBlanckaert Dec 28, 2024
466ec16
Made some changes to schema
BrentBlanckaert Dec 28, 2024
49079a0
fixed some bugs in the schema
BrentBlanckaert Dec 29, 2024
4174beb
fixed some bugs and fixed the tests
BrentBlanckaert Dec 29, 2024
1a79a82
fixed an edge case and made an extra test for it.
BrentBlanckaert Dec 29, 2024
81ce161
added the actual writing to a file.
BrentBlanckaert Dec 29, 2024
64a00cd
changed formatter to jinja
BrentBlanckaert Jan 4, 2025
88a2ede
small cleanup
BrentBlanckaert Jan 4, 2025
e6e548c
moved tests to new file
BrentBlanckaert Jan 31, 2025
95f8524
Small cleanup
BrentBlanckaert Jan 31, 2025
9ac66cb
fix isort
BrentBlanckaert Jan 31, 2025
b7012d3
got rid of usage of instanceof
BrentBlanckaert Feb 18, 2025
f630a05
fixed test
BrentBlanckaert Feb 19, 2025
a5d7f61
Made small variable name change
BrentBlanckaert Feb 19, 2025
1b9775d
rewrote the pre-processor
BrentBlanckaert Feb 22, 2025
fe62210
fixed tests
BrentBlanckaert Feb 22, 2025
dec7057
removed some prints
BrentBlanckaert Feb 22, 2025
94ea109
cleaned up the code some more
BrentBlanckaert Feb 23, 2025
c75d4a7
fixed linting issue and removed more redundant code.
BrentBlanckaert Feb 23, 2025
7dac7d6
Removed some checks that are no longer used
BrentBlanckaert Feb 23, 2025
86c6421
Wat comments toegevoegd
BrentBlanckaert Feb 23, 2025
e2d9d69
gebruik van instanceof zoveel mogelijk vermeden
BrentBlanckaert Feb 23, 2025
9a540a6
Fixed linting en typing
BrentBlanckaert Feb 23, 2025
c7d5787
removed an unused field.
BrentBlanckaert Feb 23, 2025
7681932
Fixed small bug
BrentBlanckaert Feb 24, 2025
820c36a
Changed a name
BrentBlanckaert Feb 24, 2025
eecc7b1
Merged all translations maps immediately
BrentBlanckaert Feb 27, 2025
14e16db
re-added validator
BrentBlanckaert Mar 4, 2025
885aa48
Fixed typing issues
BrentBlanckaert Mar 4, 2025
d06a55c
added test for error handling
BrentBlanckaert Mar 4, 2025
b7c8c34
added an extra test for syntax errors
BrentBlanckaert Mar 4, 2025
470e28c
fix linting
BrentBlanckaert Mar 4, 2025
94a766f
Added immediate link from preprocessor to tested.
BrentBlanckaert Mar 7, 2025
4629134
Fixed bug for a lot of tests
BrentBlanckaert Mar 7, 2025
f9d311d
Small cleanup
BrentBlanckaert Mar 7, 2025
f7c65cb
made a few more tests
BrentBlanckaert Mar 7, 2025
a539053
fixed linting issues
BrentBlanckaert Mar 7, 2025
c72190c
fixed small issue regarding lists of tabs
BrentBlanckaert Mar 12, 2025
0959919
removed field to serves no purpose anymore
BrentBlanckaert Mar 12, 2025
8aee618
fixed linting
BrentBlanckaert Mar 12, 2025
f668535
Found another edge case that wasn't covered
BrentBlanckaert Mar 12, 2025
b62bfc3
using pythonic code
BrentBlanckaert Mar 12, 2025
9f4a84f
Fixed a small bug with the nat_lang_indicators
BrentBlanckaert Mar 12, 2025
23d9b01
Added another check
BrentBlanckaert Mar 12, 2025
e78c518
Fxed linting
BrentBlanckaert Mar 12, 2025
3ed7491
forgot a tab
BrentBlanckaert Mar 12, 2025
2027194
fixed problem when no translations are used
BrentBlanckaert Mar 14, 2025
9055b90
kleine optimalisatie
BrentBlanckaert Mar 14, 2025
3f20d10
Made attempt for stdout/stderr
BrentBlanckaert Mar 18, 2025
7952b2d
made new version for translations
BrentBlanckaert Mar 20, 2025
3e9f27a
used a different way to convert to yamlObject
BrentBlanckaert Mar 20, 2025
fc6dbc5
cleaned up code
BrentBlanckaert Mar 21, 2025
cbcb263
test why it test don't work in github
BrentBlanckaert Mar 21, 2025
16f5527
adding more prints
BrentBlanckaert Mar 21, 2025
f4c55d7
adding more prints
BrentBlanckaert Mar 21, 2025
d9c115b
trying something else
BrentBlanckaert Mar 21, 2025
77fbf13
revert back
BrentBlanckaert Mar 21, 2025
35cf415
added test and cleaned up some of the code
BrentBlanckaert Mar 21, 2025
d2fb379
added extra test for conversion to yamlObject
BrentBlanckaert Mar 21, 2025
9ff38b5
made an extra test
BrentBlanckaert Mar 21, 2025
361aa5c
made one more test
BrentBlanckaert Mar 21, 2025
b81e152
fix linting
BrentBlanckaert Mar 21, 2025
fc7258e
Major cutback of of TESTed in translation
BrentBlanckaert Mar 25, 2025
18fbd02
forgot to push actual new file
BrentBlanckaert Mar 25, 2025
e86d48a
split up tests
BrentBlanckaert Mar 26, 2025
4289910
Fixed linting and typing issues
BrentBlanckaert Mar 26, 2025
33ba90f
Fixed ALL linting issues
BrentBlanckaert Mar 26, 2025
204f50a
applied all changes regarding jinja2
BrentBlanckaert Mar 27, 2025
9c83b4d
Added an extra test
BrentBlanckaert Mar 28, 2025
29b3a45
removed conversion to yamlObject
BrentBlanckaert Mar 31, 2025
4f0a0ca
remove unused import
BrentBlanckaert Mar 31, 2025
84fb41f
got rid of the !programming_language tag for this PR
BrentBlanckaert Mar 31, 2025
57fcc0e
Made an extra test
BrentBlanckaert Mar 31, 2025
c7cde3d
Fix linting
BrentBlanckaert Mar 31, 2025
79210fa
Give warning for each missing key
BrentBlanckaert Apr 3, 2025
d62862a
Merge branch 'master' of https://github.com/dodona-edu/universal-judge
BrentBlanckaert Apr 7, 2025
2c9bc06
added a bit of documentation and did some last changes to Json schema
BrentBlanckaert Apr 14, 2025
502f51d
removing redundant line
BrentBlanckaert Apr 14, 2025
52537f6
fixed linting
BrentBlanckaert Apr 14, 2025
511d6cf
cleaned up code some more and added asserts
BrentBlanckaert Apr 14, 2025
71c9ee5
merging with master
BrentBlanckaert Apr 20, 2025
c11df03
changing main
BrentBlanckaert Apr 23, 2025
f327e19
fixed a few issues
BrentBlanckaert Apr 25, 2025
5cd9e15
remove unused import
BrentBlanckaert Apr 25, 2025
1d40776
fixing typing issue
BrentBlanckaert Apr 25, 2025
3a97b15
made few changes
BrentBlanckaert Apr 28, 2025
bb3438e
removed default from arguments
BrentBlanckaert Apr 28, 2025
f09f925
changed some of the names
BrentBlanckaert May 2, 2025
46ea4cc
fixed linting
BrentBlanckaert May 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion tested/dsl/translate_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,22 @@ class ReturnOracle(dict):
pass


class NaturalLanguageMap(dict):
pass


OptionDict = dict[str, int | bool]
YamlObject = (
YamlDict | list | bool | float | int | str | None | ExpressionString | ReturnOracle
YamlDict
| list
| bool
| float
| int
| str
| None
| ExpressionString
| ReturnOracle
| NaturalLanguageMap
)


Expand Down Expand Up @@ -138,6 +151,14 @@ def _return_oracle(loader: yaml.Loader, node: yaml.Node) -> ReturnOracle:
return ReturnOracle(result)


def _natural_language_map(loader: yaml.Loader, node: yaml.Node) -> NaturalLanguageMap:
result = _parse_yaml_value(loader, node)
assert isinstance(
result, dict
), f"A natural language map must be an object, got {result} which is a {type(result)}."
return NaturalLanguageMap(result)


def _parse_yaml(yaml_stream: str) -> YamlObject:
"""
Parse a string or stream to YAML.
Expand All @@ -148,6 +169,7 @@ def _parse_yaml(yaml_stream: str) -> YamlObject:
yaml.add_constructor("!" + actual_type, _custom_type_constructors, loader)
yaml.add_constructor("!expression", _expression_string, loader)
yaml.add_constructor("!oracle", _return_oracle, loader)
yaml.add_constructor("!natural_language", _natural_language_map, loader)

try:
return yaml.load(yaml_stream, loader)
Expand Down
328 changes: 328 additions & 0 deletions tested/nat_translation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,328 @@
import re
import sys
from typing import cast, Any

import yaml

from tested.dsl.translate_parser import (
ExpressionString,
NaturalLanguageMap,
ReturnOracle,
YamlDict,
YamlObject,
_parse_yaml,
_validate_dsl,
_validate_testcase_combinations,
)


def parse_value(value: list | str | int | float | dict, flattened_stack: dict):
if isinstance(value, str):
return format_string(value, flattened_stack)
elif isinstance(value, dict):
return {k: parse_value(v, flattened_stack) for k, v in value.items()}
elif isinstance(value, list):
return [parse_value(v, flattened_stack) for v in value]

return value


def get_replacement(language: str, translation_stack: list, match: re.Match) -> str:
word = match.group(1)
current = -1
stack = translation_stack[current]
while abs(current) <= len(translation_stack) and word not in stack:
current -= 1
stack = translation_stack[current]
if abs(current) <= len(translation_stack):
translations = stack[word]
assert language in translations
word = translations[language]

return word


def flatten_stack(translation_stack: list, language: str) -> dict:
flattened = {}
for d in translation_stack:

flattened.update({k: v[language] for k, v in d.items() if language in v})
return flattened


def format_string(string: str, flattened) -> str:
return string.format(**flattened)


def translate_io(
io_object: YamlObject, key: str, language: str, flat_stack: dict
) -> str | dict:
if isinstance(io_object, NaturalLanguageMap):
assert language in io_object
io_object = io_object[language]
elif isinstance(io_object, dict):
data = io_object[key]
if isinstance(data, dict):
assert language in data
data = data[language]
assert isinstance(data, str)
io_object[key] = format_string(data, flat_stack)

# Perform translation based of translation stack.
print(io_object)
if isinstance(io_object, str):
return format_string(io_object, flat_stack)

return io_object


def translate_testcase(
testcase: YamlDict, language: str, translation_stack: list
) -> YamlDict:
_validate_testcase_combinations(testcase)
flat_stack = flatten_stack(translation_stack, language)

key_to_set = "statement" if "statement" in testcase else "expression"
if (expr_stmt := testcase.get(key_to_set)) is not None:
# Must use !natural_language
if isinstance(expr_stmt, NaturalLanguageMap):
assert language in expr_stmt
expr_stmt = expr_stmt[language]

# Perform translation based of translation stack.
if isinstance(expr_stmt, dict):
testcase[key_to_set] = {
k: format_string(cast(str, v), flat_stack) for k, v in expr_stmt.items()
}
elif isinstance(expr_stmt, str):
testcase[key_to_set] = format_string(expr_stmt, flat_stack)

else:
if (stdin_stmt := testcase.get("stdin")) is not None:
if isinstance(stdin_stmt, dict):
assert language in stdin_stmt
stdin_stmt = stdin_stmt[language]

# Perform translation based of translation stack.
assert isinstance(stdin_stmt, str)
testcase["stdin"] = format_string(stdin_stmt, flat_stack)

arguments = testcase.get("arguments", [])
if isinstance(arguments, dict):
assert language in arguments
arguments = arguments[language]

# Perform translation based of translation stack.
assert isinstance(arguments, list)
testcase["arguments"] = [
format_string(str(arg), flat_stack) for arg in arguments
]

if (stdout := testcase.get("stdout")) is not None:
# Must use !natural_language
testcase["stdout"] = translate_io(stdout, "data", language, flat_stack)

if (file := testcase.get("file")) is not None:
# Must use !natural_language
if isinstance(file, NaturalLanguageMap):
assert language in file
testcase["file"] = file[language]
# TODO: SHOULD I ADD SUPPORT FOR TRANSLATION STACK HERE?
if (stderr := testcase.get("stderr")) is not None:
testcase["stderr"] = translate_io(stderr, "data", language, flat_stack)

if (exception := testcase.get("exception")) is not None:
testcase["exception"] = translate_io(exception, "message", language, flat_stack)

if (result := testcase.get("return")) is not None:
if isinstance(result, ReturnOracle):
arguments = result.get("arguments", [])
if isinstance(arguments, dict):
assert language in arguments
arguments = arguments[language]

# Perform translation based of translation stack.
result["arguments"] = [
format_string(str(arg), flat_stack) for arg in arguments
]

value = result.get("value")
# Must use !natural_language
if isinstance(value, NaturalLanguageMap):
assert language in value
value = value[language]

assert isinstance(value, str)
result["value"] = parse_value(value, flat_stack)

elif isinstance(result, NaturalLanguageMap):
# Must use !natural_language
assert language in result
result = result[language]

if isinstance(result, str):
result = parse_value(result, flat_stack)

testcase["return"] = result

if (description := testcase.get("description")) is not None:
# Must use !natural_language
if isinstance(description, NaturalLanguageMap):
assert language in description
description = description[language]

if isinstance(description, dict):
dd = description["description"]
if isinstance(dd, dict):
assert language in dd
dd = dd[language]

if isinstance(dd, str):
description["description"] = format_string(dd, flat_stack)

testcase["description"] = description

return testcase


def translate_testcases(
testcases: list, language: str, translation_stack: list
) -> list:
result = []
for testcase in testcases:
assert isinstance(testcase, dict)
result.append(translate_testcase(testcase, language, translation_stack))

return result


def translate_contexts(contexts: list, language: str, translation_stack: list) -> list:
result = []
for context in contexts:
assert isinstance(context, dict)
if "translation" in context:
translation_stack.append(context["translation"])

key_to_set = "script" if "script" in context else "testcases"
raw_testcases = context.get(key_to_set)
assert isinstance(raw_testcases, list)
context[key_to_set] = translate_testcases(
raw_testcases, language, translation_stack
)
if "files" in context:
files = context.get("files")
if isinstance(files, NaturalLanguageMap):
assert language in files
context["files"] = files[language]
result.append(context)
if "translation" in context:
translation_stack.pop()
context.pop("translation")

return result


def translate_tab(tab: YamlDict, language: str, translation_stack: list) -> YamlDict:
key_to_set = "unit" if "unit" in tab else "tab"
name = tab.get(key_to_set)

if isinstance(name, dict):
assert language in name
name = name[language]

assert isinstance(name, str)
tab[key_to_set] = format_string(name, flatten_stack(translation_stack, language))

# The tab can have testcases or contexts.
if "contexts" in tab:
assert isinstance(tab["contexts"], list)
tab["contexts"] = translate_contexts(
tab["contexts"], language, translation_stack
)
elif "cases" in tab:
assert "unit" in tab
# We have testcases N.S. / contexts O.S.
assert isinstance(tab["cases"], list)
tab["cases"] = translate_contexts(tab["cases"], language, translation_stack)
elif "testcases" in tab:
# We have scripts N.S. / testcases O.S.
assert "tab" in tab
assert isinstance(tab["testcases"], list)
tab["testcases"] = translate_testcases(
tab["testcases"], language, translation_stack
)
else:
assert "scripts" in tab
assert isinstance(tab["scripts"], list)
tab["scripts"] = translate_testcases(
tab["scripts"], language, translation_stack
)
return tab


def translate_tabs(dsl_list: list, language: str, translation_stack=None) -> list:
if translation_stack is None:
translation_stack = []

result = []
for tab in dsl_list:
assert isinstance(tab, dict)

if "translation" in tab:
translation_stack.append(tab["translation"])

result.append(translate_tab(tab, language, translation_stack))
if "translation" in tab:
translation_stack.pop()
tab.pop("translation")

return result


def translate_dsl(dsl_object: YamlObject, language: str) -> YamlObject:
if isinstance(dsl_object, list):
return translate_tabs(dsl_object, language)
else:
assert isinstance(dsl_object, dict)
key_to_set = "units" if "units" in dsl_object else "tabs"
tab_list = dsl_object.get(key_to_set)
assert isinstance(tab_list, list)
translation_stack = []
if "translation" in dsl_object:
translation_stack.append(dsl_object["translation"])
dsl_object.pop("translation")
dsl_object[key_to_set] = translate_tabs(tab_list, language, translation_stack)
return dsl_object


def parse_yaml(yaml_path: str) -> YamlObject:
with open(yaml_path, "r") as stream:
result = _parse_yaml(stream.read())

return result


def convert_to_yaml(yaml_object: YamlObject) -> str:
def oracle_representer(dumper, data):
return dumper.represent_mapping("!oracle", data)

def expression_representer(dumper, data):
return dumper.represent_scalar("!expression", data)

# Register the representer for the ReturnOracle object
yaml.add_representer(ReturnOracle, oracle_representer)
yaml.add_representer(ExpressionString, expression_representer)
return yaml.dump(yaml_object, sort_keys=False)


if __name__ == "__main__":
n = len(sys.argv)
assert n > 1, "Expected atleast two argument (path to yaml file and language)."

path = sys.argv[1]
lang = sys.argv[2]
new_yaml = parse_yaml(path)
translated_dsl = translate_dsl(new_yaml, lang)
yaml_string = convert_to_yaml(translated_dsl)
print(yaml_string)
_validate_dsl(_parse_yaml(yaml_string))
Loading
Loading