This repo contains the code for the IST-Unbabel 2021 Submission for the Quality Estimation Shared Task.
Preprocess the entire MLQE-PE dataset into the shared task format and download/preprocess the test set:
mkdir data
git clone https://github.com/sheffieldnlp/mlqe-pe
bash preprocess_mlqepe.sh
python3 preprocess_mlqepe.py --input-dir mlqe-pe/data/ --output-dir data/
bash download_and_preprocess_test_data.sh
rm -rf mlqe-pepip install -r requirements.txt
pip install -e .Inform a config file via -f:
python3 cli.py train -f configs/xlmr-adapters-shared-task-mlqepe-all-all.yamlSee more config files in the config/ folder. PyTorch models can be found in the model/ folder.
python3 scripts/evaluate_sentence_level.py train --testset data/ro-en/dev --checkpoint path/to/model.ckptFor word-level models:
python3 scripts/evaluate_word_level.py train --testset data/ro-en/dev --checkpoint path/to/model.ckptFor baseline explainers (gradient, leave-one-out, etc.), use explain.py. For example:
python3 scripts/explain.py
--testset data/ro-en/dev
--checkpoint path/to/model.ckpt
--explainer ig
--save experiments/explanations/roen_ig/
--batch-size 1For extracting attention, use explain_attn.py. For example:
python3 scripts/explain_attn.py
--testset data/ro-en/dev
--checkpoint path/to/model.ckpt
--save experiments/explanations/roen_attn
--batch-size 1Several folders will be created with their name prefixed by the path informed via the flag --save for:
- the entire model (average layers via scalar mix)
- for each layer (average heads)
- for each head (average the "rows" in the attention map)
Moreover, if you want to get explanations in terms of attention * norm(values), you can inform these flags:
--norm-attention
--norm-strategy weighted_normWe also provide scripts for extracting explanations with other methods, e.g., DiffMask and Attention Flow/Rollout.
Use the script evaluate_explanations.py. For example:
python3 scripts/evaluate_explanations.py
--gold_sentence_scores_fname data/et-en/dev.da
--gold_explanations_fname_mt data/et-en/dev.tgt-tags
--gold_explanations_fname_src data/et-en/dev.src-tags
--model_sentence_scores_fname experiments/explanations/eten_attn_head_18_3/sentence_scores.txt
--model_explanations_fname_mt experiments/explanations/eten_attn_head_18_3/mt_scores.txt
--model_explanations_fname_src experiments/explanations/eten_attn_head_18_3/source_scores.txt
--model_fp_mask_mt experiments/explanations/eten_attn_head_18_3/mt_fp_mask.txt
--model_fp_mask_src experiments/explanations/eten_attn_head_18_3/source_fp_mask.txt
--reduction sum
--transform noneThe --reduction flag informs how to aggregate word pieces scores: none, first, sum, mean, max.
The flag --transform can be:
pre: applysigmoid(abs(.))element-wise for each score BEFORE aggregating word piecespos: applysigmoid(abs(.))element-wise for each score AFTER aggregating word piecesnone: do not apply any transformation
This transformation might be useful for explainers that can return negative values. The computation of sigmoid is in fact irrelevant, since the metrics are based on ranking. But it is useful to have scores between 0 and 1 if we want to do some kind of thresholding to calculate accuracy or something else.
Aggreagte subword units:
python3 scripts/aggregate_explanations.py \
--model_explanations_dname experiments/explanations/roen_ig/ \
--reduction sum \
--transform noneCreate a metadata.txt file, and zip all files. Here is the script that does all of this:
python3 scripts/prepare_submission.py
--explainer experiments/explanations/roen_ig/
--save submission.zip
--team "Team Name"
--track "constrained"
--desc "Simple description of the model + explainer."A file called submissions.zip will be created in the working directory with the explanations of ig for ro-en.
@inproceedings{treviso-etal-2021-ist,
title = "{IST}-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task",
author = "Treviso, Marcos and
Guerreiro, Nuno M. and
Rei, Ricardo and
Martins, Andr{\'e} F. T.",
booktitle = "Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.eval4nlp-1.14",
pages = "133--145",
}
MIT.