Code for the paper 'Boosting Document-Level Relation Extraction by Mining and Injecting Logical Rules", which is accepted to EMNLP 2022 main conference.
For simplicity, we only supply the code with the strong backbone ATLOP being the backbone tested on the DWIE dataset. MILR with other backbones and datasets are similar. This code is adapted from the repository of ATLOP. Thanks for their excellent work.
In addition, predictions used for analysis in our paper are provided. We also provide mined rules whose confidence is higher than the threshold minC.
-
Python (tested on 3.6)
-
apex==0.9.10dev
-
cvxpy==1.1.18
-
dill==0.3.4
-
gurobipy==9.5.1 (Note that installation via pip may not work. Please request an evaluation license or a free academic license of Gurobi. More instructions can be found in link.)
-
matplotlib==3.3.1
-
numpy==1.19.2
-
opt_einsum==3.3.0
-
pandas==1.1.3
-
scipy==1.2.0
-
tqdm==4.50.0
-
transformers==3.4.0
-
ujson==4.0.2
-
wandb==0.10.32
-
torch==1.6.0
We also exported
enviroment.yamlandrequirements.txt.
The training and development set of DocRED dataset can be downloaded at link. And the test set used in MILR can be downloaded at link. The DWIE dataset can be obtained following the instructions in LogiRE. We also upload the processed dataset in EMNLP 2022 START Conference Manager.
The expected structure of files is:
```
ATLOP+MILR
|-- dataset_dwie
| |-- train_annotated.json
| |-- dev.json
| |-- test.json
| |-- meta
| | |-- ner2id.json
| | |-- rel2id.json
| | |-- vec.npy
| | |-- word2id.json
|-- dataset_docred
| |-- train_annotated.json
| |-- dev.json
| |-- test.json
| |-- rel_info.json
| |-- meta
| | |-- ner2id.json
| | |-- rel2id.json
| | |-- vec.npy
| | |-- word2id.json
| | |-- char_vec.npy
| | |-- char2id.json
```
Download BERT-base-uncased at link. And put downloaded files into ./PLM/bert-base-uncased . The expected structure of files is:
```
ATLOP+MILR
|-- PLM
| |-- bert-base-uncased
| | |-- config.json
| | |-- pytorch_model.bin
| | |-- vocab.txt
```
We supply mined rules on DWIE and DocRED in ./mined_rules. The structure of files is :
```
ATLOP+MILR
|-- mined_rules:
| |-- rule_docred.txt
| |-- rule_dwie.txt
Examples are as follows:
['in1', 'in0'] -> in0 : 1.0 means in0(h,t) ← in1(h,z) ⋀ in0(z,t),whose confidence is 1.0.
['anti_based_in2', 'based_in0'] -> in0 : 1.0 means in0(h,t) ← based_in2(z,h) ⋀ based_in0(z,t),whose confidence is 1.0.
We supply predictions produced by ATLOP, ATLOP+LogiRE, and ATLOP+MILR. The structure of files is :
```
ATLOP+MILR
|-- results_for_dwie
| |-- result_ATLOP_dev.json
| |-- result_ATLOP_test.json
| |-- result_LogiRE_test.json
| |-- result_MILR_dev.json
| |-- result_MILR_test.json
|-- results_for_docred
| |-- result_ATLOP_test.json
| |-- result_MILR_test.json
We supply trained ATLOP & ATLOP+MILR on the DWIE dataset in link and link, respectively. Please download trained models and put them into the path ./trained_model/. The expected structure of files is:
```
ATLOP+MILR
|-- trained_model
| |-- model_ATLOP_DWIE.pth
| |-- model_MILR_DWIE.pth
We also provide log samples in ./logs/. These samples involve the training and inference of ATLOP+MILR and the inference of ATLOP.
>> sh scripts/MILR_train_DWIE.sh # for training; if trained model has been downloaded, this process can be omittedThe classification loss, consistency regularization loss, total loss, and evaluation results on the dev set are synced to the wandb dashboard.
>> sh scripts/MILR_evaluate_DWIE.sh # for inferenceThe program will generate a test file ./results_for_dwie/result_MILR.json in the official evaluation format. In addition, the log involving evaluation results would be dumped to ./logs/MILR_DWIE_evaluation.out.
Attention: There may be a bug when wandb is synchronized in the cloud. If this happens, try wandb offline in terminal. More instructions can be found in link .
>> sh scripts/ATLOP_evaluate_DWIE.sh # for inference The program will generate a test file ./results_for_dwie/result_ATLOP.json in the official evaluation format. In addition, the log involving evaluation results would be dumped to ./logs/ATLOP_DWIE_evaluation.out.