Skip to content

EMNLP 2022 paper: Boosting Document-Level Relation Extraction by Mining and Injecting Logical Rules

Notifications You must be signed in to change notification settings

XingYing-stack/MILR

Repository files navigation

MILR

Code for the paper 'Boosting Document-Level Relation Extraction by Mining and Injecting Logical Rules", which is accepted to EMNLP 2022 main conference.

For simplicity, we only supply the code with the strong backbone ATLOP being the backbone tested on the DWIE dataset. MILR with other backbones and datasets are similar. This code is adapted from the repository of ATLOP. Thanks for their excellent work.

In addition, predictions used for analysis in our paper are provided. We also provide mined rules whose confidence is higher than the threshold minC.

Requirements

  • Python (tested on 3.6)

  • apex==0.9.10dev

  • cvxpy==1.1.18

  • dill==0.3.4

  • gurobipy==9.5.1 (Note that installation via pip may not work. Please request an evaluation license or a free academic license of Gurobi. More instructions can be found in link.)

  • matplotlib==3.3.1

  • numpy==1.19.2

  • opt_einsum==3.3.0

  • pandas==1.1.3

  • scipy==1.2.0

  • tqdm==4.50.0

  • transformers==3.4.0

  • ujson==4.0.2

  • wandb==0.10.32

  • torch==1.6.0

    We also exported enviroment.yaml and requirements.txt.

Dataset

The training and development set of DocRED dataset can be downloaded at link. And the test set used in MILR can be downloaded at link. The DWIE dataset can be obtained following the instructions in LogiRE. We also upload the processed dataset in EMNLP 2022 START Conference Manager.

The expected structure of files is:

​```
ATLOP+MILR
 |-- dataset_dwie
 |    |-- train_annotated.json        
 |    |-- dev.json
 |    |-- test.json
 |    |-- meta
 |    |    |-- ner2id.json        
 |    |    |-- rel2id.json
 |    |    |-- vec.npy
 |    |    |-- word2id.json
 |-- dataset_docred
 |    |-- train_annotated.json        
 |    |-- dev.json
 |    |-- test.json
 |    |-- rel_info.json
 |    |-- meta
 |    |    |-- ner2id.json        
 |    |    |-- rel2id.json
 |    |    |-- vec.npy
 |    |    |-- word2id.json
 |    |    |-- char_vec.npy
 |    |    |-- char2id.json
​```

Pre-Trained Language Model

Download BERT-base-uncased at link. And put downloaded files into ./PLM/bert-base-uncased . The expected structure of files is:

​```
ATLOP+MILR
 |-- PLM
 |    |-- bert-base-uncased
 |    |    |-- config.json        
 |    |    |-- pytorch_model.bin
 |    |    |-- vocab.txt
​```

Mined Rules

We supply mined rules on DWIE and DocRED in ./mined_rules. The structure of files is :

​```
ATLOP+MILR
 |-- mined_rules:
 |    |-- rule_docred.txt
 |    |-- rule_dwie.txt

Examples are as follows:

['in1', 'in0'] -> in0 : 1.0 means in0(h,t) ← in1(h,z) ⋀ in0(z,t),whose confidence is 1.0.
['anti_based_in2', 'based_in0'] -> in0 : 1.0 means in0(h,t) ← based_in2(z,h) ⋀ based_in0(z,t),whose confidence is 1.0.

Predictions Produced by Trained Models

We supply predictions produced by ATLOP, ATLOP+LogiRE, and ATLOP+MILR. The structure of files is :

​```
ATLOP+MILR
 |-- results_for_dwie
 |    |-- result_ATLOP_dev.json
 |    |-- result_ATLOP_test.json
 |    |-- result_LogiRE_test.json
 |    |-- result_MILR_dev.json
 |    |-- result_MILR_test.json
 |-- results_for_docred
 |    |-- result_ATLOP_test.json
 |    |-- result_MILR_test.json

Trained Models

We supply trained ATLOP & ATLOP+MILR on the DWIE dataset in link and link, respectively. Please download trained models and put them into the path ./trained_model/. The expected structure of files is:

​```
ATLOP+MILR
 |-- trained_model
 |    |-- model_ATLOP_DWIE.pth
 |    |-- model_MILR_DWIE.pth

Log Samples

We also provide log samples in ./logs/. These samples involve the training and inference of ATLOP+MILR and the inference of ATLOP.

Training and Evaluation of ATLOP+MILR

>> sh scripts/MILR_train_DWIE.sh  # for training; if trained model has been downloaded, this process can be omitted

The classification loss, consistency regularization loss, total loss, and evaluation results on the dev set are synced to the wandb dashboard.

>> sh scripts/MILR_evaluate_DWIE.sh  # for inference

The program will generate a test file ./results_for_dwie/result_MILR.json in the official evaluation format. In addition, the log involving evaluation results would be dumped to ./logs/MILR_DWIE_evaluation.out.

Attention: There may be a bug when wandb is synchronized in the cloud. If this happens, try wandb offline in terminal. More instructions can be found in link .

Evaluation of ATLOP

>> sh scripts/ATLOP_evaluate_DWIE.sh  # for inference 

The program will generate a test file ./results_for_dwie/result_ATLOP.json in the official evaluation format. In addition, the log involving evaluation results would be dumped to ./logs/ATLOP_DWIE_evaluation.out.

About

EMNLP 2022 paper: Boosting Document-Level Relation Extraction by Mining and Injecting Logical Rules

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published