This repository contains the code for reproducing the results reported in the paper "Zero-Shot Information Extraction to Enhancea Knowledge Graph Describing Silk Textiles" at the LaTeCH-CLfL 2021 workshop co-located with EMNLP 2021.
install -r requirements.txt
or
conda install --file requirements.txt
First, you need to download two language-specific sub-graphs (English and Spanish) based on the ConceptNet Knowledge Graph hosted on Zenodo. The code in the notebook (see 4. Notebooks) expects them to be in a folder named "neighborhoods", but this can be changed.
2. Queries
Query the SILKNOW Knowledge graph on https://data.silknow.org/sparql by copy-pasting the content of thes SPARQL file in the folder "queries". Set "Results Format" to "CSV" before clicking on "Exectute Query" for each query.
The files are named after language and property type, for example English and material: en_material.sparql. The queries can be adjusted, but they are set up as in the paper, which means that per file only records of specific museums and properties get exported from the SILKNOW Knowledge Graph. The property values are based on concept URIs from the SILKNOW Thesaurus. "http://data.silknow.org/vocabulary/627" stands for "Gold thread" e.g.
The resulting CSVs have several columns: "obj" for the object URI, "museum" for the museum URI, "text" for the textual description and a last one for the property group, which corresponds to the class label.
3. Scripts
For each language and property combination of the queries there is a preprocessing python script in the folder "scripts" that needs to be run for every query output respectively. If your file names are different, adjust them inside the code.
These scripts do some basic formatting operations and make sure that one row represents one museum object. If you want to perform a test/train split it is recommended to do it after this step.
4. Notebooks
Run the notebooks for each property / language combination respectively. The notebooks contain all relevant code and show the results at the bottom. Each one of them also produces another CSV file with the respective predictions.
@inproceedings{schleider-troncy-2021-zero,
title = "Zero-Shot Information Extraction to Enhance a Knowledge Graph Describing Silk Textiles",
author = "Schleider, Thomas and
Troncy, Raphael",
booktitle = "Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic (online)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.latechclfl-1.16",
pages = "138--146",
}