Name		Name	Last commit message	Last commit date
parent directory ..
data		data
LICENSE.txt		LICENSE.txt
README.md		README.md

README.md

Georgetown University Multilayer Corpus for NER | Entity Extraction

Description

The GUM corpus was collected and annotated at Georgetown University.

For more information, see the LICENSE, and the following publication: Zeldes, Amir (2016) "The GUM Corpus: Creating Multilayer Resources in the Classroom". Language Resources and Evaluation.

This corpus version has been modified from the original one to focus on NER about a set of 22 classes, and transformed into IOB2/CoNLL like format.

Entries

train: 44111 entries | test: 18236 entries

URL

https://github.com/amir-zeldes/gum

File Format

text - CoNLL - IOB2

Column	Description
token	a string feature
ner_tag	a classification label, 23 classes

Example

    Pacific	B-person
    Standard	I-person
    owner	I-person
    ,	O
    Jonathan	B-person
    M.	I-person
    Stan	I-person
    ,	O
    displays	O
    the	O
    Santorum	B-substance
    cocktail	I-substance
    drink	I-substance
    as	O
    a	B-object
    finished	I-object
    product	I-object

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GUM

GUM

README.md

Georgetown University Multilayer Corpus for NER | Entity Extraction

Description

Entries

URL

File Format

Example

Files

GUM

Directory actions

More options

Directory actions

More options

Latest commit

History

GUM

Folders and files

parent directory

README.md

Georgetown University Multilayer Corpus for NER | Entity Extraction

Description

Entries

URL

File Format

Example