Skip to content

lexnlp.extract.en.geoentities.get_geoentity_annotations returning the wrong location indexes #40

Open
@Ra-you

Description

@Ra-you

>>> import lexnlp.extract.en.geoentities
>>> text = "This Contract (“Contract”) is entered into by and between the City of Detroit, a Michigan municipal corporation"
>>> for geoentity in lexnlp.extract.en.geoentities.get_geoentity_annotations(text, _CONFIG):
>>> print(geoentity)
Michigan [geoentity] at (86..95), loc: en

Currently the get_geoentity_annotations is returning the wrong location indexes as shown in the example above, the right location indexes should be Michigan [geoentity] at (82..91), loc: en. I noticed that this behavior comes when the text variable contains ponctuations signs, so each time the get_geoentity_annotations parser face a ponctuation sign (eg. ,, (, ), , ) the location index is incremented by +2, in this way any geoentity occurs first before any ponctuation signs have got the right location indexes, on the other hand the ones that occur after have got the wrong location indexes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions