Some NER tools do not mark multi-word NEs #1234

reckart · 2018-05-23T09:47:24Z

Some NER tools such as the CoreNlpNamedEntityRecognizer mark every token individually as a NE instead of creating a multi-token NE.

IMHO the default behavior should be that NEs with the same label are joined unless the model uses a BIO-like encoding in which case the BIO markers should be respected.

Also the unit tests for the NER tools should be changed to include a multi-word NE, e.g. change John from the current unit tests into John Smith.

CoGrOO Named Entity Recognizer
CoreNLP Named Entity Recogizer (old API)
CoreNLP Named Entity Recognizer
Illinois CCG Named Entity Recognizer
LingPipe Named Entity Recognizer
NLP4J Named Entity Recognizer
OpenNLP Named Entity Recognizer

The text was updated successfully, but these errors were encountered:

reckart added 🐛Bug Something isn't working Module-opennlp Module-cogroo Module-lingpipe Module-corenlp Module-nlp4j Module-lbj labels May 23, 2018

reckart added this to the 1.10.0 milestone May 23, 2018

reckart modified the milestones: 1.10.0, 1.11.0 Jul 28, 2018

reckart modified the milestones: 1.11.0, 1.12.0 Feb 12, 2019

reckart modified the milestones: 2.1.0, Backlog Sep 8, 2019

jcklie mentioned this issue Nov 22, 2019

Adjacent named entities from CoreNLP not merged #1430

Closed

reckart modified the milestones: Feature backlog, Bug backlog Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some NER tools do not mark multi-word NEs #1234

Some NER tools do not mark multi-word NEs #1234

reckart commented May 23, 2018

Some NER tools do not mark multi-word NEs #1234

Some NER tools do not mark multi-word NEs #1234

Comments

reckart commented May 23, 2018