Some NER tools do not mark multi-word NEs #1234
Labels
🐛Bug
Something isn't working
Module-cogroo
Module-corenlp
Module-lbj
Module-lingpipe
Module-nlp4j
Module-opennlp
Milestone
Some NER tools such as the
CoreNlpNamedEntityRecognizer
mark every token individually as a NE instead of creating a multi-token NE.IMHO the default behavior should be that NEs with the same label are joined unless the model uses a BIO-like encoding in which case the BIO markers should be respected.
Also the unit tests for the NER tools should be changed to include a multi-word NE, e.g. change
John
from the current unit tests intoJohn Smith
.The text was updated successfully, but these errors were encountered: