Tagging format #1

blester125 · 2019-11-02T18:06:17Z

In most NER datasets there is some sort of span labeling scheme where prefixes like B- or I- are used to separate mentions of the same type that are adjacent.

In the data it looks like there isn't a span labeling scheme used.

the	O
Mearns	LEXICON
Glacigenic	LEXICON
Subgroup	LEXICON
of O

Are there no mentions in the datasets that touch or am I missing some strategy that delims them?

The text was updated successfully, but these errors were encountered:

jeromemassot · 2019-11-02T18:41:10Z

Hi blester125,
In fact the B- and I- notations are related to n-grams : i.e. when a particular entity is made of several items. But, if two entites of the same label are following each others but are distinct, they should have been tagged with the B-prefix each time.

So, I could understand why this notation has not been reproduced in the lexicon, which is only a glossary.

Mapping from the lexicon entries to the B- and I- notation is quite easy : for each entry, split the term using "space" as the separator and prefix the first token with B- and the following ones with I-.

Best regards
Jerome

metazool · 2020-10-27T16:44:10Z

Thank you, I missed this discussion. The annotation format is the one CoreNLP suggests here:

https://stanfordnlp.github.io/CoreNLP/ner.html#training-or-retraining-new-models

There must be an assumption that the tagged tokens, if not separated by an O tagged tokens, are part of a contiguous entity. That's the assumption made by brat javascript renderer on the CoreNLP server's visual output. It won't always hold, will it, semantically? I've never looked in to the underlying LSTM.

I would be glad to hear of alternative more sophisticated approaches!

metazool · 2020-10-27T16:50:29Z

As for the source references from which the annotated sentences were extracted (during an unrelated project in the early 2000s) , many but not all of them are available as JP2 scans under an Open Government Licence. The list of sources is here: https://github.com/BritishGeologicalSurvey/geo-ner-model/blob/main/REFERENCES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tagging format #1

Tagging format #1

blester125 commented Nov 2, 2019

jeromemassot commented Nov 2, 2019

metazool commented Oct 27, 2020 •

edited

Loading

metazool commented Oct 27, 2020

Tagging format #1

Tagging format #1

Comments

blester125 commented Nov 2, 2019

jeromemassot commented Nov 2, 2019

metazool commented Oct 27, 2020 • edited Loading

metazool commented Oct 27, 2020

metazool commented Oct 27, 2020 •

edited

Loading