This folder contains examples and best practices, written in Jupyter notebooks, for building Named
Entity Recognition models. We use the
utility scripts in the utils_nlp folder to speed up data preprocessing and model building for NER.
The models can be used in a wide variety of applications, such as
information extraction and filtering. It also plays an important role in other NLP tasks like
question answering and text summarization.
Currently, we focus on fine-tuning pre-trained BERT
model. We plan to continue adding state-of-the-art models as they come up and welcome community
contributions.
Named Entity Recognition (NER) is the task of detecting and classifying real-world objects mentioned in text. Common named entities include person names, locations, organizations, etc. The state-of-the art NER methods include combining Long Short-Term Memory neural network with Conditional Random Field (LSTM-CRF) and pretrained language models like BERT.
NER usually involves assigning an entity label to each word in a sentence as shown in the figure below.
- O: Not an entity
- I-LOC: Location
- I-ORG: Organization
- I-PER: Person
There are a few standard labeling schemes and you can find the details here. The data can also be labeled with custom entities as required by the use case.
Notebook | Environment | Description | Dataset | Language |
---|---|---|---|---|
BERT | Local | Fine-tune a pretrained BERT model for token classification. | wikigold | English |