- Dictionary Based
- Regex based
- Model based
-
Sequence to Sequence model -- sequence of tokens go in, sequence of IOB tags come out.
tokens Joe Biden is the president of the United States . tags B-PER I-PER O O O O O B-GPE B-GPE O -
Popular NER models
- Hidden Markov Model
- Linear CRF (Conditional Random Field)
- BiLSTM-CRF
- Transformers
-
Hidden Markov Model
- Markov assumption: Event at time t can be predicted from events at times (t-1, ..., t-N) where N is small.
- Uses engineered word level features.
- Features from recent past used to predict tag at position t.
- Most probable tag sequence assigned to token sequence using Viterbi algorithm.
- Graphical model that calculates the conditional probability of a tag sequence c = (c1, ..., cN) given an observed token sequence o = (o1, ..., oN).
- Considers neighboring examples when making prediction.
- Still uses features generated through feature engineering.
A Linear chain Conditional Random Fields model. Image Source: Building a Named Entity Recognition model using a BiLSTM-CRF network)
- Popular neural architecture for NER.
- End-to-end model, features are inferred during training.
- Recurrent Network (LSTM) considers all time steps prior to current time step (LHS context).
- Bidirectional LSTM considers LHS and RHS context together, provides neighborhood context information.
- Output of both LSTMs fed to a linear chain CRF, is effectively an attention mechanism.
Architecture of a BiLSTM-CRF Model. (Image Source: Building a Named Entity Recognition model using a BiLSTM-CRF network)
- Modeled as a token classification task, HuggingFace provides
XXXForTokenClassification
models to do NER. - Input token sequence is padded with [CLA] and [SEP] tags, and subword tokenized.
- Transformer Encoder (BERT) used to generate embeddings, output sent to shared Linear layer to produce logits across the tag vocabulary.
- Self-attention plus linear layer serves as equivalent of CRF head.
Architecture of a Transformer based NER Model (Image Source: Tuning Multilingual Transformers for Named Entity Recognition on Slavic Languages)