Skip to content

Latest commit

 

History

History
71 lines (49 loc) · 2.87 KB

02-ner-intro.md

File metadata and controls

71 lines (49 loc) · 2.87 KB

Techniques for Named Entity Recognition

  • Dictionary Based
  • Regex based
  • Model based
    • Sequence to Sequence model -- sequence of tokens go in, sequence of IOB tags come out.

      tokens Joe Biden is the president of the United States .
      tags B-PER I-PER O O O O O B-GPE B-GPE O
    • Popular NER models

      • Hidden Markov Model
      • Linear CRF (Conditional Random Field)
      • BiLSTM-CRF
      • Transformers

Hidden Markov Model

  • Markov assumption: Event at time t can be predicted from events at times (t-1, ..., t-N) where N is small.
  • Uses engineered word level features.
  • Features from recent past used to predict tag at position t.
  • Most probable tag sequence assigned to token sequence using Viterbi algorithm.

Linear Chain CRF

  • Graphical model that calculates the conditional probability of a tag sequence c = (c1, ..., cN) given an observed token sequence o = (o1, ..., oN).
  • Considers neighboring examples when making prediction.
  • Still uses features generated through feature engineering.

A Linear chain Conditional Random Fields model. Image Source: Building a Named Entity Recognition model using a BiLSTM-CRF network)


BiLSTM-CRF

  • Popular neural architecture for NER.
  • End-to-end model, features are inferred during training.
  • Recurrent Network (LSTM) considers all time steps prior to current time step (LHS context).
  • Bidirectional LSTM considers LHS and RHS context together, provides neighborhood context information.
  • Output of both LSTMs fed to a linear chain CRF, is effectively an attention mechanism.

Architecture of a BiLSTM-CRF Model. (Image Source: Building a Named Entity Recognition model using a BiLSTM-CRF network)


Transformer based NER

  • Modeled as a token classification task, HuggingFace provides XXXForTokenClassification models to do NER.
  • Input token sequence is padded with [CLA] and [SEP] tags, and subword tokenized.
  • Transformer Encoder (BERT) used to generate embeddings, output sent to shared Linear layer to produce logits across the tag vocabulary.
  • Self-attention plus linear layer serves as equivalent of CRF head.

Architecture of a Transformer based NER Model (Image Source: Tuning Multilingual Transformers for Named Entity Recognition on Slavic Languages)