Skip to content

Latest commit

 

History

History
119 lines (85 loc) · 4.13 KB

README.md

File metadata and controls

119 lines (85 loc) · 4.13 KB

Semantic Role Labelling For Hindi

Github Repo Link

https://github.com/rahcode7/SRL

Semantic Role Labelling

Semantic Role Labelling, in natural language processing, is a process that assigns labels to different words in a sentence that indicate their semantic role in the sentence. This helps in finding the meaning of the sentence, and more importantly, the role of a particlar word in creating that meaning of the sentence. The task essentially boils down to identifying the various arguments associated with the predicate or the main verb of the sentence and assigning them specific roles.

Example


SRL Example

The above example has 3 distinct labels that can be seen - Agent, Theme, and the Location. It also has the predicate labelled. Using these labels we are then able to answer the question "Who did what to whom where?"

Some of the more common labels are -

  • Agent
  • Experiencer
  • Theme
  • Result
  • Location

More labels (not exhaustive) can be found in these slides

Problem Classification


The problem can the be further decomposed into the following -

Predicate Detection: Findint the predicate in a given sentence.

Predicate Sense Disambiguation: Disambiguating the sense of the predicate found.

Argument Identification: Identifying the arguments for the given predicate for the given sense.

Argument Classification: Assigning the labels to the arguments found.

Hindi


For Semantic Role Labelling in Hindi, we will be labelling the words into the following roles:

Label Description
ARG0 Agent, Experiencer, or doer
ARG1 Patient or Theme
ARG2 Beneficiary
ARG3 Instrument
ARG2-ATR Attribute or Quality
ARG2-LOC Physical Location
ARG2-GOL Goal
ARG2-SOU Source
ARGM-PRX Noun-Verb Construction
ARGM-ADV Adverb
ARGM-DIR Direction
ARGM-EXT Extent or Comparision
ARGM-MNR Manner
ARGM-PRP Purpose
ARGM-DIS Discourse
ARGM-LOC Abstract Location
ARGM-MNS Means
ARGM-NEG Negation
ARGM-TMP Time
ARGM-CAU Cause or Reason

The following labels have been taken from this paper.

Code Walkthrough

Requirements

torch==1.8.1. numpy==1.19.5. matplotlib==3.3.4. seaborn==0.11.1. pandas==1.1.5. scikit_learn==0.24.2.

Train a model

Download dataset

1.Download the dataset from the link - https://drive.google.com/drive/folders/1JLrZ0HgvuXKZL7PhB5Y89V_4BzIp7ucV?usp=sharing
2.Unzip the file and place it in data/processed folder

Run following commands
cd src

1. For logistic regression model

python classifier_base.py

2. For Bidirectional LSTM model

python SRL_NN_train.py --EMBEDDING_DIM=300 --NUM_HIDDEN_NODES=100 --epochs=50 --batchsize=64 --learning_rate=0.001

Hyperparameters

The following table lists optimization/training hyperparameters for the neural LSTM based SRL model

Name Type Description Default value
learning_rate float Initial learning rate. 0.001
EMBEDDING_DIM int Dimensionality of the word vectors 300
NUM_HIDDEN_NODES int Number of hidden nodes of the neural model 100
epochs int Number of times datasets needs to be iterated for a model 50
batchsize int Size of a batch of training examples sampled from a dataset 64

Model Checkpoints/Trained models

Our models are stored at models/srl_hindi_bilstm_50e.pth

Results

Final Model - Bidirectional LSTM alt text alt text