Skip to content

[WIP] Train Vietnamese Dependency Parsing

Vu Anh edited this page Jul 1, 2021 · 3 revisions

Training Data

VLSP 2020 Dataset

Train: 8151 sentences, Test: 1122 sentences

Models Description

download

Input vectors

The input vector is composed of two parts: the word embedding and the CharLSTM word representation vector of

Biaffine Attention Mechanism

Compute the score of a dependency via biaffine attention:

Parameter settings

Model parameters

Component Hyper-Parameter Value
Embedding BERT n_bert_layers
dimension
4
768
LSTM Encoder n_lstm_hidden
n_lstm_layers
lstm_dropout
400
3
0.33

Training Parameters

Hyper-Parameter Value
optimizer Adam

Choose batch_size (5000) right help us alots

Notes

  • Using wandb logs is very handful. We can easily watch logs, loss graph with nearly zero setup