[WIP] Train Vietnamese Dependency Parsing

VLSP 2020 Dataset

Train: 8151 sentences, Test: 1122 sentences

download

Input vectors

The input vector is composed of two parts: the word embedding and the CharLSTM word representation vector of

Biaffine Attention Mechanism

Compute the score of a dependency via biaffine attention:

Parameter settings

Model parameters

	Component	Hyper-Parameter	Value
Embedding	BERT	n_bert_layers dimension	4 768
LSTM	Encoder	n_lstm_hidden n_lstm_layers lstm_dropout	400 3 0.33

Training Parameters

Hyper-Parameter	Value
optimizer	Adam

Choose batch_size (5000) right help us alots

Using wandb logs is very handful. We can easily watch logs, loss graph with nearly zero setup

Provide feedback