-
-
Notifications
You must be signed in to change notification settings - Fork 273
[WIP] Train Vietnamese Dependency Parsing
Vu Anh edited this page Jul 1, 2021
·
3 revisions
VLSP 2020 Dataset
Train: 8151 sentences, Test: 1122 sentences
Input vectors
The input vector is composed of two parts: the word embedding and the CharLSTM word representation vector of
Biaffine Attention Mechanism
Compute the score of a dependency via biaffine attention:
Parameter settings
Model parameters
Component | Hyper-Parameter | Value | |
---|---|---|---|
Embedding | BERT |
n_bert_layers dimension |
4 768 |
LSTM | Encoder |
n_lstm_hidden n_lstm_layers lstm_dropout |
400 3 0.33 |
Training Parameters
Hyper-Parameter | Value |
---|---|
optimizer | Adam |
Choose batch_size (5000) right help us alots
- Using wandb logs is very handful. We can easily watch logs, loss graph with nearly zero setup