Skip to content

1605BigVocab

Petr Baudis edited this page May 13, 2016 · 3 revisions

1605 BigVocab Experiments

Changing the concept of vocabulary. Rather than building an embedding matrix from all words in the training set, build an embedding matrix just from the top 100 tokens of the training set, and substitute non-trainable GloVe embeddings for all other words (even words not in training set). See also

https://github.com/brmson/dataset-sts/issues/20

This was mainly motivated by an observation that argus hypev works much better when using (errorneously) a vocabulary that was built on a different split than the current training set.

HypEv Experiments

Argus

Before:

Model trn QAcc val QAcc val QF1 tst QAcc tst QF1 settings
avg 0.931244 0.797530 0.728479 0.731408 0.649600 (defaults)
±0.012570 ±0.006695 ±0.012416 ±0.007907 ±0.013410
DAN 0.949085 0.827096 0.750504 0.742484 0.666239 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu' l2reg=1e-5
±0.013475 ±0.015297 ±0.028354 ±0.008980 ±0.018475
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.901008 0.854416 0.782354 0.798259 0.742293 (defaults)
±0.018453 ±0.009075 ±0.015912 ±0.011856 ±0.018040
cnn 0.902398 0.857410 0.791902 0.796677 0.741328 (defaults)
±0.019215 ±0.005197 ±0.009707 ±0.010855 ±0.019413
rnncnn 0.915025 0.852171 0.782774 0.779668 0.708510 (defaults)
±0.023084 ±0.009620 ±0.016334 ±0.014759 ±0.022262
attn1511 0.853626 0.842066 0.772648 0.812500 0.770903 sdim=2
±0.010105 ±0.006757 ±0.011771 ±0.008588 ±0.017540

After:

Model trn QAcc val QAcc val QF1 tst QAcc tst QF1 settings
avg 0.626815 0.670659 nan 0.621308 nan (defaults)
±0.024750 ±0.020524 ±nan ±0.026126 ±nan
DAN 0.913809 0.848303 0.787701 0.799578 0.754505 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu' l2reg=1e-5
±0.020632 ±0.012912 ±0.028049 ±0.019803 ±0.029174
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.930491 0.859281 0.793455 0.806962 0.763644 (defaults)
±0.044004 ±0.007907 ±0.009231 ±0.020562 ±0.030651
cnn 0.920297 0.862275 0.801795 0.819620 0.763181 (defaults)
±0.030030 ±0.017017 ±0.033116 ±0.024479 ±0.061008
rnncnn 0.922768 0.861277 0.804602 0.812236 0.765567 (defaults)
±0.040065 ±0.009881 ±0.017815 ±0.014686 ±0.025850
attn1511 0.869632 0.841317 0.787862 0.812236 0.777503 (defaults)
±0.011013 ±0.009426 ±0.015477 ±0.009129 ±0.022301

There is a slight improvement, though it doesn't match what we have observed with that original vocabulary change.

Pruning size:

6x R_rg_2a51BV_EP100_mask - 0.836327 (95% [0.827690, 0.844964]):

6x R_rg_2a51BV_EP1000_mask - 0.818363 (95% [0.799982, 0.836744]):

11290398.arien.ics.muni.cz.R_rg_2a51BV_EP1000_mask etc.
[0.838323, 0.802395, 0.844311, 0.814371, 0.796407, 0.814371, ]

6x R_rg_2a51BV_EP20_mask - 0.836327 (95% [0.833365, 0.839289]):

11290400.arien.ics.muni.cz.R_rg_2a51BV_EP20_mask etc.
[0.838323, 0.838323, 0.838323, 0.838323, 0.832335, 0.832335, ]

No effect.

Other

Other experiments done with BV_EP100 on hypev are documented in 1605EightGrade.

STS Experiments

The popular sanity check:

Baseline R_ss_2rnncnn val 0.705950 ±0.005099.

16x R_ss_2rnncnnBV_EP100 - 0.703722 (95% [0.699193, 0.708251]):

11297565.arien.ics.muni.cz.R_ss_2rnncnnBV_EP100 etc.
[0.714254, 0.701655, 0.705811, 0.696601, 0.694212, 0.695169, 0.701795, 0.706860, 0.703687, 0.711265, 0.698890, 0.694599, 0.718736, 0.716835, 0.689257, 0.709924, ]
Clone this wiki locally