1605BigVocab

1605 BigVocab Experiments

Changing the concept of vocabulary. Rather than building an embedding matrix from all words in the training set, build an embedding matrix just from the top 100 tokens of the training set, and substitute non-trainable GloVe embeddings for all other words (even words not in training set). See also

https://github.com/brmson/dataset-sts/issues/20

This was mainly motivated by an observation that argus hypev works much better when using (errorneously) a vocabulary that was built on a different split than the current training set.

HypEv Experiments

Argus

Before:

Model	trn QAcc	val QAcc	val QF1	tst QAcc	tst QF1	settings
avg	0.931244	0.797530	0.728479	0.731408	0.649600	(defaults)
	±0.012570	±0.006695	±0.012416	±0.007907	±0.013410
DAN	0.949085	0.827096	0.750504	0.742484	0.666239	`inp_e_dropout=0` `inp_w_dropout=1/3` `deep=2` `pact='relu'` `l2reg=1e-5`
	±0.013475	±0.015297	±0.028354	±0.008980	±0.018475
--------------------------	----------	----------	----------	----------	-----------	----------
rnn	0.901008	0.854416	0.782354	0.798259	0.742293	(defaults)
	±0.018453	±0.009075	±0.015912	±0.011856	±0.018040
cnn	0.902398	0.857410	0.791902	0.796677	0.741328	(defaults)
	±0.019215	±0.005197	±0.009707	±0.010855	±0.019413
rnncnn	0.915025	0.852171	0.782774	0.779668	0.708510	(defaults)
	±0.023084	±0.009620	±0.016334	±0.014759	±0.022262
attn1511	0.853626	0.842066	0.772648	0.812500	0.770903	`sdim=2`
	±0.010105	±0.006757	±0.011771	±0.008588	±0.017540

After:

Model	trn QAcc	val QAcc	val QF1	tst QAcc	tst QF1	settings
avg	0.626815	0.670659	nan	0.621308	nan	(defaults)
	±0.024750	±0.020524	±nan	±0.026126	±nan
DAN	0.913809	0.848303	0.787701	0.799578	0.754505	`inp_e_dropout=0` `inp_w_dropout=1/3` `deep=2` `pact='relu'` `l2reg=1e-5`
	±0.020632	±0.012912	±0.028049	±0.019803	±0.029174
--------------------------	----------	----------	----------	----------	-----------	----------
rnn	0.930491	0.859281	0.793455	0.806962	0.763644	(defaults)
	±0.044004	±0.007907	±0.009231	±0.020562	±0.030651
cnn	0.920297	0.862275	0.801795	0.819620	0.763181	(defaults)
	±0.030030	±0.017017	±0.033116	±0.024479	±0.061008
rnncnn	0.922768	0.861277	0.804602	0.812236	0.765567	(defaults)
	±0.040065	±0.009881	±0.017815	±0.014686	±0.025850
attn1511	0.869632	0.841317	0.787862	0.812236	0.777503	(defaults)
	±0.011013	±0.009426	±0.015477	±0.009129	±0.022301

There is a slight improvement, though it doesn't match what we have observed with that original vocabulary change.

Pruning size:

6x R_rg_2a51BV_EP100_mask - 0.836327 (95% [0.827690, 0.844964]):

6x R_rg_2a51BV_EP1000_mask - 0.818363 (95% [0.799982, 0.836744]):

11290398.arien.ics.muni.cz.R_rg_2a51BV_EP1000_mask etc.
[0.838323, 0.802395, 0.844311, 0.814371, 0.796407, 0.814371, ]

6x R_rg_2a51BV_EP20_mask - 0.836327 (95% [0.833365, 0.839289]):

11290400.arien.ics.muni.cz.R_rg_2a51BV_EP20_mask etc.
[0.838323, 0.838323, 0.838323, 0.838323, 0.832335, 0.832335, ]

No effect.

Other

Other experiments done with BV_EP100 on hypev are documented in 1605EightGrade.

STS Experiments

The popular sanity check:

Baseline R_ss_2rnncnn val 0.705950 ±0.005099.

16x R_ss_2rnncnnBV_EP100 - 0.703722 (95% [0.699193, 0.708251]):

11297565.arien.ics.muni.cz.R_ss_2rnncnnBV_EP100 etc.
[0.714254, 0.701655, 0.705811, 0.696601, 0.694212, 0.695169, 0.701795, 0.706860, 0.703687, 0.711265, 0.698890, 0.694599, 0.718736, 0.716835, 0.689257, 0.709924, ]

Provide feedback

Saved searches