Question on table 1 in ACL 2021 paper #6

jin8 · 2021-07-08T07:14:17Z

Hi, I have a question on the dataset.
In section 4.1 setups, it mentions that for unsupervised experiment setting, unlabled texts from STS12-16 + STSb + SICK-R are used for training.
I have looked through the dataset files. The number of samples of unlabeled text that I have found in STS16 (last row) does not match to your table 1.
I found 8002.
In the paper, I was not able to find a mention on expanding the size of dataset by concatenating the given dataset, but the numbers make sense when I doubled the labeled (train/valid/test) samples and unlabeled samples.
Did you doubled the size of the train/valid/test samples?
And did it hurt your performance if it is not doubled?

Thank you,
Jin

yym6472 · 2021-07-08T10:00:12Z

Hi, I don't know where do you get your dataset files, but we obtain all STS datasets through the SentEval toolkit, and you can also download them using the script data/get_transfer_data.bash.

For STS15 and STS16, we notice that the original datasets contain many unannotated sentence pairs (i.e. the paired texts are provided but the similarity score is missing). We also make use of the unlabeled texts from those unannotated samples to fine-tune our model in the unsupervised experiments (I wonder if this could be the reason why the numbers mismatch). For STS16, there are 9183 text pairs provided in total, resulting in 2 * 9183 = 18366 unlabeled texts (since each text pair contains two sentences). However, there are only 1379 text pairs with annotations among them, and we use these 1379 samples to test the trained model.

We did not conduct experiments that only sample one sentence from the text pairs, but I guess the results wouldn't be hurt too much. As we show in the few-shot experiments (Figure 6), the performance keeps comparable when training on 10000 texts (about 11% of the full dataset).

jin8 changed the title ~~Question on table 1.~~ Question on table 1 in ACL 2021 paper Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on table 1 in ACL 2021 paper #6

Question on table 1 in ACL 2021 paper #6

jin8 commented Jul 8, 2021

yym6472 commented Jul 8, 2021 •

edited

Loading

Question on table 1 in ACL 2021 paper #6

Question on table 1 in ACL 2021 paper #6

Comments

jin8 commented Jul 8, 2021

yym6472 commented Jul 8, 2021 • edited Loading

yym6472 commented Jul 8, 2021 •

edited

Loading