Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on table 1 in ACL 2021 paper #6

Open
jin8 opened this issue Jul 8, 2021 · 1 comment
Open

Question on table 1 in ACL 2021 paper #6

jin8 opened this issue Jul 8, 2021 · 1 comment

Comments

@jin8
Copy link

jin8 commented Jul 8, 2021

Hi, I have a question on the dataset.
In section 4.1 setups, it mentions that for unsupervised experiment setting, unlabled texts from STS12-16 + STSb + SICK-R are used for training.
I have looked through the dataset files. The number of samples of unlabeled text that I have found in STS16 (last row) does not match to your table 1.
I found 8002.
In the paper, I was not able to find a mention on expanding the size of dataset by concatenating the given dataset, but the numbers make sense when I doubled the labeled (train/valid/test) samples and unlabeled samples.
Did you doubled the size of the train/valid/test samples?
And did it hurt your performance if it is not doubled?

Thank you,
Jin

@jin8 jin8 changed the title Question on table 1. Question on table 1 in ACL 2021 paper Jul 8, 2021
@yym6472
Copy link
Owner

yym6472 commented Jul 8, 2021

Hi, I don't know where do you get your dataset files, but we obtain all STS datasets through the SentEval toolkit, and you can also download them using the script data/get_transfer_data.bash.

For STS15 and STS16, we notice that the original datasets contain many unannotated sentence pairs (i.e. the paired texts are provided but the similarity score is missing). We also make use of the unlabeled texts from those unannotated samples to fine-tune our model in the unsupervised experiments (I wonder if this could be the reason why the numbers mismatch). For STS16, there are 9183 text pairs provided in total, resulting in 2 * 9183 = 18366 unlabeled texts (since each text pair contains two sentences). However, there are only 1379 text pairs with annotations among them, and we use these 1379 samples to test the trained model.

We did not conduct experiments that only sample one sentence from the text pairs, but I guess the results wouldn't be hurt too much. As we show in the few-shot experiments (Figure 6), the performance keeps comparable when training on 10000 texts (about 11% of the full dataset).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants