-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding the BiLSTM baseline model stated in the PAWS paper #3
Comments
Hi, sorry for the delay. Could you please specify which number in the paper you would like to compare to, and whether you got a lower or a higher accuracy number? Regarding our model architecture, it's a standard BiLSTM with dropout=0.2, hidden size = 256, activation = relu, using the first/last state vec of the forward/backward LSTM, and Glove embedding. What's your model configuration? |
I am currently using a self trained embedding, BiLSTM, last state vec, concatenate, and dense as the last layer. If what you stated is the case, where does cosine similarity come in? I am comparing my model with what's stated on page 8 of the paper, where BiLSTM achieved a 86.3 acc and 91.6 auc. |
Just to be more precise, we take the state at the last token for the forward LSTM, and the state at the first token for the backward LSTM. Concatenate the two states and add a dense layer to project them to the required dimension (256). |
Thanks! |
If I read the PAWS paper correctly, it stated that BiLSTM+cosine similarity is one of the baseline models that was used to evaluate the PAWS dataset. I tried to reenact the experiment with a BiLSTM+cosine similarity model I designed, but the accuracy is still quite far from the accuracy stated in the paper. Is there somewhere to see how you guys defined the BiLSTM+cosine similarity model? It would be really helpful on my current study regarding paraphrase identification. Thanks in advance!
The text was updated successfully, but these errors were encountered: