-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update from pytorch-transformers to transformers library #61
Conversation
Thanks! Did you do anything to verify that this doesn't effect metrics or otherwise break? |
I ran it on a smaller subset of the dataset and ran the interact script, it seemed to be fine. I'm currently training it on the full dataset. It'll take a while on my vm, but I'll report back once that finishes and I run the ConvAI2 evaluation scripts! |
@sshleifer I trained with the parameters:
|
Do we need anything else to verify metrics? |
Are those metrics ~equivalent to those in the preview here? Sorry I'm being so lazy, mostly focused on main repo :) |
No worries, the tensorboard metrics are pretty close:
results from my PR
|
👍 |
@acarrera94 were you able to test these changes with |
When I try to interact with a model trained using
I think the |
Good question, I didn’t have enough memory to train gpt2 on my vm, but if you have a checkpoint I can download id be happy to take a look at it |
@acarrera94 you don't need to be able to train GPT-2 to figure out that this is a bug. Do something like this in your Python interpreter:
Clearly, the logits is a 3-dimensional tensor for cc @sshleifer since I think I remember you mentioning in one of your past pull requests that you were unable to interact with GPT-2 in the code as it currently stands -- UPDATE: I found it, this was the one where you mentioned this: #29 also cc @KasparPeterson since you were the one who introduced this fix in the first place in this pull request: #6 I think some changes were made to |
updated dependencies and imports to use
transformers
.updated
ignore_index
and ignored values in tensors to be -100, same as default intransformers
andpytorch
.updated checkpoint save to use correct path #60