Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from pytorch-transformers to transformers library #61

Merged
merged 3 commits into from
Feb 27, 2020

Conversation

andr-ec
Copy link
Contributor

@andr-ec andr-ec commented Feb 21, 2020

updated dependencies and imports to use transformers.
updated ignore_index and ignored values in tensors to be -100, same as default in transformers and pytorch.
updated checkpoint save to use correct path #60

@andr-ec andr-ec requested a review from sshleifer February 21, 2020 18:57
@sshleifer
Copy link
Contributor

Thanks! Did you do anything to verify that this doesn't effect metrics or otherwise break?

@andr-ec
Copy link
Contributor Author

andr-ec commented Feb 21, 2020

I ran it on a smaller subset of the dataset and ran the interact script, it seemed to be fine. I'm currently training it on the full dataset. It'll take a while on my vm, but I'll report back once that finishes and I run the ConvAI2 evaluation scripts!

@andr-ec
Copy link
Contributor Author

andr-ec commented Feb 22, 2020

@sshleifer I trained with the parameters:
--model="openai-gpt" --device="cuda" --n_epochs=1
the evaluation script finished with

[ Finished evaluating tasks ['convai2:self'] using datatype valid ]
{'exs': 7801, 'hits@1': 0.75, 'hits@5': 0.961, 'hits@10': 0.992, 'hits@100': 1.0}
============================
FINAL Hits@1: 0.75

@andr-ec
Copy link
Contributor Author

andr-ec commented Feb 24, 2020

Do we need anything else to verify metrics?

@sshleifer
Copy link
Contributor

sshleifer commented Feb 25, 2020

Are those metrics ~equivalent to those in the preview here?

#30

Sorry I'm being so lazy, mostly focused on main repo :)

@andr-ec
Copy link
Contributor Author

andr-ec commented Feb 27, 2020

No worries, the tensorboard metrics are pretty close:
results from your PR

{'accuracy': 0.7466991411357519,
'average_accuracy': 0.74669914113575a19,
'average_nll': 2.6821035040007972,
'average_ppl': 14.615805388160778,
'nll': 2.6821035040007972}

results from my PR

{'accuracy': 0.7496474981307983,
'average_accuracy': 0.7496474981307983,
'average_nll': 2.6389272212982178,
'average_ppl': 13.99817943572998,
'nll': 2.6389272212982178}

@sshleifer sshleifer merged commit 16074b2 into huggingface:master Feb 27, 2020
@julien-c
Copy link
Member

👍

@g-karthik
Copy link

@acarrera94 were you able to test these changes with gpt2? Specifically, did you try testing interact.py with gpt2?

@g-karthik
Copy link

g-karthik commented Feb 27, 2020

When I try to interact with a model trained using gpt2, I get the following error:

    logits = logits[0, -1, :] / args.temperature
IndexError: too many indices for tensor of dimension 2

I think the logits = logits[0] fix is not applicable with the latest transformers and is just an artifact of pytorch_pretrained_bert, and hence can be removed.

@acarrera94 @julien-c @thomwolf

@andr-ec
Copy link
Contributor Author

andr-ec commented Feb 27, 2020

Good question, I didn’t have enough memory to train gpt2 on my vm, but if you have a checkpoint I can download id be happy to take a look at it

@g-karthik
Copy link

g-karthik commented Feb 27, 2020

@acarrera94 you don't need to be able to train GPT-2 to figure out that this is a bug.

Do something like this in your Python interpreter:

>>> import pytorch_pretrained_bert
>>> from pytorch_pretrained_bert import GPT2LMHeadModel
>>> print(pytorch_pretrained_bert.__version__)
0.6.2
>>> model = GPT2LMHeadModel.from_pretrained("gpt2")
>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> input = "hello how are you"
>>> input_ids = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(input))
>>> import torch
>>> input_ids = torch.tensor(input_ids)
>>> input_ids = input_ids.view(1,-1)
>>> outputs = model(input_ids)
>>> outputs[0].shape
torch.Size([1, 4, 50257])
>>> import transformers
>>> from transformers import GPT2LMHeadModel
>>> print(transformers.__version__)
2.3.0
>>> model = GPT2LMHeadModel.from_pretrained("gpt2")
>>> outputs = model(input_ids)
>>> outputs[0].shape
torch.Size([1, 4, 50257])

Clearly, the logits is a 3-dimensional tensor for gpt2. So the logits = logits[0] fix won't work for gpt2, even with pytorch_pretrained_bert 0.6.2.

cc @sshleifer since I think I remember you mentioning in one of your past pull requests that you were unable to interact with GPT-2 in the code as it currently stands -- UPDATE: I found it, this was the one where you mentioned this: #29

also cc @KasparPeterson since you were the one who introduced this fix in the first place in this pull request: #6

I think some changes were made to pytorch_pretrained_bert 0.6.2 after @KasparPeterson introduced this fix, rendering it unnecessary. Basically the logits used to be a 4-dimensional tensor for gpt2 before, hence making this fix necessary. @thomwolf can confirm since he approved Kaspar's pull request here.

gorkemgoknar pushed a commit to gorkemgoknar/transfer-learning-conv-ai that referenced this pull request Dec 21, 2020
)

* updated dependencies, updated ignore_index and ignored values in tensors

* removed idea project files

* set transformers library version, updated additional special tokens to list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants