-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plan to publish the paper? #3
Comments
Umm, I don't mind a paper but I'm thinking is this paper worthy?? Also if we can make a paper, how to go about it? |
Maybe this is Lack of creativity a little. |
Yes, and the num_train_epochs is 2. |
I really thought of coming back to this and improving this further, but became a bit lazy. |
tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws") i don't see the code related to save the tokenizer? |
Yeah its the same as the t5-base tokenizer. |
If you are using a different dataset, you have to change the path to it in the T5FineTuner class, in the methods train_dataloader and val_dataloader |
No i dont think thats an issue. It should be self only. |
Let me figure this out. Right now I'm working on another project, it'll kind of take some time for me to get into this. |
Are you sure it is base? When I use the t5-base tokenizer I get error:
When I use t5-small tokenizer it works fine. |
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("t5-base") # t5-base works
model = T5ForConditionalGeneration.from_pretrained("Vamsi/T5_Paraphrase_Paws")
sentence = "This is something which i cannot understand at all"
text = "paraphrase: " + sentence
encoding = tokenizer(text,padding=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"], encoding["attention_mask"]
outputs = model.generate(
input_ids=input_ids, attention_mask=attention_masks,
max_length=256,
do_sample=True,
top_k=200,
top_p=0.95,
early_stopping=True,
num_return_sequences=5
)
for output in outputs:
line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
print(line) |
This is the great work for appling pre-train model to the task of paraphrase!
Do you have the plan to publish a paper?
The text was updated successfully, but these errors were encountered: