You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper includes two steps of maximum and minimum, and contains two loss functions. So we have to train distillation.py twice?
What dose Quick start in README try to do?
And could you please provide the pseudo-training samples?
When I run it, I get an error: --config: command not found. But it's already written in the code, why did this error occur?
And also shows the next paragraph, I don't know if I need to do anything:
"Some weights of the model checkpoint at bert-base-uncased were not used when initializing Bert_For_Att_output_MLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
This IS expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."
The text was updated successfully, but these errors were encountered:
I'm sorry for not getting back to you sooner.
Here are the responses:
In distillation.py, we maximize and minimize the model simultaneously. "policy_loss" represents the maximization loss. Thus, you don't have to run the distillation.py twice.
First, run the preprocess.py file to preprocess your own pre-training corpus (such as Wikipedia), then run the distillation.py code
We already upload the dummy datasets to this repository. The "./data/dummy.pickle" file is the preprocessed dataset from the raw datasets ("./raw_data/dummy.txt").
I'm not sure but huggingface version issue can occur. Please install transformers=4.2.2 and try again. Besides, in the case of the checkpoint warning message, you don't have to mind because we don't use the ['cls.seq_relationship.weight', 'cls.seq_relationship.bias'] on our training. It may not give any effect to run our code.
The paper includes two steps of maximum and minimum, and contains two loss functions. So we have to train distillation.py twice?
What dose Quick start in README try to do?
And could you please provide the pseudo-training samples?
When I run it, I get an error: --config: command not found. But it's already written in the code, why did this error occur?
And also shows the next paragraph, I don't know if I need to do anything:
"Some weights of the model checkpoint at bert-base-uncased were not used when initializing Bert_For_Att_output_MLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
This IS expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."
The text was updated successfully, but these errors were encountered: