Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions that are not clear when they are reproduced #1

Open
xbzjsj opened this issue May 5, 2023 · 1 comment
Open

Some questions that are not clear when they are reproduced #1

xbzjsj opened this issue May 5, 2023 · 1 comment

Comments

@xbzjsj
Copy link

xbzjsj commented May 5, 2023

  • The paper includes two steps of maximum and minimum, and contains two loss functions. So we have to train distillation.py twice?

  • What dose Quick start in README try to do?

  • And could you please provide the pseudo-training samples?

  • When I run it, I get an error: --config: command not found. But it's already written in the code, why did this error occur?
    And also shows the next paragraph, I don't know if I need to do anything:
    "Some weights of the model checkpoint at bert-base-uncased were not used when initializing Bert_For_Att_output_MLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
    This IS expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    This IS NOT expected if you are initializing Bert_For_Att_output_MLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)."

@JunhoKim94
Copy link
Owner

I'm sorry for not getting back to you sooner.
Here are the responses:

  • In distillation.py, we maximize and minimize the model simultaneously. "policy_loss" represents the maximization loss. Thus, you don't have to run the distillation.py twice.
  • First, run the preprocess.py file to preprocess your own pre-training corpus (such as Wikipedia), then run the distillation.py code
  • We already upload the dummy datasets to this repository. The "./data/dummy.pickle" file is the preprocessed dataset from the raw datasets ("./raw_data/dummy.txt").
  • I'm not sure but huggingface version issue can occur. Please install transformers=4.2.2 and try again. Besides, in the case of the checkpoint warning message, you don't have to mind because we don't use the ['cls.seq_relationship.weight', 'cls.seq_relationship.bias'] on our training. It may not give any effect to run our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants