Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在现有的预训练模型的基础上用自己的数据继续预训练的流程 #9

Open
652994331 opened this issue Nov 17, 2020 · 0 comments

Comments

@652994331
Copy link

您好,可以说一下 使用自己的数据 在比如哈工大的中文预训练模型的基础上继续预训练的流程是什么样的吗, 按照自己的理解使用了那个prepare_lm_data_ngrams.py 在 data-dir/corpus/train 下产生了 很多 electra_file_x.json 和 electra_file_x_metrics.json 其中 electra_file_x 是空文件, metric 文件显示 example:0 max_sequence_length:128. ps:我的自己的原始预训练文件 txt格式 房子啊 data-dir/corpus/train目录下 是bert 模型的预训练数据格式. 谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant