-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]: 长文本的继续预训练 #4974
Comments
Title: [FEATURE]: Continue pre-training of long text |
May I ask how to construct a dataset when I need to train a large amount of long text on colosal-llama2-7B? |
可以参考LLaMA-2-32k以及2023年10月出的LongLoRA(Long-Data-Collection和Long Alpaca) |
You can refer to LLaMA-2-32k and LongLoRA (Long-Data-Collection and Long Alpaca) released in October 2023 |
Describe the feature
请问我要在colossal-llama2-7B上面训练大量的长文本,应该怎么构造数据集?
The text was updated successfully, but these errors were encountered: