Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: 长文本的继续预训练 #4974

Open
linchen111 opened this issue Oct 25, 2023 · 4 comments
Open

[FEATURE]: 长文本的继续预训练 #4974

linchen111 opened this issue Oct 25, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@linchen111
Copy link

Describe the feature

请问我要在colossal-llama2-7B上面训练大量的长文本,应该怎么构造数据集?

@linchen111 linchen111 added the enhancement New feature or request label Oct 25, 2023
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [FEATURE]: Continue pre-training of long text

@linchen111
Copy link
Author

May I ask how to construct a dataset when I need to train a large amount of long text on colosal-llama2-7B?

@Orion-Zheng
Copy link
Contributor

可以参考LLaMA-2-32k以及2023年10月出的LongLoRA(Long-Data-Collection和Long Alpaca)

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


You can refer to LLaMA-2-32k and LongLoRA (Long-Data-Collection and Long Alpaca) released in October 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants