[FEATURE]: 长文本的继续预训练 #4974

linchen111 · 2023-10-25T14:53:54Z

请问我要在colossal-llama2-7B上面训练大量的长文本，应该怎么构造数据集？

Issues-translate-bot · 2023-10-25T14:54:08Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Title: [FEATURE]: Continue pre-training of long text

linchen111 · 2023-10-26T01:24:48Z

May I ask how to construct a dataset when I need to train a large amount of long text on colosal-llama2-7B?

Orion-Zheng · 2023-10-30T03:23:28Z

可以参考LLaMA-2-32k以及2023年10月出的LongLoRA（Long-Data-Collection和Long Alpaca）

Issues-translate-bot · 2023-10-30T03:23:40Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

You can refer to LLaMA-2-32k and LongLoRA (Long-Data-Collection and Long Alpaca) released in October 2023

linchen111 added the enhancement New feature or request label Oct 25, 2023

Provide feedback