-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continue pretraining an instruction-fine-tuned LLM model like Qwen2.5-7B-Instruct. #1405
Comments
Not sure if I understand this correctly, but I have fine-tuned a lot of models, both base and instruct versions with no problems. The quality is actually better than what I got when tuning Gemini flash in Vertex AI for my use case. The only concern is that your goal is to teach the model new information, which would require a lot of data and high LoRA rank number to avoid overfitting. Still much much better than a full fine-tune. If your dataset is not HUGE, you can use a larger model with the "raw text" you have to generate an instruction dataset and then train on that directly. I have done something like this before, I wanted my model to learn Deno 2 since it's new and all LLMs we have don't know about it, so I scraped the documentation, the blog posts, and some files from their Github, then used Claude 3.5 Haiku to generate a list of prompts, and used Sonnet to answer them, both with context caching to reduce the cost and latency. The whole process was less than $5. If the text is larger than 200k tokens and won't fit the context window for Claude, you can use Gemini 1.5 Pro which supports up to two million tokens and also supports caching. It's much cheaper to use a good model with context caching than running your own. There are even simpler methods with fewer steps and doesn't require using a huge model like Sonnet or Gemini but the quality of the dataset and time saved was not worth the extra code i would need to write. |
@omarbadran whats the metric you use to understand if the model if learning correctly and not overfitting. I fine-tuned it on another curated dataset of 30k samples which did improve the accuracy but it still wasn't great. This was with both Unsloth and Llamafactory. Did you pre-train your models or fine-tune on the labelled data? |
@geo47 You can do it on instruct models, but I would advise against it if it's raw text - a trick is to at the end do @omarbadran Fair points - if the dataset is small, generally the best advice is to merge datasets from the open source world, or create some synthetic data. Large datasets are generally better (>10K) @Tejaswgupta Did you use |
Hello,
I would like to know if it's possible to continue pretraining an LLM model on raw text that is fine-tuned on instructions like (Qwen2.5-7B-Instruct).
Would there be any effect regarding its performance in understanding the instructions?
The best strategy that I am considering is to continue pre-training instruction fine-tuned version of an LLM on raw text and then fine-tune on instruction task to refresh the instruction knowledge.
Please guide! Thanks
The text was updated successfully, but these errors were encountered: