-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support sft training on d2l #100
Conversation
execute with "python -m example.rlhf.supervised_finetuning_d2l"
"from nltk.tokenize import word_tokenize\n", | ||
"\n", | ||
"# from openai import openai_object\n", | ||
"openai.api_key = \"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\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove openai.api_key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please help describe why this is called auto-rater.ipynb, it looks like this file is used for generating QA dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script for auto evaluation is incomplete. It also contains redundant code from previous QA generation code. Since the latest version of pykoi/uniflow already have auto rater, shall we remove my related commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename this file to supervised_finetuning_demo_d2l.py
to indicate this is a demo file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
# run supervised finetuning | ||
from peft import LoraConfig | ||
config = RLHFConfig(base_model_path="mistralai/Mistral-7B-Instruct-v0.1", | ||
dataset_type="local_csv", dataset_name="data/chapter22_trnvalfromseed_data_processed.csv", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: I do not see this file in the data folder? Is it missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I should add these data files.
r=512, | ||
lora_alpha=1024, | ||
lora_dropout=0.05, | ||
target_modules=["q_proj","k_proj","v_proj","o_proj",], # "gate_proj","up_proj","down_proj",], #"lm_head",], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: what is the target_modules
parameter used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It specifies the modules (e.g., which components in the attention layers) to be updated when we using peft training.
|
||
batch["labels"] = labels | ||
|
||
return batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a new line at the end of the file. If you have not done so, please setup your dev environment following https://www.notion.so/goldpiggy/Python-Linter-and-formatter-Setup-30fb3b81f0904af889832e4c697c5ec9?pvs=4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I resolved it and ran pylint on other files as well.
self.model.resize_token_embeddings(len(self.tokenizer)) | ||
|
||
# dh: try the customized data collator that only predicts the answer part | ||
data_collator = DataCollatorForCompletionOnlyLM( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: shall we make this configurable to avoid breaking running the code in the old way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
packing=True, | ||
data_collator=data_collator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: could you please help explain in the PR description why we added this data_collator
while we do not need this before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this that this is for training the instruction following objective by masking out the query instead of the casual language model objective for only the next token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args): | |||
train_dataset = ConstantLengthDataset( | |||
tokenizer, | |||
dataset["train"], | |||
formatting_func=self.prepare_sample_text, | |||
formatting_func=self.prepare_d2l_text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: same as my comments above, we should make this configurable to maintain the old functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,45 @@ | |||
"""Demo for the supervised fine tuning. | |||
|
|||
python -m example.rlhf.supervised_finetuning_demo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it should be python -m example.rlhf.supervised_finetuning_demo_d2l.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
from typing import Any, Dict, List, Tuple, Union | ||
from transformers import DataCollatorForLanguageModeling | ||
import numpy as np | ||
class DataCollatorForCompletionOnlyLM(DataCollatorForLanguageModeling): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: in this example, https://huggingface.co/docs/trl/sft_trainer#advanced-usage, it looks like it directly imports from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
.
it looks like DataCollatorForCompletionOnlyLM
is doing what we want to mask out question from the training objective and trl already have an implementation. I am curious regarding why we wrote our version of DataCollatorForCompletionOnlyLM
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is Rachel and Yunfan's customized implementation
packing=True, | ||
data_collator=data_collator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this that this is for training the instruction following objective by masking out the query instead of the casual language model objective for only the next token.
INTRO_BLURB = ( | ||
"Below is an instruction that describes a task. Write a response that appropriately completes the request." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq that does INTRO_BLURB
needed for SFT.
Unless user always put this as a part of their system prompt, I am wondering if user forget to include this as a part of their system prompt. It might hurt the inference performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Yunfei's prompt and I kept it in order to reproduce his result in pykoi. I agree with the case you mentioned.
@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args): | |||
train_dataset = ConstantLengthDataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One caveat here is that ConstantLengthDataset
always prepare your dataset into seq_length
by breaking one coherent c + q + a into multiple data point if len(c + q + a) > seq_length
While DataCollatorForCompletionOnlyLM
implementation for SFTTrainer is to train mask query and train an object for response only.
However, I am a bit confused that your dataset is not prepared to train on response only (mask out query) but still casual langauge model object for next token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the collator mask out the query?
Thanks! I will have a look over the weekend. Have to work on the dpo and post writing. |
out of memory on 24G single gpu during training