Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support sft training on d2l #100

Closed
wants to merge 3 commits into from
Closed

support sft training on d2l #100

wants to merge 3 commits into from

Conversation

llauraa23
Copy link
Collaborator

out of memory on 24G single gpu during training

@llauraa23 llauraa23 requested a review from goldmermaid as a code owner January 9, 2024 23:56
"from nltk.tokenize import word_tokenize\n",
"\n",
"# from openai import openai_object\n",
"openai.api_key = \"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\""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove openai.api_key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please help describe why this is called auto-rater.ipynb, it looks like this file is used for generating QA dataset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script for auto evaluation is incomplete. It also contains redundant code from previous QA generation code. Since the latest version of pykoi/uniflow already have auto rater, shall we remove my related commits?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename this file to supervised_finetuning_demo_d2l.py to indicate this is a demo file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

# run supervised finetuning
from peft import LoraConfig
config = RLHFConfig(base_model_path="mistralai/Mistral-7B-Instruct-v0.1",
dataset_type="local_csv", dataset_name="data/chapter22_trnvalfromseed_data_processed.csv",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: I do not see this file in the data folder? Is it missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I should add these data files.

r=512,
lora_alpha=1024,
lora_dropout=0.05,
target_modules=["q_proj","k_proj","v_proj","o_proj",], # "gate_proj","up_proj","down_proj",], #"lm_head",],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: what is the target_modules parameter used for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It specifies the modules (e.g., which components in the attention layers) to be updated when we using peft training.


batch["labels"] = labels

return batch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a new line at the end of the file. If you have not done so, please setup your dev environment following https://www.notion.so/goldpiggy/Python-Linter-and-formatter-Setup-30fb3b81f0904af889832e4c697c5ec9?pvs=4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I resolved it and ran pylint on other files as well.

self.model.resize_token_embeddings(len(self.tokenizer))

# dh: try the customized data collator that only predicts the answer part
data_collator = DataCollatorForCompletionOnlyLM(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: shall we make this configurable to avoid breaking running the code in the old way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

packing=True,
data_collator=data_collator,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: could you please help explain in the PR description why we added this data_collator while we do not need this before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this that this is for training the instruction following objective by masking out the query instead of the casual language model objective for only the next token.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args):
train_dataset = ConstantLengthDataset(
tokenizer,
dataset["train"],
formatting_func=self.prepare_sample_text,
formatting_func=self.prepare_d2l_text,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: same as my comments above, we should make this configurable to maintain the old functionality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,45 @@
"""Demo for the supervised fine tuning.

python -m example.rlhf.supervised_finetuning_demo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it should be python -m example.rlhf.supervised_finetuning_demo_d2l.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

from typing import Any, Dict, List, Tuple, Union
from transformers import DataCollatorForLanguageModeling
import numpy as np
class DataCollatorForCompletionOnlyLM(DataCollatorForLanguageModeling):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: in this example, https://huggingface.co/docs/trl/sft_trainer#advanced-usage, it looks like it directly imports from trl import SFTTrainer, DataCollatorForCompletionOnlyLM.

it looks like DataCollatorForCompletionOnlyLM is doing what we want to mask out question from the training objective and trl already have an implementation. I am curious regarding why we wrote our version of DataCollatorForCompletionOnlyLM here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is Rachel and Yunfan's customized implementation

packing=True,
data_collator=data_collator,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this that this is for training the instruction following objective by masking out the query instead of the casual language model objective for only the next token.

Comment on lines +189 to +191
INTRO_BLURB = (
"Below is an instruction that describes a task. Write a response that appropriately completes the request."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq that does INTRO_BLURB needed for SFT.

Unless user always put this as a part of their system prompt, I am wondering if user forget to include this as a part of their system prompt. It might hurt the inference performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Yunfei's prompt and I kept it in order to reproduce his result in pykoi. I agree with the case you mentioned.

@@ -208,15 +287,15 @@ def create_datasets(self, tokenizer, args):
train_dataset = ConstantLengthDataset(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat here is that ConstantLengthDataset always prepare your dataset into seq_length by breaking one coherent c + q + a into multiple data point if len(c + q + a) > seq_length

While DataCollatorForCompletionOnlyLM implementation for SFTTrainer is to train mask query and train an object for response only.

However, I am a bit confused that your dataset is not prepared to train on response only (mask out query) but still casual langauge model object for next token.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the collator mask out the query?

@CambioML
Copy link
Collaborator

@llauraa23
Copy link
Collaborator Author

@llauraa23 Take a look at huggingface/trl#1083 (comment) and https://huggingface.co/docs/trl/sft_trainer#advanced-usage

Thanks! I will have a look over the weekend. Have to work on the dpo and post writing.

@CambioML
Copy link
Collaborator

close this PR because it is also in #101. The future work is to use #101 into two PR for SFT and DPO.

@CambioML CambioML closed this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants