Train on completions only by fixing the collator inquiry #1396

hessaAlawwad · 2024-12-07T11:15:49Z

Hello,

I was wondering if I would be able to use the DataCollatorForCompletionOnlyLM to train Llama 3.2 vision model on the generated prompts only?
Something like passing a response template and the tokenizer in this code:

response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

I see that in the provided code they are using data_collator = UnslothVisionDataCollator(model, tokenizer) and indicating it is a must use. So can I see it and edit to serve my purpose of training which is computing the loss only on the generated token?

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-12-12T09:57:32Z

Apologies on the delay! Oh you're better off using our conversational notebook which masks instructions out - https://colab.research.google.com/drive/1T5-zKWM_5OD21QHwXHiV9ixTRR7k3iB9?usp=sharing

from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on completions only by fixing the collator inquiry #1396

Train on completions only by fixing the collator inquiry #1396

hessaAlawwad commented Dec 7, 2024

danielhanchen commented Dec 12, 2024

Train on completions only by fixing the collator inquiry #1396

Train on completions only by fixing the collator inquiry #1396

Comments

hessaAlawwad commented Dec 7, 2024

danielhanchen commented Dec 12, 2024