SFT for D2L. DPO #101

llauraa23 · 2024-01-24T08:17:59Z

Implementation of sft training that also supports d2l application. Implementation of inference evaluation of fine-tuned model. Implementation to support DPO training (on-going).

execute with "python -m example.rlhf.supervised_finetuning_d2l"

Temporarily use all entries in the dataset as training dataset (i.e., no eval)

…ers into a csv file

…ction, whether to disable evalution configurable

CambioML · 2024-01-26T04:38:47Z

pykoi/rlhf/rl_finetuning_dpo.py

@@ -0,0 +1,263 @@
+# The code is adapted from Huggingface.


rename to dpo.py

llauraa23 and others added 11 commits December 4, 2023 23:34

auto rater, sample data and prompt engineering

eb4878f

Merge branch 'CambioML:main' into main

5c9cb03

merge conflict

6e9f880

support supervised fine tuning on d2l.

1ab4f32

execute with "python -m example.rlhf.supervised_finetuning_d2l"

resolve merge conflicts on gpu96

9daf69c

Merge branch 'main' of https://github.com/llauraa23/pykoi

878f44e

support training multiple epochs in sft.

1428aba

Temporarily use all entries in the dataset as training dataset (i.e., no eval)

implment evaluation of fine-tuned models with pykoi pipeline

880fefc

When evaluating the SFT model, store the questions and generated answ…

58f946c

…ers into a csv file

code cleanup for d2l demo. In SFT, make data collator, formatting fun…

beeedaa

…ction, whether to disable evalution configurable

DPO training on d2l data. Version 0

04b9fa5

llauraa23 requested a review from goldmermaid as a code owner January 24, 2024 08:18

CambioML reviewed Jan 26, 2024

View reviewed changes

pykoi/rlhf/rl_finetuning_dpo.py

@@ -0,0 +1,263 @@

# The code is adapted from Huggingface.

Copy link

Collaborator

CambioML Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to dpo.py

CambioML mentioned this pull request Jan 26, 2024

support sft training on d2l #100

Closed

CambioML closed this Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFT for D2L. DPO #101

SFT for D2L. DPO #101

llauraa23 commented Jan 24, 2024 •

edited

Loading

CambioML Jan 26, 2024

SFT for D2L. DPO #101

SFT for D2L. DPO #101

Conversation

llauraa23 commented Jan 24, 2024 • edited Loading

CambioML Jan 26, 2024

Choose a reason for hiding this comment

llauraa23 commented Jan 24, 2024 •

edited

Loading