Modify step3 of RLHF: support using fined-tuned models from step 1 and 2. #50

llauraa23 · 2023-09-10T20:36:51Z

In step3 of RLHF, support using policy model trained in step1, and reward model trained in step2.
Due to limited memory with single GPU training, use dolly-v2-3b for base models in step 1 and 2.

trained in step1, and reward model trained in step2. Due to limited memory with single GPU training, use dolly-v2-3b for base models in step 1 and 2.

goldmermaid · 2023-09-10T20:47:15Z

example/rlhf/supervised_finetuning_demo.py

@@ -23,5 +23,6 @@

 # run supervised finetuning
 config = pykoi.RLHFConfig(base_model_path="elinas/llama-7b-hf-transformers-4.29", dataset_type="local_db")


update the code to

config = pykoi.RLHFConfig(base_model_path="databricks/dolly-v2-3b", dataset_type="local_db")

goldmermaid · 2023-09-10T20:51:07Z

example/rlhf/demo_rw_finetuning.py

@@ -24,5 +24,6 @@
 # run reward model finetuning
 # config = pykoi.RLHFConfig(dataset_type="local_db")
 config = pykoi.RLHFConfig()
+config.base_model_path = "databricks/dolly-v2-3b"


update to

config.reward_model_path = "databricks/dolly-v2-3b"

goldmermaid

LGTM!

llauraa23 added 2 commits September 10, 2023 12:17

In step3 of RLHF, support using policy model

2f42914

trained in step1, and reward model trained in step2. Due to limited memory with single GPU training, use dolly-v2-3b for base models in step 1 and 2.

merge conflict

c48ae85

goldmermaid reviewed Sep 10, 2023

View reviewed changes

resolve comment on config model path.

6896163

goldmermaid reviewed Sep 10, 2023

View reviewed changes

goldmermaid merged commit ef820de into CambioML:main Sep 10, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify step3 of RLHF: support using fined-tuned models from step 1 and 2. #50

Modify step3 of RLHF: support using fined-tuned models from step 1 and 2. #50

llauraa23 commented Sep 10, 2023

goldmermaid Sep 10, 2023

goldmermaid Sep 10, 2023

goldmermaid left a comment

		@@ -23,5 +23,6 @@

		# run supervised finetuning
		config = pykoi.RLHFConfig(base_model_path="elinas/llama-7b-hf-transformers-4.29", dataset_type="local_db")

Modify step3 of RLHF: support using fined-tuned models from step 1 and 2. #50

Modify step3 of RLHF: support using fined-tuned models from step 1 and 2. #50

Conversation

llauraa23 commented Sep 10, 2023

goldmermaid Sep 10, 2023

Choose a reason for hiding this comment

goldmermaid Sep 10, 2023

Choose a reason for hiding this comment

goldmermaid left a comment

Choose a reason for hiding this comment