Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify step3 of RLHF: support using fined-tuned models from step 1 and 2. #50

Merged
merged 3 commits into from
Sep 10, 2023

Conversation

llauraa23
Copy link
Collaborator

In step3 of RLHF, support using policy model trained in step1, and reward model trained in step2.
Due to limited memory with single GPU training, use dolly-v2-3b for base models in step 1 and 2.

trained in step1, and reward model trained in step2.
Due to limited memory with single GPU training,
use dolly-v2-3b for base models in step 1 and 2.
@@ -23,5 +23,6 @@

# run supervised finetuning
config = pykoi.RLHFConfig(base_model_path="elinas/llama-7b-hf-transformers-4.29", dataset_type="local_db")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the code to

config = pykoi.RLHFConfig(base_model_path="databricks/dolly-v2-3b", dataset_type="local_db")

@@ -24,5 +24,6 @@
# run reward model finetuning
# config = pykoi.RLHFConfig(dataset_type="local_db")
config = pykoi.RLHFConfig()
config.base_model_path = "databricks/dolly-v2-3b"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to

config.reward_model_path = "databricks/dolly-v2-3b"

Copy link
Collaborator

@goldmermaid goldmermaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@goldmermaid goldmermaid merged commit ef820de into CambioML:main Sep 10, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants