Skip to content

Commit

Permalink
chore: address feedbacks
Browse files Browse the repository at this point in the history
  • Loading branch information
ayulockin committed Nov 27, 2023
1 parent b8cdd49 commit 382e63f
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 34 deletions.
89 changes: 55 additions & 34 deletions colabs/openai/Fine_tune_OpenAI_with_Weights_and_Biases.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional: Fine-tune ChatGPT-3.5\n",
"\n",
"It's always more fun to experiment with your own projects so if you have already used the openai API to fine-tune an OpenAI model, just skip this section.\n",
"\n",
"Otherwise let's fine-tune ChatGPT-3.5 on a legal dataset!"
"In this colab notebook, we will be finetuning GPT 3.5 model on the [LegalBench](https://hazyresearch.stanford.edu/legalbench/) dataset. The notebook will show how to prepare and validate the dataset, upload it to OpenAI and setup a fine-tune job. Finally, the notebook shows how to use the `WandbLogger`."
]
},
{
Expand Down Expand Up @@ -101,7 +97,7 @@
"source": [
"Initialize the OpenAI client\n",
"\n",
"You can add the api key to your environment variable by doing `export OPENAI_API_KEY='YOUR_API_KEY'` in your terminal."
"You can add the api key to your environment variable by doing `os.environ['OPENAI_API_KEY'] = \"sk-....\"`."
]
},
{
Expand All @@ -110,7 +106,9 @@
"metadata": {},
"outputs": [],
"source": [
"client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])"
"# Uncomment the line below and set your OpenAI API Key.\n",
"# os.environ['OPENAI_API_KEY'] = \"sk-....\" \n",
"client = OpenAI()"
]
},
{
Expand All @@ -126,7 +124,9 @@
"metadata": {},
"outputs": [],
"source": [
"from wandb.integration.openai import WandbLogger"
"from wandb.integration.openai import WandbLogger\n",
"\n",
"WANDB_PROJECT = \"OpenAI-Fine-Tune\""
]
},
{
Expand All @@ -142,7 +142,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -594,7 +594,7 @@
"metadata": {},
"outputs": [],
"source": [
"WandbLogger.sync(fine_tune_job_id=ft_job_id, openai_client=client)"
"WandbLogger.sync(fine_tune_job_id=ft_job_id, project=WANDB_PROJECT)"
]
},
{
Expand All @@ -613,23 +613,23 @@
"source": [
"The fine-tuning job is now successfully synced to Weights and Biases. Click on the URL above to open the [W&B run page](https://docs.wandb.ai/guides/app/pages/run-page). The following will be logged to W&B:\n",
"\n",
"- Training and validation metrics\n",
"#### Training and validation metrics\n",
"\n",
"![image.png](assets/metrics.png)\n",
"\n",
"- Training and validation data as W&B Tables\n",
"#### Training and validation data as W&B Tables\n",
"\n",
"![image.png](assets/datatable.png)\n",
"\n",
"- The data and model artifacts for version control (go to the overview tab)\n",
"#### The data and model artifacts for version control (go to the overview tab)\n",
"\n",
"![image.png](assets/artifacts.png)\n",
"\n",
"- The configuration and hyperparameters (go to the overview tab)\n",
"#### The configuration and hyperparameters (go to the overview tab)\n",
"\n",
"![image.png](assets/configs.png)\n",
"\n",
"- The data and model DAG\n",
"#### The data and model DAG\n",
"\n",
"![image.png](assets/dag.png)"
]
Expand All @@ -650,20 +650,39 @@
"Let's generate a few inference samples and log them to W&B and see how the performance compares to a baseline ChatGPT-3.5 model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be evaluating using the validation dataset. In the overview tab of the run page, find the \"validation_files\" in the Artifact Inputs section. Clicking on it will take you to the artifacts page. Copy the artifact URI (full name) as shown in the image below.\n",
"\n",
"![image](assets/select_artifact_uri.png)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"WANDB_PROJECT = \"OpenAI-Fine-Tune\""
"run = wandb.init(\n",
" project=WANDB_PROJECT,\n",
" job_type='eval'\n",
")\n",
"\n",
"VALIDATION_FILE_ARTIFACT_URI = '<entity>/<project>/valid-file-*' # REPLACE THIS WITH YOUR OWN ARTIFACT URI\n",
"\n",
"artifact_valid = run.use_artifact(\n",
" VALIDATION_FILE_ARTIFACT_URI,\n",
" type='validation_files'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be evaluating using the validation dataset. In the overview tab of the run page, find the \"validation_files\" in the Artifact Inputs section. The code snippet below, download the logged validation data and prepare a pandas dataframe from it."
"The code snippet below, download the logged validation data and prepare a pandas dataframe from it."
]
},
{
Expand All @@ -672,16 +691,6 @@
"metadata": {},
"outputs": [],
"source": [
"run = wandb.init(\n",
" project=WANDB_PROJECT,\n",
" job_type='eval'\n",
")\n",
"\n",
"artifact_valid = run.use_artifact(\n",
" 'ayut/OpenAI-Fine-Tune/valid-file-z2xYlp21ljsfc7mXBcX1Jimg:v0', # REPLACE THIS WITH YOUR OWN ARTIFACT URI\n",
" type='validation_files'\n",
")\n",
"\n",
"artifact_valid_path = artifact_valid.download()\n",
"print(\"Downloaded the validation data at: \", artifact_valid_path)\n",
"\n",
Expand All @@ -707,7 +716,7 @@
"{\"messages\": [{\"role\": \"system\", \"content\": \"some system prompt\"}, {\"role\": \"user\", \"content\": \"some user prompt\"}, {\"role\": \"assistant\", \"content\": \"completion text\"}]}\n",
"```\n",
"\n",
"For evaluation we don't need to pack the `{\"role\": \"assistant\", \"content\": \"completition text\"}` in `messages`."
"For evaluation we don't need to pack the `{\"role\": \"assistant\", \"content\": \"completition text\"}` in `messages` as this is meant to be generated by GPT 3.5."
]
},
{
Expand Down Expand Up @@ -735,19 +744,31 @@
"source": [
"### Run evaluation on the Fine-Tuned Model\n",
"\n",
"Next up we will get the fine-tuned model's id from the logged `model_metadata`. In the overview tab of the run page, find the \"model\" in the Artifact Outputs section."
"Next up we will get the fine-tuned model's id from the logged `model_metadata`. In the overview tab of the run page, find the \"model\" in the Artifact Outputs section. Clicking on it will take you to the artifacts page. Copy the artifact URI (full name) as shown in the image below.\n",
"\n",
"![image](assets/select_model_artifact.png)"
]
},
{
"cell_type": "code",
"execution_count": 116,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"MODEL_ARTIFACT_URI = '<entity>/<project>/model_metadata:v*' # REPLACE THIS WITH YOUR OWN ARTIFACT URI\n",
"\n",
"model_artifact = run.use_artifact(\n",
" 'ayut/OpenAI-Fine-Tune/model_metadata:v5', # REPLACE THIS WITH YOUR OWN ARTIFACT URI\n",
" MODEL_ARTIFACT_URI,\n",
" type='model'\n",
")\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [],
"source": [
"model_metadata_path = model_artifact.download()\n",
"print(\"Downloaded the validation data at: \", model_metadata_path)\n",
"\n",
Expand All @@ -765,7 +786,7 @@
"outputs": [],
"source": [
"fine_tuned_model = model_metadata[\"fine_tuned_model\"]\n",
"client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])"
"client = OpenAI()"
]
},
{
Expand Down
Binary file added colabs/openai/assets/select_artifact_uri.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added colabs/openai/assets/select_model_artifact.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 382e63f

Please sign in to comment.