Skip to content

Commit

Permalink
add lm-eval-harness colab
Browse files Browse the repository at this point in the history
  • Loading branch information
ayulockin committed Feb 8, 2024
1 parent 25bd733 commit b00969a
Showing 1 changed file with 153 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "d6YuliF8qfsy"
},
"source": [
"<img src=\"https://wandb.me/logo-im-png\" width=\"400\" alt=\"Weights & Biases\" />\n",
"<!--- @wandbcode{lm-eval-harness} -->\n",
"\n",
"# Visualizing Results in Weights and Biases\n",
"\n",
"<!--- @wandbcode{lm-eval-harness-colab} -->\n",
"\n",
"With the Weights and Biases integration, you can now spend more time extracting deeper insights into your evaluation results. The integration is designed to streamline the process of logging and visualizing experiment results using the Weights & Biases (W&B) platform.\n",
"\n",
"The integration provide functionalities\n",
"\n",
"- to automatically log the evaluation results,\n",
"- log the samples as W&B Tables for easy visualization,\n",
"- log the `results.json` file as an artifact for version control,\n",
"- log the `<task_name>_eval_samples.json` file if the samples are logged,\n",
"- generate a comprehensive report for analysis and visualization with all the important metric,\n",
"- log task and cli configs,\n",
"- and more out of the box like the command used to run the evaluation, GPU/CPU counts, timestamp, etc.\n",
"\n",
"The integration is super easy to use with the eval harness. Let's see how!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tJRBdlTrqgVg"
},
"outputs": [],
"source": [
"# Install this project if you did not already have it.\n",
"# This is all that is needed to be installed to start using Weights and Biases\n",
"\n",
"!git clone -b wandb-logging https://github.com/ayulockin/lm-evaluation-harness\n",
"%cd lm-evaluation-harness\n",
"!pip -qq install -e .[wandb]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Eb4eGgJIxCki"
},
"outputs": [],
"source": [
"# Getting an error `no module named transformers.cache_utils` while running eval.\n",
"# Installing transformers from the repo is solving the issue.\n",
"# Ref: https://huggingface.co/DiscoResearch/mixtral-7b-8expert/discussions/9#6576edcd0370e52e3b2c0620\n",
"!pip uninstall -y transformers\n",
"!pip install git+https://github.com/huggingface/transformers"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5c-t1T2vrxQw"
},
"source": [
"# Run the Eval Harness\n",
"\n",
"Run the eval harness as usual with a `wandb_args` flag. This flag is used to provide arguments for initializing a wandb run ([wandb.init](https://docs.wandb.ai/ref/python/init)) as comma separated string arguments.\n",
"\n",
"If `wandb_args` flag is used, the metrics and all other goodness will be automatically logged to Weights and Biases. In the stdout, you will find the link to the W&B run page as well as link to the generated report."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YkeORhFXry8o"
},
"source": [
"## Set your API Key\n",
"\n",
"Before you can use W&B, you need to authenticate your machine with an authentication key. Visit https://wandb.ai/authorize to get one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "E7YTd5OUrpnO"
},
"outputs": [],
"source": [
"import wandb\n",
"wandb.login()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_L5yCLtGt5Tu"
},
"source": [
"> Note that if you are using command line you can simply authrnticate your machine by doing `wandb login` in your terminal. For more info check out the [documentation](https://docs.wandb.ai/quickstart#2-log-in-to-wb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R4wGEB7ot7OZ"
},
"source": [
"## Run and log to W&B"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fkMGw712sg8f"
},
"outputs": [],
"source": [
"!lm_eval \\\n",
" --model hf \\\n",
" --model_args pretrained=microsoft/phi-2,trust_remote_code=True \\\n",
" --tasks hellaswag,mmlu_abstract_algebra \\\n",
" --device cuda:0 \\\n",
" --batch_size 8 \\\n",
" --output_path output/phi-2 \\\n",
" --limit 10 \\\n",
" --wandb_args project=lm-eval-harness-integration \\\n",
" --log_samples"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "V100",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit b00969a

Please sign in to comment.