-
Notifications
You must be signed in to change notification settings - Fork 37
2.3.17 Satellite: lm evaluation harness
Handle:
lmeval
URL: -
This project provides a unified framework to test generative language models on a large number of different evaluation tasks.
# [Optional] pre-build the image
harbor build lmeval
Note
You can use either lmeval
or lm_eval
alias for invoking the service. These docs will use lmeval
.
# [Optional] see the help
harbor lmeval --help
# [Optional] see available tasks
harbor lmeval --tasks list
After the configuration is complete (see below), run evals with:
harbor lmeval --tasks gsm8k --limit 10
Harbor's lmeval
service is mainly intended to be run against OpenAI-compatible APIs, rather than running models on its own, although the latter is also possible.
In other words, the workflows are optimised for running lm_eval
with the --model local-completions
, which is also the default.
# local-completions is the default
# See other supported "types":
# https://github.com/EleutherAI/lm-evaluation-harness/blob/main/README.md#model-apis-and-inference-servers
harbor lmeval type
# Get/set the model and the API url
harbor lmeval model
harbor lmeval model meta-llama/Meta-Llama-3-8B-Instruct
harbor lmeval api
harbor lmeval api $(harbor url -i vllm)
Tip
harbor url
works when a given service is running, -i
is for retrieving the URL to be used within the service containers.
Above commands are aliases for the harbor lmeval args get/set model/base_url
respectively. You can manage the rest of the --model_args
in a dictionary format as follows:
# See the help on working with dict args
harbor lmeval args -h
# See current args
harbor lmeval args ls
harbor lmeval args
# Get arg value
harbor lmeval args get model
# Set arg value
harbor lmeval args set model $(harbor vllm model)
Here is a list of sample args from the official docs:
# "model" is the model ID that is sent to the API
harbor lmeval args set model llama3.1:8b
# "base_url" is the URL of the API, from _within_
# the container, use "harbor url -i" to get it
harbor lmeval args set base_url $(harbor url -i ollama)
# "num_concurrent" is the number of concurrent
# requests to make to the completion API
harbor lmeval args set num_concurrent 4
# "tokenized_requests" is a boolean flag that
harbor lmeval args set tokenized_requests False
# "tokenizer" is the name of the tokenizer to use
harbor lmeval args set tokenizer gpt2
Harbor connects cache and results folders from your host to the lmeval
:
# Open cache folder in the file manager
harbor lmeval cache
# Open results folder in the file manager
harbor lmeval results
By default, the results and cache will be specific for every distinct harbor lmeval model
you set.
Note
When running with ollama
, llamacpp
and other services that do not use HuggingFace repo specifier (user/model), you'll need to manually point lmeval
to use the correct tokenizer. Most typically, you can use an official repo of the base model as a tokenizer specifier.
# Start ollama
harbor up
# Pick the model to test
harbor ollama ls
# Configure lmeval
harbor lmeval model llama3.1:8b
harbor lmeval api $(harbor url -i ollama)/v1/completions
harbor lmeval args tokenizer set meta-llama/Meta-Llama-3-8B-Instruct
# Run the eval
harbor lmeval --tasks gsm8k --limit 10
# Start with llamacpp
harbor up llamacpp
# Pick the model to test
curl -s $(harbor url llamacpp)/v1/models | jq -r '.data[].id'
# Configure lmeval
harbor lmeval model <model id>
harbor lmeval api $(harbor url -i llamacpp)/v1/completions
harbor lmeval args tokenizer set <hf repo id>
# Run the eval
harbor lmeval --tasks gsm8k --limit 10