Skip to content

Latest commit

 

History

History
186 lines (184 loc) · 12.2 KB

howto_add_models.md

File metadata and controls

186 lines (184 loc) · 12.2 KB

How to add models to clembench

This guide covers how to add support for any chat-trained model to clembench. Models can be run locally, meaning on the machine that the clembench benchmark is run directly (like currently implemented for Huggingface models), or accessed via remote API (like currently implemented for OpenAI and other proprietary models).

Contents

Overview
Adding a new backend
Adding a model to the model registry
Test the added model

Overview

To add support for a new model, go through the following steps:

  • Check if there is an already implemented backend that can handle the model. Every model needs to have a backend that holds the required inference or remote API code. One backend can handle any number of similar models, using the same remote API request or inference code, requiring only a small amount of additional data. Already implemented backends can be found in the backends directory, with file names ending in _api.py. Supported models and the corresponding backends are listed in the model registry, backends/model_registry.json. See the model registry readme for more information on the model registry.
  • If there is no implemented backend that supports your model, you have to implement one. See Adding a backend.
  • If there is an implemented backend that supports your model, you need to add a new model entry to the model registry. See Adding a model to the model registry.
  • Test your model (and backend, if you implemented a new one) by running hellogame. See Testing the added model.

How to add a new backend

The backend is responsible for calling local or remote models (via an API).

  1. Add a file that ends in _api.py in the backends directory e.g. mybackend_api.py
  2. Implement in that file your backend class which needs to extend backends.Backend e.g. class MyBackend(backends.Backend)
  3. (Optional) Add an entry for your backend in the key.json

The framework will automatically look into the backends folder for all files that end in _api.py and at the model registry to make models available for benchmarking.
Important: All backends must return a prompt, response, response_text tuple which must be exactly this:

  • prompt is the exact object that was passed to the LLM (if the object has more structure, keep it as is, do not return only the message string)
  • response is the exact object that was returns by the LLM (again, do not change this object in any way)
  • response_text is only the message generated by the LLM as a string

The first two should get logged into the requests.json file generated by the game master and should be used for inspection that the actual inputs and outputs are correct.

Adding a model to the model registry

Adding a model to the registry can be as simple as adding an entry with the model's name and the backend that handles it, but the model entry can hold more data to be used by a backend.
For example, to add support for a new OpenAI model available via the OpenAI API, adding a simple entry like this is enough:

{
  "model_name": "GPT-5-Einstein",
  "model_id": "GPT-5-Einstein",
  "backend": "openai"
}

Given the hypothetical new model is named GPT-5-Einstein, and referred to with that string for the API request.
Add the entry to backends/model_registry.json, making sure that it is properly separated by a comma and inside the JSON list.
Important: Order of the entries in the model registry does matter! Models can be accessed by incomplete specifications (the data of which is contained in the model entries), and the first model entry that matches the partial specification will be used to load/access the model if there are multiple available implementations.

HuggingFace models

This section explains how to add a LLM hosted on the HuggingFace (HF) model repository to the model registry to make it available for the local HuggingFace backend of clembench. Due to the variety of models available via Huggingface, model registry entries for these models can hold an extensive amount of additional data used by the backend for inference.
Each model hosted on HuggingFace is identified by its model ID, which is the combination of the model uploader's username and the individual model name.
For example: For the OpenChat 3.5 model, the model ID is openchat/openchat_3.5, as openchat is the uploader's user name and openchat_3.5 is the model name.
This model ID is all that is needed to access ungated models hosted on HuggingFace.
Accessing gated models, like Meta's Llama2, requires an HF API access key/token. HF API tokens are acquired via your user profile on the HF website. Make sure that the HF account used to acquire the access key has been granted access to the gated model you want to add. This API key needs to be added to key.json in the clembench root directory to be available for loading gated model data.

Workflow

I. Check the model card on HuggingFace

You should thoroughly read the model card for the model to be added to be informed about individual aspects. It's also a good idea to look at the community tab of a model repository to see if there are common issues with the model.

II. Check the model's tokenizer

The clembench HuggingFace local backend relies on the transformers and indirectly on the tokenizers libraries for model-dependent input tokenization. It also relies on the chat template utility of the libraries' tokenizer classes. This first step is to make sure that a candidate model hosted on HuggingFace has the required configuration to be used with the clembench backend.
To perform a preliminary check for compatibility, run python3 backends/initial_hf_check.py -m <MODEL ID>.
For example: python3 backends/initial_hf_check.py -m openchat/openchat_3.5 to check the OpenChat 3.5 model.
The initial_hf_check.py script will show the applied template and warn about common issues, but does not cover all edge cases. It also takes the flags -i to show the tokenizer's information and -t to show the configured chat template in jinja string format, which can be useful for modification into a custom template for the model.
The initial check script applies the same preprocessing as the backend.

III. Add the model's information to the model registry

Open backends/hf_local_models.json in your editor of choice. This file contains entries for all models supported by the huggingface-local backend. To make a new model available, an entry for it needs to be added to this registry.

Basic model entry

A minimal model entry contains the model name, the backend to handle it, its HF ID, a bool that determines if a premade chat template for it will be loaded from HF and the EOS regular expression to be culled from its outputs:

{
  "model_name": "Mistral-7B-Instruct-v0.1",
  "backend": "huggingface_local",
  "huggingface_id": "mistralai/Mistral-7B-Instruct-v0.1",
  "premade_chat_template": true,
  "eos_to_cull": "</s>"
}
Chat template

If the model to be added passed the initial check without any issue, use "premade_chat_template": true in its registry entry. This indicates that the model's tokenizer properly applies a chat template that works without any further editing.
If it does not pass the check or otherwise requires chat template changes, the entry must contain "premade_chat_template": false and include the custom chat template to be used in jinja2 string format.
For example:

{
  "model_name": "sheep-duck-llama-2-70b-v1.1",
  "backend": "huggingface_local",
  "huggingface_id": "Riiid/sheep-duck-llama-2-70b-v1.1",
  "premade_chat_template": false,
  "custom_chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{{ '### User:\\n' + message['content'] + '\\n\\n' }}{% elif message['role'] == 'system' %}{{ '### System:\\n' + message['content'] + '\\n\\n' }}{% elif message['role'] == 'assistant' %}{{ '### Assistant:\\n' + message['content'] + '\\n\\n' }}{% endif %}{% if loop.last %}{{ '### Assistant:\\n' }}{% endif %}{% endfor %}",
  "eos_to_cull": "</s>"
}
Slow tokenizer handling

If the model requires the use of the 'slow' tokenizer class, which should be noted on the model card, the model entry must contain "slow_tokenizer": true.
For example:

{
  "model_name": "SUS-Chat-34B",
  "backend": "huggingface_local",
  "huggingface_id": "SUSTech/SUS-Chat-34B",
  "premade_chat_template": false,
  "custom_chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{{ '### Human: ' + message['content'] + '\\n\\n' }}{% elif message['role'] == 'assistant' %}{{ '### Assistant: ' + message['content'] }}{% endif %}{% if loop.last %}{{ '### Assistant: ' }}{% endif %}{% endfor %}",
  "slow_tokenizer": true,
  "eos_to_cull": "<|endoftext|>"
}
Output split string

The model to be added might use an uncommon tokenizer, which can lead to discrepancies between prompt and decoded model output, requiring the model output to be split to be properly handled by clembench. In this case, the string that predeces the model output proper needs to be contained in the model entry. (This will likely be found in testing the model.)
For example:

{
  "model_name": "Yi-34B-Chat",
  "backend": "huggingface_local",
  "huggingface_id": "01-ai/Yi-34B-Chat",
  "premade_chat_template": false,
  "custom_chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = true %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
  "slow_tokenizer": true,
  "output_split_prefix": "assistant\n",
  "eos_to_cull": "<|im_end|>"
}
HF API key requirement

If the model to be added is gated, the model entry must contain "requires_api_key": true. Make sure that key.json exists and has a viable HF API access key when the model is to be used.
For example:

{
  "model_name": "llama-2-7b-hf",
  "backend": "huggingface_local",
  "requires_api_key": true,
  "huggingface_id": "meta-llama/llama-2-7b-hf",
  "premade_chat_template": true,
  "eos_to_cull": "</s>"
}
Further model registry information

See the model registry readme for more information on the model registry.

Testing the model

Run HelloGame

Run clembench with the hellogame clemgame. See the corresponding documentation for HowTo.
This produces interactions and requests files in JSON format in the results directory. Specific files can be found in results/<MODEL NAME>/hellogame/0_greet_en/ episode subdirectories.

Check requests files

The requests file of each episode contains the prompts given to the model and its outputs.
Check the modified_prompt_object values for proper application of the chat template.
Then check if there is generated text and if the model outputs match the modified_prompt_object before the generated text.
Finally, check if the model output ends with a EOS string. This string needs to be culled, as noted above, and proper culling is checked in the next step.

Check interactions files

The interactions files contain processed outputs in the form they are relevant to clembench.
Model replies in the interaction files should not contain any model-specific EOS token strings.
Check if the model replies end in an EOS string. If they do, add this exact string to the EOS culling in the backend code as shown above.

Repeat after changes

If you made any changes to the code after the first test, run the test again and check the files to make sure that they now have proper contents.

Share your code

If you have successfully run the tests above, open a pull request for the clembench repository.
You can also run the benchmark with your added model if you have the necessary hardware available - if you do, please share the results by contributing them to the clembench-runs repository.