From f843e2198013a73a347d161f466cf5a96e37f3fb Mon Sep 17 00:00:00 2001 From: Michael Gruner Date: Sat, 1 Jun 2024 16:31:54 -0700 Subject: [PATCH] Add mistral-inference documentation --- packages/llm/mistral-inference/docs.md | 262 +++++++++++++++++++++++++ 1 file changed, 262 insertions(+) create mode 100644 packages/llm/mistral-inference/docs.md diff --git a/packages/llm/mistral-inference/docs.md b/packages/llm/mistral-inference/docs.md new file mode 100644 index 000000000..f6cfc6f21 --- /dev/null +++ b/packages/llm/mistral-inference/docs.md @@ -0,0 +1,262 @@ + +The [Mistral +Inference](https://github.com/mistralai/mistral-inference) project +contains minimal code to run Mistral's 7B, 8x7B and 8x22B models. + +Blog 7B: [https://mistral.ai/news/announcing-mistral-7b/](https://mistral.ai/news/announcing-mistral-7b/)\ +Blog 8x7B: [https://mistral.ai/news/mixtral-of-experts/](https://mistral.ai/news/mixtral-of-experts/)\ +Blog 8x22B: [https://mistral.ai/news/mixtral-8x22b/](https://mistral.ai/news/mixtral-8x22b/) +Blog Codestral 22B: [https://mistral.ai/news/codestral](https://mistral.ai/news/codestral/) + +## Model download + +| Name | Download | md5sum | +|-------------|-------|-------| +| 7B Instruct | https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar | `80b71fcb6416085bcb4efad86dfb4d52` | +| 8x7B Instruct | https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar (**Updated model coming soon!**) | `8e2d3930145dc43d3084396f49d38a3f` | +| 8x22 Instruct | https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tar | `471a02a6902706a2f1e44a693813855b` | +| 7B Base | https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-v0.3.tar | `0663b293810d7571dad25dae2f2a5806` | +| 8x7B | **Updated model coming soon!** | - | +| 8x22B | https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-v0.3.tar | `a2fa75117174f87d1197e3a4eb50371a` | +| Codestral 22B | https://models.mistralcdn.com/codestral-22b-v0-1/codestral-22B-v0.1.tar | `a5661f2f6c6ee4d6820a2f68db934c5d` | + +Note: +- **Important**: + - `mixtral-8x22B-Instruct-v0.3.tar` is exactly the same as [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), only stored in `.safetensors` format + - `mixtral-8x22B-v0.3.tar` is the same as [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1), but has an extended vocabulary of 32768 tokens. + - `codestral-22B-v0.1.tar` has a custom non-commercial license, called [Mistral AI Non-Production (MNPL) License](https://mistral.ai/licenses/MNPL-0.1.md) +- All of the listed models above supports function calling. For example, Mistral 7B Base/Instruct v3 is a minor update to Mistral 7B Base/Instruct v2, with the addition of function calling capabilities. +- The "coming soon" models will include function calling as well. +- You can download the previous versions of our models from our [docs](https://docs.mistral.ai/getting-started/open_weight_models/#downloading). + + +Create a local folder to store models +```sh +export MISTRAL_MODEL=$HOME/mistral_models +mkdir -p $MISTRAL_MODEL +``` + +Download any of the above links and extract the content, *e.g.*: + +```sh +export M7B_DIR=$MISTRAL_MODEL/7B_instruct +wget https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar +mkdir -p $M7B_DIR +tar -xf mistral-7B-Instruct-v0.3.tar -C $M7B_DIR +``` + +or + +```sh +export M8x7B_DIR=$MISTRAL_MODEL/8x7b_instruct +wget https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar +mkdir -p $M8x7B_DIR +tar -xf Mixtral-8x7B-v0.1-Instruct.tar -C $M8x7B_DIR +``` + +## Usage + +The following sections give an overview of how to run the model from the Command-line interface (CLI) or directly within Python. + +### CLI + +- **Demo** + +To test that a model works in your setup, you can run the `mistral-demo` command. +The 7B models can be tested on a single GPU as follows: + +```sh +mistral-demo $M7B_DIR +``` + +Large models, such **8x7B** and **8x22B** have to be run in a multi-GPU setup. +For these models, you can use the following command: + +```sh +torchrun --nproc-per-node 2 --no-python mistral-demo $M8x7B_DIR +``` + +*Note*: Change `--nproc-per-node` to more GPUs if available. + +- **Chat** + +To interactively chat with the models, you can make use of the `mistral-chat` command. + +```sh +mistral-chat $M7B_DIR --instruct +``` + +For large models, you can make use of `torchrun`. + +```sh +torchrun --nproc-per-node 2 --no-python mistral-chat $M8x7B_DIR --instruct +``` + +*Note*: Change `--nproc-per-node` to more GPUs if necessary (*e.g.* for 8x22B). + +- **Chat as Code Assistant** + +To use [Codestral](https://mistral.ai/news/codestral/) as a coding assistant you can run the following command using `mistral-chat`. +Make sure `$M22B_CODESTRAL` is set to a valid path to the downloaded codestral folder, e.g. `$HOME/mistral_models/Codestral-22B-v0.1` + +```sh +mistral-chat $M22B_CODESTRAL --instruct --max_tokens 256 +``` + +If you prompt it with *"Write me a function that computes fibonacci in Rust"*, the model should generate something along the following lines: + +```sh +Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number. + +fn fibonacci(n: u32) -> u32 { + match n { + 0 => 0, + 1 => 1, + _ => fibonacci(n - 1) + fibonacci(n - 2), + } +} + +fn main() { + let n = 10; + println!("The {}th Fibonacci number is: {}", n, fibonacci(n)); +} + +This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers. +``` + +You can continue chatting afterwards, *e.g.* with *"Translate it to Python"*. + +### Python + +- *Instruction Following*: + +```py +from mistral_inference.model import Transformer +from mistral_inference.generate import generate + +from mistral_common.tokens.tokenizers.mistral import MistralTokenizer +from mistral_common.protocol.instruct.messages import UserMessage +from mistral_common.protocol.instruct.request import ChatCompletionRequest + + +tokenizer = MistralTokenizer.from_file("./mistral_7b_instruct/tokenizer.model.v3") # change to extracted tokenizer file +model = Transformer.from_folder("./mistral_7b_instruct") # change to extracted model dir + +completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")]) + +tokens = tokenizer.encode_chat_completion(completion_request).tokens + +out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) +result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0]) + +print(result) +``` + +- *Function Calling*: + +```py +from mistral_common.protocol.instruct.tool_calls import Function, Tool + +completion_request = ChatCompletionRequest( + tools=[ + Tool( + function=Function( + name="get_current_weather", + description="Get the current weather", + parameters={ + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA", + }, + "format": { + "type": "string", + "enum": ["celsius", "fahrenheit"], + "description": "The temperature unit to use. Infer this from the users location.", + }, + }, + "required": ["location", "format"], + }, + ) + ) + ], + messages=[ + UserMessage(content="What's the weather like today in Paris?"), + ], +) + +tokens = tokenizer.encode_chat_completion(completion_request).tokens + +out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) +result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0]) + +print(result) +``` + +- *Fill-in-the-middle (FIM)*: + +Make sure to have `mistral-common >= 1.2.0` installed: +``` +pip install --upgrade mistral-common +``` + +You can simulate a code completion in-filling as follows. + +```py +from mistral_inference.model import Transformer +from mistral_inference.generate import generate +from mistral_common.tokens.tokenizers.mistral import MistralTokenizer +from mistral_common.tokens.instruct.request import FIMRequest + +tokenizer = MistralTokenizer.from_model("codestral-22b") +model = Transformer.from_folder("./mistral_22b_codestral") + +prefix = """def add(""" +suffix = """ return sum""" + +request = FIMRequest(prompt=prefix, suffix=suffix) + +tokens = tokenizer.encode_fim(request).tokens + +out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) +result = tokenizer.decode(out_tokens[0]) + +middle = result.split(suffix)[0].strip() +print(middle) +``` + +### One-file-ref + +If you want a self-contained implementation, look at `one_file_ref.py`, or run it with + +``` +python -m one_file_ref $M7B_DIR +``` + +which should give something along the following lines: + +``` +This is a test of the emergency broadcast system. This is only a test. + +If this were a real emergency, you would be told what to do. + +This is a test +===================== +This is another test of the new blogging software. I’m not sure if I’m going to keep it or not. I’m not sure if I’m going to keep +===================== +This is a third test, mistral AI is very good at testing. 🙂 + +This is a third test, mistral AI is very good at testing. 🙂 + +This +===================== +``` + +**Note**: To run self-contained implementations, you need to do a local installation. + +## Additional Documentation + +This documentation, along with additional information, can be found in +[Mistral Inference's GitHub +repository](https://github.com/mistralai/mistral-inference/tree/main). \ No newline at end of file