diff --git a/docs/deployment/laplateforme/organization.mdx b/docs/deployment/laplateforme/organization.mdx index 807724e..d15b66d 100644 --- a/docs/deployment/laplateforme/organization.mdx +++ b/docs/deployment/laplateforme/organization.mdx @@ -15,7 +15,7 @@ This ensures that the model is accessible and usable by all authorized team memb ## Create a workspace -When you first join La Plateform, you can either create or join a workspace. +When you first join La Plateforme, you can either create or join a workspace. Click on "Create workspace" to create and set up your workspace. @@ -40,4 +40,4 @@ To invite members to your organization, navigate to "Workspace - Members" and click "Invite a new member". - \ No newline at end of file + diff --git a/docs/deployment/self-deployment/vllm.mdx b/docs/deployment/self-deployment/vllm.mdx index c7af9b4..4c2519b 100644 --- a/docs/deployment/self-deployment/vllm.mdx +++ b/docs/deployment/self-deployment/vllm.mdx @@ -74,6 +74,37 @@ batch inference workloads. ``` + + + + ```python + from vllm import LLM + from vllm.sampling_params import SamplingParams + + model_name = "mistralai/Mistral-Small-Instruct-2409" + sampling_params = SamplingParams(max_tokens=8192) + + llm = LLM( + model=model_name, + tokenizer_mode="mistral", + load_format="mistral", + config_format="mistral", + ) + + messages = [ + { + "role": "user", + "content": "Who is the best French painter. Answer with detailed explanations.", + } + ] + + res = llm.chat(messages=messages, sampling_params=sampling_params) + print(res[0].outputs[0].text) + + ``` + + + Suppose you want to caption the following images:
@@ -181,6 +212,64 @@ allowing you to directly reuse existing code relying on the OpenAI API. + + + + Start the inference server to deploy your model, e.g. for Mistral Small: + + ```bash + vllm serve mistralai/Mistral-Small-Instruct-2409 \ + --tokenizer_mode mistral \ + --config_format mistral \ + --load_format mistral + ``` + + You can now run inference requests with text input: + + + + ```bash + curl --location 'http://localhost:8000/v1/chat/completions' \ + --header 'Content-Type: application/json' \ + --header 'Authorization: Bearer token' \ + --data '{ + "model": "mistralai/Mistral-Small-Instruct-2409", + "messages": [ + { + "role": "user", + "content": "Who is the best French painter? Answer in one short sentence." + } + ] + }' + ``` + + + ```python + import httpx + + url = 'http://localhost:8000/v1/chat/completions' + headers = { + 'Content-Type': 'application/json', + 'Authorization': 'Bearer token' + } + data = { + "model": "mistralai/Mistral-Small-Instruct-2409", + "messages": [ + { + "role": "user", + "content": "Who is the best French painter? Answer in one short sentence." + } + ] + } + + response = httpx.post(url, headers=headers, json=data) + + print(response.json()) + + ``` + + + @@ -296,6 +385,22 @@ the project's official Docker image (see more details in the --config_format mistral ``` + + + ```bash + docker run --runtime nvidia --gpus all \ + -v ~/.cache/huggingface:/root/.cache/huggingface \ + --env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \ + -p 8000:8000 \ + --ipc=host \ + vllm/vllm-openai:latest \ + --model mistralai/Mistral-Small-Instruct-2409 \ + --tokenizer_mode mistral \ + --load_format mistral \ + --config_format mistral + ``` + + ```bash docker run --runtime nvidia --gpus all \ diff --git a/docs/getting-started/models.mdx b/docs/getting-started/models.mdx index 0af0307..6c697ac 100644 --- a/docs/getting-started/models.mdx +++ b/docs/getting-started/models.mdx @@ -39,9 +39,9 @@ Mistral provides two types of models: free models and premier models. | Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version| |--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| -| Mistral 7B | :heavy_check_mark:
Apache2 |:heavy_check_mark: |Our best open source model to date released April 2024. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3| +| Mistral 7B | :heavy_check_mark:
Apache2 |:heavy_check_mark: | Our first dense model released September 2023. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3| | Mixtral 8x7B |:heavy_check_mark:
Apache2 | :heavy_check_mark: |Our first sparse mixture-of-experts released December 2023. Learn more on our [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k | `open-mixtral-8x7b`| v0.1| -| Mixtral 8x22B |:heavy_check_mark:
Apache2 | :heavy_check_mark: |Our first dense model released September 2023. Learn more on our [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| v0.1| +| Mixtral 8x22B |:heavy_check_mark:
Apache2 | :heavy_check_mark: | Our best open source model to date released April 2024. Learn more on our [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| v0.1| ## API versioning @@ -84,10 +84,14 @@ This guide will explore the performance and cost trade-offs, and discuss how to Today, Mistral models are behind many LLM applications at scale. Here is a brief overview on the types of use cases we see along with their respective Mistral model: -1) Simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation) are powered by Mistral Small. -2) Intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions) are powered by Mistral 8x22B. +1) Simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation) can be powered by Mistral Nemo. +2) Intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions) are powered by Mistral Small. 3) Complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents) are powered by Mistral Large. +Our Legacy models can currently be replaced by our more recent, high-quality models. If you are considering an upgrade, here are some general comments that may assist you: +- Mistral Nemo currently outperforms Mistral 7B and is more cost-effective. +- Mistral Small currently outperforms Mixtral 8x7B and is more cost-effective. +- Mistral Large currently outperforms Mixtral 8x22B while maintaining the same price ratio. ### Performance and cost trade-offs diff --git a/docs/guides/finetuning_sections/_03_e2e_examples.md b/docs/guides/finetuning_sections/_03_e2e_examples.md index 94a9628..64afd32 100644 --- a/docs/guides/finetuning_sections/_03_e2e_examples.md +++ b/docs/guides/finetuning_sections/_03_e2e_examples.md @@ -8,7 +8,7 @@ import TabItem from '@theme/TabItem'; -You can fine-tune Mistral’s open-weights models Mistral 7B and Mistral Small via Mistral API. Follow the steps below using Mistral's fine-tuning API. +You can fine-tune all Mistral’s models via Mistral API. Follow the steps below using Mistral's fine-tuning API. ### Prepare dataset In this example, let’s use the [ultrachat_200k dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k). We load a chunk of the data into Pandas Dataframes, split the data into training and validation, and save the data into the required `jsonl` format for fine-tuning.