mistralai · Bam4d · Sep 18, 2024 · Sep 18, 2024
diff --git a/docs/deployment/laplateforme/organization.mdx b/docs/deployment/laplateforme/organization.mdx
@@ -15,7 +15,7 @@ This ensures that the model is accessible and usable by all authorized team memb
 
 ## Create a workspace 
 
-When you first join La Plateform, you can either create or join a workspace. 
+When you first join La Plateforme, you can either create or join a workspace. 
 Click on "Create workspace" to create and set up your workspace. 
 
 <img src="/img/org_join.png" width="80%"/>
@@ -40,4 +40,4 @@ To invite members to your organization, navigate to "Workspace - Members"
 and click "Invite a new member". 
 
 
-<img src="/img/org_invite2.png" width="75%"/>
+<img src="/img/org_invite2.png" width="75%"/>
diff --git a/docs/deployment/self-deployment/vllm.mdx b/docs/deployment/self-deployment/vllm.mdx
@@ -74,6 +74,37 @@ batch inference workloads.
         ```
 
     </TabItem>
+
+        <TabItem value="vllm-batch-small" label="Text input (Mistral Small)">
+
+        ```python
+        from vllm import LLM
+        from vllm.sampling_params import SamplingParams
+
+        model_name = "mistralai/Mistral-Small-Instruct-2409"
+        sampling_params = SamplingParams(max_tokens=8192)
+
+        llm = LLM(
+            model=model_name,
+            tokenizer_mode="mistral",
+            load_format="mistral",
+            config_format="mistral",
+        )
+
+        messages = [
+            {
+                "role": "user",
+                "content": "Who is the best French painter. Answer with detailed explanations.",
+            }
+        ]
+
+        res = llm.chat(messages=messages, sampling_params=sampling_params)
+        print(res[0].outputs[0].text)
+
+        ```
+
+    </TabItem>
+
     <TabItem value="vllm-batch-pixtral" label="Image + text input (Pixtral-12B)">
         Suppose you want to caption the following images:
           <center>
@@ -181,6 +212,64 @@ allowing you to directly reuse existing code relying on the OpenAI API.
             </TabItem>
           </Tabs>
 
+    </TabItem>
+
+        <TabItem value="vllm-server-text-small" label="Text input (Mistral Small)">
+        Start the inference server to deploy your model, e.g. for Mistral Small:
+
+          ```bash
+          vllm serve mistralai/Mistral-Small-Instruct-2409 \
+            --tokenizer_mode mistral \
+            --config_format mistral \
+            --load_format mistral
+          ```
+
+        You can now run inference requests with text input:
+
+          <Tabs>
+            <TabItem value="vllm-infer-small-curl" label="cURL">
+                ```bash
+                curl --location 'http://localhost:8000/v1/chat/completions' \
+                    --header 'Content-Type: application/json' \
+                    --header 'Authorization: Bearer token' \
+                    --data '{
+                        "model": "mistralai/Mistral-Small-Instruct-2409",
+                        "messages": [
+                          {
+                            "role": "user",
+                            "content": "Who is the best French painter? Answer in one short sentence."
+                          }
+                        ]
+                      }'
+                ```
+            </TabItem>
+            <TabItem value="vllm-infer-small-python" label="Python">
+                ```python
+                import httpx
+
+                url = 'http://localhost:8000/v1/chat/completions'
+                headers = {
+                    'Content-Type': 'application/json',
+                    'Authorization': 'Bearer token'
+                }
+                data = {
+                    "model": "mistralai/Mistral-Small-Instruct-2409",
+                    "messages": [
+                        {
+                            "role": "user",
+                            "content": "Who is the best French painter? Answer in one short sentence."
+                        }
+                    ]
+                }
+
+                response = httpx.post(url, headers=headers, json=data)
+
+                print(response.json())
+
+                ```
+            </TabItem>
+          </Tabs>
+
     </TabItem>
 
     <TabItem value="vllm-server-mm" label="Image + text input (Pixtral-12B)">
@@ -296,6 +385,22 @@ the project's official Docker image (see more details in the
             --config_format mistral
         ```
     </TabItem>
+
+    <TabItem value="vllm-docker-small" label="Mistral Small">
+        ```bash
+        docker run --runtime nvidia --gpus all \
+            -v ~/.cache/huggingface:/root/.cache/huggingface \
+            --env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \
+            -p 8000:8000 \
+            --ipc=host \
+            vllm/vllm-openai:latest \
+            --model mistralai/Mistral-Small-Instruct-2409 \
+            --tokenizer_mode mistral \
+            --load_format mistral \
+            --config_format mistral
+        ```
+    </TabItem>
+
     <TabItem value="vllm-docker-pixtral" label="Pixtral-12B">
         ```bash
         docker run --runtime nvidia --gpus all \

diff --git a/docs/getting-started/models.mdx b/docs/getting-started/models.mdx
@@ -39,9 +39,9 @@ Mistral provides two types of models: free models and premier models.
 
 | Model               | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version|
 |--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
-| Mistral 7B    | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: |Our best open source model to date released April 2024. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3|
+| Mistral 7B    | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: | Our first dense model released September 2023. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3|
 | Mixtral 8x7B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |Our first sparse mixture-of-experts released December 2023. Learn more on our [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k  | `open-mixtral-8x7b`| v0.1|
-| Mixtral 8x22B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |Our first dense model released September 2023. Learn more on our [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k  | `open-mixtral-8x22b`| v0.1|
+| Mixtral 8x22B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | Our best open source model to date released April 2024. Learn more on our [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k  | `open-mixtral-8x22b`| v0.1|
 
 
 ## API versioning 
@@ -84,10 +84,14 @@ This guide will explore the performance and cost trade-offs, and discuss how to
 
 Today, Mistral models are behind many LLM applications at scale. Here is a brief overview on the types of use cases we see along with their respective Mistral model:
 
-1) Simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation) are powered by Mistral Small.
-2) Intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions) are powered by Mistral 8x22B.
+1) Simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation) can be powered by Mistral Nemo.
+2) Intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions) are powered by Mistral Small.
 3) Complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents) are powered by Mistral Large.
 
+Our Legacy models can currently be replaced by our more recent, high-quality models. If you are considering an upgrade, here are some general comments that may assist you:
+- Mistral Nemo currently outperforms Mistral 7B and is more cost-effective.
+- Mistral Small currently outperforms Mixtral 8x7B and is more cost-effective.
+- Mistral Large currently outperforms Mixtral 8x22B while maintaining the same price ratio.
 
 ### Performance and cost trade-offs 
 

diff --git a/docs/guides/finetuning_sections/_03_e2e_examples.md b/docs/guides/finetuning_sections/_03_e2e_examples.md
@@ -8,7 +8,7 @@ import TabItem from '@theme/TabItem';
 </a>
 
 
-You can fine-tune Mistral’s open-weights models Mistral 7B and Mistral Small via Mistral API. Follow the steps below using Mistral's fine-tuning API.
+You can fine-tune all Mistral’s models via Mistral API. Follow the steps below using Mistral's fine-tuning API.
 
 ### Prepare dataset
 In this example, let’s use the [ultrachat_200k dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k). We load a chunk of the data into Pandas Dataframes, split the data into training and validation, and save the data into the required `jsonl` format for fine-tuning.