Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs to v0.0.83 #135

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/deployment/laplateforme/organization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This ensures that the model is accessible and usable by all authorized team memb

## Create a workspace

When you first join La Plateform, you can either create or join a workspace.
When you first join La Plateforme, you can either create or join a workspace.
Click on "Create workspace" to create and set up your workspace.

<img src="/img/org_join.png" width="80%"/>
Expand All @@ -40,4 +40,4 @@ To invite members to your organization, navigate to "Workspace - Members"
and click "Invite a new member".


<img src="/img/org_invite2.png" width="75%"/>
<img src="/img/org_invite2.png" width="75%"/>
105 changes: 105 additions & 0 deletions docs/deployment/self-deployment/vllm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,37 @@ batch inference workloads.
```

</TabItem>

<TabItem value="vllm-batch-small" label="Text input (Mistral Small)">

```python
from vllm import LLM
from vllm.sampling_params import SamplingParams

model_name = "mistralai/Mistral-Small-Instruct-2409"
sampling_params = SamplingParams(max_tokens=8192)

llm = LLM(
model=model_name,
tokenizer_mode="mistral",
load_format="mistral",
config_format="mistral",
)

messages = [
{
"role": "user",
"content": "Who is the best French painter. Answer with detailed explanations.",
}
]

res = llm.chat(messages=messages, sampling_params=sampling_params)
print(res[0].outputs[0].text)

```

</TabItem>

<TabItem value="vllm-batch-pixtral" label="Image + text input (Pixtral-12B)">
Suppose you want to caption the following images:
<center>
Expand Down Expand Up @@ -181,6 +212,64 @@ allowing you to directly reuse existing code relying on the OpenAI API.
</TabItem>
</Tabs>

</TabItem>

<TabItem value="vllm-server-text-small" label="Text input (Mistral Small)">
Start the inference server to deploy your model, e.g. for Mistral Small:

```bash
vllm serve mistralai/Mistral-Small-Instruct-2409 \
--tokenizer_mode mistral \
--config_format mistral \
--load_format mistral
```

You can now run inference requests with text input:

<Tabs>
<TabItem value="vllm-infer-small-curl" label="cURL">
```bash
curl --location 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer token' \
--data '{
"model": "mistralai/Mistral-Small-Instruct-2409",
"messages": [
{
"role": "user",
"content": "Who is the best French painter? Answer in one short sentence."
}
]
}'
```
</TabItem>
<TabItem value="vllm-infer-small-python" label="Python">
```python
import httpx

url = 'http://localhost:8000/v1/chat/completions'
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer token'
}
data = {
"model": "mistralai/Mistral-Small-Instruct-2409",
"messages": [
{
"role": "user",
"content": "Who is the best French painter? Answer in one short sentence."
}
]
}

response = httpx.post(url, headers=headers, json=data)

print(response.json())

```
</TabItem>
</Tabs>

</TabItem>

<TabItem value="vllm-server-mm" label="Image + text input (Pixtral-12B)">
Expand Down Expand Up @@ -296,6 +385,22 @@ the project's official Docker image (see more details in the
--config_format mistral
```
</TabItem>

<TabItem value="vllm-docker-small" label="Mistral Small">
```bash
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model mistralai/Mistral-Small-Instruct-2409 \
--tokenizer_mode mistral \
--load_format mistral \
--config_format mistral
```
</TabItem>

<TabItem value="vllm-docker-pixtral" label="Pixtral-12B">
```bash
docker run --runtime nvidia --gpus all \
Expand Down
12 changes: 8 additions & 4 deletions docs/getting-started/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ Mistral provides two types of models: free models and premier models.

| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version|
|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
| Mistral 7B | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: |Our best open source model to date released April 2024. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3|
| Mistral 7B | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: | Our first dense model released September 2023. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3|
| Mixtral 8x7B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |Our first sparse mixture-of-experts released December 2023. Learn more on our [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k | `open-mixtral-8x7b`| v0.1|
| Mixtral 8x22B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |Our first dense model released September 2023. Learn more on our [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| v0.1|
| Mixtral 8x22B |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | Our best open source model to date released April 2024. Learn more on our [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| v0.1|


## API versioning
Expand Down Expand Up @@ -84,10 +84,14 @@ This guide will explore the performance and cost trade-offs, and discuss how to

Today, Mistral models are behind many LLM applications at scale. Here is a brief overview on the types of use cases we see along with their respective Mistral model:

1) Simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation) are powered by Mistral Small.
2) Intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions) are powered by Mistral 8x22B.
1) Simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation) can be powered by Mistral Nemo.
2) Intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions) are powered by Mistral Small.
3) Complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents) are powered by Mistral Large.

Our Legacy models can currently be replaced by our more recent, high-quality models. If you are considering an upgrade, here are some general comments that may assist you:
- Mistral Nemo currently outperforms Mistral 7B and is more cost-effective.
- Mistral Small currently outperforms Mixtral 8x7B and is more cost-effective.
- Mistral Large currently outperforms Mixtral 8x22B while maintaining the same price ratio.

### Performance and cost trade-offs

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/finetuning_sections/_03_e2e_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import TabItem from '@theme/TabItem';
</a>


You can fine-tune Mistral’s open-weights models Mistral 7B and Mistral Small via Mistral API. Follow the steps below using Mistral's fine-tuning API.
You can fine-tune all Mistral’s models via Mistral API. Follow the steps below using Mistral's fine-tuning API.

### Prepare dataset
In this example, let’s use the [ultrachat_200k dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k). We load a chunk of the data into Pandas Dataframes, split the data into training and validation, and save the data into the required `jsonl` format for fine-tuning.
Expand Down
Loading