diff --git a/docs/capabilities/batch.md b/docs/capabilities/batch.md index e19053f..570158a 100644 --- a/docs/capabilities/batch.md +++ b/docs/capabilities/batch.md @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem'; A batch is composed of a list of API requests. The structure of an individual request includes: -- A unique `custom_id` for identifying each request and referening results after completion +- A unique `custom_id` for identifying each request and referencing results after completion - A `body` object with message information Here's an example of how to structure a batch request: diff --git a/docs/getting-started/changelog.mdx b/docs/getting-started/changelog.mdx index 25812e4..c895297 100644 --- a/docs/getting-started/changelog.mdx +++ b/docs/getting-started/changelog.mdx @@ -21,6 +21,9 @@ November 6, 2024 - `frequency_penalty`: penalizes the repetition of words based on their frequency in the generated text - `n`: number of completions to return for each request, input tokens are only billed once. +November 6, 2024 +- We downscaled the temperature parameter of `pixtral-12b`, `ministral-3b-2410`, and `ministral-8b-2410` by a multiplier of 0.43 to improve consistency, quality, and unify model behavior. + October 9, 2024 - We released Ministral 3B (`ministral-3b-2410`) and Ministral 8B (`ministral-8b-2410`). diff --git a/docs/getting-started/glossary.mdx b/docs/getting-started/glossary.mdx index 57cecd2..3a11d5f 100644 --- a/docs/getting-started/glossary.mdx +++ b/docs/getting-started/glossary.mdx @@ -84,3 +84,6 @@ allowing the model to understand and generate language more effectively. Mistral AI Embeddings API offers cutting-edge, state-of-the-art embeddings for text, which can be used for many NLP tasks. Check out our [Embeddings](/capabilities/embeddings) guide to learn more. + +## Temperature +Temperature is a fundamental sampling parameter in LLMs that controls the randomness and diversity of the generated outputs. Lower Temperature values result in more deterministic and accurate responses, while higher values introduce more creativity and randomness. This parameter affects the softmax function, which normalizes logits into a probability distribution. Higher Temperatures flatten the distribution, making less likely tokens more probable, while lower Temperatures sharpen the distribution, favoring the most likely tokens. Adjusting the Temperature allows for tailoring the model's behavior to suit different applications, such as requiring high accuracy for tasks like mathematics or classification, or enhancing creativity for tasks like brainstorming or writing novels. Balancing creativity and coherence is crucial, as increasing Temperature can also introduce inaccuracies. Some models, such as `pixtral-12b`, `ministral-3b-2410`, `ministral-8b-2410` and `open-mistral-nemo` have a factor of 0.43 on temperature when used via our services, to align better with how it impacts other models and unify model behaviour. diff --git a/docs/guides/contribute/_category_.json b/docs/guides/contribute/_category_.json index d4427cc..9ea6367 100644 --- a/docs/guides/contribute/_category_.json +++ b/docs/guides/contribute/_category_.json @@ -1,6 +1,6 @@ { "label": "How to contribute", - "position": 1.9, + "position": 1.10, "link": { "type": "doc", "id": "contribute_overview" diff --git a/docs/guides/contribute/overview.md b/docs/guides/contribute/overview.md index c5cc776..d357e23 100644 --- a/docs/guides/contribute/overview.md +++ b/docs/guides/contribute/overview.md @@ -4,7 +4,6 @@ title: Contribute slug: overview --- - # How to contribute Thank you for your interest in contributing to Mistral AI. We welcome everyone who wishes to contribute and we appreciate your time and effort! diff --git a/docs/guides/evaluation.md b/docs/guides/evaluation.md index d5e918b..80befdf 100644 --- a/docs/guides/evaluation.md +++ b/docs/guides/evaluation.md @@ -1,7 +1,7 @@ --- id: evaluation title: Evaluation -sidebar_position: 1.6 +sidebar_position: 1.7 --- diff --git a/docs/guides/finetuning.mdx b/docs/guides/finetuning.mdx index f2e8735..1ab9f75 100644 --- a/docs/guides/finetuning.mdx +++ b/docs/guides/finetuning.mdx @@ -1,7 +1,7 @@ --- id: finetuning title: Fine-tuning -sidebar_position: 1.5 +sidebar_position: 1.6 --- :::warning[ ] There's a monthly storage fee of $2 for each model. For more detailed pricing information, please visit our [pricing page](https://mistral.ai/technology/#pricing). diff --git a/docs/guides/observability.md b/docs/guides/observability.md index 63c181d..59565e2 100644 --- a/docs/guides/observability.md +++ b/docs/guides/observability.md @@ -2,7 +2,7 @@ id: observability title: Observability slug: observability -sidebar_position: 1.7 +sidebar_position: 1.8 --- ## Why observability? diff --git a/docs/guides/other-resources.mdx b/docs/guides/other-resources.mdx index 1b59d66..ff69fb4 100644 --- a/docs/guides/other-resources.mdx +++ b/docs/guides/other-resources.mdx @@ -2,7 +2,7 @@ id: other_resources title: Other resources slug: resources -sidebar_position: 1.8 +sidebar_position: 1.9 --- Visit the [Mistral AI Cookbook](https://github.com/mistralai/cookbook) for additional inspiration, diff --git a/docs/guides/sampling.md b/docs/guides/sampling.md new file mode 100644 index 0000000..44b9ea2 --- /dev/null +++ b/docs/guides/sampling.md @@ -0,0 +1,464 @@ +--- +id: sampling +title: Sampling +sidebar_position: 1.5 +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Sampling: Overview on our sampling settings + +Here, we will discuss the sampling settings that influence the output of Language Learning Models (LLMs). This guide covers parameters such as **Temperature**, **N**, **Top P**, **Presence Penalty**, and **Frequency Penalty**, and explains how to adjust them. Whether you aim to generate creative content or ensure accurate responses, understanding these settings is key. + +Let's explore each parameter and learn how to fine-tune LLM outputs effectively. + +
+ +N Completions + +## N Completions + +**N** represents the number of completions to return for each request. This parameter is useful when you want to generate multiple responses for a single input. Each completion will be a unique response generated by the model, providing a variety of outputs to choose from. + +### Key Points + +- **Multiple Responses**: By setting `N` to a value greater than 1, you can get multiple responses for the same input. +- **Cost Efficiency**: Input tokens are only billed once, regardless of the number of completions requested. This makes it cost-effective to explore different possibilities. + +### Example + +Here's an example of how to use the `N` parameter in the API: + +```py +import os +from mistralai import Mistral + +api_key = os.environ["MISTRAL_API_KEY"] +model = "ministral-3b-latest" + +client = Mistral(api_key=api_key) + +chat_response = client.chat.complete( + model=model, + messages=[ + { + "role": "user", + "content": "What is the best mythical creature? Answer with a single word.", + }, + ], + temperature = 1, # Increasing randomness and diversity of the output, this is required to be higher than 0 to have diverse outputs + n = 10 # Number of completions +) + +for i, choice in enumerate(chat_response.choices): + print(choice.message.content) +``` + +### Output + +``` +Phoenix. +Dragon +Dragon +Unicorn +Unicorn +Phoenix +Unicorn +Dragon +Dragon. +Unicorn +``` + +In this example, the model generates 10 responses for the same input prompt. This allows you to see a variety of possible answers and choose the one that best fits your needs. + +
+ +
+ +Temperature + +## Temperature + +**Temperature** in Language Learning Models (LLMs) controls output diversity. Lower values make the model more deterministic, focusing on likely responses for accuracy. Higher values increase creativity and diversity. During text generation, LLMs predict tokens with associated probabilities using a softmax function. Temperature scales these probabilities: higher temperatures flatten the distribution, making outputs more varied, while lower temperatures amplify differences, favoring more likely tokens. + +## Visualization + +To better understand the underlying principle and impact it has on the probability distribution, here is a visualisation of the Temperature with a simple prompt: + *"What is the best mythical creature? Answer with a single word."* + +
+ Example Image + + Barplot example comparing the distribution with different `Temperature` values and the top 5 tokens using Mistral 7B at 4 bits precision. +
+ +**Temperature** significantly affects the probability distribution in LLMs. At a Temperature of 0, the model always outputs the most likely token, e.g., "**Dragon**". Increasing the Temperature to 0.2 introduces variability, allowing for tokens like "**Un**" (as in "**Un**icorn"). Further increases reveal more diverse tokens: the third token might still be "**Drag**" (for "**Drag**on"), but the fourth could start "**Peg**asus", and the fifth, "**Phoenix**". Higher Temperatures make less likely tokens more probable, enhancing the diversity of the model's output. + +## API +You can set a temperature value easily via our clients, let's experiment with our API. +```py +import os +from mistralai import Mistral + +api_key = os.environ["MISTRAL_API_KEY"] +model = "ministral-3b-latest" + +client = Mistral(api_key=api_key) + +chat_response = client.chat.complete( + model = model, + messages = [ + { + "role": "user", + "content": "What is the best mythical creature? Answer with a single word.", + }, + ], + temperature = 0.1, + n = 10 +) + +for i, choice in enumerate(chat_response.choices): + print(choice.message.content) +``` +``` +Dragon +Dragon +Dragon +Dragon +Dragon +Dragon +Dragon +Dragon +Dragon +Dragon +``` +The model answered mostly with Dragon! Lets try with a higher temperature to try to have more diverse outputs, let's set it to `temperature = 1`. +``` +Unicorn +Dragon +Phoenix +Unicorn +Dragon +Phoenix. +Dragon. +Phoenix +Dragon +Unicorn. +``` + +The outputs ended much more diverse, the model answering with a different creature more frequently, we have "Dragon", "Unicorn" and "Phoenix". + +## The Best Temperature + +There's no one-size-fits-all Temperature for all use cases, but some guidelines can help you find the best for your applications. + +### Determinism + +- **Requirements**: Tasks needing consistent, accurate responses, such as Mathematics, Classification, Healthcare, or Reasoning. +- **Temperature**: Use very low values, sometimes not null to add slight uniqueness. + +For example, a classification agent should use a Temperature of 0 to always pick the best token. A math chat assistant might use very low Temperature values to avoid repetition while maintaining accuracy. + +### Creativity + +- **Requirements**: Tasks needing diverse, unique text, like brainstorming, writing novels, creating slogans, or roleplaying. +- **Temperature**: Use high values, but avoid excessively high Temperatures to prevent randomness and nonsense outputs. + +Consider the trade-off: higher Temperatures increase creativity but may decrease quality and accuracy. + +
+ +
+ +Top P + +# Top P + +**Top P** is a setting that limits the tokens considered by a language model based on a probability threshold. It helps focus on the most likely tokens, improving output quality. + +## Visualization + +For these examples, we set the Temperature first, then apply a Top P of 50%. Note that a Temperature of 0 is deterministic, making Top P irrelevant in that case. + +The process is as follows: +1. Apply the Temperature. +2. Use Top P (0.5) to keep only the most likely tokens. +3. Adjust the probabilities of the remaining tokens. + +We will visualize the token probability distribution across different temperature values for the question: +- "What is the best mythical creature? Answer with a single word." + +
+
+ Example Image +
Different Temperature values and the top 5 tokens using Mistral 7B at 4 bits precision. +
+ +
+ +
+ +
+ Example Image +
Top P considers only the top tokens until reaching 50% probability. +
+ +
+ +
+ +
+ Example Image +
Other tokens' probabilities are set to 0, and the remaining tokens' probabilities are adjusted. +
+
+ +Top P ensures that only high-quality tokens are considered, maintaining output quality by excluding unlikely tokens. It's challenging to balance Temperature and Top P, so it's recommended to fix one and adjust the other. However you should experiment to find the best settings for your use case! + +### To Summarize +1. **Role of Top P**: Top P limits the tokens considered based on a probability threshold, focusing on the most likely tokens to improve output quality. +2. **Interaction with Temperature**: Top P is applied after Temperature. +3. **Impact on Outputs**: Top P avoids considering very unlikely tokens, maintaining output quality and coherence. +4. **Balancing Temperature and Top P**: It's challenging to balance both. Start by fixing one parameter and adjust the other, experiment to find optimal settings. + +### Example + +Here's an example of how to use the `Top P` parameter with our python client: + +```py +import os +from mistralai import Mistral + +api_key = os.environ["MISTRAL_API_KEY"] +model = "ministral-3b-latest" + +client = Mistral(api_key=api_key) + +chat_response = client.chat.complete( + model=model, + messages=[ + { + "role": "user", + "content": "What is the best mythical creature? Answer with a single word.", + }, + ], + temperature=1, + top_p=0.5, + n=10 +) + +for i, choice in enumerate(chat_response.choices): + print(choice.message.content) +``` + +### Output + +```py +Unicorn +Unicorn +Unicorn +Unicorn +Dragon +Unicorn +Dragon +Dragon +Dragon +Dragon +``` + +### Output Table + +| Temperature 0.1 | Temperature 1 | Temperature 1 & Top P 50% | +|:-----------------:|:-------------:|:-------------------------:| +| Dragon | Unicorn | Unicorn | +| Dragon | Dragon | Unicorn | +| Dragon | Phoenix | Unicorn | +| Dragon | Unicorn | Unicorn | +| Dragon | Dragon | Dragon | +| Dragon | Phoenix. | Unicorn | +| Dragon | Dragon. | Dragon | +| Dragon | Phoenix | Dragon | +| Dragon | Dragon | Dragon | +| Dragon | Unicorn. | Dragon | + +In this example, the model generates a response considering only the top tokens that cumulatively reach a 50% probability threshold. This ensures that the output keeps some uniform diversity while still taking only the best tokens, in this case only 2 tokens reach the 50% threshold. + +
+ +
+ +Penalties + +# Presence/Frequency Penalty + +## Presence Penalty + +**Presence Penalty** determines how much the model penalizes the repetition of words or phrases. It encourages the model to use a wider variety of words and phrases, making the output more diverse and creative. + +- **Range**: [-2, 2] +- **Default**: 0 + +A higher presence penalty encourages the model to avoid repeating words or phrases that have already appeared in the output, ensuring a more varied and creative text. + +The presence penalty specifically is a **one-time adjustment** applied to all tokens that have been used at least once. It reduces the likelihood of repeating any token that has already appeared. This encourages the model to use a diverse range of tokens, promoting creativity and variety in the output. + +## Frequency Penalty + +**Frequency Penalty** is a parameter that penalizes the repetition of words based on their frequency in the generated text. It helps to promote diversity and reduce repetition in the output. + +- **Range**: [-2, 2] +- **Default**: 0 + +A higher frequency penalty discourages the model from repeating words that have already appeared frequently in the output. This ensures that the generated text is more varied and less repetitive. + +The frequency penalty specifically is a value that increases with the frequency of a token's appearance in the generated text, **an accumulative penalty**, the more the token is sampled the higher the penalty. It reduces the likelihood of repeating any token that has already appeared frequently. This ensures that the generated text is more varied and less repetitive. + +### Differences Between Presence Penalty and Frequency Penalty + +- **Presence Penalty**: This is a one-off additive contribution that applies to all tokens that have been sampled at least once. It encourages the model to include a diverse range of tokens in the generated text. +- **Frequency Penalty**: This is a contribution that is proportional to how often a particular token has already been sampled. It discourages the model from repeating the same words or phrases too frequently within the generated text. + +Both parameters can be tweaked to shape the quality and diversity of the generated text. The best values for these parameters can differ based on the specific task and the desired outcome. + + + + +### Example Without Presence Penalty + +Here's an example of how the output looks without the `Presence Penalty` parameter: + +```py +import os +from mistralai import Mistral + +api_key = os.environ["MISTRAL_API_KEY"] +model = "ministral-3b-latest" + +client = Mistral(api_key=api_key) + +chat_response = client.chat.complete( + model=model, + messages=[ + {"role": "user", + "content": "List 10 possible titles for a fantasy book. Give a list only."} + ], + temperature=0 +) + +print(chat_response.choices[0].message.content) +``` + +### Output Without Presence Penalty + +``` +1. "The Shattered Crown" +2. "Whispers of the Old Magic" +3. "Echoes of the Forgotten Realm" +4. "The Chronicles of the Silver Moon" +5. "The Enchanted Forest's Secret" +6. "The Last Dragon's Legacy" +7. "The Shadowed Path" +8. "The Song of the Siren's Call" +9. "The Lost City of the Stars" +10. "The Whispering Winds of Destiny" +``` + + + +### Example With Presence Penalty + +Here's an example of how to use the `Presence Penalty` parameter in the API: + +```py +import os +from mistralai import Mistral + +api_key = os.environ["MISTRAL_API_KEY"] +model = "ministral-3b-latest" + +client = Mistral(api_key=api_key) + +chat_response = client.chat.complete( + model=model, + messages=[ + {"role": "user", + "content": "List 10 possible titles for a fantasy book. Give a list only."} + ], + temperature=0, + presence_penalty=2 +) + +print(chat_response.choices[0].message.content) +``` + +### Output With Presence Penalty + +``` +1. "The Shattered Crown" +2. "Whispers of the Old Magic" +3. "Echoes of Eternity" +4. "Shadows of the Forgotten Realm" +5. "Chronicles of the Enchanted Forest" +6. "The Last Dragon's Roar" +7. "Mysteries of the Hidden City" +8. "Legends of the Lost Kingdom" +9. "The Whispering Winds" +10. "The Unseen War" +``` + +> The output list is already slightly different than the first one, being impacted by the presence penalty of present tokens. For instance we have less `The` as a token compared to without presence penalty. + + + + + +### Example With Frequency Penalty + +Here's an example of how to use the `Frequency Penalty` parameter in the API: + +```py +import os +from mistralai import Mistral + +api_key = os.environ["MISTRAL_API_KEY"] +model = "ministral-3b-latest" + +client = Mistral(api_key=api_key) + +chat_response = client.chat.complete( + model=model, + messages=[ + {"role": "user", + "content": "List 10 possible titles for a fantasy book. Give a list only."} + ], + temperature=0, + frequency_penalty=2 +) + +print(chat_response.choices[0].message.content) +``` + +### Output With Frequency Penalty + +``` +1. "The Shattered Crown" +2. "Whispers of the Old Magic" +3. "Echoes of Eternity" +4. "The Forgotten Realm" +5. "Shadows of the Lost City" +6. "Chronicles of the Enchanted Forest" +7. The Last Dragon's Roar +8."The Veil Between Worlds" +9."The Song of the Siren's Call" +10."Legends in Stone" +``` + +> The output is already more diverse than previously, however notice that after the 7th value of the list tokens such as `_"` and single quotation marks start to also be heavily affected, this shows how stronger the impact of frequency penalty is in the long term as an accumulative penalty. + + + + +**Penalties are a sensible parameter that can have a significant impact on long context and long output queries. They can also help avoid highly repetitive loops that the model may otherwise fall into, making them a valuable parameter.** + +
diff --git a/static/img/barplot.png b/static/img/barplot.png new file mode 100644 index 0000000..d8dd70b Binary files /dev/null and b/static/img/barplot.png differ diff --git a/static/img/top_barplot.png b/static/img/top_barplot.png new file mode 100644 index 0000000..0ebcc50 Binary files /dev/null and b/static/img/top_barplot.png differ diff --git a/static/img/top_barplot_black.png b/static/img/top_barplot_black.png new file mode 100644 index 0000000..876e489 Binary files /dev/null and b/static/img/top_barplot_black.png differ diff --git a/static/img/top_barplot_final.png b/static/img/top_barplot_final.png new file mode 100644 index 0000000..7e11049 Binary files /dev/null and b/static/img/top_barplot_final.png differ diff --git a/version.txt b/version.txt index ba072ea..c8fe2be 100644 --- a/version.txt +++ b/version.txt @@ -1 +1 @@ -v0.0.107 +v0.0.15