-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
GitHub Actions
committed
Sep 17, 2024
1 parent
c71d46a
commit a4f766c
Showing
11 changed files
with
6,341 additions
and
7,368 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,374 @@ | ||
--- | ||
id: vision | ||
title: Vision | ||
sidebar_position: 2.21 | ||
--- | ||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
Our latest Pixtral 12B introduces vision capabilities, enabling it to analyze images and provide insights based on visual content in addition to text. This multimodal approach opens up new possibilities for applications that require both textual and visual understanding. | ||
|
||
## Passing an Image URL | ||
If the image is hosted online, you can simply provide the URL of the image in the request. This method is straightforward and does not require any encoding. | ||
|
||
|
||
|
||
<Tabs> | ||
<TabItem value="python" label="python" default> | ||
|
||
```python | ||
import os | ||
from mistralai import Mistral | ||
|
||
# Retrieve the API key from environment variables | ||
api_key = os.environ["MISTRAL_API_KEY"] | ||
|
||
# Specify model | ||
model = "pixtral-12b-2409" | ||
|
||
# Initialize the Mistral client | ||
client = Mistral(api_key=api_key) | ||
|
||
# Define the messages for the chat | ||
messages = [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "What's in this image?" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg" | ||
} | ||
] | ||
} | ||
] | ||
|
||
# Get the chat response | ||
chat_response = client.chat.complete( | ||
model=model, | ||
messages=messages | ||
) | ||
|
||
# Print the content of the response | ||
print(chat_response.choices[0].message.content) | ||
|
||
``` | ||
|
||
|
||
</TabItem> | ||
<TabItem value="typescript" label="typescript"> | ||
```typescript | ||
import { Mistral } from "mistralai"; | ||
|
||
const apiKey = process.env.MISTRAL_API_KEY; | ||
|
||
const mistral = new Mistral({apiKey: apiKey}); | ||
|
||
const chatResponse = await mistral.chat.complete({ | ||
model: "pixtral-12b-2409", | ||
messages: [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "What’s in this image?" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg" | ||
} | ||
] | ||
} | ||
] | ||
} | ||
); | ||
|
||
console.log('JSON:', chatResponse.choices[0].message.content) | ||
``` | ||
</TabItem> | ||
<TabItem value="curl" label="curl"> | ||
```bash | ||
curl https://api.mistral.ai/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer $MISTRAL_API_KEY" \ | ||
-d '{ | ||
"model": "pixtral-12b-2409", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "What’s in this image?" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg" | ||
} | ||
] | ||
} | ||
], | ||
"max_tokens": 300 | ||
}' | ||
``` | ||
</TabItem> | ||
</Tabs> | ||
|
||
## Passing a Base64 Encoded Image | ||
If you have an image or a set of images stored locally, you can pass them to the model in base64 encoded format. Base64 encoding is a common method for converting binary data into a text format that can be easily transmitted over the internet. This is particularly useful when you need to include images in API requests. | ||
|
||
|
||
```py | ||
import base64 | ||
import requests | ||
import os | ||
from mistralai import Mistral | ||
|
||
def encode_image(image_path): | ||
"""Encode the image to base64.""" | ||
try: | ||
with open(image_path, "rb") as image_file: | ||
return base64.b64encode(image_file.read()).decode('utf-8') | ||
except FileNotFoundError: | ||
print(f"Error: The file {image_path} was not found.") | ||
return None | ||
except Exception as e: # Added general exception handling | ||
print(f"Error: {e}") | ||
return None | ||
|
||
# Path to your image | ||
image_path = "path_to_your_image.jpg" | ||
|
||
# Getting the base64 string | ||
base64_image = encode_image(image_path) | ||
|
||
# Retrieve the API key from environment variables | ||
api_key = os.environ["MISTRAL_API_KEY"] | ||
|
||
# Specify model | ||
model = "pixtral-12b-2409" | ||
|
||
# Initialize the Mistral client | ||
client = Mistral(api_key=api_key) | ||
|
||
# Define the messages for the chat | ||
messages = [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "What's in this image?" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": f"data:image/jpeg;base64,{base64_image}" | ||
} | ||
] | ||
} | ||
] | ||
|
||
# Get the chat response | ||
chat_response = client.chat.complete( | ||
model=model, | ||
messages=messages | ||
) | ||
|
||
# Print the content of the response | ||
print(chat_response.choices[0].message.content) | ||
``` | ||
|
||
## Use cases | ||
<details> | ||
<summary><b>Understand charts</b></summary> | ||
|
||
![](https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg) | ||
|
||
```bash | ||
curl https://api.mistral.ai/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer $MISTRAL_API_KEY" \ | ||
-d '{ | ||
"model": "pixtral-12b-2409", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "What’s in this image?" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg" | ||
} | ||
] | ||
} | ||
], | ||
"max_tokens": 300 | ||
}' | ||
``` | ||
|
||
Model output: | ||
``` | ||
The chart is a bar chart titled 'France's Social Divide,' comparing socio-economic indicators between disadvantaged areas and the whole of France. It comprises two sections: the first section includes three bar groups representing the percentage of people part of the working-class, unemployment rate, and percentage of 16-25-year-olds not in school and unemployed. The second section includes three bar groups representing median monthly income, poverty rate, and households living in overcrowded housing. Each bar group contains two bars: one for disadvantaged areas (red) and one for the whole of France (blue). The data indicate that disadvantaged areas have higher percentages of working-class individuals (33.5% vs. 14.5%), unemployment (18.1% vs. 7.3%), and young people not in school and unemployed (25.2% vs. 12.9%). They also show a lower median monthly income (€1,168 vs. €1,822), a higher poverty rate (43.3% vs. 15.5%), and a higher percentage of households living in overcrowded housing (22.0% vs. 8.7%). The chart highlights significant disparities in socio-economic conditions between disadvantaged areas and the rest of France, emphasizing the challenges faced by these communities. | ||
``` | ||
|
||
</details> | ||
|
||
<details> | ||
<summary><b>Compare images</b></summary> | ||
|
||
![](https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg) | ||
|
||
![](https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg) | ||
|
||
```bash | ||
curl https://api.mistral.ai/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer $MISTRAL_API_KEY" \ | ||
-d '{ | ||
"model": "pixtral-12b-2409", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "what are the differences between two images?" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": { | ||
"url": "https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg" | ||
} | ||
} | ||
] | ||
} | ||
], | ||
"max_tokens": 300 | ||
}' | ||
``` | ||
|
||
Model output: | ||
``` | ||
The first image features the Eiffel Tower surrounded by snow-covered trees and pathways, with a clear view of the tower's intricate iron lattice structure. The second image shows the Eiffel Tower in the background of a large, outdoor stadium filled with spectators, with a red tennis court in the center. The most notable differences are the setting - one is a winter scene with snow, while the other is a summer scene with a crowd at a sporting event. The mood of the first image is serene and quiet, whereas the second image conveys a lively and energetic atmosphere. These differences highlight the versatility of the Eiffel Tower as a landmark that can be enjoyed in various contexts and seasons. | ||
``` | ||
|
||
</details> | ||
|
||
<details> | ||
<summary><b>Transcribe receipts</b></summary> | ||
|
||
![](https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg) | ||
|
||
```bash | ||
curl https://api.mistral.ai/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer $MISTRAL_API_KEY" \ | ||
-d '{ | ||
"model": "pixtral-12b-2409", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "transcribe this receipt" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg" | ||
} | ||
] | ||
} | ||
] | ||
}' | ||
|
||
``` | ||
|
||
Model output: | ||
``` | ||
\nDine-In\n\nReceipt Details\nDate: 02-Apr-2022\nTime: 5:01:56 PM\nCashier: Raul\n\nItems Purchased:\n1 Empanada - Beef $3.00\n1 Empanada - Cheese $3.00\n1 Empanada - Chicken $3.00\n1 Tallarin Huancaina Lomo Saltado $19.99\n1 1/2 Pisco Sour $15.00\n\nSubtotal $43.99\nLocal Taxes (5.5%) $2.42\nTotal $46.41\n\nMessage: IMMIGRANTS MAKE AMERICA GREAT THEY ALSO COOKED YOUR FOOD AND SERVED YOU TODAY GOD BLESS YOU\n\nOrder ID: D0BQZ3R656MDC\n\nLinks:\n- Online Ordering: https://clover.com/r/D0BQZ3R656MDC\n- Clover Privacy Policy: https://clover.com/privacy\n | ||
``` | ||
|
||
</details> | ||
|
||
|
||
<details> | ||
<summary><b>Transcribe old documents</b></summary> | ||
|
||
![](https://ciir.cs.umass.edu/irdemo/hw-demo/page_example.jpg) | ||
|
||
```bash | ||
curl https://api.mistral.ai/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-H "Authorization: Bearer $MISTRAL_API_KEY" \ | ||
-d '{ | ||
"model": "pixtral-12b-2409", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "transcribe this" | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": "https://ciir.cs.umass.edu/irdemo/hw-demo/page_example.jpg" | ||
} | ||
] | ||
} | ||
] | ||
}' | ||
|
||
``` | ||
|
||
Model output: | ||
``` | ||
# Letters Orders and Instructions December 1855\n\n**Hoag's Company, if any opportunity offers.**\n\nYou are to be particularly exact and careful in these pagineries, that there is no disgrace meet between the Returns and you Pay Roll, or those who will be strict examining into it hereafter.\n\nI am & c.\n\n*[Signed]*\nEff. | ||
``` | ||
|
||
</details> | ||
|
||
## FAQ | ||
- Can I fine-tune the image capabilities in Pixtral 12B? | ||
|
||
No, we do not currently support fine-tuning the image capabilities of Pixtral 12B. | ||
|
||
- Can I use Pixtral 12B to generate images? | ||
|
||
No, Pixtral 12B is designed to understand and analyze images, not to generate them. | ||
|
||
- What types of image files are supported? | ||
|
||
We currently support the following image formats: | ||
|
||
- PNG (.png) | ||
- JPEG (.jpeg and .jpg) | ||
- WEBP (.webp) | ||
- Non-animated GIF with only one frame (.gif) | ||
|
||
- Is there a limit to the size of the image? | ||
|
||
The current file size limit is 10Mb. | ||
|
||
- What's the maximum number images per request? | ||
|
||
The maximum number images per request via API is 8. | ||
|
||
- What is the rate limit for Pixtral 12B? | ||
|
||
For information on rate limits, please visit https://console.mistral.ai/limits/. | ||
|
||
|
||
|
Oops, something went wrong.