Tekkenizer
Tekkenizer
The new Tekkenizer class is based on Open AI's tiktoken and supports the new Mistral-Nemo model.
Tekkenizer always makes use of version 3 or higher.
Examples:
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer = MistralTokenizer.from_model("...")
Function calling (just like before)
# Import needed packages:
from mistral_common.protocol.instruct.messages import (
UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
Function,
Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
# Load Mistral tokenizer
model_name = "..."
tokenizer = MistralTokenizer.from_model(model_name)
# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris"),
],
model=model_name,
)
)
tokens, text = tokenized.tokens, tokenized.text
# Count the number of tokens
print(len(tokens))
What's Changed
- v1.3.0 by @patrickvonplaten in #27
Full Changelog: v1.3.0...v1.3.1