You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on function calling methods, however i face challenged in the decode function as the output has not been consistent sometimes it produces the parameters required and other it errors as special_token_ policy error despite of turning it to ignore "1"
Here is my snippet, while use Nemo-Instruct-2407:
model = Transformer.from_folder(setup.mistral_models_path)
#tokenizer = MistralTokenizer.from_file(f"{setup.mistral_models_path}/tekken.json")
tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer.special_token_policy = 1;
get_flow_definition = {
"type": "function",
"function": {
"name": "get_flow",
"description": "Get a flow using the id",
"parameters": {
"type": "object",
"properties": {
"flow_id": {
"type": "string",
"description": "id of the flow",
},
},
"required": ["flow_id"],
},
},
}
trial = [get_flow_definition]
readyTools = []
for tool in trial:
if isinstance(tool, dict) and 'function' in tool:
newTool =Tool(
function = Function(
name=tool["function"]["name"],
description=tool["function"]["description"],
parameters=tool["function"]["parameters"]
)
)
readyTools.append(newTool)
print(readyTools)
prompt = "can you get me a flow?"
messges = [
UserMessage(content=prompt)
]
completion_request = ChatCompletionRequest(
tools=readyTools,
messages=[UserMessage(content=prompt)],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)
My issue due to output creates fluctuation, most of the time produces the below error:
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 48
46 out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
47 tokenizer.special_token_policy = 1;
---> 48 result = tokenizer.decode(out_tokens[0])
49 print(result)
File [~/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/mistral.py:148](http://147.185.40.32:20045/lab/tree/ai-worker/src/v1/ai/model/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/mistral.py#line=147), in MistralTokenizer.decode(self, tokens)
147 def decode(self, tokens: List[int]) -> str:
--> 148 return self.instruct_tokenizer.decode(tokens)
File [~/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/sentencepiece.py:200](http://147.185.40.32:20045/lab/tree/ai-worker/src/v1/ai/model/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/sentencepiece.py#line=199), in InstructTokenizerBase.decode(self, tokens)
199 def decode(self, tokens: List[int]) -> str:
--> 200 return self.tokenizer.decode(tokens)
File [~/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/tekken.py:234](http://147.185.40.32:20045/lab/tree/ai-worker/src/v1/ai/model/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/tekken.py#line=233), in Tekkenizer.decode(self, tokens)
233 def decode(self, tokens: List[int]) -> str:
--> 234 return "".join(self._decode_all(tokens, special_token_policy=self._special_token_policy))
File [~/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/tekken.py:203](http://147.185.40.32:20045/lab/tree/ai-worker/src/v1/ai/model/mambaforge/envs/env/lib/python3.11/site-packages/mistral_common/tokens/tokenizers/tekken.py#line=202), in Tekkenizer._decode_all(self, tokens, special_token_policy)
201 if is_special:
202 if special_token_policy == SpecialTokenPolicy.RAISE:
--> 203 raise ValueError(
204 f"Decoding `tokens` that contain special tokens ({list(group)}) is not allowed. \n"
205 "Either make sure `tokens` do not include any special tokens or, "
206 "if you want to decode `tokens` that includes special tokens, "
207 "change the tokenizer's special token policy to IGNORE or KEEP: \n"
208 "```\nfrom mistral_common.tokens.tokenizers.mistral import MistralTokenizer"
209 "\nfrom mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy"
210 "\n\ntokenizer = MistralTokenizer.v3(is_tekken=True)"
211 "\ntokenizer.special_token_policy = SpecialTokenPolicy.IGNORE # or SpecialTokenPolicy.KEEP"
212 "\n```"
213 )
214 elif special_token_policy == SpecialTokenPolicy.KEEP:
215 decoded.extend(self._all_special_tokens[t] for t in group)
ValueError: Decoding `tokens` that contain special tokens ([9]) is not allowed.
Either make sure `tokens` do not include any special tokens or, if you want to decode `tokens` that includes special tokens, change the tokenizer's special token policy to IGNORE or KEEP:
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer.special_token_policy = SpecialTokenPolicy.IGNORE # or SpecialTokenPolicy.KEEP
```
### Expected Behavior
Expected is to produce the params of the flow id required
### Additional Context
_No response_
### Suggested Solutions
_No response_
The text was updated successfully, but these errors were encountered:
It consistently worked when i manually removed the special token causes this issue [9] with the below line filtered_tokens = [token for token in out_tokens[0] if token != 9]
I still think we need a valid solution for this, if possible
Thanks for the issue - can you please make sure to post a fully, reproducible code snippet that I can copy-paste into a Python shell and it will run correctly.
For the above code snippet, I don't know exactly where you took the model weights from. Also Transformer and MistralTokenizer are not imported so the code snippet is not runable.
Can you try to post a complete code snippet please?
Python -VV
Pip Freeze
Reproduction Steps
I am working on function calling methods, however i face challenged in the decode function as the output has not been consistent sometimes it produces the parameters required and other it errors as special_token_ policy error despite of turning it to ignore "1"
Here is my snippet, while use Nemo-Instruct-2407:
My issue due to output creates fluctuation, most of the time produces the below error:
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer.special_token_policy = SpecialTokenPolicy.IGNORE # or SpecialTokenPolicy.KEEP
The text was updated successfully, but these errors were encountered: