Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when converting llama3.2-1B tokenizer.model to tokenizer.bin #5913

Open
a8nova opened this issue Oct 5, 2024 · 2 comments
Open

Error when converting llama3.2-1B tokenizer.model to tokenizer.bin #5913

a8nova opened this issue Oct 5, 2024 · 2 comments

Comments

@a8nova
Copy link

a8nova commented Oct 5, 2024

🐛 Describe the bug

When following instructions from https://github.com/pytorch/executorch/blob/main/examples/demo-apps/apple_ios/LLaMA/docs/delegates/xnnpack_README.md and when trying to convert llama3.2-1B tokenizer to bin you will see below error:

(et_xnnpack) /content/HI/executorch# python -m extension.llm.tokenizer.tokenizer -t /root/.llama/checkpoints/Llama3.2-1B/tokenizer.model -o tokenizer.bin
Traceback (most recent call last):
  File "/usr/local/envs/et_xnnpack/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/envs/et_xnnpack/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/content/HI/executorch/extension/llm/tokenizer/tokenizer.py", line 140, in <module>
    t = Tokenizer(args.tokenizer_model)
  File "/content/HI/executorch/extension/llm/tokenizer/tokenizer.py", line 26, in __init__
    self.sp_model = SentencePieceProcessor(model_file=model_path)
  File "/usr/local/envs/et_xnnpack/lib/python3.10/site-packages/sentencepiece/__init__.py", line 468, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/envs/et_xnnpack/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/usr/local/envs/et_xnnpack/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from /root/.llama/checkpoints/Llama3.2-1B/tokenizer.model
@a8nova a8nova changed the title Error when converting tokenizer.model to bin Error when converting llama3.2-1B tokenizer.model to bin Oct 5, 2024
@a8nova a8nova changed the title Error when converting llama3.2-1B tokenizer.model to bin Error when converting llama3.2-1B tokenizer.model to tokenizer.bin Oct 5, 2024
@HSANGLEE
Copy link

HSANGLEE commented Oct 8, 2024

Dear @a8nova

As far as I know, from LLaMA3 you can just use tokenizer.model file not tokenizer.bin.

@a8nova
Copy link
Author

a8nova commented Oct 9, 2024

Thank you, @HSANGLEE. I am currently blocked by #5840

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants