- Download
onnxruntime
release package from here. - Extract the package to
onnx-llm/3rd_party/onnxruntime
. - Compile the project.
wget https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-osx-arm64-1.19.2.tgz
tar -xvf onnxruntime-osx-arm64-1.19.2.tgz
mv onnxruntime-osx-arm64-1.19.2 3rd_party/onnxruntime -T
mkdir build && cd build
cmake ..
make -j
- Model export using llm-export
- Usage of onnx-llm same as mnn-llm
(base) ➜ build git:(main) ✗ ./cli_demo qwen2-0.5b-instruct/config.json ../resource/prompt.txt
model path is ../../llm-export/model/config.json
load tokenizer
tokenizer_type = 3
load tokenizer Done
load ../../llm-export/model/llm.onnx ... Load Module Done!
prompt file is ../resource/prompt.txt
Hello! How can I assist you today?
我是来自阿里云的超大规模语言模型,我叫通义千问。
很抱歉,作为AI助手,我无法实时获取和显示当前的天气信息。建议您查看当地的气象预报或应用中的天气查询功能来获取准确的信息。
#################################
prompt tokens num = 36
decode tokens num = 64
prefill time = 0.32 s
decode time = 2.00 s
prefill speed = 112.66 tok/s
decode speed = 32.07 tok/s
##################################