Meta-Llama-3-70B-Instruct #470

longcheng183 · 2024-06-26T09:40:36Z

问一下博主，支持加速Meta-Llama-3-70B-Instruct吗，我用您的方法加速Meta-Llama-3-70B-Instruct结果回复的都是反斜杠

ztxz16 · 2024-06-26T10:52:32Z

是用的int4模型吗？这个模型int4精度好像不太够，可以试试int4g （int4分组量化）
这个模型可以不用转模型直接读取（类似下面的命令）
./main -p /yourpath/meta-llama/Meta-Llama-3-70B-Instruct/ --dtype int4g
类似这样，dtype那里可以分别试试int8, int4g, int4g256，我本地测试都是能正常输出的

longcheng183 · 2024-06-27T07:29:11Z

感谢博主，已解决，这个模型采用了int4分组量化后可用

longcheng183 · 2024-06-27T09:32:32Z

博主我再像您请教一个问题，./main -p fastllm_int4g_70B.flm 用这个命令跑，问一个问题之后他会一直回复，除非用ctrl c终止程序，如何才能达成持续性的连续问问题

ztxz16 · 2024-06-28T05:04:02Z

博主我再像您请教一个问题，./main -p fastllm_int4g_70B.flm 用这个命令跑，问一个问题之后他会一直回复，除非用ctrl c终止程序，如何才能达成持续性的连续问问题

这个模型运行的时候好像得指定 --eos_token "<|eot_id|>"，因为它模型里面定义的eos_token不是这个（官方代码里面也这么指定了）

longcheng183 · 2024-07-01T02:08:02Z

博主我再像您请教一个问题，./main -p fastllm_int4g_70B.flm 用这个命令跑，问一个问题之后他会一直回复，除非用ctrl c终止程序，如何才能达成持续性的连续问问题

这个模型运行的时候好像得指定 --eos_token "<|eot_id|>"，因为它模型里面定义的eos_token不是这个（官方代码里面也这么指定了）

感谢博主，加了这个命令后问题已解决，非常感谢您

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-Llama-3-70B-Instruct #470

Meta-Llama-3-70B-Instruct #470

longcheng183 commented Jun 26, 2024

ztxz16 commented Jun 26, 2024

longcheng183 commented Jun 27, 2024

longcheng183 commented Jun 27, 2024

ztxz16 commented Jun 28, 2024

longcheng183 commented Jul 1, 2024

Meta-Llama-3-70B-Instruct #470

Meta-Llama-3-70B-Instruct #470

Comments

longcheng183 commented Jun 26, 2024

ztxz16 commented Jun 26, 2024

longcheng183 commented Jun 27, 2024

longcheng183 commented Jun 27, 2024

ztxz16 commented Jun 28, 2024

longcheng183 commented Jul 1, 2024