You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you all for very nice work. I would like to use this wonderful work to re-produce the evaluation performance reported by qwen 2.5 and llama 3.1. However, I encounter several problems as shown below:
the performance of llama3.1-8b-instruct on gsm8k-cot-zeroshot
the performance of qwen-2.5-14b-instruct on gsm8k-cot-zeroshot
the scores are much lower compared with original reports, especially for llama3.1-8b-instruct
is there anything wrong? should I change dtype, batch_size, and also template / max_tokens or others?
besides that, when i use this command to run gsm8k, the program just get stuck without running the evaluation after loading the model.
The text was updated successfully, but these errors were encountered:
Hi, thank you all for very nice work. I would like to use this wonderful work to re-produce the evaluation performance reported by qwen 2.5 and llama 3.1. However, I encounter several problems as shown below:
the performance of llama3.1-8b-instruct on gsm8k-cot-zeroshot
the performance of qwen-2.5-14b-instruct on gsm8k-cot-zeroshot
the scores are much lower compared with original reports, especially for llama3.1-8b-instruct
I directly use command like
lm_eval --model hf --tasks gsm8k_cot_zeroshot --model_args pretrained=/pathto/Qwen2.5-14B-Instruct,parallelize=True --batch_size 64 --output_path ./results --log_samples
is there anything wrong? should I change dtype, batch_size, and also template / max_tokens or others?
besides that, when i use this command to run gsm8k, the program just get stuck without running the evaluation after loading the model.
The text was updated successfully, but these errors were encountered: