We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
想问一下怎么像 https://llmbench.ai/align/submit 这个上面的服务一样,通过一个LLM推理的csv结果文件,通过CritiqueLLM来打分评测
另外,我通过 OpenCompass 使用 CritiqueLLM 进行推理评测,683条数据中只成功解析来616条,感觉这个CritiqueLM对于格式的指令遵循好像不是很强?想问一下 https://llmbench.ai/align/submit 网站上是用的什么prompt和解析方法呢?
附上 OpenCompass 的 config:
from mmengine.config import read_base with read_base(): from .datasets.subjective.alignbench.alignbench_judgeby_critiquellm import alignbench_datasets from opencompass.models import HuggingFaceCausalLM, HuggingFace, HuggingFaceChatGLM3, OpenAI from opencompass.models.openai_api import OpenAIAllesAPIN from opencompass.partitioners import NaivePartitioner, SizePartitioner from opencompass.partitioners.sub_naive import SubjectiveNaivePartitioner from opencompass.partitioners.sub_size import SubjectiveSizePartitioner from opencompass.runners import LocalRunner from opencompass.runners import SlurmSequentialRunner from opencompass.tasks import OpenICLInferTask from opencompass.tasks.subjective_eval import SubjectiveEvalTask from opencompass.summarizers import AlignmentBenchSummarizer # -------------Inference Stage ---------------------------------------- # For subjective evaluation, we often set do sample for models from opencompass.models import VLLM _meta_template = dict( round=[ dict(role="HUMAN", begin='<|im_start|>user\n', end='<|im_end|>\n'), dict(role="BOT", begin="<|im_start|>assistant\n", end='<|im_end|>\n', generate=True), ], eos_token_id=151645, ) GPU_NUMS = 4 stop_list = ['<|im_end|>', '</s>', '<|endoftext|>'] models = [ dict( type=VLLM, abbr='xxx', path='xxx', model_kwargs=dict(tensor_parallel_size=GPU_NUMS, disable_custom_all_reduce=True, enforce_eager=True), meta_template=_meta_template, max_out_len=1024, max_seq_len=2048, batch_size=GPU_NUMS * 8, generation_kwargs=dict(temperature=0.1, top_p=0.9, skip_special_tokens=False, stop=stop_list), stop_words=stop_list, run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1), ) ] datasets = [*alignbench_datasets] # -------------Evalation Stage ---------------------------------------- ## ------------- JudgeLLM Configuration api_meta_template = dict( round=[ dict(role='HUMAN', api_role='HUMAN'), dict(role='BOT', api_role='BOT', generate=True), ], ) judge_models = [ dict( type=VLLM, abbr='CritiqueLLM', path='/xxx/models/CritiqueLLM', model_kwargs=dict(tensor_parallel_size=GPU_NUMS, disable_custom_all_reduce=True, enforce_eager=True), meta_template=_meta_template, max_out_len=1024, max_seq_len=2048, batch_size=GPU_NUMS * 8, generation_kwargs=dict(temperature=0.1, top_p=0.9, skip_special_tokens=False, stop=stop_list), run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1), ) ] ## ------------- Evaluation Configuration eval = dict( partitioner=dict(type=SubjectiveNaivePartitioner, models=models, judge_models=judge_models), runner=dict(type=LocalRunner, max_num_workers=16, task=dict(type=SubjectiveEvalTask)), ) summarizer = dict(type=AlignmentBenchSummarizer) work_dir = 'outputs/alignment_bench/'
The text was updated successfully, but these errors were encountered:
No branches or pull requests
想问一下怎么像 https://llmbench.ai/align/submit 这个上面的服务一样,通过一个LLM推理的csv结果文件,通过CritiqueLLM来打分评测
另外,我通过 OpenCompass 使用 CritiqueLLM 进行推理评测,683条数据中只成功解析来616条,感觉这个CritiqueLM对于格式的指令遵循好像不是很强?想问一下 https://llmbench.ai/align/submit 网站上是用的什么prompt和解析方法呢?
附上 OpenCompass 的 config:
The text was updated successfully, but these errors were encountered: