Skip to content

Commit

Permalink
add groq tool use models benchmark (#134)
Browse files Browse the repository at this point in the history
  • Loading branch information
tybalex authored Jul 17, 2024
1 parent 447803c commit 57f9a85
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/docs/benchmark.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Some of the LLMs above require using custom libraries to post-process LLM genera

`functionary-small-v2.5` and `functionary-medium-v3.0` models are tested using [MeetKai's functionary](https://github.com/MeetKai/functionary?tab=readme-ov-file#setup) with the vllm framework. For each model, we compared the results with functionary's `Grammar Sampling` feature enabled and disabled, taking the highest score from either configuration. The `functionary-small-v2.5` model achieved a higher score than the `functionary-medium-v3.0` model, primarily due to the medium model exhibiting more hallucinations in some of our more advanced test cases.

`groq/Llama-3-Groq-8B-Tool-Use` and `groq/Llama-3-Groq-70B-Tool-Use` are tested using [groq's API](https://console.groq.com/docs/tool-use).

:::::

`Nexusflow/NexusRaven-V2-13B` and `gorilla-llm/gorilla-openfunctions-v2` don't accept tool observations, the result of running a tool or function once the LLM calls it, so we appended the observation to the prompt.
20 changes: 20 additions & 0 deletions docs/src/components/BenchmarkTable.js
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,26 @@ const data = [
gsm8k: '66.11',
math: '20.54',
mtBench:'7.09',
},
{
model: 'groq/Llama-3-Groq-8B-Tool-Use',
params: 8.03,
functionCalling: '45.70%',
mmlu: '-',
gpqa: '-',
gsm8k: '-',
math: '-',
mtBench:'-',
},
{
model: 'groq/Llama-3-Groq-70B-Tool-Use',
params: 70.6,
functionCalling: '74.29%',
mmlu: '-',
gpqa: '-',
gsm8k: '-',
math: '-',
mtBench:'-',
}
];

Expand Down

0 comments on commit 57f9a85

Please sign in to comment.