update hammer handler and add Hammer2.0 model #667

linqq9 · 2024-09-30T11:02:50Z

Hello, we have updated the hammer handle and added Hammer2.0 series models, including Hammer2.0-7b, Hammer2.0-3b, Hammer2.0-1.5b and Hammer2.0-0.5b. The performance on BFCL-V3 is as follows:

Model	Overall Acc	Non-live AST	Non-live Exec	Live AST	Multi Turn Acc	Relevance	Irrelevance
MadeAgents/Hammer2.0-7b (FC)	56.60	90.15	82.64	68.68	15.75	92.68	68.20
MadeAgents/Hammer2.0-1.5b (FC)	51.94	84.31	81.80	63.17	11.38	92.68	61.83
MadeAgents/Hammer2.0-3b (FC)	49.88	86.77	80.25	66.06	0.50	92.68	68.59
MadeAgents/Hammer2.0-0.5b (FC)	39.51	67.00	65.73	51.62	0.00	87.80	67.00

ShishirPatil · 2024-10-01T18:27:29Z

Thank you for the PR @linqq9 We will review this PR tomorrow PST :)

HuanzhiMao

Hey @linqq9, Thanks for the PR!

Question regarding the _format_prompt function:

In these lines, why we remove all those <|im_start|> <|im_end|> tags? The chat template for Hammer2.0 on huggingface does include them.
What happened to the system prompts here? Previously they were contacted in the task instruction section, but now it seems that they are all thrown away.
Why do we need to special handle the situation when the length of the prompt is 2 (code here)?

linqq9 · 2024-10-06T00:27:01Z

Hey @linqq9, Thanks for the PR!

Question regarding the _format_prompt function:

In these lines, why we remove all those <|im_start|> <|im_end|> tags? The chat template for Hammer2.0 on huggingface does include them.

What happened to the system prompts here? Previously they were contacted in the task instruction section, but now it seems that they are all thrown away.

Why do we need to special handle the situation when the length of the prompt is 2 (code here)?

@HuanzhiMao hi，Thanks for your questions. Here are my responses:
1·. Regarding the <|im_start|> <|im_end|> tags, in _query_prompting, we use the client.chat.completions.create function. This function already includes handling of these special symbols for the user's provided content.
2. As for the system prompts, the content in the system is later included in the generated content. Hence, there is no need to add it redundantly and it is removed.
3. We use the length of the prompt being 2 to determine if historical information is included. When the length of the prompt is 2 (system and user), it is considered that there is no historical information. If it is greater than 2, it means historical information is present.

linqq9 · 2024-10-09T05:34:28Z

Hi @HuanzhiMao, Do you have any other questions?

HuanzhiMao · 2024-10-10T03:17:53Z

I will submit a PR to your branch.

linqq9 · 2024-10-10T03:28:25Z

I will submit a PR to your branch.

ok, thanks!

HuanzhiMao · 2024-10-11T06:28:59Z

Hi @linqq9, I have submitted a PR to your branch. MadeAgents#2
Things I changed:

Let HammerHandler inherit from OSSHandler instead and copy over any necessary decoding logic from the SalesforceHandler. This simplifies things a lot.
Change to use completions endpoint instead of chat.completions. This won't affect the score.
Since Hammer doesn't take user-supplied system message, we turn any system message into user message (only message role change). This affects the live categories, and is also what we do to other models in similar situations.
Since Hammer has its own system prompt, we don't need to add the default BFCL system prompt in _pre_query_processing_prompting.

ps, after the change, the score is roughly the same as you reported for MadeAgents/Hammer2.0-7b (FC).

PR 667 Patch

HuanzhiMao

LGTM

linqq9 · 2024-10-11T07:17:11Z

Hi @linqq9, I have submitted a PR to your branch. MadeAgents#2 Things I changed:

Let HammerHandler inherit from OSSHandler instead and copy over any necessary decoding logic from the SalesforceHandler. This simplifies things a lot.

Change to use completions endpoint instead of chat.completions. This won't affect the score.

Since Hammer doesn't take user-supplied system message, we turn any system message into user message (only message role change). This affects the live categories, and is also what we do to other models in similar situations.

Since Hammer has its own system prompt, we don't need to add the default BFCL system prompt in _pre_query_processing_prompting.

ps, after the change, the score is roughly the same as you reported for MadeAgents/Hammer2.0-7b (FC).

@HuanzhiMao Thank you for your modification！

This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #660 2. #661 3. #683 4. #679 5. #708 6. #709 7. #701 8. #657 9. #658 10. #640 11. #653 12. #642 13. #696 14. #667 Close #662. Note: Some models (like `firefunction`, `functionary`, `microsoft/phi`)are not included in this leaderboard update because we don't have all the entries generated. We will add them back once we get the full result generated.

Hello, we have updated the hammer handle and added Hammer2.0 series models, including [Hammer2.0-7b](https://huggingface.co/MadeAgents/Hammer2.0-7b), [Hammer2.0-3b](https://huggingface.co/MadeAgents/Hammer2.0-3b), [Hammer2.0-1.5b](https://huggingface.co/MadeAgents/Hammer2.0-1.5b) and [Hammer2.0-0.5b](https://huggingface.co/MadeAgents/Hammer2.0-0.5b). The performance on BFCL-V3 is as follows: | Model | Overall Acc | Non-live AST | Non-live Exec | Live AST | Multi Turn Acc | Relevance | Irrelevance | |--------------------------------|-------------|--------------|---------------|----------|----------------|-----------|-------------| | MadeAgents/Hammer2.0-7b (FC) | 56.60 | 90.15 | 82.64 | 68.68 | 15.75 | 92.68 | 68.20 | | MadeAgents/Hammer2.0-1.5b (FC) | 51.94 | 84.31 | 81.80 | 63.17 | 11.38 | 92.68 | 61.83 | | MadeAgents/Hammer2.0-3b (FC) | 49.88 | 86.77 | 80.25 | 66.06 | 0.50 | 92.68 | 68.59 | | MadeAgents/Hammer2.0-0.5b (FC) | 39.51 | 67.00 | 65.73 | 51.62 | 0.00 | 87.80 | 67.00 | --------- Co-authored-by: linqiqiang1 <[email protected]> Co-authored-by: Huanzhi (Hans) Mao <[email protected]>

update hammer handler and add Hammer2.0 model

5d7dc1c

ShishirPatil requested a review from HuanzhiMao October 1, 2024 18:27

linqq9 added 2 commits October 5, 2024 08:52

Merge branch 'main' into main

9fb3675

Merge branch 'main' into main

29bf531

HuanzhiMao requested changes Oct 6, 2024

View reviewed changes

Merge branch 'main' into main

f269ee4

linqq9 requested a review from HuanzhiMao October 7, 2024 15:14

linqq9 and others added 3 commits October 10, 2024 19:14

Merge branch 'main' into main

1da09c2

update hammer handler

91667ee

update change log

2f3030a

Merge pull request #2 from HuanzhiMao/667-patch

eb96db3

PR 667 Patch

HuanzhiMao approved these changes Oct 11, 2024

View reviewed changes

ShishirPatil merged commit 79c50ab into ShishirPatil:main Oct 15, 2024

HuanzhiMao mentioned this pull request Oct 17, 2024

[BFCL] Leaderboard Update, 10/21/2024 #672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update hammer handler and add Hammer2.0 model #667

update hammer handler and add Hammer2.0 model #667

linqq9 commented Sep 30, 2024

ShishirPatil commented Oct 1, 2024

HuanzhiMao left a comment

linqq9 commented Oct 6, 2024 •

edited

Loading

linqq9 commented Oct 9, 2024

HuanzhiMao commented Oct 10, 2024

linqq9 commented Oct 10, 2024

HuanzhiMao commented Oct 11, 2024

HuanzhiMao left a comment

linqq9 commented Oct 11, 2024

update hammer handler and add Hammer2.0 model #667

update hammer handler and add Hammer2.0 model #667

Conversation

linqq9 commented Sep 30, 2024

ShishirPatil commented Oct 1, 2024

HuanzhiMao left a comment

Choose a reason for hiding this comment

linqq9 commented Oct 6, 2024 • edited Loading

linqq9 commented Oct 9, 2024

HuanzhiMao commented Oct 10, 2024

linqq9 commented Oct 10, 2024

HuanzhiMao commented Oct 11, 2024

HuanzhiMao left a comment

Choose a reason for hiding this comment

linqq9 commented Oct 11, 2024

linqq9 commented Oct 6, 2024 •

edited

Loading