-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update hammer handler and add Hammer2.0 model #667
Conversation
Thank you for the PR @linqq9 We will review this PR tomorrow PST :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @linqq9, Thanks for the PR!
Question regarding the _format_prompt
function:
- In these lines, why we remove all those
<|im_start|>
<|im_end|>
tags? The chat template for Hammer2.0 on huggingface does include them. - What happened to the system prompts here? Previously they were contacted in the
task instruction
section, but now it seems that they are all thrown away. - Why do we need to special handle the situation when the length of the prompt is 2 (code here)?
@HuanzhiMao hi,Thanks for your questions. Here are my responses: |
Hi @HuanzhiMao, Do you have any other questions? |
I will submit a PR to your branch. |
ok, thanks! |
Hi @linqq9, I have submitted a PR to your branch. MadeAgents#2
ps, after the change, the score is roughly the same as you reported for |
PR 667 Patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@HuanzhiMao Thank you for your modification! |
This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #660 2. #661 3. #683 4. #679 5. #708 6. #709 7. #701 8. #657 9. #658 10. #640 11. #653 12. #642 13. #696 14. #667 Close #662. Note: Some models (like `firefunction`, `functionary`, `microsoft/phi`)are not included in this leaderboard update because we don't have all the entries generated. We will add them back once we get the full result generated.
Hello, we have updated the hammer handle and added Hammer2.0 series models, including [Hammer2.0-7b](https://huggingface.co/MadeAgents/Hammer2.0-7b), [Hammer2.0-3b](https://huggingface.co/MadeAgents/Hammer2.0-3b), [Hammer2.0-1.5b](https://huggingface.co/MadeAgents/Hammer2.0-1.5b) and [Hammer2.0-0.5b](https://huggingface.co/MadeAgents/Hammer2.0-0.5b). The performance on BFCL-V3 is as follows: | Model | Overall Acc | Non-live AST | Non-live Exec | Live AST | Multi Turn Acc | Relevance | Irrelevance | |--------------------------------|-------------|--------------|---------------|----------|----------------|-----------|-------------| | MadeAgents/Hammer2.0-7b (FC) | 56.60 | 90.15 | 82.64 | 68.68 | 15.75 | 92.68 | 68.20 | | MadeAgents/Hammer2.0-1.5b (FC) | 51.94 | 84.31 | 81.80 | 63.17 | 11.38 | 92.68 | 61.83 | | MadeAgents/Hammer2.0-3b (FC) | 49.88 | 86.77 | 80.25 | 66.06 | 0.50 | 92.68 | 68.59 | | MadeAgents/Hammer2.0-0.5b (FC) | 39.51 | 67.00 | 65.73 | 51.62 | 0.00 | 87.80 | 67.00 | --------- Co-authored-by: linqiqiang1 <[email protected]> Co-authored-by: Huanzhi (Hans) Mao <[email protected]>
Hello, we have updated the hammer handle and added Hammer2.0 series models, including Hammer2.0-7b, Hammer2.0-3b, Hammer2.0-1.5b and Hammer2.0-0.5b. The performance on BFCL-V3 is as follows: