-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BFCL] Add ToolACE handler for BFCL-v3 #653
Conversation
Apologize for the long delay. Will definitely take a look after the ICLR deadline. |
Thanks! Good luck with your submission~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @XuHwang. Given that the Team-ACE/ToolACE-8B
model use the same prompt format and the same decide_ast
logic as LlamaHandler
, I just let it use the LlamaHandler
instead; it should not have its own decode_exec
method.
(FYI, using the default decode_exec
will actually boost your model performance.
This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #660 2. #661 3. #683 4. #679 5. #708 6. #709 7. #701 8. #657 9. #658 10. #640 11. #653 12. #642 13. #696 14. #667 Close #662. Note: Some models (like `firefunction`, `functionary`, `microsoft/phi`)are not included in this leaderboard update because we don't have all the entries generated. We will add them back once we get the full result generated.
This PR adds the handler of the [ToolACE](https://huggingface.co/Team-ACE/ToolACE-8B) model, which finetunes LLaMA-3.1-8B-Instruct model with [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) dataset, obtaining wonderful points in functional calling. We have adapted our handler compatible with version 3. Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines. | **Rank** | **Overall Acc** | **Non-Live AST Acc** | **Non-Live Simple AST** | **Non-Live Multiple AST** | **Non-Live Parallel AST** | **Non-Live Parallel Multiple AST** | **Non-Live Exec Acc** | **Non-Live Simple Exec** | **Non-Live Multiple Exec** | **Non-Live Parallel Exec** | **Non-Live Parallel Multiple Exec** | **Live Acc** | **Live Simple AST** | **Live Multiple AST** | **Live Parallel AST** | **Live Parallel Multiple AST** | **Multi Turn Acc** | **Multi Turn Base** | **Multi Turn Miss Func** | **Multi Turn Miss Param** | **Multi Turn Long Context** | **Multi Turn Composite** | **Relevance Detection** | **Irrelevance Detection** | |----------|-----------------|----------------------|-------------------------|---------------------------|---------------------------|------------------------------------|-----------------------|--------------------------|----------------------------|----------------------------|-------------------------------------|--------------|---------------------|-----------------------|-----------------------|--------------------------------|--------------------|---------------------|--------------------------|---------------------------|-----------------------------|--------------------------|-------------------------|---------------------------| | 1 | 59.22% | 89.27% | 80.58% | 95.00% | 91.00% | 90.50% | 90.07% | 98.29% | 94.00% | 88.00% | 80.00% | 73.21% | 62.79% | 74.25% | 81.25% | 75.00% | 14.37% | 21.50% | 6.50% | 17.50% | 12.00% | N/A | 85.37% | 83.81% | Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot~ --------- Co-authored-by: Huanzhi (Hans) Mao <[email protected]>
This PR adds the handler of the ToolACE model, which finetunes LLaMA-3.1-8B-Instruct model with ToolACE dataset, obtaining wonderful points in functional calling.
We have adapted our handler compatible with version 3.
Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines.
Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot~