Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Add ToolACE handler for BFCL-v3 #653

Merged
merged 7 commits into from
Oct 5, 2024

Conversation

XuHwang
Copy link
Contributor

@XuHwang XuHwang commented Sep 23, 2024

This PR adds the handler of the ToolACE model, which finetunes LLaMA-3.1-8B-Instruct model with ToolACE dataset, obtaining wonderful points in functional calling.

We have adapted our handler compatible with version 3.

Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines.

Rank Overall Acc Non-Live AST Acc Non-Live Simple AST Non-Live Multiple AST Non-Live Parallel AST Non-Live Parallel Multiple AST Non-Live Exec Acc Non-Live Simple Exec Non-Live Multiple Exec Non-Live Parallel Exec Non-Live Parallel Multiple Exec Live Acc Live Simple AST Live Multiple AST Live Parallel AST Live Parallel Multiple AST Multi Turn Acc Multi Turn Base Multi Turn Miss Func Multi Turn Miss Param Multi Turn Long Context Multi Turn Composite Relevance Detection Irrelevance Detection
1 59.22% 89.27% 80.58% 95.00% 91.00% 90.50% 90.07% 98.29% 94.00% 88.00% 80.00% 73.21% 62.79% 74.25% 81.25% 75.00% 14.37% 21.50% 6.50% 17.50% 12.00% N/A 85.37% 83.81%

Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot~

@XuHwang XuHwang mentioned this pull request Sep 23, 2024
@HuanzhiMao
Copy link
Collaborator

Apologize for the long delay. Will definitely take a look after the ICLR deadline.

@XuHwang
Copy link
Contributor Author

XuHwang commented Sep 29, 2024

Apologize for the long delay. Will definitely take a look after the ICLR deadline.

Thanks! Good luck with your submission~

Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @XuHwang. Given that the Team-ACE/ToolACE-8B model use the same prompt format and the same decide_ast logic as LlamaHandler, I just let it use the LlamaHandler instead; it should not have its own decode_exec method.
(FYI, using the default decode_exec will actually boost your model performance.

@ShishirPatil ShishirPatil merged commit 7c0efb1 into ShishirPatil:main Oct 5, 2024
@HuanzhiMao HuanzhiMao added the BFCL-New Model Add New Model to BFCL label Oct 5, 2024
ShishirPatil pushed a commit that referenced this pull request Oct 21, 2024
This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #660 
2. #661
3. #683
4. #679
5. #708 
6. #709
7. #701
8. #657 
9. #658 
10. #640 
11. #653
12. #642 
13. #696 
14. #667

Close #662.

Note: Some models (like `firefunction`, `functionary`,
`microsoft/phi`)are not included in this leaderboard update because we
don't have all the entries generated. We will add them back once we get
the full result generated.
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 11, 2024
This PR adds the handler of the
[ToolACE](https://huggingface.co/Team-ACE/ToolACE-8B) model, which
finetunes LLaMA-3.1-8B-Instruct model with
[ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) dataset,
obtaining wonderful points in functional calling.

We have adapted our handler compatible with version 3.

Here are the results of the version evaluated in our machine
(4*v100-32GB). We also found that the results would change in different
machines.

| **Rank** | **Overall Acc** | **Non-Live AST Acc** | **Non-Live Simple
AST** | **Non-Live Multiple AST** | **Non-Live Parallel AST** |
**Non-Live Parallel Multiple AST** | **Non-Live Exec Acc** | **Non-Live
Simple Exec** | **Non-Live Multiple Exec** | **Non-Live Parallel Exec**
| **Non-Live Parallel Multiple Exec** | **Live Acc** | **Live Simple
AST** | **Live Multiple AST** | **Live Parallel AST** | **Live Parallel
Multiple AST** | **Multi Turn Acc** | **Multi Turn Base** | **Multi Turn
Miss Func** | **Multi Turn Miss Param** | **Multi Turn Long Context** |
**Multi Turn Composite** | **Relevance Detection** | **Irrelevance
Detection** |

|----------|-----------------|----------------------|-------------------------|---------------------------|---------------------------|------------------------------------|-----------------------|--------------------------|----------------------------|----------------------------|-------------------------------------|--------------|---------------------|-----------------------|-----------------------|--------------------------------|--------------------|---------------------|--------------------------|---------------------------|-----------------------------|--------------------------|-------------------------|---------------------------|
| 1 | 59.22% | 89.27% | 80.58% | 95.00% | 91.00% | 90.50% | 90.07% |
98.29% | 94.00% | 88.00% | 80.00% | 73.21% | 62.79% | 74.25% | 81.25% |
75.00% | 14.37% | 21.50% | 6.50% | 17.50% | 12.00% | N/A | 85.37% |
83.81% |


Thanks for your efforts in holding such a wonderful leaderboard. We need
your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the
leaderboard. Thanks a lot~

---------

Co-authored-by: Huanzhi (Hans) Mao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-New Model Add New Model to BFCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants