[BFCL] Add ToolACE handler for BFCL-v3 #653

XuHwang · 2024-09-23T17:30:50Z

This PR adds the handler of the ToolACE model, which finetunes LLaMA-3.1-8B-Instruct model with ToolACE dataset, obtaining wonderful points in functional calling.

We have adapted our handler compatible with version 3.

Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines.

Rank	Overall Acc	Non-Live AST Acc	Non-Live Simple AST	Non-Live Multiple AST	Non-Live Parallel AST	Non-Live Parallel Multiple AST	Non-Live Exec Acc	Non-Live Simple Exec	Non-Live Multiple Exec	Non-Live Parallel Exec	Non-Live Parallel Multiple Exec	Live Acc	Live Simple AST	Live Multiple AST	Live Parallel AST	Live Parallel Multiple AST	Multi Turn Acc	Multi Turn Base	Multi Turn Miss Func	Multi Turn Miss Param	Multi Turn Long Context	Multi Turn Composite	Relevance Detection	Irrelevance Detection
1	59.22%	89.27%	80.58%	95.00%	91.00%	90.50%	90.07%	98.29%	94.00%	88.00%	80.00%	73.21%	62.79%	74.25%	81.25%	75.00%	14.37%	21.50%	6.50%	17.50%	12.00%	N/A	85.37%	83.81%

Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot～

HuanzhiMao · 2024-09-29T09:42:54Z

Apologize for the long delay. Will definitely take a look after the ICLR deadline.

XuHwang · 2024-09-29T09:46:31Z

Apologize for the long delay. Will definitely take a look after the ICLR deadline.

Thanks! Good luck with your submission~

HuanzhiMao

Thanks for the PR @XuHwang. Given that the Team-ACE/ToolACE-8B model use the same prompt format and the same decide_ast logic as LlamaHandler, I just let it use the LlamaHandler instead; it should not have its own decode_exec method.
(FYI, using the default decode_exec will actually boost your model performance.

This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #660 2. #661 3. #683 4. #679 5. #708 6. #709 7. #701 8. #657 9. #658 10. #640 11. #653 12. #642 13. #696 14. #667 Close #662. Note: Some models (like `firefunction`, `functionary`, `microsoft/phi`)are not included in this leaderboard update because we don't have all the entries generated. We will add them back once we get the full result generated.

@HuanzhiMao

This PR adds the handler of the [ToolACE](https://huggingface.co/Team-ACE/ToolACE-8B) model, which finetunes LLaMA-3.1-8B-Instruct model with [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) dataset, obtaining wonderful points in functional calling. We have adapted our handler compatible with version 3. Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines. | **Rank** | **Overall Acc** | **Non-Live AST Acc** | **Non-Live Simple AST** | **Non-Live Multiple AST** | **Non-Live Parallel AST** | **Non-Live Parallel Multiple AST** | **Non-Live Exec Acc** | **Non-Live Simple Exec** | **Non-Live Multiple Exec** | **Non-Live Parallel Exec** | **Non-Live Parallel Multiple Exec** | **Live Acc** | **Live Simple AST** | **Live Multiple AST** | **Live Parallel AST** | **Live Parallel Multiple AST** | **Multi Turn Acc** | **Multi Turn Base** | **Multi Turn Miss Func** | **Multi Turn Miss Param** | **Multi Turn Long Context** | **Multi Turn Composite** | **Relevance Detection** | **Irrelevance Detection** | |----------|-----------------|----------------------|-------------------------|---------------------------|---------------------------|------------------------------------|-----------------------|--------------------------|----------------------------|----------------------------|-------------------------------------|--------------|---------------------|-----------------------|-----------------------|--------------------------------|--------------------|---------------------|--------------------------|---------------------------|-----------------------------|--------------------------|-------------------------|---------------------------| | 1 | 59.22% | 89.27% | 80.58% | 95.00% | 91.00% | 90.50% | 90.07% | 98.29% | 94.00% | 88.00% | 80.00% | 73.21% | 62.79% | 74.25% | 81.25% | 75.00% | 14.37% | 21.50% | 6.50% | 17.50% | 12.00% | N/A | 85.37% | 83.81% | Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot～ --------- Co-authored-by: Huanzhi (Hans) Mao <[email protected]>

Add ToolACE handler

bf1b973

XuHwang mentioned this pull request Sep 23, 2024

Add ToolACE handler #619

Closed

XuHwang added 2 commits September 26, 2024 11:45

Merge branch 'main' into toolace

fcbada4

Merge branch 'main' into toolace

1e777b6

HuanzhiMao added 2 commits October 4, 2024 18:17

let toolace use llama handler; update change log

d625ca6

Merge branch 'main' into toolace

ac30ec8

HuanzhiMao approved these changes Oct 5, 2024

View reviewed changes

HuanzhiMao mentioned this pull request Oct 5, 2024

[BFCL] Leaderboard Update, 10/21/2024 #672

Merged

HuanzhiMao added 2 commits October 4, 2024 19:25

update supported model list

b219044

Merge branch 'main' into toolace

9f45de0

ShishirPatil merged commit 7c0efb1 into ShishirPatil:main Oct 5, 2024

HuanzhiMao added the BFCL-New Model Add New Model to BFCL label Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Add ToolACE handler for BFCL-v3 #653

[BFCL] Add ToolACE handler for BFCL-v3 #653

XuHwang commented Sep 23, 2024

HuanzhiMao commented Sep 29, 2024

XuHwang commented Sep 29, 2024

HuanzhiMao left a comment

[BFCL] Add ToolACE handler for BFCL-v3 #653

[BFCL] Add ToolACE handler for BFCL-v3 #653

Conversation

XuHwang commented Sep 23, 2024

HuanzhiMao commented Sep 29, 2024

XuHwang commented Sep 29, 2024

HuanzhiMao left a comment

Choose a reason for hiding this comment