GoEx and Berkeley Function Calling Leaderboard Updates
😍 v0.3 release 🚀
Highlights
⚡️ Released GoEx: A runtime that presents abstractions for safe execution of LLM generated code, APIs, actions, etc
🏆 Updates to Berkeley Function Calling Leaderboard (aka Berkeley Tool Calling Leaderboard) : Newer models including GPT-4o, gemini-flash and 1.5-pro, Hermes-2-Pro, etc. All measured along P95 and P99 latency, and costs besides accuracy.
What's Changed
- Fix Typos in Evaluation Script and System Prompt. Identify Errors in a Dataset by @zuxin666 in #335
- BFCL April 8th Release by @HuanzhiMao in #330
- Initial goex commit by @ShishirPatil in #336
- BFCL April 9th Release (Dataset Bug Fix) by @HuanzhiMao in #338
- BFCL April 10th Release (API Sanity Check) by @HuanzhiMao in #339
- Add Support for NousResearch/Hermes-2-Pro-Mistral-7B Function Calling by @Fanjia-Yan in #327
- Update raft.py with default
p
to match paper by @ShishirPatil in #353 - GoEx Import Issues by @royh02 in #354
- BFCL April 11th Patch. Add Latency Statistics. by @HuanzhiMao in #347
- GoEx Gitignore User Credentials by @royh02 in #344
- Fix Circular Import Issue for BFCL evluation pipeline by @HuanzhiMao in #356
- Added Docker to README by @Noppapon in #355
- [Bug fix] Add Hermes-2-Pro-Mistral-7B model to UNDERSCORE_TO_DOT to parse API properly by @JasonZhu1313 in #364
- Update requirements.txt by @viniciuslazzari in #343
- Fix script argument by @ricklamers in #367
- BFCL April 16th Release by @HuanzhiMao in #366
- Log error messages from API validation by @eitanturok in #369
- Update .gitignore by @eitanturok in #370
- BFCL April 18th Release (Pipeline only) by @HuanzhiMao in #375
- Add missing argument to
OSSHandler
's_format_prompt
function by @eitanturok in #373 - Add FC + Prompt for Cohere command-r-plus by @harry-cohere in #350
- BFCL April 19th Release (Dataset & Pipeline) by @HuanzhiMao in #377
- Azure OpenAI support in raft.py by @cedricvidal in #381
- BFCL April 25th Release (New Models) by @HuanzhiMao in #386
- Colored logging configuration + displaying progress in logs by @cedricvidal in #384
- BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) by @HuanzhiMao in #390
- BFCL April 28th Release (New Model: snowflake/arctic) by @Fanjia-Yan in #397
- RAFT Recovery Mode for interruptions by @kaiwen129 in #410
- Small corrections to possible_answers for simple test category by @aastroza in #405
- BFCL May 6th Release (Dataset Bug Fix) by @HuanzhiMao in #412
- RAFT DevContainer for GitHub Codespaces by @cedricvidal in #379
- RAFT Add support for configuring separate completion and embedding endpoints + pytest by @cedricvidal in #396
- RAFT Fix arbitrary code execution vulnerability in checkpoint feature by @cedricvidal in #415
- handle parallel function calls from gemini by @vandyxiaowei in #406
- RAFT Support for chat and completion model formats by @cedricvidal in #417
- [RAFT] Edit encode prompt to include
<ANSWER>:
tag in label by @kaiwen129 in #422 - [BFCL] Patch Gemini Handler by @HuanzhiMao in #421
- BFCL May 14th Release (GPT-4o and Gemini) by @Fanjia-Yan in #426
- [BFCL] update tree_sitter version in requirements.txt by @justinwangx in #433
- Fix indentation in leaderboard README by @polm-stability in #449
- Fix breaking changes due to updated Anthropic SDK by @eitanturok in #452
New Contributors
- @zuxin666 made their first contribution in #335
- @JasonZhu1313 made their first contribution in #364
- @ricklamers made their first contribution in #367
- @eitanturok made their first contribution in #369
- @harry-cohere made their first contribution in #350
- @cedricvidal made their first contribution in #381
- @aastroza made their first contribution in #405
- @vandyxiaowei made their first contribution in #406
- @justinwangx made their first contribution in #433
- @polm-stability made their first contribution in #449
Full Changelog: v0.2...v0.3