Releases: tatsu-lab/alpaca_eval
Releases · tatsu-lab/alpaca_eval
Release v0.6.5
What's Changed
- Add Llama-3-Instruct-8B-WPO-HB-v2 to AlpacaEval by @wzhouad in #377
- [ENH] add llama 3.1 by @YannDubs in #378
- [ENH] add example for LLama 3 vllm by @YannDubs in #381
- Add Infinity-Instruct-7M-0729-Llama3_1-70B, Infinity-Instruct-7M-0729-Llama3_1-8B, Infinity-Instruct-7M-0729-mistral-7B to AlpacaEval by @cszhengyh in #383
- Add gemma-2-9b-it-WPO-HB to AlpacaEval by @wzhouad in #384
- Add link to gemma-2-9b-it-WPO-HB by @wzhouad in #385
- Change the name of the Infinity-Instruct-7M-0729-Models to Infinity-Instruct-7M-Gen-Models by @cszhengyh in #387
- Add blendaxai-gm-l3-v35 to AlpacaEval by @ym-blendax-ai in #389
- [ENH] OpenAI use tools instead of functions by @YannDubs in #391
- [ENH] enable base_dir to be a list by @YannDubs in #392
- [ENH] add mistral v0.3, Qwen2 70b, gtp4 mini by @YannDubs in #393
New Contributors
- @wzhouad made their first contribution in #377
- @ym-blendax-ai made their first contribution in #389
Full Changelog: v0.6.4...v0.6.5
Release v0.6.4
What's Changed
- Add SPPO-Llama-3-Instruct-8B-PairRM to AlpacaEval by @Edward-Sun in #354
- Add Infinity-Instruct-3M-0613-Llama3-70B to AlpacaEval by @cszhengyh in #358
- Add SPPO-Gemma-2-9B-It-PairRM to AlpacaEval by @angelahzyuan in #359
- Add Infinity-Instruct-3M-0625-Models to AlpacaEval by @cszhengyh in #364
- Add Higgs Llama3-70B V2 Results by @sxjscience in #367
- Added Ghost 8B Beta (d0x5) model by @lh0x00 in #366
- Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval by @xiamengzhou in #368
- [ENH] add CI test for unwanted files by @YannDubs in #369
- update model links by @xiamengzhou in #370
- [ENH] add the code to compute instruction_following by @YannDubs in #371
- [ENH] adding simplified glm by @YannDubs in #372
- [BUG] backward compatibility vllm do_sample -> use_beam_search by @YannDubs in #373
New Contributors
- @angelahzyuan made their first contribution in #359
- @sxjscience made their first contribution in #367
Full Changelog: v0.6.3...v0.6.4
Release v0.6.3
What's Changed
- Add the evaluation result for our latest model by @hendrydong in #286
- Add Ghost 7B Alpha to AlpacaEval by @lh0x00 in #288
- Add link for FsfairX-Zephyr-Chat-v0.1 by @hendrydong in #289
- add Qwen1.5-110B-Chat self-report results by @Lukeming-tsinghua in #291
- [ENH] verifying all the qwens by @YannDubs in #292
- Enable analyzing evaluators/annotators on data without multiple generator models by @rdnfn in #293
- Add Storm-7B to AlpacaEval by @yifan123 in #294
- Use verified by default by @YannDubs in #297
- Add SPPO-Mistral7B-PairRM to AlpacaEval by @Edward-Sun in #298
- Add ExPO results to AlpacaEval by @chujiezheng in #299
- Fix typo in README.md by @tongyx361 in #302
- Add Yi-Large Preview to AlpacaEval by @HyperdriveHustle in #304
- "Add Mistral-7B+RAHF-DUAL+LoRA to AlpacaEval" by @LiuAmber in #307
- [verified] Yi-large by @YannDubs in #309
- [ADD] GPT4-o by @YannDubs in #311
- [ENH] add LC SEM by @YannDubs in #317
- llama3 evaluator by @zhuang-li in #314
- Update README.md by @zhuang-li in #315
- [CLEAN] move evaluators lb llama3 by @YannDubs in #318
- [ENH] vicuna 1.5 by @YannDubs in #319
- Add Llama-3-Instruct-8B-SimPO to AlpacaEval by @xiamengzhou in #320
- [ENH] Use multi threading instead of processing by @YannDubs in #321
- Add Aligner 2B+GPT-4 Turbo (04/09) Results by @AlignInc in #324
- Add REBEL-Llama-3-8B-Instruct to AlpacaEval by @ZhaolinGao in #326
- [ENH&BUG] improve VLLM by @YannDubs in #330
- Add ExPO +
Llama-3-Instruct-8B-SimPO
results by @chujiezheng in #331 - fix model link by @chujiezheng in #332
- Add merlinite-7B-AOT to AlpacaEval by @imelnyk in #334
- [BUG] fix bs in VLLM and add chatml by @YannDubs in #338
- Add Together-MoA, Together-MoA-Lite to AlpacaEval by @IsThatYou in #342
- Add Nanbeige2-16B-Chat to AlpacaEval by @yuani114 in #345
- Add claude-3-5-sonnet-20240620 to AlpacaEval by @MarjovanLier in #348
- [BUG] trust repo alpaca_eval by @YannDubs in #349
- Add OpenPipe Mixture of Agents model to Alpaca Eval by @saum7800 in #347
- Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval by @yifan123 in #344
- Add Infinity-Instruct-3M-0613-Mistral-7B to AlpacaEval by @cszhengyh in #351
New Contributors
- @hendrydong made their first contribution in #286
- @lh0x00 made their first contribution in #288
- @yifan123 made their first contribution in #294
- @Edward-Sun made their first contribution in #298
- @chujiezheng made their first contribution in #299
- @tongyx361 made their first contribution in #302
- @LiuAmber made their first contribution in #307
- @zhuang-li made their first contribution in #314
- @xiamengzhou made their first contribution in #320
- @ZhaolinGao made their first contribution in #326
- @imelnyk made their first contribution in #334
- @IsThatYou made their first contribution in #342
- @MarjovanLier made their first contribution in #348
- @saum7800 made their first contribution in #347
- @cszhengyh made their first contribution in #351
Full Changelog: v0.6.2...v0.6.3
Release v0.6.2
What's Changed
- [BUG] backward compatibility with AF by @YannDubs in #278
- Add Nanbeige-Plus-Chat-v0.1 to AlpacaEval by @yuani114 in #279
- Update README.md by @Dominic789654 in #280
- [BUG] revert to GPT4 preview 1106 by @YannDubs in #283
- Add support for analyzing evaluators with custom cross-annotations by @rdnfn in #281
- [ENH] llama3 by @YannDubs in #285
New Contributors
- @Dominic789654 made their first contribution in #280
- @rdnfn made their first contribution in #281
Full Changelog: v0.6.1...v0.6.2
Release v0.6.1
What's Changed
- Add Aligner-2B+Qwen1.5-72B-Chat & Aligner-2B+Claude3 Opus to AlpacaEval by @AlignInc in #259
- Supplement for Aligner by @AlignInc in #261
- Add Ein-70B-v0.1 to AlpacaEval by @bin-bi in #262
- Add TempNet-LLaMA2-Chat to AlpacaEval by @xumao-nju in #264
- Add Conifer-7B-DPO to AlpacaEval by @liulixin29 in #267
- Updating link to a super fast demo! by @kyleliang919 in #268
- Add Nanbeige2-8B-Chat to AlpacaEval by @yuani114 in #274
- [ENH] adding drbx and gpt4 turbo by @YannDubs in #275
New Contributors
- @AlignInc made their first contribution in #259
- @bin-bi made their first contribution in #262
- @xumao-nju made their first contribution in #264
- @liulixin29 made their first contribution in #267
- @yuani114 made their first contribution in #274
Full Changelog: v0.6...v0.6.1
Release v0.6
What's Changed
- [DATA] Add Gemma by @YannDubs in #242
- [NOTEBOOK] adding final length correction notebook. by @YannDubs in #244
- add Mistral-7B-ReMax-v0.1 by @liziniu in #245
- [ENH] add claude 3 by @YannDubs in #247
- [ENH] add contextual by @YannDubs in #250
- [ENH] add mistral large by @YannDubs in #251
- Add Samba-CoE-v0.2 to AlpacaEval by @kyleliang919 in #253
- Add Samba-CoE-v0.2-best-of-16 to AlpacaEval by @kyleliang919 in #256
- Add Mistral-ORPO-Beta to AlpacaEval by @jiwooya1000 in #257
- Yann/length correction by @YannDubs in #258
New Contributors
- @liziniu made their first contribution in #245
- @kyleliang919 made their first contribution in #253
- @jiwooya1000 made their first contribution in #257
Full Changelog: v0.5.4...v0.6
Release v0.5.4
What's Changed
- Add Qwen1.5-72B-Chat to AlpacaEval by @Lukeming-tsinghua in #226
- Add claude-instant-1.2, deepseek-llm-67b-chat, wizardlm-70b, Qwen-14B-Chat (config + outputs without annotations) by @gblazex in #228
- [DATA] Adding annotations for the arena models by @YannDubs in #229
- Update README.md - Add missing "Y" to "ou" by @yoderj in #230
- [DEV] Analyzing length-controlled metrics. by @YannDubs in #231
- [DOC] add annotation interpretation by @YannDubs in #232
- [DATA] add results from the Arena openai models by @YannDubs in #234
- update ELO for llama-2-13b-chat-hf by @gblazex in #235
- [NOTEBOOK] add length-corrected GLM by @YannDubs in #237
- [ENH] add inverse mapper to make sure in and out types are the same by @YannDubs in #240
- [ENH] update to allow AF to use AE by @YannDubs in #241
New Contributors
- @Lukeming-tsinghua made their first contribution in #226
- @yoderj made their first contribution in #230
Full Changelog: v0.5.3...v0.5.4
Release v0.5.3
What's Changed
- [ENH] add mistral-medium by @YannDubs in #205
- [ENH] add internlm2-chat-20b-ppo by @C1rN09 in #207
- prettify "pretty_name" of internlm2 by @C1rN09 in #208
- [ENH] add outputs & configs form dolphin 2.2.1 by @YannDubs in #209
- Add PairRM 0.4B + Yi-34B-Chat to AlpacaEval 2.0 by @jdf-prog in #210
- dolphin 2.1.1 configs.yaml by @gblazex in #212
- Update README.md (small typo) by @xwinxu in #213
- [TEST]: fix ordering of df by @YannDubs in #214
- Add Snorkel-Mistral-PairRM-DPO (best-of-16) to Alpaca Eval 2.0 by @viethoangtranduong in #215
- update InternLM2 chat template by @C1rN09 in #216
- Add Starling-LM-7B-alpha, vicuna-13b-v1.5, vicuna-7b-v1.5 to AlpacaEval (config + outputs without annotations) by @gblazex in #217
- [RES] add 3 models for arena correlations by @YannDubs in #218
- Add xwinlm-70b-v0.3 to AlpacaEval by @nbl97 in #221
- [ENH] add referenced_models locally by @YannDubs in #224
New Contributors
- @C1rN09 made their first contribution in #207
- @gblazex made their first contribution in #212
- @xwinxu made their first contribution in #213
- @viethoangtranduong made their first contribution in #215
Full Changelog: v0.5.2...v0.5.3