Releases · tatsu-lab/alpaca_eval

17 Aug 23:39

github-actions

v0.6.5

2990c4d

Release v0.6.5 Latest

Latest

What's Changed

Add Llama-3-Instruct-8B-WPO-HB-v2 to AlpacaEval by @wzhouad in #377
[ENH] add llama 3.1 by @YannDubs in #378
[ENH] add example for LLama 3 vllm by @YannDubs in #381
Add Infinity-Instruct-7M-0729-Llama3_1-70B, Infinity-Instruct-7M-0729-Llama3_1-8B, Infinity-Instruct-7M-0729-mistral-7B to AlpacaEval by @cszhengyh in #383
Add gemma-2-9b-it-WPO-HB to AlpacaEval by @wzhouad in #384
Add link to gemma-2-9b-it-WPO-HB by @wzhouad in #385
Change the name of the Infinity-Instruct-7M-0729-Models to Infinity-Instruct-7M-Gen-Models by @cszhengyh in #387
Add blendaxai-gm-l3-v35 to AlpacaEval by @ym-blendax-ai in #389
[ENH] OpenAI use tools instead of functions by @YannDubs in #391
[ENH] enable base_dir to be a list by @YannDubs in #392
[ENH] add mistral v0.3, Qwen2 70b, gtp4 mini by @YannDubs in #393

New Contributors

@wzhouad made their first contribution in #377
@ym-blendax-ai made their first contribution in #389

Full Changelog: v0.6.4...v0.6.5

Contributors

wzhouad, YannDubs, and 2 other contributors

Assets 2

18 Jul 18:01

github-actions

v0.6.4

d7747c3

Release v0.6.4

What's Changed

Add SPPO-Llama-3-Instruct-8B-PairRM to AlpacaEval by @Edward-Sun in #354
Add Infinity-Instruct-3M-0613-Llama3-70B to AlpacaEval by @cszhengyh in #358
Add SPPO-Gemma-2-9B-It-PairRM to AlpacaEval by @angelahzyuan in #359
Add Infinity-Instruct-3M-0625-Models to AlpacaEval by @cszhengyh in #364
Add Higgs Llama3-70B V2 Results by @sxjscience in #367
Added Ghost 8B Beta (d0x5) model by @lh0x00 in #366
Add gemma-2-9b-it-SimPO and gemma-2-9b-it-DPO to AlpacaEval by @xiamengzhou in #368
[ENH] add CI test for unwanted files by @YannDubs in #369
update model links by @xiamengzhou in #370
[ENH] add the code to compute instruction_following by @YannDubs in #371
[ENH] adding simplified glm by @YannDubs in #372
[BUG] backward compatibility vllm do_sample -> use_beam_search by @YannDubs in #373

New Contributors

@angelahzyuan made their first contribution in #359
@sxjscience made their first contribution in #367

Full Changelog: v0.6.3...v0.6.4

Contributors

sxjscience, lh0x00, and 5 other contributors

Assets 2

24 Jun 00:58

github-actions

v0.6.3

16199f3

Release v0.6.3

What's Changed

Add the evaluation result for our latest model by @hendrydong in #286
Add Ghost 7B Alpha to AlpacaEval by @lh0x00 in #288
Add link for FsfairX-Zephyr-Chat-v0.1 by @hendrydong in #289
add Qwen1.5-110B-Chat self-report results by @Lukeming-tsinghua in #291
[ENH] verifying all the qwens by @YannDubs in #292
Enable analyzing evaluators/annotators on data without multiple generator models by @rdnfn in #293
Add Storm-7B to AlpacaEval by @yifan123 in #294
Use verified by default by @YannDubs in #297
Add SPPO-Mistral7B-PairRM to AlpacaEval by @Edward-Sun in #298
Add ExPO results to AlpacaEval by @chujiezheng in #299
Fix typo in README.md by @tongyx361 in #302
Add Yi-Large Preview to AlpacaEval by @HyperdriveHustle in #304
"Add Mistral-7B+RAHF-DUAL+LoRA to AlpacaEval" by @LiuAmber in #307
[verified] Yi-large by @YannDubs in #309
[ADD] GPT4-o by @YannDubs in #311
[ENH] add LC SEM by @YannDubs in #317
llama3 evaluator by @zhuang-li in #314
Update README.md by @zhuang-li in #315
[CLEAN] move evaluators lb llama3 by @YannDubs in #318
[ENH] vicuna 1.5 by @YannDubs in #319
Add Llama-3-Instruct-8B-SimPO to AlpacaEval by @xiamengzhou in #320
[ENH] Use multi threading instead of processing by @YannDubs in #321
Add Aligner 2B+GPT-4 Turbo (04/09) Results by @AlignInc in #324
Add REBEL-Llama-3-8B-Instruct to AlpacaEval by @ZhaolinGao in #326
[ENH&BUG] improve VLLM by @YannDubs in #330
Add ExPO + Llama-3-Instruct-8B-SimPO results by @chujiezheng in #331
fix model link by @chujiezheng in #332
Add merlinite-7B-AOT to AlpacaEval by @imelnyk in #334
[BUG] fix bs in VLLM and add chatml by @YannDubs in #338
Add Together-MoA, Together-MoA-Lite to AlpacaEval by @IsThatYou in #342
Add Nanbeige2-16B-Chat to AlpacaEval by @yuani114 in #345
Add claude-3-5-sonnet-20240620 to AlpacaEval by @MarjovanLier in #348
[BUG] trust repo alpaca_eval by @YannDubs in #349
Add OpenPipe Mixture of Agents model to Alpaca Eval by @saum7800 in #347
Add Storm-7B, Storm-7B (best-of-64) to AlpacaEval by @yifan123 in #344
Add Infinity-Instruct-3M-0613-Mistral-7B to AlpacaEval by @cszhengyh in #351

New Contributors

@hendrydong made their first contribution in #286
@lh0x00 made their first contribution in #288
@yifan123 made their first contribution in #294
@Edward-Sun made their first contribution in #298
@chujiezheng made their first contribution in #299
@tongyx361 made their first contribution in #302
@LiuAmber made their first contribution in #307
@zhuang-li made their first contribution in #314
@xiamengzhou made their first contribution in #320
@ZhaolinGao made their first contribution in #326
@imelnyk made their first contribution in #334
@IsThatYou made their first contribution in #342
@MarjovanLier made their first contribution in #348
@saum7800 made their first contribution in #347
@cszhengyh made their first contribution in #351

Full Changelog: v0.6.2...v0.6.3

Contributors

MarjovanLier, imelnyk, and 19 other contributors

Assets 2

19 Apr 06:28

github-actions

v0.6.2

46ca37b

Release v0.6.2

What's Changed

[BUG] backward compatibility with AF by @YannDubs in #278
Add Nanbeige-Plus-Chat-v0.1 to AlpacaEval by @yuani114 in #279
Update README.md by @Dominic789654 in #280
[BUG] revert to GPT4 preview 1106 by @YannDubs in #283
Add support for analyzing evaluators with custom cross-annotations by @rdnfn in #281
[ENH] llama3 by @YannDubs in #285

New Contributors

@Dominic789654 made their first contribution in #280
@rdnfn made their first contribution in #281

Full Changelog: v0.6.1...v0.6.2

Contributors

YannDubs, yuani114, and 2 other contributors

Assets 2

13 Apr 05:40

github-actions

v0.6.1

26b6af7

Release v0.6.1

What's Changed

Add Aligner-2B+Qwen1.5-72B-Chat & Aligner-2B+Claude3 Opus to AlpacaEval by @AlignInc in #259
Supplement for Aligner by @AlignInc in #261
Add Ein-70B-v0.1 to AlpacaEval by @bin-bi in #262
Add TempNet-LLaMA2-Chat to AlpacaEval by @xumao-nju in #264
Add Conifer-7B-DPO to AlpacaEval by @liulixin29 in #267
Updating link to a super fast demo! by @kyleliang919 in #268
Add Nanbeige2-8B-Chat to AlpacaEval by @yuani114 in #274
[ENH] adding drbx and gpt4 turbo by @YannDubs in #275

New Contributors

@AlignInc made their first contribution in #259
@bin-bi made their first contribution in #262
@xumao-nju made their first contribution in #264
@liulixin29 made their first contribution in #267
@yuani114 made their first contribution in #274

Full Changelog: v0.6...v0.6.1

Contributors

bin-bi, kyleliang919, and 5 other contributors

Assets 2

20 Mar 02:50

github-actions

v0.6

f5046ae

Release v0.6

What's Changed

[DATA] Add Gemma by @YannDubs in #242
[NOTEBOOK] adding final length correction notebook. by @YannDubs in #244
add Mistral-7B-ReMax-v0.1 by @liziniu in #245
[ENH] add claude 3 by @YannDubs in #247
[ENH] add contextual by @YannDubs in #250
[ENH] add mistral large by @YannDubs in #251
Add Samba-CoE-v0.2 to AlpacaEval by @kyleliang919 in #253
Add Samba-CoE-v0.2-best-of-16 to AlpacaEval by @kyleliang919 in #256
Add Mistral-ORPO-Beta to AlpacaEval by @jiwooya1000 in #257
Yann/length correction by @YannDubs in #258

New Contributors

@liziniu made their first contribution in #245
@kyleliang919 made their first contribution in #253
@jiwooya1000 made their first contribution in #257

Full Changelog: v0.5.4...v0.6

Contributors

jiwooya1000, kyleliang919, and 2 other contributors

Assets 2

24 Feb 08:56

github-actions

v0.5.4

3c43e9d

Release v0.5.4

What's Changed

Add Qwen1.5-72B-Chat to AlpacaEval by @Lukeming-tsinghua in #226
Add claude-instant-1.2, deepseek-llm-67b-chat, wizardlm-70b, Qwen-14B-Chat (config + outputs without annotations) by @gblazex in #228
[DATA] Adding annotations for the arena models by @YannDubs in #229
Update README.md - Add missing "Y" to "ou" by @yoderj in #230
[DEV] Analyzing length-controlled metrics. by @YannDubs in #231
[DOC] add annotation interpretation by @YannDubs in #232
[DATA] add results from the Arena openai models by @YannDubs in #234
update ELO for llama-2-13b-chat-hf by @gblazex in #235
[NOTEBOOK] add length-corrected GLM by @YannDubs in #237
[ENH] add inverse mapper to make sure in and out types are the same by @YannDubs in #240
[ENH] update to allow AF to use AE by @YannDubs in #241

New Contributors

@Lukeming-tsinghua made their first contribution in #226
@yoderj made their first contribution in #230

Full Changelog: v0.5.3...v0.5.4

Contributors

gblazex, yoderj, and 2 other contributors

Assets 2

01 Feb 08:54

github-actions

v0.5.3

8779373

Release v0.5.3

What's Changed

[ENH] add mistral-medium by @YannDubs in #205
[ENH] add internlm2-chat-20b-ppo by @C1rN09 in #207
prettify "pretty_name" of internlm2 by @C1rN09 in #208
[ENH] add outputs & configs form dolphin 2.2.1 by @YannDubs in #209
Add PairRM 0.4B + Yi-34B-Chat to AlpacaEval 2.0 by @jdf-prog in #210
dolphin 2.1.1 configs.yaml by @gblazex in #212
Update README.md (small typo) by @xwinxu in #213
[TEST]: fix ordering of df by @YannDubs in #214
Add Snorkel-Mistral-PairRM-DPO (best-of-16) to Alpaca Eval 2.0 by @viethoangtranduong in #215
update InternLM2 chat template by @C1rN09 in #216
Add Starling-LM-7B-alpha, vicuna-13b-v1.5, vicuna-7b-v1.5 to AlpacaEval (config + outputs without annotations) by @gblazex in #217
[RES] add 3 models for arena correlations by @YannDubs in #218
Add xwinlm-70b-v0.3 to AlpacaEval by @nbl97 in #221
[ENH] add referenced_models locally by @YannDubs in #224

New Contributors

@C1rN09 made their first contribution in #207
@gblazex made their first contribution in #212
@xwinxu made their first contribution in #213
@viethoangtranduong made their first contribution in #215

Full Changelog: v0.5.2...v0.5.3

Contributors

gblazex, YannDubs, and 5 other contributors

Assets 2

10 Jan 23:57

github-actions

v0.5.2

83e91f3

Release v0.5.2

What's Changed

[BUG] force openai >1.5.0 by @YannDubs in #202
[WIP] precompute all leaderboard for AE2 by @YannDubs in #199
[ENH] add OpenHermes by @YannDubs in #203

Full Changelog: v0.5.1...v0.5.2

Contributors

YannDubs

Assets 2

10 Jan 06:16

github-actions

v0.5.1

91a903f

Release v0.5.1

What's Changed

[BUG] fix no OAI org id set by @YannDubs in #200

Full Changelog: v0.5.0...v0.5.1

Contributors

YannDubs

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: tatsu-lab/alpaca_eval

Release v0.6.5

What's Changed

New Contributors

Contributors

Release v0.6.4

What's Changed

New Contributors

Contributors

Release v0.6.3

What's Changed

New Contributors

Contributors

Release v0.6.2

What's Changed

New Contributors

Contributors

Release v0.6.1

What's Changed

New Contributors

Contributors

Release v0.6

What's Changed

New Contributors

Contributors

Release v0.5.4

What's Changed

New Contributors

Contributors

Release v0.5.3

What's Changed

New Contributors

Contributors

Release v0.5.2

What's Changed

Contributors

Release v0.5.1

What's Changed

Contributors