diff --git a/README.md b/README.md index da353ebf..92597b5d 100644 --- a/README.md +++ b/README.md @@ -139,6 +139,9 @@ To cite [L2MAC](https://openreview.net/forum?id=EhrzQwsV4K) in publications, ple 2. We further evaluated L2MAC on the standard **HumanEval benchmark** and observe that it achieves a state-of-the-art score of [90.2% Pass@1](https://paperswithcode.com/sota/code-generation-on-humaneval). 3. L2MAC also works for general-purpose extensive text-based tasks, such as writing an [entire book from a single prompt](https://samholt.github.io/L2MAC/guide/use_cases/gallery.html#entire-book-italian-pasta-recipe-book). +![HumanEval](docs/public/images/human_eval.png) +

LLM-Automatic Computer (L2MAC) achieves strong performance on HumanEval coding benchmark and is currently ranked the 3rd best AI coding agent in the world on the global coding industry-standard leaderboard of HumanEval.

+ ### In depth-comparison to AutoGPT and GPT-4 #### Can L2MAC correctly perform task-oriented context management? diff --git a/docs/guide/get_started/comparison_to_autogpt.md b/docs/guide/get_started/comparison_to_autogpt.md index 6d4ab153..41036220 100644 --- a/docs/guide/get_started/comparison_to_autogpt.md +++ b/docs/guide/get_started/comparison_to_autogpt.md @@ -7,6 +7,9 @@ 2. We further evaluated L2MAC on the standard **HumanEval benchmark** and observe that it achieves a state-of-the-art score of [90.2% Pass@1](https://paperswithcode.com/sota/code-generation-on-humaneval). 3. L2MAC also works for general-purpose extensive text-based tasks, such as writing an [entire book from a single prompt](https://samholt.github.io/L2MAC/guide/use_cases/gallery.html#entire-book-italian-pasta-recipe-book). +![HumanEval](/images/human_eval.png) +

LLM-Automatic Computer (L2MAC) achieves strong performance on HumanEval coding benchmark and is currently ranked the 3rd best AI coding agent in the world on the global coding industry-standard leaderboard of HumanEval.

+ # In depth-comparison to AutoGPT and GPT-4 ## Can L2MAC correctly perform task-oriented context management? diff --git a/docs/public/images/human_eval.png b/docs/public/images/human_eval.png new file mode 100644 index 00000000..4b1f7346 Binary files /dev/null and b/docs/public/images/human_eval.png differ