Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Yunnglin committed Aug 21, 2024
1 parent ba5870c commit e82fbbc
Show file tree
Hide file tree
Showing 16 changed files with 460 additions and 725 deletions.
553 changes: 185 additions & 368 deletions README.md

Large diffs are not rendered by default.

517 changes: 179 additions & 338 deletions README_zh.md

Large diffs are not rendered by default.

Binary file removed docs/en/_static/images/evalscope.jpeg
Binary file not shown.
Binary file added docs/en/_static/images/evalscope_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions docs/en/best_practice/experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

# Experiments

## [MMLU](https://modelscope.cn/datasets/modelscope/mmlu/summary)

### Settings: (Split: test, Total num: 13985, 0-shot)

| Model | Revision | Precision | Humanities | STEM | SocialScience | Other | WeightedAvg | Target | Delta |
|--------------------------------------------------------------------------------------------------|----------|-----------|-------------|------------|---------------|---------|-------------|-------------|--------|
| [Baichuan2-7B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary) | v1.0.2 | fp16 | 0.4111 | 0.3807 | 0.5233 | 0.504 | 0.4506 | - | |
| [Baichuan2-7B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-chat/summary) | v1.0.4 | fp16 | 0.4439 | 0.374 | 0.5524 | 0.5458 | 0.4762 | - | |
| [chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary) | v1.0.12 | fp16 | 0.3834 | 0.3413 | 0.4708 | 0.4445 | 0.4077 | 0.4546(CoT) | -4.69% |
| [chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary) | v1.0.1 | fp16 | 0.5435 | 0.5087 | 0.7227 | 0.6471 | 0.5992 | 0.614 | -1.48% |
| [internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | v1.0.1 | fp16 | 0.4005 | 0.3547 | 0.4953 | 0.4796 | 0.4297 | - | |
| [Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary) | v1.0.2 | fp16 | 0.4371 | 0.3887 | 0.5579 | 0.5437 | 0.4778 | - | |
| [Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary) | v1.0.2 | fp16 | 0.3146 | 0.3037 | 0.4134 | 0.3885 | 0.3509 | - | |
| [Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary) | v1.0.6 | bf16 | 0.5326 | 0.5397 | 0.7184 | 0.6859 | 0.6102 | - | |
| [Qwen-7B](https://modelscope.cn/models/qwen/Qwen-7B/summary) | v1.1.6 | bf16 | 0.387 | 0.4 | 0.5403 | 0.5139 | 0.4527 | - | |
| [Qwen-7B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary) | v1.1.6 | int8 | 0.4322 | 0.4277 | 0.6088 | 0.5778 | 0.5035 | - | |

- Target -- The official claimed score of the model on the dataset
- Delta -- The difference between the WeightedAvg score and the Target score


### Settings: (Split: test, Total num: 13985, 5-shot)

| Model | Revision | Precision | Humanities | STEM | SocialScience | Other | WeightedAvg | Avg | Target | Delta |
|---------------------|----------|-----------|------------|--------|---------------|--------|-------------|--------|--------------------|---------|
| Baichuan2-7B-Base | v1.0.2 | fp16 | 0.4295 | 0.398 | 0.5736 | 0.5325 | 0.4781 | 0.4918 | 0.5416 (official) | -4.98% |
| Baichuan2-7B-Chat | v1.0.4 | fp16 | 0.4344 | 0.3937 | 0.5814 | 0.5462 | 0.4837 | 0.5029 | 0.5293 (official) | -2.64% |
| chatglm2-6b | v1.0.12 | fp16 | 0.3941 | 0.376 | 0.4897 | 0.4706 | 0.4288 | 0.4442 | - | - |
| chatglm3-6b-base | v1.0.1 | fp16 | 0.5356 | 0.4847 | 0.7175 | 0.6273 | 0.5857 | 0.5995 | - | - |
| internlm-chat-7b | v1.0.1 | fp16 | 0.4171 | 0.3903 | 0.5772 | 0.5493 | 0.4769 | 0.4876 | - | - |
| Llama-2-13b-ms | v1.0.2 | fp16 | 0.484 | 0.4133 | 0.6157 | 0.5809 | 0.5201 | 0.5327 | 0.548 (official) | -1.53% |
| Llama-2-7b-ms | v1.0.2 | fp16 | 0.3747 | 0.3363 | 0.4372 | 0.4514 | 0.3979 | 0.4089 | 0.453 (official) | -4.41% |
| Qwen-14B-Chat | v1.0.6 | bf16 | 0.574 | 0.553 | 0.7403 | 0.684 | 0.6313 | 0.6414 | 0.646 (official) | -0.46% |
| Qwen-7B | v1.1.6 | bf16 | 0.4587 | 0.426 | 0.6078 | 0.5629 | 0.5084 | 0.5151 | 0.567 (official) | -5.2% |
| Qwen-7B-Chat-Int8 | v1.1.6 | int8 | 0.4697 | 0.4383 | 0.6284 | 0.5967 | 0.5271 | 0.5347 | 0.554 (official) | -1.93% |

4 changes: 2 additions & 2 deletions docs/en/get_started/installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Installation

## Install Using pip
## Method 1: Install Using pip
We recommend using conda to manage your environment and installing dependencies with pip:

1. Create a conda environment (optional)
Expand Down Expand Up @@ -31,7 +31,7 @@ from llmuses import ...
```
````

## Install from Source
## Method 2: Install from Source
1. Download the source code
```shell
git clone https://github.com/modelscope/evalscope.git
Expand Down
13 changes: 6 additions & 7 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
.. image:: _static/images/evalscope.jpeg
.. image:: _static/images/evalscope_logo.png

.. _pypi_downloads: https://pypi.org/project/evalscope
.. _github_pr: https://github.com/modelscope/evalscope/pulls

.. raw:: html

<p align="center">
<a href="`_pypi_downloads`">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/evalscope">
<a href="https://badge.fury.io/py/evalscope"><img src="https://badge.fury.io/py/evalscope.svg" alt="PyPI version" height="18"></a>
<a href="https://pypi.org/project/evalscope"><img alt="PyPI - Downloads" src="https://static.pepy.tech/badge/evalscope">
</a>
<a href="`_github_pr`">
<img src="https://img.shields.io/badge/PR-welcome-55EB99.svg"></a>
<a href='https://evalscope.readthedocs.io/zh-cn/latest/?badge=latest'>
<img src='https://readthedocs.org/projects/evalscope/badge/?version=latest' alt='Documentation Status' />
<a href='https://evalscope.readthedocs.io/en/latest/?badge=latest'>
<img src='https://readthedocs.org/projects/evalscope-en/badge/?version=latest' alt='Documentation Status' />
</a>
</p>

Expand Down Expand Up @@ -76,6 +74,7 @@ We always welcome users' PRs and Issues to improve EvalScope.
:caption: Best Practices

best_practice/swift_integration.md
best_practice/experiments.md

Index and Tables
==================
Expand Down
2 changes: 1 addition & 1 deletion docs/en/user_guides/offline_evaluation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Offline Environment Evaluation
# Offline Evaluation

By default, datasets are hosted on [ModelScope](https://modelscope.cn/datasets), which requires an internet connection to load. However, if you find yourself in an environment without internet access, you can use local datasets. Follow the steps below:

Expand Down
2 changes: 1 addition & 1 deletion docs/en/user_guides/vlmevalkit_backend.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# VLMEvalKit Evaluation Backend
# VLMEvalKit Backend

To facilitate the use of the VLMEvalKit evaluation backend, we have customized the VLMEvalKit source code, naming it `ms-vlmeval`. This version encapsulates the configuration and execution of evaluation tasks and supports installation via PyPI, allowing users to initiate lightweight VLMEvalKit evaluation tasks through EvalScope. Additionally, we support interface evaluation tasks based on the OpenAI API format, and you can deploy multi-modal model services using ModelScope [swift](https://github.com/modelscope/swift).

Expand Down
Binary file removed docs/zh/_static/images/evalscope.jpeg
Binary file not shown.
Binary file added docs/zh/_static/images/evalscope_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions docs/zh/best_practice/experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

# 实验和报告

## [MMLU](https://modelscope.cn/datasets/modelscope/mmlu/summary)

### Settings: (Split: test, Total num: 13985, 0-shot)

| Model | Revision | Precision | Humanities | STEM | SocialScience | Other | WeightedAvg | Target | Delta |
|--------------------------------------------------------------------------------------------------|----------|-----------|-------------|------------|---------------|---------|-------------|-------------|--------|
| [Baichuan2-7B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary) | v1.0.2 | fp16 | 0.4111 | 0.3807 | 0.5233 | 0.504 | 0.4506 | - | |
| [Baichuan2-7B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-chat/summary) | v1.0.4 | fp16 | 0.4439 | 0.374 | 0.5524 | 0.5458 | 0.4762 | - | |
| [chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary) | v1.0.12 | fp16 | 0.3834 | 0.3413 | 0.4708 | 0.4445 | 0.4077 | 0.4546(CoT) | -4.69% |
| [chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary) | v1.0.1 | fp16 | 0.5435 | 0.5087 | 0.7227 | 0.6471 | 0.5992 | 0.614 | -1.48% |
| [internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary) | v1.0.1 | fp16 | 0.4005 | 0.3547 | 0.4953 | 0.4796 | 0.4297 | - | |
| [Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary) | v1.0.2 | fp16 | 0.4371 | 0.3887 | 0.5579 | 0.5437 | 0.4778 | - | |
| [Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary) | v1.0.2 | fp16 | 0.3146 | 0.3037 | 0.4134 | 0.3885 | 0.3509 | - | |
| [Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary) | v1.0.6 | bf16 | 0.5326 | 0.5397 | 0.7184 | 0.6859 | 0.6102 | - | |
| [Qwen-7B](https://modelscope.cn/models/qwen/Qwen-7B/summary) | v1.1.6 | bf16 | 0.387 | 0.4 | 0.5403 | 0.5139 | 0.4527 | - | |
| [Qwen-7B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary) | v1.1.6 | int8 | 0.4322 | 0.4277 | 0.6088 | 0.5778 | 0.5035 | - | |

- Target -- The official claimed score of the model on the dataset
- Delta -- The difference between the WeightedAvg score and the Target score


### Settings: (Split: test, Total num: 13985, 5-shot)

| Model | Revision | Precision | Humanities | STEM | SocialScience | Other | WeightedAvg | Avg | Target | Delta |
|---------------------|----------|-----------|------------|--------|---------------|--------|-------------|--------|--------------------|---------|
| Baichuan2-7B-Base | v1.0.2 | fp16 | 0.4295 | 0.398 | 0.5736 | 0.5325 | 0.4781 | 0.4918 | 0.5416 (official) | -4.98% |
| Baichuan2-7B-Chat | v1.0.4 | fp16 | 0.4344 | 0.3937 | 0.5814 | 0.5462 | 0.4837 | 0.5029 | 0.5293 (official) | -2.64% |
| chatglm2-6b | v1.0.12 | fp16 | 0.3941 | 0.376 | 0.4897 | 0.4706 | 0.4288 | 0.4442 | - | - |
| chatglm3-6b-base | v1.0.1 | fp16 | 0.5356 | 0.4847 | 0.7175 | 0.6273 | 0.5857 | 0.5995 | - | - |
| internlm-chat-7b | v1.0.1 | fp16 | 0.4171 | 0.3903 | 0.5772 | 0.5493 | 0.4769 | 0.4876 | - | - |
| Llama-2-13b-ms | v1.0.2 | fp16 | 0.484 | 0.4133 | 0.6157 | 0.5809 | 0.5201 | 0.5327 | 0.548 (official) | -1.53% |
| Llama-2-7b-ms | v1.0.2 | fp16 | 0.3747 | 0.3363 | 0.4372 | 0.4514 | 0.3979 | 0.4089 | 0.453 (official) | -4.41% |
| Qwen-14B-Chat | v1.0.6 | bf16 | 0.574 | 0.553 | 0.7403 | 0.684 | 0.6313 | 0.6414 | 0.646 (official) | -0.46% |
| Qwen-7B | v1.1.6 | bf16 | 0.4587 | 0.426 | 0.6078 | 0.5629 | 0.5084 | 0.5151 | 0.567 (official) | -5.2% |
| Qwen-7B-Chat-Int8 | v1.1.6 | int8 | 0.4697 | 0.4383 | 0.6284 | 0.5967 | 0.5271 | 0.5347 | 0.554 (official) | -1.93% |

4 changes: 2 additions & 2 deletions docs/zh/get_started/installation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 安装

## 使用pip安装
## 方式1. 使用pip安装
我们推荐使用conda来管理环境,并使用pip安装依赖:
1. 创建conda环境 (可选)
```shell
Expand Down Expand Up @@ -32,7 +32,7 @@ from llmuses import ...
````


## 使用源码安装
## 方式2. 使用源码安装
1. 下载源码
```shell
git clone https://github.com/modelscope/evalscope.git
Expand Down
12 changes: 6 additions & 6 deletions docs/zh/index.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
.. image:: _static/images/evalscope.jpeg
.. image:: _static/images/evalscope_logo.png

.. _pypi_downloads: https://pypi.org/project/evalscope
.. _github_pr: https://github.com/modelscope/evalscope/pulls

.. raw:: html

<p align="center">
<a href="`_pypi_downloads`">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/evalscope">
<a href="https://badge.fury.io/py/evalscope"><img src="https://badge.fury.io/py/evalscope.svg" alt="PyPI version" height="18"></a>
<a href="https://pypi.org/project/evalscope"><img alt="PyPI - Downloads" src="https://static.pepy.tech/badge/evalscope">
</a>
<a href="`_github_pr`">
<img src="https://img.shields.io/badge/PR-welcome-55EB99.svg"></a>
<a href="https://github.com/modelscope/evalscope/pulls"><img src="https://img.shields.io/badge/PR-welcome-55EB99.svg"></a>
<a href='https://evalscope.readthedocs.io/zh-cn/latest/?badge=latest'>
<img src='https://readthedocs.org/projects/evalscope/badge/?version=latest' alt='Documentation Status' />
<img src='https://readthedocs.org/projects/evalscope/badge/?version=latest' alt='Documentation Status' />
</a>
</p>

Expand Down Expand Up @@ -78,6 +77,7 @@ EvalScope 上手路线
:caption: 最佳实践

best_practice/swift_integration.md
best_practice/experiments.md

索引与表格
==================
Expand Down
Binary file removed resources/evalscope.jpeg
Binary file not shown.
Binary file removed resources/evalscope_framework.png
Binary file not shown.

0 comments on commit e82fbbc

Please sign in to comment.