title | emoji | python_version | app_file | sdk | sdk_version | pinned | tags | colorFrom | colorTo | ||
---|---|---|---|---|---|---|---|---|---|---|---|
ML.ENERGY Leaderboard |
⚡ |
3.9 |
app.py |
gradio |
3.39.0 |
true |
|
black |
black |
How much energy do LLMs consume?
This README focuses on explaining how to run the benchmark yourself. The actual leaderboard is here: https://ml.energy/leaderboard.
We instrumented Hugging Face TGI so that it measures and returns GPU energy consumption. Then, our controller server receives user prompts from the Gradio app, selects two models randomly, and streams model responses back with energy consumption.
- For models that are directly accessible in Hugging Face Hub, you don't need to do anything.
- For other models, convert them to Hugging Face format and put them in
/data/leaderboard/weights/lmsys/vicuna-13B
, for example. The last two path components (e.g.,lmsys/vicuna-13B
) are taken as the name of the model.
We have our pre-built Docker image published with the tag mlenergy/leaderboard:latest
(Dockerfile).
$ docker run -it \
--name leaderboard0 \
--gpus '"device=0"' \
-v /path/to/your/data/dir:/data/leaderboard \
-v $(pwd):/workspace/leaderboard \
mlenergy/leaderboard:latest bash
The container internally expects weights to be inside /data/leaderboard/weights
(e.g., /data/leaderboard/weights/lmsys/vicuna-7B
), and sets the Hugging Face cache directory to /data/leaderboard/hfcache
.
If needed, the repository should be mounted to /workspace/leaderboard
to override the copy of the repository inside the container.
We run benchmarks using multiple nodes and GPUs using Pegasus. Take a look at pegasus/
for details.
You can still run benchmarks without Pegasus like this:
$ docker exec leaderboard0 python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled_sorted.json
$ docker exec leaderboard0 python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled_sorted.json