Skip to content

Commit

Permalink
Merge pull request bytedance#122 from bytedance/fix_llm_perf
Browse files Browse the repository at this point in the history
add llm_perf doc.
  • Loading branch information
suisiyuan authored Nov 19, 2024
2 parents a0faed3 + 801f99b commit 85e7840
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions byte_infer_perf/llm_perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ You can run following command automate all steps with chatglm2 model on GPU back
python3 byte_infer_perf/llm_perf/launch.py --hardware_type GPU --task chatglm2-torch-fp16-6b
```

## Split model
Splitting model is needed if model is too large to fit into one GPU. Except for chatglm2-6b, other models should be splitted manually using `split_model.py` under `backends/GPU/model_impl`, such as `split_mixtral.py`. `chatglm2-6b` will be automatically splitted online.

After splitting model, you will find a subdirectory `TP8` (tp_size=8) under model directory.


## Test accuracy (single query with specify prompt)
Launch a server running mixtral-8x22b (tp_size=8, max_batch_size=8) with following command:
```shell
Expand Down

0 comments on commit 85e7840

Please sign in to comment.