Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mrwyattii authored Jan 19, 2024
1 parent 7956420 commit 1ac843a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion blogs/deepspeed-fastgen/2024-01-19/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Today, we are happy to share that we are improving DeepSpeed-FastGen along three

- **Performance Optimizations**

We drastically reduced the scheduling overhead of Dynamic SplitFuse and increased the efficiency of token sampling. As a result, we see higher throughput and lower latency, particularly when handling concurrent requests from many clients. We demonstrate the performance optimizations with benchmarks and evaluation of DeepSpeed-FastGen against vLLM for the newly added model families. The benchmark results can be seen in [Performance Evaluation](#performance-evaluation) and the benchmark code is available at [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii).
We drastically reduced the scheduling overhead of Dynamic SplitFuse and increased the efficiency of token sampling. As a result, we see higher throughput and lower latency, particularly when handling concurrent requests from many clients. We demonstrate the performance optimizations with benchmarks and evaluation of DeepSpeed-FastGen against vLLM for the newly added model families. The benchmark results can be seen in [Performance Evaluation](#performance-optimizations) and the benchmark code is available at [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii).

- **Feature Enhancements**

Expand Down

0 comments on commit 1ac843a

Please sign in to comment.