Automated dashboard for tracking and comparing performance metrics between Shortfin LLM Server and SGLang Server.
This project collects daily performance metrics from two LLM servers:
- Shortfin LLM Server with SGLang frontend integration
- SGLang's native LLM server (baseline)
For each server at varying request rates (1, 2, 4, 8, 16, 32):
- Median E2E Latency (ms)
- Median TTFT (Time to First Token)
- Median ITL (Inter-Token Latency)
- Request Throughput (req/s)
- Benchmark Duration (s)
Data is collected in jsonlines files, named according to the pattern:
{server}_{date}_{request_rate}.jsonl
Example:
shortfin_10_1.jsonl
shortfin_10_2.jsonl
...
sglang_10_1.jsonl
sglang_10_2.jsonl
- Runs nightly via CI
- Data refresh rate: Daily
- Initial retention period: 3 months (configurable)
- Integrated with existing Grafana instance
- Tracks performance improvements over time
- Compares Shortfin vs SGLang server performance
- Track Shortfin server improvements
- Benchmark against SGLang baseline
- Identify performance trends and regressions
- Shortfin LLM with SGLang Documentation
- Performance test results can be viewed at the above link
- Automate data collection
- Additional metrics collection
- Enhanced visualization options