Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implement Whisper Large Encoder + Qwen2 1.5B model Triton+Trt-llm GPU serving solution:
Decoding Aishell 1 Test Set on an A10 gpu
%WER = 0.69
Errors: 55 insertions, 83 deletions, 589 substitutions, over 104765 reference words (104093 correct)
processing time: 230.089 seconds (0.06 hours)
latency_variance: 4.07
latency_50_percentile_ms: 243.73
latency_90_percentile_ms: 340.68
latency_95_percentile_ms: 370.07
latency_99_percentile_ms: 422.93
average_latency_ms: 252.50