Replies: 4 comments
-
I think we can already compute stats for input string length / output string length / time. Should be sufficient for tabby's optimization purpose? |
Beta Was this translation helpful? Give feedback.
-
@wsxiaoys yeah but as you know, |
Beta Was this translation helpful? Give feedback.
-
Yes - that's why I emphasis that |
Beta Was this translation helpful? Give feedback.
-
Is there a way (besides tabby server logs) to get a sense of inference performance (i.e. tokens/s) as part of the response for benchmarking? Since I am new to this space, can you please suggest other better alternatives to benchmark the server's performance? |
Beta Was this translation helpful? Give feedback.
-
tabby/crates/tabby-inference/src/lib.rs
Lines 23 to 31 in 99d49a9
Current TextGeneration trait is simple, but it doesn't tell the statics we need for monitoring and optimizing the inference server.
for example,
input_token_length
andoutput_token_length
stats are really important to measure the inference server throughput.My initial idea would be something like this:
Beta Was this translation helpful? Give feedback.
All reactions