How to get very long KV cached #102
-
Before I start, I just want to say that I stumbled across this repo today, and I love it! I have a long common prefix (like 5000 tokens) and then add questions about the long prefix at the end. I have about 15 or so questions. How can I get the 5000 token prefix KVs cached? Even a hacky way as a temporary solution would be much appreciated! |
Beta Was this translation helpful? Give feedback.
Answered by
merrymercy
Jan 30, 2024
Replies: 1 comment
-
@pj-ml Thanks for your interest. SGLang runtime is just designed for this and can greatly accelerate your workloads. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
merrymercy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@pj-ml Thanks for your interest. SGLang runtime is just designed for this and can greatly accelerate your workloads.
Please see #106 (comment)