New in v1.2.0 - ✨ Prompt Caching with Anthropic #126
olimorris
announced in
Announcements
Replies: 1 comment 2 replies
-
It has some problems when you try to use @rag functionality. Then you see errors like that "Error: A maximum of 4 blocks with cache_control may be provided. Found 6." Just open the chat buffer, ask a question with rag marker |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As per v1.2.0, I've now added in prompt caching for the Anthropic adapter, which they've outlined here. Thanks to everyone in #99 and #111 for raising this as a suggested enhancement.
The adapter automatically caches the large system prompt which is attached to every request and also detects any messages which are over 300 tokens and caches them. If you use the
#buffer
or#buffers
variables in the chat buffer then this will have a huge impact.From my testing:
codecompanion.utils.util.lua
#buffer what does this code do?
followed byCan it be refactored to be better?
With prompt caching, this is now coming out at less than 1,500 tokens. If you set the logging in the plugin to
TRACE
you'll see output such as:Which will give you an idea of how much caching is taking place.
The caching at over 300 tokens may be too large or too small...only our testing will find out. Equally, I know Anthropic limits cache breakpoints to 4...that may yet cause an issue.
So please try this out and report back.
Beta Was this translation helpful? Give feedback.
All reactions