-
Notifications
You must be signed in to change notification settings - Fork 108
Assistant: Anthropic prompt caching extension API #8336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assistant: Anthropic prompt caching extension API #8336
Conversation
All contributors have signed the CLA ✍️ ✅ |
E2E Tests 🚀 |
01d62e6
to
c26a892
Compare
Cherry-pick upstream commit.
c26a892
to
f8b4160
Compare
Please feel free to merge if all looks good, otherwise I'll make any fixes tomorrow morning SAST |
@seeM Can you look into the test failures? This looks related:
|
I've updated the integration tests and the echo language model, since the PR swaps the order of the user context and query messages. |
Performed a variety of manual tests as a sanity check and things look good: interpreters, data explorer, plots, app, help, variables, etc. Anthropic API showed overloaded when I asked it about a plot, but I was able to get information about a text file. Installed databot and performed some basic actions with it. No issues noted, other than hitting:
with successive questions about flights data. |
This PR makes it possible for extensions to manually define cache breakpoints everywhere that's supported by Anthropic, except tool definitions (although tools will often be cached via system prompt cache breakpoints). Addresses #8325;
This PR also moves the user context message from before the user query to after for better prompt caching. @wch mentioned that he noticed no changes to model responses when experimenting with the context/query order, but we should double-check.
I cherry-picked an upstream commit to bring in updates to
LanguageModelDataPart
so that we can implement this in the same way as the Copilot extension. That gives us the added benefit that when theLanguageModelDataPart
API proposal is accepted, extensions likeshiny-vscode
will be able to set cache breakpoints for Anthropic models contributed by both the Copilot extension and Positron Assistant.Release Notes
New Features
Bug Fixes
QA Notes
Since this PR also moves the user context message from before the user query to after for better prompt caching, we should also double-check that the quality of responses is roughly the same.
In the cases below, if caching is working, you should see logs indicating cache writes followed by cache reads, for example:
Step-by-step instructions:
positron.assistant.useAnthropicSdk
and restarting)feature/anthropic-cache-messages
src/extension.ts
file and press F5 to start debugging@shiny
participant in the Positron Assistant chat pane, with Anthropic and Vercel models@:assistant