Skip to content

Assistant: Anthropic prompt caching extension API #8336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 30, 2025

Conversation

seeM
Copy link
Contributor

@seeM seeM commented Jun 27, 2025

This PR makes it possible for extensions to manually define cache breakpoints everywhere that's supported by Anthropic, except tool definitions (although tools will often be cached via system prompt cache breakpoints). Addresses #8325;

This PR also moves the user context message from before the user query to after for better prompt caching. @wch mentioned that he noticed no changes to model responses when experimenting with the context/query order, but we should double-check.

I cherry-picked an upstream commit to bring in updates to LanguageModelDataPart so that we can implement this in the same way as the Copilot extension. That gives us the added benefit that when the LanguageModelDataPart API proposal is accepted, extensions like shiny-vscode will be able to set cache breakpoints for Anthropic models contributed by both the Copilot extension and Positron Assistant.

Release Notes

New Features

Bug Fixes

  • N/A

QA Notes

Since this PR also moves the user context message from before the user query to after for better prompt caching, we should also double-check that the quality of responses is roughly the same.

In the cases below, if caching is working, you should see logs indicating cache writes followed by cache reads, for example:

2025-06-27 18:58:33.965 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0
2025-06-27 18:58:40.010 [debug] [anthropic] SEND messages.stream [req_011CQZ5f35yWxARapzaM565V]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2
2025-06-27 18:58:41.896 [debug] [anthropic] RECV messages.stream [req_011CQZ5f35yWxARapzaM565V]: usage: {"input_tokens":4,"cache_creation_input_tokens":45353,"cache_read_input_tokens":0,"output_tokens":74,"service_tier":"standard"}
2025-06-27 18:59:05.508 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0
2025-06-27 18:59:07.680 [debug] [anthropic] SEND messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2
2025-06-27 18:59:14.208 [debug] [anthropic] RECV messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: usage: {"input_tokens":4,"cache_creation_input_tokens":0,"cache_read_input_tokens":45353,"output_tokens":289,"service_tier":"standard"}

Step-by-step instructions:

  1. Positron Assistant participants cache write/read the last 2 user messages when using Anthropic models
  2. Positron Assistant participants should behave as before for Vercel models (e.g. after disabling positron.assistant.useAnthropicSdk and restarting)
  3. Requires a bit more setup to test the Shiny extension in Positron:
    1. Start a Positron dev instance at branch feature/anthropic-cache-messages
    2. In Positron, open the Shiny extension repo at this branch: Anthropic prompt caching shiny-vscode#94. Open the src/extension.ts file and press F5 to start debugging
    3. Try the @shiny participant in the Positron Assistant chat pane, with Anthropic and Vercel models
  4. Similarly, the Shiny extension can be tested in VSCode by following the same steps as above in VSCode. There will be no caching but nothing should break.

@:assistant

@seeM seeM requested review from wch and jmcphers June 27, 2025 17:11
Copy link

github-actions bot commented Jun 27, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link

github-actions bot commented Jun 27, 2025

E2E Tests 🚀
This PR will run tests tagged with: @:critical @:assistant

readme  valid tags

@seeM seeM force-pushed the feature/anthropic-cache-messages branch from 01d62e6 to c26a892 Compare June 27, 2025 20:03
@seeM seeM force-pushed the feature/anthropic-cache-messages branch from c26a892 to f8b4160 Compare June 27, 2025 20:07
@seeM seeM changed the base branch from main to prerelease/2025.07 June 27, 2025 20:07
@seeM
Copy link
Contributor Author

seeM commented Jun 27, 2025

Please feel free to merge if all looks good, otherwise I'll make any fixes tomorrow morning SAST

@jmcphers
Copy link
Collaborator

@seeM Can you look into the test failures? This looks related:

15 passing (2s)
  7 failing
  1) PositronAssistantParticipant
       should include positron session context:

      AssertionError [ERR_ASSERTION]: Unexpected text part value
+ actual - expected

@seeM
Copy link
Contributor Author

seeM commented Jun 28, 2025

I've updated the integration tests and the echo language model, since the PR swaps the order of the user context and query messages.

@testlabauto
Copy link
Contributor

testlabauto commented Jun 30, 2025

Performed a variety of manual tests as a sanity check and things look good: interpreters, data explorer, plots, app, help, variables, etc.

Anthropic API showed overloaded when I asked it about a plot, but I was able to get information about a text file.

Installed databot and performed some basic actions with it. No issues noted, other than hitting:

An error occurred while processing the response from the language model: 429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed the rate limit for your organization (59d07029-8711-4c94-9be7-4789975e3d9b) of 20,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits. You can see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}.

with successive questions about flights data.

@jmcphers jmcphers merged commit 03ae7a5 into prerelease/2025.07 Jun 30, 2025
9 checks passed
@jmcphers jmcphers deleted the feature/anthropic-cache-messages branch June 30, 2025 16:15
@github-actions github-actions bot locked and limited conversation to collaborators Jun 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants