Assistant: Anthropic prompt caching extension API #8336

seeM · 2025-06-27T17:11:28Z

This PR makes it possible for extensions to manually define cache breakpoints everywhere that's supported by Anthropic, except tool definitions (although tools will often be cached via system prompt cache breakpoints). Addresses #8325;

This PR also moves the user context message from before the user query to after for better prompt caching. @wch mentioned that he noticed no changes to model responses when experimenting with the context/query order, but we should double-check.

I cherry-picked an upstream commit to bring in updates to LanguageModelDataPart so that we can implement this in the same way as the Copilot extension. That gives us the added benefit that when the LanguageModelDataPart API proposal is accepted, extensions like shiny-vscode will be able to set cache breakpoints for Anthropic models contributed by both the Copilot extension and Positron Assistant.

Release Notes

New Features

Extensions can set Anthropic prompt cache breakpoints in the message history (Assistant: add prompt caching API #8325).

Bug Fixes

N/A

QA Notes

Since this PR also moves the user context message from before the user query to after for better prompt caching, we should also double-check that the quality of responses is roughly the same.

In the cases below, if caching is working, you should see logs indicating cache writes followed by cache reads, for example:

2025-06-27 18:58:33.965 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0
2025-06-27 18:58:40.010 [debug] [anthropic] SEND messages.stream [req_011CQZ5f35yWxARapzaM565V]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2
2025-06-27 18:58:41.896 [debug] [anthropic] RECV messages.stream [req_011CQZ5f35yWxARapzaM565V]: usage: {"input_tokens":4,"cache_creation_input_tokens":45353,"cache_read_input_tokens":0,"output_tokens":74,"service_tier":"standard"}
2025-06-27 18:59:05.508 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0
2025-06-27 18:59:07.680 [debug] [anthropic] SEND messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2
2025-06-27 18:59:14.208 [debug] [anthropic] RECV messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: usage: {"input_tokens":4,"cache_creation_input_tokens":0,"cache_read_input_tokens":45353,"output_tokens":289,"service_tier":"standard"}

Step-by-step instructions:

Positron Assistant participants cache write/read the last 2 user messages when using Anthropic models
Positron Assistant participants should behave as before for Vercel models (e.g. after disabling positron.assistant.useAnthropicSdk and restarting)
Requires a bit more setup to test the Shiny extension in Positron:
1. Start a Positron dev instance at branch feature/anthropic-cache-messages
2. In Positron, open the Shiny extension repo at this branch: Anthropic prompt caching shiny-vscode#94. Open the src/extension.ts file and press F5 to start debugging
3. Try the @shiny participant in the Positron Assistant chat pane, with Anthropic and Vercel models
Similarly, the Shiny extension can be tested in VSCode by following the same steps as above in VSCode. There will be no caching but nothing should break.

@:assistant

github-actions · 2025-06-27T17:11:40Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

github-actions · 2025-06-27T17:11:46Z

E2E Tests 🚀
This PR will run tests tagged with: @:critical @:assistant

^readme ^{valid tags}

Cherry-pick upstream commit.

seeM · 2025-06-27T20:46:58Z

Please feel free to merge if all looks good, otherwise I'll make any fixes tomorrow morning SAST

jmcphers · 2025-06-27T22:50:06Z

@seeM Can you look into the test failures? This looks related:

15 passing (2s)
  7 failing
  1) PositronAssistantParticipant
       should include positron session context:

      AssertionError [ERR_ASSERTION]: Unexpected text part value
+ actual - expected

seeM · 2025-06-28T10:01:16Z

I've updated the integration tests and the echo language model, since the PR swaps the order of the user context and query messages.

testlabauto · 2025-06-30T14:12:21Z

Performed a variety of manual tests as a sanity check and things look good: interpreters, data explorer, plots, app, help, variables, etc.

Anthropic API showed overloaded when I asked it about a plot, but I was able to get information about a text file.

Installed databot and performed some basic actions with it. No issues noted, other than hitting:

An error occurred while processing the response from the language model: 429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed the rate limit for your organization (59d07029-8711-4c94-9be7-4789975e3d9b) of 20,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits. You can see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}.

with successive questions about flights data.

seeM requested review from wch and jmcphers June 27, 2025 17:11

seeM mentioned this pull request Jun 27, 2025

Anthropic prompt caching posit-dev/shiny-vscode#94

Closed

seeM added 2 commits June 27, 2025 22:03

support caching messages in anthropic requests

0033e4a

use data part instead of extra data part

faa1489

seeM force-pushed the feature/anthropic-cache-messages branch from 01d62e6 to c26a892 Compare June 27, 2025 20:03

seeM added 2 commits June 27, 2025 22:07

Delete LanguageModelExtraDataPart

0f10af4

Cherry-pick upstream commit.

cleanup and better cache breakpoint logging

f8b4160

seeM force-pushed the feature/anthropic-cache-messages branch from c26a892 to f8b4160 Compare June 27, 2025 20:07

seeM changed the base branch from main to prerelease/2025.07 June 27, 2025 20:07

seeM added 2 commits June 27, 2025 22:08

rename

9c7c295

update to match copilot extension mimetype and data schema

55d420a

seeM added 2 commits June 28, 2025 11:51

fix tests: context message is now after user prompt

ef3e71f

fix echo language model after moving context message to last

5d0a419

jmcphers approved these changes Jun 30, 2025

View reviewed changes

jmcphers merged commit 03ae7a5 into prerelease/2025.07 Jun 30, 2025
9 checks passed

jmcphers deleted the feature/anthropic-cache-messages branch June 30, 2025 16:15

github-actions bot locked and limited conversation to collaborators Jun 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assistant: Anthropic prompt caching extension API #8336

Assistant: Anthropic prompt caching extension API #8336

Uh oh!

seeM commented Jun 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 27, 2025 •

edited

Loading

Uh oh!

seeM commented Jun 27, 2025

Uh oh!

jmcphers commented Jun 27, 2025

Uh oh!

seeM commented Jun 28, 2025

Uh oh!

testlabauto commented Jun 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Assistant: Anthropic prompt caching extension API #8336

Assistant: Anthropic prompt caching extension API #8336

Uh oh!

Conversation

seeM commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

New Features

Bug Fixes

QA Notes

Uh oh!

github-actions bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seeM commented Jun 27, 2025

Uh oh!

jmcphers commented Jun 27, 2025

Uh oh!

seeM commented Jun 28, 2025

Uh oh!

testlabauto commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seeM commented Jun 27, 2025 •

edited

Loading

github-actions bot commented Jun 27, 2025 •

edited

Loading

github-actions bot commented Jun 27, 2025 •

edited

Loading

testlabauto commented Jun 30, 2025 •

edited

Loading