-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Title: usage.cache_*
always 0
and system prompt not cached with moderate
strategy (Docker v1.0.1)

Description
First of all—great project, exactly what we need for Anthropic caching.
I’ve run into a bug where cost and caching details are always reported as 0
in responses, and the system prompt is not being cached, even with the default moderate
strategy.
You can see this in the attached screenshot (Postman). The response’s usage
block shows:
"usage": {
"input_tokens": 24851,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 49
}
I would expect non-zero values on the first call (cache_creation_input_tokens
> 0) and subsequent calls (cache_read_input_tokens
> 0), because the request contains a very large system prompt (thousands of tokens) and the proxy advertises caching for system prompts in moderate
mode.
Additionally, the “ROI / cache” response metadata (headers) either doesn’t appear or indicates zeroed/out values, suggesting no cache was actually applied.
Expected behavior
-
With
CACHE_STRATEGY=moderate
and a large system prompt:- First request should produce non-zero
usage.cache_creation_input_tokens
. - Subsequent identical requests should produce non-zero
usage.cache_read_input_tokens
. - Autocache ROI/caching headers should show meaningful, non-zero values (e.g., cached token counts, ratio, breakpoints including
system
).
- First request should produce non-zero
Actual behavior
usage.cache_creation_input_tokens
andusage.cache_read_input_tokens
are always0
.- System prompt appears not to be cached (no improvement on subsequent requests).
- ROI/caching headers are missing or effectively zeroed.
Screenshot attached shows the response with zeros for all cache-related fields.
Steps to reproduce
-
Run Autocache via Docker (no API key in env; pass per request):
docker run -d -p 8090:8090 --name autocache ghcr.io/montevive/autocache:latest
Health check:
curl http://localhost:8090/health # {"status":"healthy","version":"1.0.1","strategy":"moderate"}
-
Send a messages request with a large system prompt (thousands of tokens) and a tiny user message:
curl http://localhost:8090/v1/messages \ -H "Content-Type: application/json" \ -H "anthropic-version: 2023-06-01" \ -H "x-api-key: sk-ant-<redacted>" \ -d '{ "model": "claude-3-5-sonnet-20241022", "max_tokens": 64, "system": "<very long system prompt here — 5k+ tokens>", "messages": [ {"role": "user", "content": "Ping"} ], "stream": false }'
-
Repeat the exact same request 2–3 times.
-
Observe that the response
usage.cache_creation_input_tokens
andusage.cache_read_input_tokens
remain0
, and caching/ROI headers don’t reflect any caching.
Environment
- Autocache image:
ghcr.io/montevive/autocache:latest
(reportsversion: "1.0.1"
,strategy: "moderate"
) - Host OS: Debian 12
- Client: Postman and
curl
- Endpoint used:
POST /v1/messages
- Headers sent:
Content-Type: application/json
,anthropic-version: 2023-06-01
,x-api-key: <Anthropic key>
- Networking: direct (no additional proxies in between)
Notes / hypotheses
- If Autocache injects
cache_control
into system prompts, maybe the injection doesn’t trigger when the top-levelsystem
string is used (as opposed to amessages[0].content[..]
text block)? The docs say system prompts are supported, but this path might be skipping injection. - If injection succeeds, perhaps the proxy is not correctly surfacing Anthropic’s usage fields from the downstream response (mapping/parsing issue).
- Behavior is the same for first and subsequent identical requests (so it doesn’t look like a one-off miss).
Thanks a lot—happy to test a fix or a debug build.