Skip to content

usage.cache_* always 0 and system prompt not cached with moderate strategy #4

@OrionCodeDev

Description

@OrionCodeDev

Title: usage.cache_* always 0 and system prompt not cached with moderate strategy (Docker v1.0.1)

Image

Description

First of all—great project, exactly what we need for Anthropic caching.
I’ve run into a bug where cost and caching details are always reported as 0 in responses, and the system prompt is not being cached, even with the default moderate strategy.

You can see this in the attached screenshot (Postman). The response’s usage block shows:

"usage": {
  "input_tokens": 24851,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0,
  "output_tokens": 49
}

I would expect non-zero values on the first call (cache_creation_input_tokens > 0) and subsequent calls (cache_read_input_tokens > 0), because the request contains a very large system prompt (thousands of tokens) and the proxy advertises caching for system prompts in moderate mode.

Additionally, the “ROI / cache” response metadata (headers) either doesn’t appear or indicates zeroed/out values, suggesting no cache was actually applied.


Expected behavior

  • With CACHE_STRATEGY=moderate and a large system prompt:

    • First request should produce non-zero usage.cache_creation_input_tokens.
    • Subsequent identical requests should produce non-zero usage.cache_read_input_tokens.
    • Autocache ROI/caching headers should show meaningful, non-zero values (e.g., cached token counts, ratio, breakpoints including system).

Actual behavior

  • usage.cache_creation_input_tokens and usage.cache_read_input_tokens are always 0.
  • System prompt appears not to be cached (no improvement on subsequent requests).
  • ROI/caching headers are missing or effectively zeroed.

Screenshot attached shows the response with zeros for all cache-related fields.


Steps to reproduce

  1. Run Autocache via Docker (no API key in env; pass per request):

    docker run -d -p 8090:8090 --name autocache ghcr.io/montevive/autocache:latest

    Health check:

    curl http://localhost:8090/health
    # {"status":"healthy","version":"1.0.1","strategy":"moderate"}
  2. Send a messages request with a large system prompt (thousands of tokens) and a tiny user message:

    curl http://localhost:8090/v1/messages \
      -H "Content-Type: application/json" \
      -H "anthropic-version: 2023-06-01" \
      -H "x-api-key: sk-ant-<redacted>" \
      -d '{
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 64,
        "system": "<very long system prompt here — 5k+ tokens>",
        "messages": [
          {"role": "user", "content": "Ping"}
        ],
        "stream": false
      }'
  3. Repeat the exact same request 2–3 times.

  4. Observe that the response usage.cache_creation_input_tokens and usage.cache_read_input_tokens remain 0, and caching/ROI headers don’t reflect any caching.


Environment

  • Autocache image: ghcr.io/montevive/autocache:latest (reports version: "1.0.1", strategy: "moderate")
  • Host OS: Debian 12
  • Client: Postman and curl
  • Endpoint used: POST /v1/messages
  • Headers sent: Content-Type: application/json, anthropic-version: 2023-06-01, x-api-key: <Anthropic key>
  • Networking: direct (no additional proxies in between)

Notes / hypotheses

  • If Autocache injects cache_control into system prompts, maybe the injection doesn’t trigger when the top-level system string is used (as opposed to a messages[0].content[..] text block)? The docs say system prompts are supported, but this path might be skipping injection.
  • If injection succeeds, perhaps the proxy is not correctly surfacing Anthropic’s usage fields from the downstream response (mapping/parsing issue).
  • Behavior is the same for first and subsequent identical requests (so it doesn’t look like a one-off miss).

Thanks a lot—happy to test a fix or a debug build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions