-
Notifications
You must be signed in to change notification settings - Fork 2
perf: Introduce lightweight caching with TTL for hot paths #154
Description
Feature: Lightweight Caching with TTL
Business Value
Significantly reduce API call volume and improve response times by implementing intelligent caching for frequently accessed data. This feature will decrease latency for end users, reduce load on the DeepSource API, and improve resilience during API rate limiting or temporary outages. For users with large project portfolios accessing the same data repeatedly, this can reduce API calls by up to 80% for hot paths.
User Story
As a DeepSource MCP server user
I want frequently accessed data to be cached with configurable TTL
So that I experience faster response times and avoid unnecessary API calls while maintaining data freshness
Gherkin Specification
Feature: Lightweight Caching with TTL
Implement in-memory LRU caching for hot paths with configurable TTL
and per-request memoization to optimize API usage and response times.
Background:
Given the DeepSource MCP server is configured
And caching is enabled via environment variables
And cache TTL is set to 300 seconds (5 minutes)
Scenario: Cache hit for project list
Given the project list was fetched 2 minutes ago
When I request the project list again
Then the cached data should be returned immediately
And no API call should be made to DeepSource
And the response time should be under 10ms
And cache statistics should show a cache hit
Scenario: Cache miss after TTL expiration
Given the project list was cached 6 minutes ago
When I request the project list
Then the cache entry should be considered stale
And a fresh API call should be made
And the new data should be cached with updated timestamp
And cache statistics should show a cache miss
Scenario: Per-request memoization for concurrent identical calls
Given 5 concurrent requests for the same project's issues
When all requests are initiated simultaneously
Then only 1 API call should be made to DeepSource
And all 5 requests should receive the same response
And the response should be cached for subsequent requests
And request deduplication should be logged
Scenario: LRU eviction when cache is full
Given the cache size limit is 100 entries
And the cache contains 100 entries
When a new unique request is made
Then the least recently used entry should be evicted
And the new entry should be added to the cache
And cache statistics should show the eviction
Scenario: Cache invalidation on write operations
Given project metrics are cached
When I update a metric threshold
Then the cached metrics for that project should be invalidated
And the next read should fetch fresh data
And related cache entries should also be invalidated
Scenario Outline: Configurable cache behavior per endpoint
Given endpoint <endpoint> with cache config <config>
When I request data from <endpoint>
Then caching behavior should be <behavior>
Examples:
| endpoint | config | behavior |
| projects | ttl=300s | cache for 5 minutes |
| project_issues | ttl=60s | cache for 1 minute |
| quality_metrics | ttl=120s | cache for 2 minutes |
| dependency_vulnerabilities | ttl=600s | cache for 10 minutes |
| update_metric_threshold | no-cache | never cache (write operation) |
Scenario: Cache disabled via environment variable
Given CACHE_ENABLED is set to "false"
When I make any request
Then no caching should occur
And all requests should go directly to the API
And cache statistics should show all operations bypassed
Scenario: Cache key generation with parameters
Given a request for issues with filter "analyzer=python"
And another request for issues with filter "analyzer=javascript"
When both requests are made
Then they should have different cache keys
And both responses should be cached separately
And cache lookups should respect the parameters
Scenario: Cache warmup on startup
Given CACHE_WARMUP_ENABLED is set to "true"
When the server starts
Then it should prefetch the project list
And prefetch recent runs for each project
And log the warmup completion time
And subsequent requests should hit the warm cache
Scenario: Cache statistics and monitoring
Given the cache is actively being used
When I request cache statistics
Then I should see:
| Metric | Description |
| hit_rate | Percentage of cache hits |
| miss_rate | Percentage of cache misses |
| eviction_count | Number of LRU evictions |
| avg_entry_size | Average size of cached entries |
| total_memory_usage | Total memory used by cache |
| entries_count | Current number of cached entries |
| requests_saved | Number of API calls avoided |
Scenario: Graceful degradation on cache failure
Given the cache encounters a memory error
When a request is made
Then the request should proceed without caching
And an error should be logged
And the response should still be returned
And monitoring should be alerted
Scenario: Cache serialization for complex objects
Given a complex GraphQL response with nested data
When the response is cached
Then all data types should be preserved correctly
And dates should maintain their format
And branded types should remain intact
And the deserialized data should match the original
Scenario: TTL refresh on access (sliding window)
Given CACHE_TTL_STRATEGY is set to "sliding"
And a cache entry was accessed 4 minutes ago (TTL=5min)
When the same entry is accessed again
Then the TTL should reset to 5 minutes from now
And the entry should remain in cache for another 5 minutesAcceptance Criteria
- In-memory LRU cache implementation with configurable size
- TTL support with both fixed and sliding window strategies
- Per-request memoization to deduplicate concurrent identical calls
- Cache key generation based on:
- Endpoint/method name
- Request parameters (filters, pagination, etc.)
- User context (API key hash)
- Configurable cache behavior via environment variables:
CACHE_ENABLED(true/false, default: true)CACHE_DEFAULT_TTL_SECONDS(default: 300)CACHE_MAX_ENTRIES(default: 1000)CACHE_MAX_MEMORY_MB(default: 100)CACHE_TTL_STRATEGY(fixed/sliding, default: fixed)CACHE_WARMUP_ENABLED(true/false, default: false)
- Per-endpoint cache configuration:
- Projects: 5 minutes
- Issues: 1 minute
- Metrics: 2 minutes
- Vulnerabilities: 10 minutes
- Runs: 30 seconds
- Cache invalidation on write operations
- Cache statistics and monitoring
- Memory-efficient storage with size limits
- Thread-safe cache operations
- No external dependencies (pure TypeScript implementation)
Non-Goals
- Will NOT implement distributed caching (Redis, Memcached)
- Will NOT persist cache to disk
- Will NOT cache sensitive authentication data
- Will NOT implement cache preloading from external sources
- Will NOT provide cache synchronization across multiple server instances
- Out of scope: Response streaming with partial cache hits
- Will NOT implement cache compression
- Will NOT cache GraphQL mutations or write operations
Risks & Mitigations
-
Risk: Stale data being served to users
Mitigation: Conservative TTL defaults and clear documentation about cache behavior -
Risk: Memory exhaustion from unbounded cache growth
Mitigation: LRU eviction policy and configurable memory limits -
Risk: Cache key collisions causing incorrect data retrieval
Mitigation: Comprehensive key generation including all relevant parameters -
Risk: Complex cache invalidation logic causing bugs
Mitigation: Simple invalidation patterns and extensive testing -
Risk: Performance degradation from cache overhead
Mitigation: Lightweight implementation and ability to disable caching
Technical Considerations
-
Architecture impact:
- Add caching layer between handlers and clients
- Implement cache interceptor pattern
- Create cache key generation utilities
- Add cache statistics collector
-
Performance considerations:
- O(1) cache lookups via Map
- O(log n) LRU management via linked list
- Minimal serialization overhead
- Memory monitoring to prevent leaks
-
Implementation approach:
- Create
CacheManagerclass with LRU logic - Implement
CacheInterceptorfor transparent caching - Add
@Cacheabledecorator for methods - Use WeakMap for per-request memoization
- Leverage existing logger for cache events
- Create
-
Memory management:
- Use Map for O(1) lookups
- Doubly linked list for LRU tracking
- Lazy eviction on capacity exceeded
- Size estimation for complex objects
Testing Requirements
- Unit tests for cache operations (get, set, evict)
- Integration tests with real API calls
- Property-based tests for:
- LRU eviction correctness
- TTL expiration accuracy
- Cache key uniqueness
- Performance tests:
- Cache hit/miss latency
- Memory usage under load
- Concurrent access patterns
- Stress tests for memory limits
- Test cache invalidation scenarios
- Verify cache statistics accuracy
- Test configuration variations
Definition of Done
- LRU cache implementation complete
- TTL mechanisms working correctly
- Per-request memoization functional
- All endpoints properly cached
- Cache invalidation implemented
- Configuration via environment variables
- Cache statistics and monitoring available
- All tests passing with >90% coverage
- Documentation includes caching guide
- Performance benchmarks show >50% reduction in API calls
- Memory usage stays within configured limits
- No memory leaks detected
- Code reviewed and approved
Implementation Notes
-
Cache Manager Interface:
interface CacheEntry<T> { data: T; timestamp: number; ttl: number; accessCount: number; size: number; } interface CacheConfig { maxEntries: number; maxMemoryMB: number; defaultTTL: number; ttlStrategy: 'fixed' | 'sliding'; }
-
Cache Key Generation:
function generateCacheKey( method: string, params: Record<string, unknown>, context: { apiKeyHash: string } ): string { return `${context.apiKeyHash}:${method}:${hash(params)}`; }
-
Decorator Pattern:
@Cacheable({ ttl: 300 }) async listProjects(): Promise<Project[]> { // Method implementation }
-
Cache Layers:
- L1: Per-request memoization (request lifecycle)
- L2: LRU cache with TTL (server lifecycle)
- L3: API calls (fallback)
-
Invalidation Patterns:
- Write operations invalidate related reads
- Hierarchical invalidation (project → issues → runs)
- Tag-based invalidation for grouped entries
-
Monitoring Integration:
- Export Prometheus metrics
- Log cache events at DEBUG level
- Alert on low hit rates (<30%)
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com