-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Feature: Enhanced Logging with Security and Traceability
Business Value
Improve production readiness and operational excellence by implementing secure, concise, and traceable logging. This feature will prevent sensitive data leaks, reduce log storage costs, improve debugging efficiency through correlation IDs, and enable end-to-end request tracing. For production environments, this ensures compliance with security standards while maintaining the observability needed for rapid incident response.
User Story
As an operations engineer or security auditor
I want secure, concise, and traceable logs
So that I can debug issues effectively without exposing sensitive data or overwhelming log storage
Gherkin Specification
Feature: Enhanced Logging with Security and Traceability
Implement secure logging defaults with INFO level in production, sensitive data scrubbing,
payload truncation, and correlation ID propagation for end-to-end traceability.
Background:
Given the DeepSource MCP server with logging capabilities
And various environments (development, staging, production)
And sensitive data like API keys and tokens
Scenario: Production logging defaults to INFO level
Given the server is running in production (NODE_ENV=production)
And no LOG_LEVEL is explicitly set
When the server starts
Then the log level should default to INFO
And DEBUG messages should not be logged
And INFO, WARN, and ERROR messages should be logged
Scenario: Development logging defaults to DEBUG level
Given the server is running in development (NODE_ENV=development)
And no LOG_LEVEL is explicitly set
When the server starts
Then the log level should default to DEBUG
And all log levels should be logged
Scenario: Sensitive data scrubbing
Given a log message containing sensitive data
When the data is logged
Then sensitive fields should be scrubbed:
| Field Type | Original | Logged As |
| API Key | sk_live_abcd1234efgh5678 | sk_live_****5678 |
| Bearer Token | Bearer eyJhbGc... | Bearer **** |
| Password | mySecretPassword123 | **** |
| Email | user@example.com | u***@example.com |
| Authorization | Basic dXNlcjpwYXNz | Basic **** |
| Credit Card | 4111111111111111 | ****1111 |
| SSN | 123-45-6789 | ***-**-6789 |
Scenario: Large payload truncation
Given a log message with a large payload
When the payload exceeds <size_limit>
Then the payload should be truncated
And include metadata about truncation
Examples:
| payload_type | size_limit | truncation_indicator |
| JSON object | 1KB | {...truncated: "1.5KB total"}|
| Array | 100 items | [...100 items, truncated] |
| String | 500 chars | "...truncated at 500 chars" |
| GraphQL response | 2KB | {data: "...truncated"} |
Scenario: Correlation ID generation and propagation
Given a new request arrives at the server
When the request is processed
Then a unique correlation ID should be generated
And the ID format should be "ds-{timestamp}-{random}"
And the ID should appear in all related log entries
And the ID should propagate through all layers:
| Layer | Log Field Name | Example |
| HTTP Request | correlationId | ds-1704963600000-a1b2c3 |
| Handler | correlationId | ds-1704963600000-a1b2c3 |
| Client | correlationId | ds-1704963600000-a1b2c3 |
| GraphQL | correlationId | ds-1704963600000-a1b2c3 |
| Error | correlationId | ds-1704963600000-a1b2c3 |
Scenario: Structured logging format
Given any log message
When it is written
Then it should use structured JSON format:
| Field | Type | Required | Description |
| timestamp | ISO8601 | Yes | 2024-01-15T10:30:00.123Z |
| level | string | Yes | DEBUG/INFO/WARN/ERROR |
| context | string | Yes | Component name |
| message | string | Yes | Log message |
| correlationId | string | No | Request correlation ID |
| traceId | string | No | Distributed trace ID |
| spanId | string | No | Span ID within trace |
| userId | string | No | Scrubbed user identifier |
| duration | number | No | Operation duration in ms |
| data | object | No | Additional structured data |
Scenario: Performance logging optimization
Given a high-frequency operation (>100/sec)
When logging is configured
Then sampling should be applied:
| Operation Type | Sample Rate | Condition |
| Health checks | 1% | Log 1 in 100 requests |
| Cache hits | 0.1% | Log 1 in 1000 hits |
| Successful API | 10% | Log 1 in 10 successes |
| Failed API | 100% | Log all failures |
| Slow queries | 100% | Log all >1000ms queries |
Scenario: Context preservation across async operations
Given an async operation with correlation ID "ds-123-abc"
When the operation spawns child tasks
Then all child tasks should inherit the correlation ID
And async context should be maintained using AsyncLocalStorage
And the correlation ID should appear in all related logs
Scenario: Log rotation and retention
Given log files are being written
When the log file reaches size or age limits
Then rotation should occur:
| Trigger | Value | Action |
| File size | 100MB | Rotate to new file |
| Age | 24 hours | Rotate daily at midnight |
| Max files | 7 | Delete oldest when exceeded |
| Compression | After | Gzip completed log files |
Scenario: Sensitive data patterns detection
Given various data patterns in logs
When the scrubber processes them
Then it should detect and scrub:
| Pattern | Regex/Method | Replacement |
| API Keys | /sk_[a-zA-Z0-9]{20,}/ | sk_****[last4] |
| JWT Tokens | /eyJ[a-zA-Z0-9._-]+/ | **** |
| URLs with secrets | /\?.*(?:key|token|secret)=[^&]+/ | ?**** |
| Base64 credentials | /Basic [a-zA-Z0-9+/]+=*/ | Basic **** |
| GraphQL mutations | /mutation.*password/is | [REDACTED] |
Scenario: Error logging enhancement
Given an error occurs in the system
When it is logged
Then the log should include:
| Field | Description |
| error.name | Error class name |
| error.message | Scrubbed error message |
| error.code | Error code if available |
| error.stack | Stack trace (dev only) |
| error.cause | Root cause if chained |
| context | Request/operation context |
| correlationId | Request correlation ID |
Scenario: Compliance mode
Given compliance requirements (GDPR, HIPAA, etc.)
When COMPLIANCE_MODE is enabled
Then additional scrubbing should apply:
| Data Type | Action |
| Personal names | Replace with hash |
| IP addresses | Mask last octet |
| User IDs | Use salted hash |
| Timestamps | Round to hour |
| Geolocation | Round to city level |Acceptance Criteria
- Environment-based log level defaults:
- Production: INFO level by default
- Development: DEBUG level by default
- Configurable via LOG_LEVEL environment variable
- Sensitive data scrubbing:
- API keys, tokens, passwords automatically redacted
- Configurable scrubbing patterns
- Zero sensitive data in production logs
- Payload truncation:
- Large objects/arrays truncated with size metadata
- Configurable size limits
- Preservation of structure indicators
- Correlation ID system:
- Automatic generation for each request
- Propagation through all layers
- AsyncLocalStorage for async context
- Format:
ds-{timestamp}-{random}
- Structured JSON logging:
- Consistent field names
- ISO8601 timestamps
- Parseable by log aggregators
- Performance optimizations:
- Sampling for high-frequency operations
- Lazy evaluation of expensive log data
- Conditional logging based on level
- Log rotation and retention:
- Size and time-based rotation
- Compression of old logs
- Configurable retention policies
- Error enhancement:
- Complete error context
- Stack traces in development only
- Error categorization
Non-Goals
- Will NOT implement external log shipping (use file watchers)
- Will NOT add custom log aggregation services
- Will NOT implement distributed tracing protocols (OpenTelemetry)
- Will NOT encrypt log files at rest
- Out of scope: Log visualization dashboards
- Will NOT implement audit logging (separate concern)
- Will NOT add log replay functionality
Risks & Mitigations
-
Risk: Over-scrubbing making debugging difficult
Mitigation: Configurable scrubbing rules and debug mode override -
Risk: Performance impact from scrubbing and truncation
Mitigation: Lazy evaluation and regex caching -
Risk: Lost correlation during async operations
Mitigation: AsyncLocalStorage for context preservation -
Risk: Disk space exhaustion from logs
Mitigation: Rotation, compression, and retention policies
Technical Considerations
-
Architecture impact:
- Add correlation ID middleware
- Enhance logger with scrubbing pipeline
- Implement AsyncLocalStorage context
- Add log rotation mechanism
-
Implementation approach:
// Enhanced logger with security class SecureLogger { private scrubbers: Scrubber[] = [ new ApiKeyScrubber(), new TokenScrubber(), new PasswordScrubber(), ]; private truncate(data: unknown): unknown { if (typeof data === 'string' && data.length > 500) { return `${data.slice(0, 500)}...truncated`; } if (Array.isArray(data) && data.length > 100) { return [...data.slice(0, 100), '...truncated']; } // More truncation logic return data; } log(level: LogLevel, message: string, data?: unknown) { const scrubbed = this.scrub(data); const truncated = this.truncate(scrubbed); const entry = { timestamp: new Date().toISOString(), level, correlationId: getCorrelationId(), message, data: truncated, }; this.write(entry); } }
-
Correlation ID propagation:
import { AsyncLocalStorage } from 'async_hooks'; const asyncContext = new AsyncLocalStorage<Context>(); interface Context { correlationId: string; traceId?: string; spanId?: string; } function withCorrelation<T>( correlationId: string, fn: () => T ): T { return asyncContext.run({ correlationId }, fn); }
-
Environment detection:
function getDefaultLogLevel(): LogLevel { const env = process.env.NODE_ENV || 'development'; return env === 'production' ? LogLevel.INFO : LogLevel.DEBUG; }
Testing Requirements
- Unit tests for scrubbing patterns
- Integration tests for correlation ID propagation
- Performance tests for truncation impact
- Security tests for sensitive data leakage
- Async context preservation tests
- Log rotation tests
- Compliance mode tests
- Environment-based default tests
- Structured format validation tests
Definition of Done
- Production defaults to INFO level
- All sensitive data patterns scrubbed
- Large payloads truncated appropriately
- Correlation IDs propagate through all layers
- Structured JSON format implemented
- Log rotation working
- AsyncLocalStorage context preservation
- Performance impact <1% overhead
- Zero sensitive data in test logs
- All tests passing with >95% coverage
- Documentation updated
- Migration guide for existing logs
- Compliance mode validated
- Code reviewed and approved
Implementation Notes
-
Phased Rollout:
- Phase 1: Implement scrubbing pipeline
- Phase 2: Add correlation IDs
- Phase 3: Structured logging format
- Phase 4: Log rotation
- Phase 5: Performance optimizations
-
Scrubber Registry:
const SCRUBBERS = { apiKey: /(?:api[_-]?key|apikey)[\s:="]*([a-zA-Z0-9_-]{20,})/gi, bearer: /Bearer\s+[a-zA-Z0-9._-]+/gi, email: /([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})/g, // ... more patterns };
-
Configuration:
interface LogConfig { level: LogLevel; scrubbing: { enabled: boolean; patterns: RegExp[]; compliance: boolean; }; truncation: { maxStringLength: number; maxArrayLength: number; maxObjectDepth: number; }; rotation: { maxSize: string; maxAge: string; maxFiles: number; }; }
-
Performance Monitoring:
let logCounter = 0; let sampleRate = 100; function shouldSample(): boolean { return ++logCounter % sampleRate === 0; }
-
Testing Strategy:
- Test with real sensitive data patterns
- Verify no leakage in any log level
- Benchmark scrubbing performance
- Test correlation across async boundaries
- Validate structured format parsing
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com