Skip to content

fix: Tighten logging defaults for security and traceability #158

@sapientpants

Description

@sapientpants

Feature: Enhanced Logging with Security and Traceability

Business Value

Improve production readiness and operational excellence by implementing secure, concise, and traceable logging. This feature will prevent sensitive data leaks, reduce log storage costs, improve debugging efficiency through correlation IDs, and enable end-to-end request tracing. For production environments, this ensures compliance with security standards while maintaining the observability needed for rapid incident response.

User Story

As an operations engineer or security auditor
I want secure, concise, and traceable logs
So that I can debug issues effectively without exposing sensitive data or overwhelming log storage

Gherkin Specification

Feature: Enhanced Logging with Security and Traceability
  Implement secure logging defaults with INFO level in production, sensitive data scrubbing,
  payload truncation, and correlation ID propagation for end-to-end traceability.

  Background:
    Given the DeepSource MCP server with logging capabilities
    And various environments (development, staging, production)
    And sensitive data like API keys and tokens

  Scenario: Production logging defaults to INFO level
    Given the server is running in production (NODE_ENV=production)
    And no LOG_LEVEL is explicitly set
    When the server starts
    Then the log level should default to INFO
    And DEBUG messages should not be logged
    And INFO, WARN, and ERROR messages should be logged

  Scenario: Development logging defaults to DEBUG level
    Given the server is running in development (NODE_ENV=development)
    And no LOG_LEVEL is explicitly set
    When the server starts
    Then the log level should default to DEBUG
    And all log levels should be logged

  Scenario: Sensitive data scrubbing
    Given a log message containing sensitive data
    When the data is logged
    Then sensitive fields should be scrubbed:
      | Field Type        | Original                      | Logged As                    |
      | API Key           | sk_live_abcd1234efgh5678      | sk_live_****5678             |
      | Bearer Token      | Bearer eyJhbGc...             | Bearer ****                  |
      | Password          | mySecretPassword123           | ****                         |
      | Email             | user@example.com              | u***@example.com             |
      | Authorization     | Basic dXNlcjpwYXNz            | Basic ****                   |
      | Credit Card       | 4111111111111111              | ****1111                     |
      | SSN               | 123-45-6789                   | ***-**-6789                  |

  Scenario: Large payload truncation
    Given a log message with a large payload
    When the payload exceeds <size_limit>
    Then the payload should be truncated
    And include metadata about truncation

    Examples:
      | payload_type      | size_limit | truncation_indicator         |
      | JSON object       | 1KB        | {...truncated: "1.5KB total"}|
      | Array             | 100 items  | [...100 items, truncated]    |
      | String            | 500 chars  | "...truncated at 500 chars"  |
      | GraphQL response  | 2KB        | {data: "...truncated"}       |

  Scenario: Correlation ID generation and propagation
    Given a new request arrives at the server
    When the request is processed
    Then a unique correlation ID should be generated
    And the ID format should be "ds-{timestamp}-{random}"
    And the ID should appear in all related log entries
    And the ID should propagate through all layers:
      | Layer             | Log Field Name    | Example                        |
      | HTTP Request      | correlationId     | ds-1704963600000-a1b2c3       |
      | Handler           | correlationId     | ds-1704963600000-a1b2c3       |
      | Client            | correlationId     | ds-1704963600000-a1b2c3       |
      | GraphQL           | correlationId     | ds-1704963600000-a1b2c3       |
      | Error             | correlationId     | ds-1704963600000-a1b2c3       |

  Scenario: Structured logging format
    Given any log message
    When it is written
    Then it should use structured JSON format:
      | Field         | Type      | Required | Description                      |
      | timestamp     | ISO8601   | Yes      | 2024-01-15T10:30:00.123Z       |
      | level         | string    | Yes      | DEBUG/INFO/WARN/ERROR           |
      | context       | string    | Yes      | Component name                   |
      | message       | string    | Yes      | Log message                      |
      | correlationId | string    | No       | Request correlation ID           |
      | traceId       | string    | No       | Distributed trace ID             |
      | spanId        | string    | No       | Span ID within trace             |
      | userId        | string    | No       | Scrubbed user identifier         |
      | duration      | number    | No       | Operation duration in ms         |
      | data          | object    | No       | Additional structured data       |

  Scenario: Performance logging optimization
    Given a high-frequency operation (>100/sec)
    When logging is configured
    Then sampling should be applied:
      | Operation Type    | Sample Rate | Condition                      |
      | Health checks     | 1%          | Log 1 in 100 requests          |
      | Cache hits        | 0.1%        | Log 1 in 1000 hits             |
      | Successful API    | 10%         | Log 1 in 10 successes          |
      | Failed API        | 100%        | Log all failures               |
      | Slow queries      | 100%        | Log all >1000ms queries        |

  Scenario: Context preservation across async operations
    Given an async operation with correlation ID "ds-123-abc"
    When the operation spawns child tasks
    Then all child tasks should inherit the correlation ID
    And async context should be maintained using AsyncLocalStorage
    And the correlation ID should appear in all related logs

  Scenario: Log rotation and retention
    Given log files are being written
    When the log file reaches size or age limits
    Then rotation should occur:
      | Trigger       | Value    | Action                           |
      | File size     | 100MB    | Rotate to new file               |
      | Age           | 24 hours | Rotate daily at midnight         |
      | Max files     | 7        | Delete oldest when exceeded      |
      | Compression   | After    | Gzip completed log files         |

  Scenario: Sensitive data patterns detection
    Given various data patterns in logs
    When the scrubber processes them
    Then it should detect and scrub:
      | Pattern                    | Regex/Method                        | Replacement    |
      | API Keys                   | /sk_[a-zA-Z0-9]{20,}/             | sk_****[last4] |
      | JWT Tokens                 | /eyJ[a-zA-Z0-9._-]+/               | ****           |
      | URLs with secrets          | /\?.*(?:key|token|secret)=[^&]+/  | ?****          |
      | Base64 credentials         | /Basic [a-zA-Z0-9+/]+=*/           | Basic ****     |
      | GraphQL mutations          | /mutation.*password/is             | [REDACTED]     |

  Scenario: Error logging enhancement
    Given an error occurs in the system
    When it is logged
    Then the log should include:
      | Field              | Description                         |
      | error.name         | Error class name                    |
      | error.message      | Scrubbed error message             |
      | error.code         | Error code if available            |
      | error.stack        | Stack trace (dev only)             |
      | error.cause        | Root cause if chained              |
      | context            | Request/operation context          |
      | correlationId      | Request correlation ID             |

  Scenario: Compliance mode
    Given compliance requirements (GDPR, HIPAA, etc.)
    When COMPLIANCE_MODE is enabled
    Then additional scrubbing should apply:
      | Data Type          | Action                              |
      | Personal names     | Replace with hash                  |
      | IP addresses       | Mask last octet                     |
      | User IDs           | Use salted hash                     |
      | Timestamps         | Round to hour                       |
      | Geolocation        | Round to city level                 |

Acceptance Criteria

  • Environment-based log level defaults:
    • Production: INFO level by default
    • Development: DEBUG level by default
    • Configurable via LOG_LEVEL environment variable
  • Sensitive data scrubbing:
    • API keys, tokens, passwords automatically redacted
    • Configurable scrubbing patterns
    • Zero sensitive data in production logs
  • Payload truncation:
    • Large objects/arrays truncated with size metadata
    • Configurable size limits
    • Preservation of structure indicators
  • Correlation ID system:
    • Automatic generation for each request
    • Propagation through all layers
    • AsyncLocalStorage for async context
    • Format: ds-{timestamp}-{random}
  • Structured JSON logging:
    • Consistent field names
    • ISO8601 timestamps
    • Parseable by log aggregators
  • Performance optimizations:
    • Sampling for high-frequency operations
    • Lazy evaluation of expensive log data
    • Conditional logging based on level
  • Log rotation and retention:
    • Size and time-based rotation
    • Compression of old logs
    • Configurable retention policies
  • Error enhancement:
    • Complete error context
    • Stack traces in development only
    • Error categorization

Non-Goals

  • Will NOT implement external log shipping (use file watchers)
  • Will NOT add custom log aggregation services
  • Will NOT implement distributed tracing protocols (OpenTelemetry)
  • Will NOT encrypt log files at rest
  • Out of scope: Log visualization dashboards
  • Will NOT implement audit logging (separate concern)
  • Will NOT add log replay functionality

Risks & Mitigations

  • Risk: Over-scrubbing making debugging difficult
    Mitigation: Configurable scrubbing rules and debug mode override

  • Risk: Performance impact from scrubbing and truncation
    Mitigation: Lazy evaluation and regex caching

  • Risk: Lost correlation during async operations
    Mitigation: AsyncLocalStorage for context preservation

  • Risk: Disk space exhaustion from logs
    Mitigation: Rotation, compression, and retention policies

Technical Considerations

  • Architecture impact:

    • Add correlation ID middleware
    • Enhance logger with scrubbing pipeline
    • Implement AsyncLocalStorage context
    • Add log rotation mechanism
  • Implementation approach:

    // Enhanced logger with security
    class SecureLogger {
      private scrubbers: Scrubber[] = [
        new ApiKeyScrubber(),
        new TokenScrubber(),
        new PasswordScrubber(),
      ];
      
      private truncate(data: unknown): unknown {
        if (typeof data === 'string' && data.length > 500) {
          return `${data.slice(0, 500)}...truncated`;
        }
        if (Array.isArray(data) && data.length > 100) {
          return [...data.slice(0, 100), '...truncated'];
        }
        // More truncation logic
        return data;
      }
      
      log(level: LogLevel, message: string, data?: unknown) {
        const scrubbed = this.scrub(data);
        const truncated = this.truncate(scrubbed);
        const entry = {
          timestamp: new Date().toISOString(),
          level,
          correlationId: getCorrelationId(),
          message,
          data: truncated,
        };
        this.write(entry);
      }
    }
  • Correlation ID propagation:

    import { AsyncLocalStorage } from 'async_hooks';
    
    const asyncContext = new AsyncLocalStorage<Context>();
    
    interface Context {
      correlationId: string;
      traceId?: string;
      spanId?: string;
    }
    
    function withCorrelation<T>(
      correlationId: string, 
      fn: () => T
    ): T {
      return asyncContext.run({ correlationId }, fn);
    }
  • Environment detection:

    function getDefaultLogLevel(): LogLevel {
      const env = process.env.NODE_ENV || 'development';
      return env === 'production' ? LogLevel.INFO : LogLevel.DEBUG;
    }

Testing Requirements

  • Unit tests for scrubbing patterns
  • Integration tests for correlation ID propagation
  • Performance tests for truncation impact
  • Security tests for sensitive data leakage
  • Async context preservation tests
  • Log rotation tests
  • Compliance mode tests
  • Environment-based default tests
  • Structured format validation tests

Definition of Done

  • Production defaults to INFO level
  • All sensitive data patterns scrubbed
  • Large payloads truncated appropriately
  • Correlation IDs propagate through all layers
  • Structured JSON format implemented
  • Log rotation working
  • AsyncLocalStorage context preservation
  • Performance impact <1% overhead
  • Zero sensitive data in test logs
  • All tests passing with >95% coverage
  • Documentation updated
  • Migration guide for existing logs
  • Compliance mode validated
  • Code reviewed and approved

Implementation Notes

  1. Phased Rollout:

    • Phase 1: Implement scrubbing pipeline
    • Phase 2: Add correlation IDs
    • Phase 3: Structured logging format
    • Phase 4: Log rotation
    • Phase 5: Performance optimizations
  2. Scrubber Registry:

    const SCRUBBERS = {
      apiKey: /(?:api[_-]?key|apikey)[\s:="]*([a-zA-Z0-9_-]{20,})/gi,
      bearer: /Bearer\s+[a-zA-Z0-9._-]+/gi,
      email: /([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})/g,
      // ... more patterns
    };
  3. Configuration:

    interface LogConfig {
      level: LogLevel;
      scrubbing: {
        enabled: boolean;
        patterns: RegExp[];
        compliance: boolean;
      };
      truncation: {
        maxStringLength: number;
        maxArrayLength: number;
        maxObjectDepth: number;
      };
      rotation: {
        maxSize: string;
        maxAge: string;
        maxFiles: number;
      };
    }
  4. Performance Monitoring:

    let logCounter = 0;
    let sampleRate = 100;
    
    function shouldSample(): boolean {
      return ++logCounter % sampleRate === 0;
    }
  5. Testing Strategy:

    • Test with real sensitive data patterns
    • Verify no leakage in any log level
    • Benchmark scrubbing performance
    • Test correlation across async boundaries
    • Validate structured format parsing

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity improvements and fixes

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions