|
| 1 | +# Debug Logging for Connection and Reconnection Logic |
| 2 | + |
| 3 | +## Purpose & User Problem |
| 4 | +**Goal**: Enable engineers to identify the root cause of connection failures from debug logs alone, without needing to reproduce the issue locally. |
| 5 | + |
| 6 | +**Complex Network Environments**: |
| 7 | +- **Coder instance**: Virtual machine (AWS/cloud hosted) or local (Coder Desktop) |
| 8 | +- **VSCode client**: Desktop binary, VSCode Web, or development container |
| 9 | +- **Connection paths**: Direct, VPN, VPC, SSH tunnels, or other proxies |
| 10 | + |
| 11 | +**Problem**: Without detailed logging, distinguishing between user configuration errors, network issues, transient failures, and software bugs requires extensive back-and-forth debugging. |
| 12 | + |
| 13 | +**Edge cases NOT in scope for this pass**: IPv6-specific issues, multi-hop proxy chains, air-gapped environments |
| 14 | + |
| 15 | +## Success Criteria |
| 16 | +- **Record** SSH configuration from: |
| 17 | + - VSCode-Coder extension's generated config (primary) |
| 18 | + - User's local SSH config if accessible (via `fs.readFile` of `~/.ssh/config` with error handling) |
| 19 | +- **Record** JavaScript runtime errors via: |
| 20 | + - `process.on('uncaughtException', ...)` for unhandled errors |
| 21 | + - `process.on('unhandledRejection', ...)` for promise rejections |
| 22 | + - Memory pressure warnings when heap usage > 90% (check via `process.memoryUsage()`) |
| 23 | + - Process signals: `SIGTERM`, `SIGINT`, `SIGHUP` via `process.on('SIGTERM', ...)` |
| 24 | +- **Record** network events (as available from VSCode APIs and HTTP client): |
| 25 | + - Connection timeouts with duration in milliseconds |
| 26 | + - HTTP/WebSocket error codes and messages |
| 27 | + - Retry attempts with backoff delays in milliseconds |
| 28 | +- **Record** full connection lifecycle: |
| 29 | + - Initial connection: `{uri: string, timestamp: ISO8601, attemptNumber: number}` |
| 30 | + - Disconnection: `{timestamp: ISO8601, reason: string, duration: milliseconds}` |
| 31 | + - Reconnection: `{timestamp: ISO8601, attemptNumber: number, backoffDelay: milliseconds}` |
| 32 | + |
| 33 | +## Scope & Constraints |
| 34 | +- **ALL** new debug logs MUST be gated behind `coder.verbose` flag using `logger.debug()` |
| 35 | +- **Masking requirements** - redact these patterns before logging: |
| 36 | + - SSH private keys: Replace content between `-----BEGIN` and `-----END` with `[REDACTED KEY]` |
| 37 | + - Passwords in URLs: Replace `://user:pass@` with `://user:[REDACTED]@` |
| 38 | + - AWS keys: Replace strings matching `AKIA[0-9A-Z]{16}` with `[REDACTED AWS KEY]` |
| 39 | + - Bearer tokens: Replace `Bearer <token>` with `Bearer [REDACTED]` |
| 40 | +- **Priority areas for this pass** (in order): |
| 41 | + 1. SSH config generation and validation |
| 42 | + 2. Connection establishment and disconnection events |
| 43 | + 3. Retry logic and backoff timing |
| 44 | +- **Future enhancements** (DO NOT IMPLEMENT - add as `// TODO:` comments only): |
| 45 | + - WebSocket connection logging |
| 46 | + - HTTP API call logging |
| 47 | + - Certificate validation logging |
| 48 | + - Token refresh logging |
| 49 | + |
| 50 | +## Technical Considerations |
| 51 | +- **SSH Extension Detection** - Use this priority order from `extension.ts`: |
| 52 | + ```typescript |
| 53 | + vscode.extensions.getExtension("jeanp413.open-remote-ssh") || |
| 54 | + vscode.extensions.getExtension("codeium.windsurf-remote-openssh") || |
| 55 | + vscode.extensions.getExtension("anysphere.remote-ssh") || |
| 56 | + vscode.extensions.getExtension("ms-vscode-remote.remote-ssh") |
| 57 | + ``` |
| 58 | +- **SSH Config Logging** - Log full config with secrets masked per patterns in Scope section |
| 59 | +- **Error Handling** - Wrap ONLY: |
| 60 | + - Network API calls (axios, fetch) |
| 61 | + - SSH config file operations |
| 62 | + - Process spawning |
| 63 | + - Always log full stack trace: `logger.debug(\`Error in ${operation}: ${err.stack}\`)` |
| 64 | +- **Log Format** - Use consistent template: `[${component}#${connectionId}] ${phase}: ${message}` |
| 65 | + - component: `ssh`, `api`, `reconnect`, etc. |
| 66 | + - connectionId: unique per connection attempt |
| 67 | + - phase: `init`, `connect`, `disconnect`, `retry`, `error` |
| 68 | + - message: specific details |
| 69 | +- **Text Format** - UTF-8 encoding, `\n` line endings (VSCode output channel handles platform differences) |
| 70 | + |
| 71 | +## Out of Scope |
| 72 | +- **Existing logger.info calls** - DO NOT modify. Verify with: `grep -n "logger\.info(" src/**/*.ts` |
| 73 | +- **Third-party code** - No changes to node_modules or external extension APIs |
| 74 | +- **Performance safeguards** - No log rotation or size limits in this pass (VSCode output channels handle this) |
| 75 | + - Note: If SSH configs exceed 10KB, truncate with `[TRUNCATED after 10KB]` |
| 76 | +- **Structured logging** - NO JSON objects or structured fields. Use only plain strings to ease future migration |
| 77 | +- **User notification** - No UI alerts, notifications, or status bar updates about connection issues |
0 commit comments