Skip to content

Content freshness detection for external links #35

@Mearman

Description

@Mearman

Problem

External documentation links can become outdated over time, even when they're still technically valid. This is particularly problematic for rapidly evolving technology documentation where content becomes stale or APIs change.

Real-world Example

During validation of a TypeScript project, we found links to:

  • Firebase documentation that had moved to new URLs
  • Twilio API docs that were updated with breaking changes
  • GitHub Actions syntax that had deprecated features

All these links returned 200 OK but contained outdated information that could mislead developers.

Proposed Solution

Add content freshness detection that can:

  1. Detect potentially stale content

    • Check last-modified headers when available
    • Flag content older than configurable thresholds (e.g., 2+ years)
    • Detect deprecation notices in common documentation formats
  2. Content change detection

    • Store content hashes and detect significant changes
    • Flag when external content has substantially changed since last validation
    • Detect when pages have moved topics or changed focus
  3. Technology-specific rules

    • Different freshness thresholds for different domains
    • Special handling for API documentation vs general guides
    • Recognition of version-specific documentation patterns

Expected Output

📊 Validation Summary
Files processed: 18
Total links found: 163
Valid links: 163
Potentially stale links: 3

⚠️ Content Freshness Warnings:

📄 docs/firebase-setup.md (2 warnings):
  ⚠️ [external] https://firebase.google.com/docs/functions/beta (line 45)
     Status: valid but stale (last modified: 2+ years ago)
     Suggestion: Check for newer version
     
  ⚠️ [external] https://console.cloud.google.com/apis/library/cloudfunctions (line 67)
     Status: valid but may have moved
     Detected: "This page has moved" notice

🔄 Stale links: 3
✅ Fresh links: 160

Configuration Options

# .markmv.yml
freshness:
  enabled: true
  thresholds:
    general: "2 years"
    api-docs: "1 year"
    tutorials: "6 months"
  domains:
    "firebase.google.com": "1 year"
    "docs.github.com": "6 months"
  detect_patterns:
    - "deprecated"
    - "this page has moved"
    - "no longer supported"

Benefits

  • Proactive maintenance: Identify outdated references before they cause issues
  • Quality assurance: Ensure documentation references remain relevant
  • Developer experience: Prevent frustration from following outdated guides
  • CI integration: Include freshness checks in documentation quality gates

This would transform markmv from a simple link checker into a comprehensive documentation quality tool.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions