Files
gh-epieczko-betty/skills/docs.lint.links/SKILL.md
2025-11-29 18:26:08 +08:00

8.1 KiB

docs.lint.links

Overview

docs.lint.links validates Markdown links to detect broken internal or external links, with optional autofix mode to correct common issues.

Purpose

This skill helps maintain documentation quality by:

  • Scanning all .md files in a repository
  • Detecting broken external links (404s and other HTTP errors)
  • Detecting broken internal links (relative paths that don't resolve)
  • Providing suggested fixes for common issues
  • Automatically fixing case mismatches and .md extension issues

Usage

Basic Usage

python skills/docs.lint.links/docs_link_lint.py [root_dir] [options]

Parameters

Parameter Type Required Default Description
root_dir string No . Root directory to search for Markdown files
--no-external boolean No false Skip checking external links (faster)
--autofix boolean No false Automatically fix common issues (case mismatches, .md extension issues)
--timeout integer No 10 Timeout for external link checks in seconds
--exclude string No - Comma-separated list of patterns to exclude (e.g., 'node_modules,.git')
--output string No json Output format (json or text)

Outputs

Output Type Description
lint_results object JSON object containing link validation results with issues and statistics
issues array Array of link issues found, each with file, line, link, issue type, and suggested fix
summary object Summary statistics including files checked, issues found, and fixes applied

Usage Examples

Check all markdown files in the current directory for broken links:

python skills/docs.lint.links/docs_link_lint.py

Check only internal links (much faster):

python skills/docs.lint.links/docs_link_lint.py --no-external

Example 3: Auto-fix Common Issues

Automatically fix case mismatches and .md extension issues:

python skills/docs.lint.links/docs_link_lint.py --autofix

Example 4: Check Specific Directory

Check markdown files in the docs directory:

python skills/docs.lint.links/docs_link_lint.py docs/

Example 5: Exclude Patterns

Exclude certain directories from checking:

python skills/docs.lint.links/docs_link_lint.py --exclude "node_modules,vendor,.venv"

Example 6: Text Output

Get human-readable text output instead of JSON:

python skills/docs.lint.links/docs_link_lint.py --output text

Example 7: Custom Timeout

Use a longer timeout for external link checks:

python skills/docs.lint.links/docs_link_lint.py --timeout 30

Output Format

JSON Output (Default)

{
  "status": "success",
  "summary": {
    "files_checked": 42,
    "files_with_issues": 3,
    "total_issues": 5,
    "autofix_enabled": false,
    "total_fixes_applied": 0
  },
  "issues": [
    {
      "file": "docs/api.md",
      "line": 15,
      "link": "../README.MD",
      "issue_type": "internal_broken",
      "message": "File not found: ../README.MD (found case mismatch: README.md)",
      "suggested_fix": "../README.md"
    },
    {
      "file": "docs/guide.md",
      "line": 23,
      "link": "https://example.com/missing",
      "issue_type": "external_broken",
      "message": "External link is broken: HTTP 404"
    }
  ]
}

Text Output

Markdown Link Lint Results
==================================================
Files checked: 42
Files with issues: 3
Total issues: 5

Issues found:
--------------------------------------------------

docs/api.md:15
  Link: ../README.MD
  Issue: File not found: ../README.MD (found case mismatch: README.md)
  Suggested fix: ../README.md

docs/guide.md:23
  Link: https://example.com/missing
  Issue: External link is broken: HTTP 404

Issue Types

These are relative file paths that don't resolve:

  • Case mismatches: README.MD when file is README.md
  • Missing .md extension: guide when file is guide.md
  • Extra .md extension: file.md when file is file
  • File not found: Path doesn't exist in the repository

These are HTTP/HTTPS URLs that return errors:

  • 404 Not Found: Page doesn't exist
  • 403 Forbidden: Access denied
  • 500+ Server Errors: Server-side issues
  • Timeout: Server didn't respond in time
  • Network errors: DNS failures, connection refused, etc.

Autofix Behavior

When --autofix is enabled, the skill will automatically correct:

  1. Case mismatches: If a link uses wrong case but a case-insensitive match exists
  2. Missing .md extension: If a link is missing .md but the file exists with it
  3. Extra .md extension: If a link has .md but the file exists without it

The autofix preserves:

  • Anchor fragments (e.g., #section)
  • Query parameters (e.g., ?version=1.0)

Note: Autofix modifies files in place. It's recommended to use version control or create backups before using this option.

The skill detects the following link formats:

  1. Standard markdown links: [text](url)
  2. Angle bracket URLs: <https://example.com>
  3. Reference-style links: [text][ref] with [ref]: url definitions
  4. Implicit reference links: [text][] using text as reference

Excluded Patterns

By default, the following patterns are excluded from scanning:

  • .git/
  • node_modules/
  • .venv/ and venv/
  • __pycache__/

Additional patterns can be excluded using the --exclude parameter.

Integration Examples

Use in CI/CD

Add to your CI pipeline to catch broken links:

# .github/workflows/docs-lint.yml
name: Documentation Link Check

on: [push, pull_request]

jobs:
  lint-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Check documentation links
        run: |
          python skills/docs.lint.links/docs_link_lint.py --no-external

Use with Pre-commit Hook

Add to .git/hooks/pre-commit:

#!/bin/bash
python skills/docs.lint.links/docs_link_lint.py --no-external --output text
if [ $? -ne 0 ]; then
  echo "Documentation has broken links. Please fix before committing."
  exit 1
fi

Use in Documentation Workflow

# workflows/documentation.yaml
steps:
  - skill: docs.lint.links
    args:
      - "docs/"
      - "--autofix"
  - skill: docs.lint.links
    args:
      - "docs/"
      - "--output=text"

Performance Considerations

Checking external links can be slow because:

  • Each link requires an HTTP request
  • Some servers may rate-limit or block automated requests
  • Network latency and timeouts add up

Recommendations:

  • Use --no-external for fast local checks
  • Use --timeout to adjust timeout for slow networks
  • Run external checks less frequently (e.g., nightly builds)

Large Repositories

For repositories with many markdown files:

  • Use --exclude to skip irrelevant directories
  • Consider checking specific subdirectories instead of the entire repo
  • The skill automatically skips common directories like node_modules

Error Handling

The skill returns:

  • Exit code 0 if no broken links are found
  • Exit code 1 if broken links are found or an error occurs

This makes it suitable for use in CI/CD pipelines and pre-commit hooks.

Dependencies

No external dependencies

All functionality uses Python standard library modules:

  • re - Regular expression matching for link extraction
  • urllib - HTTP requests for external link checking
  • pathlib - File system operations
  • json - JSON output formatting

Tags

documentation, linting, validation, links, markdown

See Also

Version

0.1.0 - Initial implementation with link validation and autofix support