Initial commit
This commit is contained in:
313
skills/docs.lint.links/SKILL.md
Normal file
313
skills/docs.lint.links/SKILL.md
Normal file
@@ -0,0 +1,313 @@
|
||||
# docs.lint.links
|
||||
|
||||
## Overview
|
||||
|
||||
**docs.lint.links** validates Markdown links to detect broken internal or external links, with optional autofix mode to correct common issues.
|
||||
|
||||
## Purpose
|
||||
|
||||
This skill helps maintain documentation quality by:
|
||||
- Scanning all `.md` files in a repository
|
||||
- Detecting broken external links (404s and other HTTP errors)
|
||||
- Detecting broken internal links (relative paths that don't resolve)
|
||||
- Providing suggested fixes for common issues
|
||||
- Automatically fixing case mismatches and `.md` extension issues
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py [root_dir] [options]
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| `root_dir` | string | No | `.` | Root directory to search for Markdown files |
|
||||
| `--no-external` | boolean | No | false | Skip checking external links (faster) |
|
||||
| `--autofix` | boolean | No | false | Automatically fix common issues (case mismatches, .md extension issues) |
|
||||
| `--timeout` | integer | No | `10` | Timeout for external link checks in seconds |
|
||||
| `--exclude` | string | No | - | Comma-separated list of patterns to exclude (e.g., 'node_modules,.git') |
|
||||
| `--output` | string | No | `json` | Output format (json or text) |
|
||||
|
||||
## Outputs
|
||||
|
||||
| Output | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `lint_results` | object | JSON object containing link validation results with issues and statistics |
|
||||
| `issues` | array | Array of link issues found, each with file, line, link, issue type, and suggested fix |
|
||||
| `summary` | object | Summary statistics including files checked, issues found, and fixes applied |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: Basic Link Validation
|
||||
|
||||
Check all markdown files in the current directory for broken links:
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py
|
||||
```
|
||||
|
||||
### Example 2: Skip External Link Checks
|
||||
|
||||
Check only internal links (much faster):
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py --no-external
|
||||
```
|
||||
|
||||
### Example 3: Auto-fix Common Issues
|
||||
|
||||
Automatically fix case mismatches and `.md` extension issues:
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py --autofix
|
||||
```
|
||||
|
||||
### Example 4: Check Specific Directory
|
||||
|
||||
Check markdown files in the `docs` directory:
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py docs/
|
||||
```
|
||||
|
||||
### Example 5: Exclude Patterns
|
||||
|
||||
Exclude certain directories from checking:
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py --exclude "node_modules,vendor,.venv"
|
||||
```
|
||||
|
||||
### Example 6: Text Output
|
||||
|
||||
Get human-readable text output instead of JSON:
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py --output text
|
||||
```
|
||||
|
||||
### Example 7: Custom Timeout
|
||||
|
||||
Use a longer timeout for external link checks:
|
||||
|
||||
```bash
|
||||
python skills/docs.lint.links/docs_link_lint.py --timeout 30
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
### JSON Output (Default)
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"summary": {
|
||||
"files_checked": 42,
|
||||
"files_with_issues": 3,
|
||||
"total_issues": 5,
|
||||
"autofix_enabled": false,
|
||||
"total_fixes_applied": 0
|
||||
},
|
||||
"issues": [
|
||||
{
|
||||
"file": "docs/api.md",
|
||||
"line": 15,
|
||||
"link": "../README.MD",
|
||||
"issue_type": "internal_broken",
|
||||
"message": "File not found: ../README.MD (found case mismatch: README.md)",
|
||||
"suggested_fix": "../README.md"
|
||||
},
|
||||
{
|
||||
"file": "docs/guide.md",
|
||||
"line": 23,
|
||||
"link": "https://example.com/missing",
|
||||
"issue_type": "external_broken",
|
||||
"message": "External link is broken: HTTP 404"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Text Output
|
||||
|
||||
```
|
||||
Markdown Link Lint Results
|
||||
==================================================
|
||||
Files checked: 42
|
||||
Files with issues: 3
|
||||
Total issues: 5
|
||||
|
||||
Issues found:
|
||||
--------------------------------------------------
|
||||
|
||||
docs/api.md:15
|
||||
Link: ../README.MD
|
||||
Issue: File not found: ../README.MD (found case mismatch: README.md)
|
||||
Suggested fix: ../README.md
|
||||
|
||||
docs/guide.md:23
|
||||
Link: https://example.com/missing
|
||||
Issue: External link is broken: HTTP 404
|
||||
```
|
||||
|
||||
## Issue Types
|
||||
|
||||
### Internal Broken Links
|
||||
|
||||
These are relative file paths that don't resolve:
|
||||
|
||||
- **Case mismatches**: `README.MD` when file is `README.md`
|
||||
- **Missing `.md` extension**: `guide` when file is `guide.md`
|
||||
- **Extra `.md` extension**: `file.md` when file is `file`
|
||||
- **File not found**: Path doesn't exist in the repository
|
||||
|
||||
### External Broken Links
|
||||
|
||||
These are HTTP/HTTPS URLs that return errors:
|
||||
|
||||
- **404 Not Found**: Page doesn't exist
|
||||
- **403 Forbidden**: Access denied
|
||||
- **500+ Server Errors**: Server-side issues
|
||||
- **Timeout**: Server didn't respond in time
|
||||
- **Network errors**: DNS failures, connection refused, etc.
|
||||
|
||||
## Autofix Behavior
|
||||
|
||||
When `--autofix` is enabled, the skill will automatically correct:
|
||||
|
||||
1. **Case mismatches**: If a link uses wrong case but a case-insensitive match exists
|
||||
2. **Missing `.md` extension**: If a link is missing `.md` but the file exists with it
|
||||
3. **Extra `.md` extension**: If a link has `.md` but the file exists without it
|
||||
|
||||
The autofix preserves:
|
||||
- Anchor fragments (e.g., `#section`)
|
||||
- Query parameters (e.g., `?version=1.0`)
|
||||
|
||||
**Note**: Autofix modifies files in place. It's recommended to use version control or create backups before using this option.
|
||||
|
||||
## Link Detection
|
||||
|
||||
The skill detects the following link formats:
|
||||
|
||||
1. **Standard markdown links**: `[text](url)`
|
||||
2. **Angle bracket URLs**: `<https://example.com>`
|
||||
3. **Reference-style links**: `[text][ref]` with `[ref]: url` definitions
|
||||
4. **Implicit reference links**: `[text][]` using text as reference
|
||||
|
||||
## Excluded Patterns
|
||||
|
||||
By default, the following patterns are excluded from scanning:
|
||||
|
||||
- `.git/`
|
||||
- `node_modules/`
|
||||
- `.venv/` and `venv/`
|
||||
- `__pycache__/`
|
||||
|
||||
Additional patterns can be excluded using the `--exclude` parameter.
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### Use in CI/CD
|
||||
|
||||
Add to your CI pipeline to catch broken links:
|
||||
|
||||
```yaml
|
||||
# .github/workflows/docs-lint.yml
|
||||
name: Documentation Link Check
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
lint-docs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Check documentation links
|
||||
run: |
|
||||
python skills/docs.lint.links/docs_link_lint.py --no-external
|
||||
```
|
||||
|
||||
### Use with Pre-commit Hook
|
||||
|
||||
Add to `.git/hooks/pre-commit`:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
python skills/docs.lint.links/docs_link_lint.py --no-external --output text
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "Documentation has broken links. Please fix before committing."
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### Use in Documentation Workflow
|
||||
|
||||
```yaml
|
||||
# workflows/documentation.yaml
|
||||
steps:
|
||||
- skill: docs.lint.links
|
||||
args:
|
||||
- "docs/"
|
||||
- "--autofix"
|
||||
- skill: docs.lint.links
|
||||
args:
|
||||
- "docs/"
|
||||
- "--output=text"
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### External Link Checking
|
||||
|
||||
Checking external links can be slow because:
|
||||
- Each link requires an HTTP request
|
||||
- Some servers may rate-limit or block automated requests
|
||||
- Network latency and timeouts add up
|
||||
|
||||
**Recommendations**:
|
||||
- Use `--no-external` for fast local checks
|
||||
- Use `--timeout` to adjust timeout for slow networks
|
||||
- Run external checks less frequently (e.g., nightly builds)
|
||||
|
||||
### Large Repositories
|
||||
|
||||
For repositories with many markdown files:
|
||||
- Use `--exclude` to skip irrelevant directories
|
||||
- Consider checking specific subdirectories instead of the entire repo
|
||||
- The skill automatically skips common directories like `node_modules`
|
||||
|
||||
## Error Handling
|
||||
|
||||
The skill returns:
|
||||
- Exit code `0` if no broken links are found
|
||||
- Exit code `1` if broken links are found or an error occurs
|
||||
|
||||
This makes it suitable for use in CI/CD pipelines and pre-commit hooks.
|
||||
|
||||
## Dependencies
|
||||
|
||||
_No external dependencies_
|
||||
|
||||
All functionality uses Python standard library modules:
|
||||
- `re` - Regular expression matching for link extraction
|
||||
- `urllib` - HTTP requests for external link checking
|
||||
- `pathlib` - File system operations
|
||||
- `json` - JSON output formatting
|
||||
|
||||
## Tags
|
||||
|
||||
`documentation`, `linting`, `validation`, `links`, `markdown`
|
||||
|
||||
## See Also
|
||||
|
||||
- [Betty Architecture](../../docs/betty-architecture.md) - Five-layer model
|
||||
- [Skills Framework](../../docs/skills-framework.md) - Betty skills framework
|
||||
- [generate.docs](../generate.docs/SKILL.md) - Generate documentation from manifests
|
||||
|
||||
## Version
|
||||
|
||||
**0.1.0** - Initial implementation with link validation and autofix support
|
||||
1
skills/docs.lint.links/__init__.py
Normal file
1
skills/docs.lint.links/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# Auto-generated package initializer for skills.
|
||||
609
skills/docs.lint.links/docs_link_lint.py
Executable file
609
skills/docs.lint.links/docs_link_lint.py
Executable file
@@ -0,0 +1,609 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
docs_link_lint.py - Implementation of the docs.lint.links Skill.
|
||||
|
||||
Validates Markdown links to detect broken internal or external links.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
from urllib.parse import urlparse
|
||||
from urllib.request import Request, urlopen
|
||||
from urllib.error import HTTPError, URLError
|
||||
|
||||
# Ensure project root on path for betty imports when executed directly
|
||||
|
||||
from betty.errors import BettyError # noqa: E402
|
||||
from betty.logging_utils import setup_logger # noqa: E402
|
||||
|
||||
logger = setup_logger(__name__)
|
||||
|
||||
# Regex patterns for finding links in markdown
|
||||
# Matches [text](url) format
|
||||
MARKDOWN_LINK_PATTERN = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')
|
||||
# Matches <url> format
|
||||
ANGLE_LINK_PATTERN = re.compile(r'<(https?://[^>]+)>')
|
||||
# Matches reference-style links [text][ref]
|
||||
REFERENCE_LINK_PATTERN = re.compile(r'\[([^\]]+)\]\[([^\]]*)\]')
|
||||
# Matches reference definitions [ref]: url
|
||||
REFERENCE_DEF_PATTERN = re.compile(r'^\[([^\]]+)\]:\s+(.+)$', re.MULTILINE)
|
||||
|
||||
|
||||
class LinkIssue:
|
||||
"""Represents a broken or problematic link."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
file: str,
|
||||
line: int,
|
||||
link: str,
|
||||
issue_type: str,
|
||||
message: str,
|
||||
suggested_fix: Optional[str] = None
|
||||
):
|
||||
self.file = file
|
||||
self.line = line
|
||||
self.link = link
|
||||
self.issue_type = issue_type
|
||||
self.message = message
|
||||
self.suggested_fix = suggested_fix
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary for JSON output."""
|
||||
result = {
|
||||
"file": self.file,
|
||||
"line": self.line,
|
||||
"link": self.link,
|
||||
"issue_type": self.issue_type,
|
||||
"message": self.message
|
||||
}
|
||||
if self.suggested_fix:
|
||||
result["suggested_fix"] = self.suggested_fix
|
||||
return result
|
||||
|
||||
|
||||
def find_markdown_files(root_dir: str, exclude_patterns: Optional[List[str]] = None) -> List[Path]:
|
||||
"""
|
||||
Find all .md files in the directory tree.
|
||||
|
||||
Args:
|
||||
root_dir: Root directory to search
|
||||
exclude_patterns: List of path patterns to exclude (e.g., 'node_modules', '.git')
|
||||
|
||||
Returns:
|
||||
List of Path objects for markdown files
|
||||
"""
|
||||
exclude_patterns = exclude_patterns or ['.git', 'node_modules', '.venv', 'venv', '__pycache__']
|
||||
md_files = []
|
||||
|
||||
root_path = Path(root_dir).resolve()
|
||||
|
||||
for path in root_path.rglob('*.md'):
|
||||
# Skip excluded directories
|
||||
if any(excluded in path.parts for excluded in exclude_patterns):
|
||||
continue
|
||||
md_files.append(path)
|
||||
|
||||
logger.info(f"Found {len(md_files)} markdown files")
|
||||
return md_files
|
||||
|
||||
|
||||
def is_in_code_block(line: str) -> bool:
|
||||
"""
|
||||
Check if a line contains inline code that might contain false positive links.
|
||||
|
||||
Args:
|
||||
line: Line to check
|
||||
|
||||
Returns:
|
||||
True if we should skip this line for link extraction
|
||||
"""
|
||||
# Count backticks - if odd number, we're likely inside inline code
|
||||
# This is a simple heuristic
|
||||
backtick_count = line.count('`')
|
||||
|
||||
# If we have backticks, we need to be more careful
|
||||
# For simplicity, we'll extract the content outside of backticks
|
||||
return False # We'll handle this differently
|
||||
|
||||
|
||||
def extract_links_from_markdown(content: str) -> List[Tuple[int, str, str]]:
|
||||
"""
|
||||
Extract all links from markdown content.
|
||||
|
||||
Args:
|
||||
content: Markdown file content
|
||||
|
||||
Returns:
|
||||
List of tuples: (line_number, link_text, link_url)
|
||||
"""
|
||||
lines = content.split('\n')
|
||||
links = []
|
||||
|
||||
# First, extract reference definitions
|
||||
references = {}
|
||||
for match in REFERENCE_DEF_PATTERN.finditer(content):
|
||||
ref_name = match.group(1).lower()
|
||||
ref_url = match.group(2).strip()
|
||||
references[ref_name] = ref_url
|
||||
|
||||
# Track if we're in a code block
|
||||
in_code_block = False
|
||||
|
||||
# Process each line
|
||||
for line_num, line in enumerate(lines, start=1):
|
||||
# Check for code block delimiters
|
||||
if line.strip().startswith('```'):
|
||||
in_code_block = not in_code_block
|
||||
continue
|
||||
|
||||
# Skip lines inside code blocks
|
||||
if in_code_block:
|
||||
continue
|
||||
|
||||
# Remove inline code blocks from the line before processing
|
||||
# This prevents false positives from code examples
|
||||
processed_line = re.sub(r'`[^`]+`', '', line)
|
||||
|
||||
# Find standard markdown links [text](url)
|
||||
for match in MARKDOWN_LINK_PATTERN.finditer(processed_line):
|
||||
# Check if this match is actually in the original line
|
||||
# (not removed by our inline code filter)
|
||||
match_pos = processed_line.find(match.group(0))
|
||||
if match_pos >= 0:
|
||||
text = match.group(1)
|
||||
url = match.group(2)
|
||||
links.append((line_num, text, url))
|
||||
|
||||
# Find angle bracket links <url>
|
||||
for match in ANGLE_LINK_PATTERN.finditer(processed_line):
|
||||
url = match.group(1)
|
||||
links.append((line_num, url, url))
|
||||
|
||||
# Find reference-style links [text][ref] or [text][]
|
||||
for match in REFERENCE_LINK_PATTERN.finditer(processed_line):
|
||||
text = match.group(1)
|
||||
ref = match.group(2) if match.group(2) else text
|
||||
ref_lower = ref.lower()
|
||||
if ref_lower in references:
|
||||
url = references[ref_lower]
|
||||
links.append((line_num, text, url))
|
||||
|
||||
return links
|
||||
|
||||
|
||||
def is_external_link(url: str) -> bool:
|
||||
"""Check if a URL is external (http/https)."""
|
||||
return url.startswith('http://') or url.startswith('https://')
|
||||
|
||||
|
||||
def check_external_link(url: str, timeout: int = 10) -> Optional[str]:
|
||||
"""
|
||||
Check if an external URL is accessible.
|
||||
|
||||
Args:
|
||||
url: URL to check
|
||||
timeout: Timeout in seconds
|
||||
|
||||
Returns:
|
||||
Error message if link is broken, None if OK
|
||||
"""
|
||||
try:
|
||||
# Create request with a user agent to avoid 403s from some sites
|
||||
req = Request(
|
||||
url,
|
||||
headers={
|
||||
'User-Agent': 'Betty/1.0 (Link Checker)',
|
||||
'Accept': '*/*'
|
||||
}
|
||||
)
|
||||
|
||||
with urlopen(req, timeout=timeout) as response:
|
||||
if response.status >= 400:
|
||||
return f"HTTP {response.status}"
|
||||
return None
|
||||
|
||||
except HTTPError as e:
|
||||
return f"HTTP {e.code}"
|
||||
except URLError as e:
|
||||
return f"URL Error: {e.reason}"
|
||||
except Exception as e:
|
||||
return f"Error: {str(e)}"
|
||||
|
||||
|
||||
def resolve_relative_path(md_file_path: Path, relative_url: str) -> Path:
|
||||
"""
|
||||
Resolve a relative URL from a markdown file.
|
||||
|
||||
Args:
|
||||
md_file_path: Path to the markdown file containing the link
|
||||
relative_url: Relative URL/path from the link
|
||||
|
||||
Returns:
|
||||
Resolved absolute path
|
||||
"""
|
||||
# Remove anchor/hash fragment
|
||||
url_without_anchor = relative_url.split('#')[0]
|
||||
|
||||
if not url_without_anchor:
|
||||
# Just an anchor to current file
|
||||
return md_file_path
|
||||
|
||||
# Resolve relative to the markdown file's directory
|
||||
base_dir = md_file_path.parent
|
||||
resolved = (base_dir / url_without_anchor).resolve()
|
||||
|
||||
return resolved
|
||||
|
||||
|
||||
def check_internal_link(
|
||||
md_file_path: Path,
|
||||
relative_url: str,
|
||||
root_dir: Path
|
||||
) -> Tuple[Optional[str], Optional[str]]:
|
||||
"""
|
||||
Check if an internal link is valid.
|
||||
|
||||
Args:
|
||||
md_file_path: Path to the markdown file containing the link
|
||||
relative_url: Relative URL from the link
|
||||
root_dir: Repository root directory
|
||||
|
||||
Returns:
|
||||
Tuple of (error_message, suggested_fix)
|
||||
"""
|
||||
# Remove query string and anchor
|
||||
clean_url = relative_url.split('?')[0].split('#')[0]
|
||||
|
||||
if not clean_url:
|
||||
# Just an anchor or query, assume valid
|
||||
return None, None
|
||||
|
||||
resolved = resolve_relative_path(md_file_path, clean_url)
|
||||
|
||||
# Check if file exists
|
||||
if resolved.exists():
|
||||
return None, None
|
||||
|
||||
# File doesn't exist - try to suggest fixes
|
||||
error_msg = f"File not found: {relative_url}"
|
||||
suggested_fix = None
|
||||
|
||||
# Try case-insensitive match
|
||||
if resolved.parent.exists():
|
||||
for file in resolved.parent.iterdir():
|
||||
if file.name.lower() == resolved.name.lower():
|
||||
relative_to_md = os.path.relpath(file, md_file_path.parent)
|
||||
suggested_fix = relative_to_md
|
||||
error_msg += f" (found case mismatch: {file.name})"
|
||||
break
|
||||
|
||||
# Try without .md extension if it has one
|
||||
if not suggested_fix and clean_url.endswith('.md'):
|
||||
url_without_ext = clean_url[:-3]
|
||||
resolved_without_ext = resolve_relative_path(md_file_path, url_without_ext)
|
||||
if resolved_without_ext.exists():
|
||||
relative_to_md = os.path.relpath(resolved_without_ext, md_file_path.parent)
|
||||
suggested_fix = relative_to_md
|
||||
error_msg += f" (file exists without .md extension)"
|
||||
|
||||
# Try adding .md extension if it doesn't have one
|
||||
if not suggested_fix and not clean_url.endswith('.md'):
|
||||
url_with_ext = clean_url + '.md'
|
||||
resolved_with_ext = resolve_relative_path(md_file_path, url_with_ext)
|
||||
if resolved_with_ext.exists():
|
||||
relative_to_md = os.path.relpath(resolved_with_ext, md_file_path.parent)
|
||||
suggested_fix = relative_to_md
|
||||
error_msg += f" (file exists with .md extension)"
|
||||
|
||||
return error_msg, suggested_fix
|
||||
|
||||
|
||||
def lint_markdown_file(
|
||||
md_file: Path,
|
||||
root_dir: Path,
|
||||
check_external: bool = True,
|
||||
external_timeout: int = 10
|
||||
) -> List[LinkIssue]:
|
||||
"""
|
||||
Lint a single markdown file for broken links.
|
||||
|
||||
Args:
|
||||
md_file: Path to markdown file
|
||||
root_dir: Repository root directory
|
||||
check_external: Whether to check external links
|
||||
external_timeout: Timeout for external link checks
|
||||
|
||||
Returns:
|
||||
List of LinkIssue objects
|
||||
"""
|
||||
issues = []
|
||||
|
||||
try:
|
||||
content = md_file.read_text(encoding='utf-8')
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not read {md_file}: {e}")
|
||||
return issues
|
||||
|
||||
links = extract_links_from_markdown(content)
|
||||
|
||||
for line_num, link_text, url in links:
|
||||
# Skip empty URLs
|
||||
if not url or url.strip() == '':
|
||||
continue
|
||||
|
||||
# Skip mailto and other special schemes
|
||||
if url.startswith('mailto:') or url.startswith('tel:'):
|
||||
continue
|
||||
|
||||
relative_path = os.path.relpath(md_file, root_dir)
|
||||
|
||||
if is_external_link(url):
|
||||
if check_external:
|
||||
logger.debug(f"Checking external link: {url}")
|
||||
error = check_external_link(url, timeout=external_timeout)
|
||||
if error:
|
||||
issues.append(LinkIssue(
|
||||
file=relative_path,
|
||||
line=line_num,
|
||||
link=url,
|
||||
issue_type="external_broken",
|
||||
message=f"External link is broken: {error}"
|
||||
))
|
||||
else:
|
||||
# Internal link
|
||||
logger.debug(f"Checking internal link: {url}")
|
||||
error, suggested_fix = check_internal_link(md_file, url, root_dir)
|
||||
if error:
|
||||
issues.append(LinkIssue(
|
||||
file=relative_path,
|
||||
line=line_num,
|
||||
link=url,
|
||||
issue_type="internal_broken",
|
||||
message=error,
|
||||
suggested_fix=suggested_fix
|
||||
))
|
||||
|
||||
return issues
|
||||
|
||||
|
||||
def autofix_markdown_file(
|
||||
md_file: Path,
|
||||
root_dir: Path
|
||||
) -> Tuple[int, List[str]]:
|
||||
"""
|
||||
Automatically fix common link issues in a markdown file.
|
||||
|
||||
Args:
|
||||
md_file: Path to markdown file
|
||||
root_dir: Repository root directory
|
||||
|
||||
Returns:
|
||||
Tuple of (number_of_fixes, list_of_fix_descriptions)
|
||||
"""
|
||||
try:
|
||||
content = md_file.read_text(encoding='utf-8')
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not read {md_file}: {e}")
|
||||
return 0, []
|
||||
|
||||
original_content = content
|
||||
links = extract_links_from_markdown(content)
|
||||
fixes = []
|
||||
fix_count = 0
|
||||
|
||||
for line_num, link_text, url in links:
|
||||
if is_external_link(url):
|
||||
continue
|
||||
|
||||
# Check if internal link is broken
|
||||
error, suggested_fix = check_internal_link(md_file, url, root_dir)
|
||||
|
||||
if error and suggested_fix:
|
||||
# Apply the fix
|
||||
# Preserve any anchor/hash
|
||||
anchor = ''
|
||||
if '#' in url:
|
||||
anchor = '#' + url.split('#', 1)[1]
|
||||
|
||||
new_url = suggested_fix + anchor
|
||||
|
||||
# Replace in content
|
||||
content = content.replace(f']({url})', f']({new_url})')
|
||||
fix_count += 1
|
||||
fixes.append(f"Line {line_num}: {url} -> {new_url}")
|
||||
|
||||
# Write back if changes were made
|
||||
if fix_count > 0:
|
||||
try:
|
||||
md_file.write_text(content, encoding='utf-8')
|
||||
logger.info(f"Applied {fix_count} fixes to {md_file}")
|
||||
except Exception as e:
|
||||
logger.error(f"Could not write fixes to {md_file}: {e}")
|
||||
return 0, []
|
||||
|
||||
return fix_count, fixes
|
||||
|
||||
|
||||
def lint_all_markdown(
|
||||
root_dir: str,
|
||||
check_external: bool = True,
|
||||
autofix: bool = False,
|
||||
external_timeout: int = 10,
|
||||
exclude_patterns: Optional[List[str]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Lint all markdown files in a directory.
|
||||
|
||||
Args:
|
||||
root_dir: Root directory to search
|
||||
check_external: Whether to check external links (can be slow)
|
||||
autofix: Whether to automatically fix common issues
|
||||
external_timeout: Timeout for external link checks
|
||||
exclude_patterns: Patterns to exclude from search
|
||||
|
||||
Returns:
|
||||
Result dictionary with issues and statistics
|
||||
"""
|
||||
root_path = Path(root_dir).resolve()
|
||||
md_files = find_markdown_files(root_dir, exclude_patterns)
|
||||
|
||||
all_issues = []
|
||||
all_fixes = []
|
||||
files_checked = 0
|
||||
files_with_issues = 0
|
||||
total_fixes = 0
|
||||
|
||||
for md_file in md_files:
|
||||
files_checked += 1
|
||||
|
||||
if autofix:
|
||||
fix_count, fixes = autofix_markdown_file(md_file, root_path)
|
||||
total_fixes += fix_count
|
||||
if fixes:
|
||||
relative_path = os.path.relpath(md_file, root_path)
|
||||
all_fixes.append({
|
||||
"file": relative_path,
|
||||
"fixes": fixes
|
||||
})
|
||||
|
||||
# Check for issues (after autofix if enabled)
|
||||
issues = lint_markdown_file(
|
||||
md_file,
|
||||
root_path,
|
||||
check_external=check_external,
|
||||
external_timeout=external_timeout
|
||||
)
|
||||
|
||||
if issues:
|
||||
files_with_issues += 1
|
||||
all_issues.extend(issues)
|
||||
|
||||
result = {
|
||||
"status": "success",
|
||||
"summary": {
|
||||
"files_checked": files_checked,
|
||||
"files_with_issues": files_with_issues,
|
||||
"total_issues": len(all_issues),
|
||||
"autofix_enabled": autofix,
|
||||
"total_fixes_applied": total_fixes
|
||||
},
|
||||
"issues": [issue.to_dict() for issue in all_issues]
|
||||
}
|
||||
|
||||
if autofix and all_fixes:
|
||||
result["fixes"] = all_fixes
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def main(argv: Optional[List[str]] = None) -> int:
|
||||
"""Entry point for CLI execution."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Lint Markdown files to detect broken internal or external links"
|
||||
)
|
||||
parser.add_argument(
|
||||
"root_dir",
|
||||
nargs='?',
|
||||
default='.',
|
||||
help="Root directory to search for Markdown files (default: current directory)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-external",
|
||||
action="store_true",
|
||||
help="Skip checking external links (faster)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--autofix",
|
||||
action="store_true",
|
||||
help="Automatically fix common issues (case, .md extension)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--timeout",
|
||||
type=int,
|
||||
default=10,
|
||||
help="Timeout for external link checks in seconds (default: 10)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--exclude",
|
||||
type=str,
|
||||
help="Comma-separated list of patterns to exclude (e.g., 'node_modules,.git')"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=str,
|
||||
choices=['json', 'text'],
|
||||
default='json',
|
||||
help="Output format (default: json)"
|
||||
)
|
||||
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
exclude_patterns = None
|
||||
if args.exclude:
|
||||
exclude_patterns = [p.strip() for p in args.exclude.split(',')]
|
||||
|
||||
try:
|
||||
result = lint_all_markdown(
|
||||
root_dir=args.root_dir,
|
||||
check_external=not args.no_external,
|
||||
autofix=args.autofix,
|
||||
external_timeout=args.timeout,
|
||||
exclude_patterns=exclude_patterns
|
||||
)
|
||||
|
||||
if args.output == 'json':
|
||||
print(json.dumps(result, indent=2))
|
||||
else:
|
||||
# Text output
|
||||
summary = result['summary']
|
||||
print(f"Markdown Link Lint Results")
|
||||
print(f"=" * 50)
|
||||
print(f"Files checked: {summary['files_checked']}")
|
||||
print(f"Files with issues: {summary['files_with_issues']}")
|
||||
print(f"Total issues: {summary['total_issues']}")
|
||||
|
||||
if summary['autofix_enabled']:
|
||||
print(f"Fixes applied: {summary['total_fixes_applied']}")
|
||||
|
||||
if result['issues']:
|
||||
print(f"\nIssues found:")
|
||||
print(f"-" * 50)
|
||||
for issue in result['issues']:
|
||||
print(f"\n{issue['file']}:{issue['line']}")
|
||||
print(f" Link: {issue['link']}")
|
||||
print(f" Issue: {issue['message']}")
|
||||
if issue.get('suggested_fix'):
|
||||
print(f" Suggested fix: {issue['suggested_fix']}")
|
||||
else:
|
||||
print("\n✓ No issues found!")
|
||||
|
||||
# Return non-zero if issues found
|
||||
return 1 if result['issues'] else 0
|
||||
|
||||
except BettyError as e:
|
||||
logger.error(f"Linting failed: {e}")
|
||||
result = {
|
||||
"status": "error",
|
||||
"error": str(e)
|
||||
}
|
||||
print(json.dumps(result, indent=2))
|
||||
return 1
|
||||
except Exception as e:
|
||||
logger.exception("Unexpected error during linting")
|
||||
result = {
|
||||
"status": "error",
|
||||
"error": str(e)
|
||||
}
|
||||
print(json.dumps(result, indent=2))
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
97
skills/docs.lint.links/skill.yaml
Normal file
97
skills/docs.lint.links/skill.yaml
Normal file
@@ -0,0 +1,97 @@
|
||||
name: docs.lint.links
|
||||
version: 0.1.0
|
||||
description: >
|
||||
Validates Markdown links to detect broken internal or external links,
|
||||
with optional autofix mode to correct common issues.
|
||||
|
||||
inputs:
|
||||
- name: root_dir
|
||||
type: string
|
||||
required: false
|
||||
default: "."
|
||||
description: "Root directory to search for Markdown files (default: current directory)"
|
||||
|
||||
- name: no_external
|
||||
type: boolean
|
||||
required: false
|
||||
default: false
|
||||
description: "Skip checking external links (faster)"
|
||||
|
||||
- name: autofix
|
||||
type: boolean
|
||||
required: false
|
||||
default: false
|
||||
description: "Automatically fix common issues (case mismatches, .md extension issues)"
|
||||
|
||||
- name: timeout
|
||||
type: integer
|
||||
required: false
|
||||
default: 10
|
||||
description: "Timeout for external link checks in seconds"
|
||||
|
||||
- name: exclude
|
||||
type: string
|
||||
required: false
|
||||
description: "Comma-separated list of patterns to exclude (e.g., 'node_modules,.git')"
|
||||
|
||||
- name: output
|
||||
type: string
|
||||
required: false
|
||||
default: "json"
|
||||
description: "Output format (json or text)"
|
||||
|
||||
outputs:
|
||||
- name: lint_results
|
||||
type: object
|
||||
description: "JSON object containing link validation results with issues and statistics"
|
||||
|
||||
- name: issues
|
||||
type: array
|
||||
description: "Array of link issues found, each with file, line, link, issue type, and suggested fix"
|
||||
|
||||
- name: summary
|
||||
type: object
|
||||
description: "Summary statistics including files checked, issues found, and fixes applied"
|
||||
|
||||
dependencies: []
|
||||
|
||||
status: active
|
||||
|
||||
entrypoints:
|
||||
- command: /docs/lint/links
|
||||
handler: docs_link_lint.py
|
||||
runtime: python
|
||||
description: >
|
||||
Scan all Markdown files and detect broken internal or external links.
|
||||
parameters:
|
||||
- name: root_dir
|
||||
type: string
|
||||
required: false
|
||||
description: "Root directory to search (default: current directory)"
|
||||
- name: no_external
|
||||
type: boolean
|
||||
required: false
|
||||
description: "Skip checking external links"
|
||||
- name: autofix
|
||||
type: boolean
|
||||
required: false
|
||||
description: "Automatically fix common issues"
|
||||
- name: timeout
|
||||
type: integer
|
||||
required: false
|
||||
description: "Timeout for external link checks in seconds"
|
||||
- name: exclude
|
||||
type: string
|
||||
required: false
|
||||
description: "Comma-separated exclusion patterns"
|
||||
- name: output
|
||||
type: string
|
||||
required: false
|
||||
description: "Output format (json or text)"
|
||||
|
||||
permissions:
|
||||
- filesystem:read
|
||||
- filesystem:write
|
||||
- network
|
||||
|
||||
tags: [documentation, linting, validation, links, markdown]
|
||||
Reference in New Issue
Block a user