Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:01:30 +08:00
commit 9c0b92f025
39 changed files with 9512 additions and 0 deletions

105
project-migrate/SKILL.md Normal file
View File

@@ -0,0 +1,105 @@
---
name: project-migrate
description: Use this skill to migrate existing projects to the SynthesisFlow structure. It uses an AI-powered analysis to intelligently discover, categorize, and migrate documentation, generate rich frontmatter, and preserve git history.
---
# Project Migrate Skill
## Purpose
To intelligently migrate existing projects (brownfield) to the SynthesisFlow directory structure using a powerful, AI-assisted workflow. This skill goes beyond simple file moving by leveraging the **Gemini CLI** to analyze document content, ensuring accurate categorization and the generation of rich, meaningful metadata. It provides a safe, guided migration with discovery, analysis, backup, and validation phases to ensure zero data loss and high-quality results.
## When to Use
Use this skill in the following situations:
- Adding SynthesisFlow to an existing project with established documentation.
- Migrating docs from an ad-hoc structure to SynthesisFlow conventions.
- When you want to automatically and intelligently categorize and add metadata to existing documents.
- To ensure a safe migration with backups and rollback capabilities.
## Prerequisites
- Project with existing documentation (`docs/`, `documentation/`, `wiki/`, or markdown files).
- Git repository initialized.
- Write permissions to the project directory.
- `gemini` CLI tool installed and authenticated.
- `doc-indexer` skill available for final compliance checking.
## Workflow
The skill guides you through a series of phases with interactive approval.
### Step 1: Run the Migration Script
Execute with one of three modes:
**Interactive (default)** - Review and approve each phase:
```bash
bash scripts/project-migrate.sh
```
**Dry-run** - Preview the plan without making any changes:
```bash
bash scripts/project-migrate.sh --dry-run
```
**Auto-approve** - Skip prompts for automation (useful for CI/CD):
```bash
bash scripts/project-migrate.sh --auto-approve
```
### Step 2: Review Each Phase
**Phase 1 & 2 - AI-Powered Discovery and Analysis**:
The script scans for all markdown files. For each file, it calls the **Gemini CLI** to analyze the document's *content*, not just its filename. This results in a much more accurate categorization of files into types like `spec`, `proposal`, `adr`, etc. The output is a detailed plan mapping each file to its new, correct location in the SynthesisFlow structure.
**Phase 3 - Planning**:
Shows you the complete, AI-driven migration plan for your approval. You can review source and target mappings before any files are moved.
**Phase 4 - Backup**:
Creates a timestamped backup directory of your entire `docs/` folder and includes a `rollback.sh` script before any changes are made.
**Phase 5 - Migration**:
Executes the plan, moving files using `git mv` to preserve history and creating the necessary directory structure.
**Phase 6 - LLM-Based Link Updates**:
Uses the Gemini CLI to intelligently identify and correct broken or outdated relative links within migrated files. This LLM-based approach is more robust than simple path recalculation, as it understands document context and can handle edge cases that pattern matching might miss.
**Phase 7 - Validation**:
Verifies that all files were migrated correctly, checks link integrity, and validates the new directory structure.
**Phase 8 - AI-Powered Frontmatter Generation (Optional)**:
For files that lack YAML frontmatter, the script uses the **Gemini CLI** to read the file content and generate rich, `doc-indexer` compliant frontmatter. This includes a suggested `title`, the `type` determined during the analysis phase, and a concise `description` summarizing the document's purpose.
### Step 3: Post-Migration
After successful completion:
- Review the validation report for any warnings.
- Run the `doc-indexer` skill to verify full documentation compliance.
- Commit the migration changes to git.
## Error Handling
### Gemini CLI Issues
**Symptom**: The script fails during the "Analysis" or "Frontmatter Generation" phase with an error related to the `gemini` command.
**Solution**:
- Ensure the `gemini` CLI is installed and in your system's PATH.
- Verify you are authenticated by running `gemini auth`.
- Check for Gemini API outages or network connectivity issues.
- The script has basic fallbacks, but for best results, ensure the Gemini CLI is functional.
### Other Issues
For issues related to permissions, conflicts, or broken links, the script provides detailed error messages and resolution suggestions during its interactive execution. The backup and rollback script is always available for a safe exit.
## Notes
- **AI-Enhanced**: Uses Gemini for intelligent content analysis, not just simple pattern matching.
- **Safe by default**: Creates a full backup with a rollback script before making any changes.
- **Git-aware**: Preserves file history using `git mv`.
- **Interactive**: You review and approve the AI-generated plan before execution.
- **Rich Metadata**: Generates high-quality frontmatter, including titles and descriptions.
- **LLM-Powered Link Correction**: Uses Gemini to intelligently update relative links with context awareness.

View File

@@ -0,0 +1,309 @@
#!/usr/bin/env python3
"""
LLM-based link correction for project-migrate skill
This script uses an LLM to intelligently identify and correct broken or outdated
links within markdown content during file migration.
"""
import sys
import os
import re
import argparse
import json
import subprocess
from pathlib import Path
from typing import List, Dict, Tuple, Optional
def extract_markdown_links(content: str) -> List[Dict]:
"""
Extract all markdown links from content and return structured information.
Args:
content: Markdown content to analyze
Returns:
List of dictionaries with link information
"""
links = []
# Pattern to match markdown links: [text](path) and ![alt](path)
pattern = r'!\[([^\]]*)\]\(([^)]+)\)|\[([^\]]*)\]\(([^)]+)\)'
for match in re.finditer(pattern, content):
alt_text, img_src, link_text, link_href = match.groups()
if img_src:
# Image link
links.append({
'type': 'image',
'alt': alt_text,
'path': img_src,
'full_match': match.group(0)
})
elif link_href:
# Regular link
links.append({
'type': 'link',
'text': link_text,
'path': link_href,
'full_match': match.group(0)
})
return links
def should_skip_link(link_path: str) -> bool:
"""
Determine if a link should be skipped (external URLs, anchors, etc.).
Args:
link_path: The path part of the link
Returns:
True if the link should be skipped
"""
# Skip absolute URLs
if link_path.startswith(('http://', 'https://', 'mailto:', 'ftp://', 'tel:')):
return True
# Skip anchor links
if link_path.startswith('#'):
return True
# Skip email links without mailto prefix
if '@' in link_path and not link_path.startswith(('http://', 'https://')):
return True
return False
def get_file_context(file_path: str) -> Dict:
"""
Get context about the file being processed.
Args:
file_path: Path to the file
Returns:
Dictionary with file context information
"""
path = Path(file_path)
try:
relative_to_root = str(path.relative_to(Path.cwd()))
except ValueError:
# Handle case where file is not subdirectory of current working directory
relative_to_root = str(path)
context = {
'file_path': str(path.absolute()),
'filename': path.name,
'directory': str(path.parent.absolute()),
'relative_to_root': relative_to_root,
}
return context
def call_llm_for_link_correction(content: str, context: Dict) -> str:
"""
Call LLM to perform intelligent link correction.
Args:
content: Original markdown content
context: File context information
Returns:
Corrected markdown content
"""
try:
# Prepare the prompt for the LLM
prompt = f"""You are a markdown link correction assistant. Your task is to identify and correct broken or outdated relative links in the following markdown content.
Context:
- File: {context['relative_to_root']}
- Directory: {context['directory']}
Instructions:
1. Analyze all relative links in the content
2. For each link, determine if it points to an existing file
3. If a link appears broken or outdated, suggest a corrected path
4. Common migrations to consider:
- Files moved from root to docs/ directory
- Files moved from docs/ to docs/specs/ or docs/changes/
- Changes in file extensions or naming conventions
5. Preserve all external URLs, anchors, and email links unchanged
6. Only modify links that clearly need correction
Return ONLY the corrected markdown content without any additional explanation.
Content to analyze:
{content}"""
# Call Gemini CLI if available, otherwise fallback to a simple pass-through
try:
result = subprocess.run(
['gemini', '--model', 'gemini-2.5-flash'],
input=prompt,
capture_output=True,
text=True,
timeout=30
)
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
except (subprocess.TimeoutExpired, FileNotFoundError):
# Gemini not available or timed out - fallback to basic processing
pass
except Exception as e:
print(f"Warning: LLM call failed: {e}", file=sys.stderr)
# Fallback: return original content unchanged
return content
def validate_corrected_links(original: str, corrected: str) -> Dict[str, int]:
"""
Compare original and corrected content to count changes.
Args:
original: Original markdown content
corrected: Corrected markdown content
Returns:
Dictionary with change statistics
"""
original_links = extract_markdown_links(original)
corrected_links = extract_markdown_links(corrected)
original_paths = {link['path'] for link in original_links if not should_skip_link(link['path'])}
corrected_paths = {link['path'] for link in corrected_links if not should_skip_link(link['path'])}
changes = {
'total_links': len(original_links),
'skipped_links': len([link for link in original_links if should_skip_link(link['path'])]),
'corrected_links': len(original_paths - corrected_paths),
'new_links': len(corrected_paths - original_paths)
}
return changes
def correct_links_in_content(content: str, file_path: str) -> Tuple[str, Dict]:
"""
Correct links in markdown content using LLM.
Args:
content: Markdown content to process
file_path: Path to the file being processed
Returns:
Tuple of (corrected_content, statistics)
"""
# Extract links for analysis
links = extract_markdown_links(content)
# Filter for links that need processing
processable_links = [link for link in links if not should_skip_link(link['path'])]
if not processable_links:
# No links to process
return content, {
'total_links': len(links),
'processable_links': 0,
'corrected_links': 0,
'llm_called': False
}
# Get file context
context = get_file_context(file_path)
# Call LLM for correction
corrected_content = call_llm_for_link_correction(content, context)
# Validate changes
changes = validate_corrected_links(content, corrected_content)
changes.update({
'processable_links': len(processable_links),
'llm_called': True
})
return corrected_content, changes
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(
description='LLM-based markdown link correction',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Correct links in a file
cat README.md | correct_links_llm.py --file README.md
# Process multiple files
find docs -name "*.md" -exec correct_links_llm.py --file {} \\;
# Show statistics only
cat file.md | correct_links_llm.py --file file.md --stats-only
"""
)
parser.add_argument(
'--file',
required=True,
help='Path to the file being processed (required for context)'
)
parser.add_argument(
'--stats-only',
action='store_true',
help='Only show statistics, don\'t output corrected content'
)
parser.add_argument(
'--dry-run',
action='store_true',
help='Analyze without making changes'
)
args = parser.parse_args()
try:
# Read content from stdin
content = sys.stdin.read()
if not content.strip():
print("Error: No content provided on stdin", file=sys.stderr)
sys.exit(1)
# Correct links
corrected_content, stats = correct_links_in_content(content, args.file)
# Output statistics
if stats['llm_called']:
print(f"Link correction statistics for {args.file}:", file=sys.stderr)
print(f" Total links: {stats['total_links']}", file=sys.stderr)
print(f" Processable links: {stats['processable_links']}", file=sys.stderr)
print(f" Corrected links: {stats['corrected_links']}", file=sys.stderr)
print(f" Skipped links: {stats['skipped_links']}", file=sys.stderr)
else:
print(f"No links to process in {args.file}", file=sys.stderr)
# Output corrected content (unless stats-only)
if not args.stats_only:
print(corrected_content)
except KeyboardInterrupt:
print("\nInterrupted by user", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == '__main__':
main()

File diff suppressed because it is too large Load Diff