Initial commit

2025-11-29 18:01:30 +08:00
commit 9c0b92f025
39 changed files with 9512 additions and 0 deletions
--- a/project-migrate/SKILL.md
+++ b/project-migrate/SKILL.md
@@ -0,0 +1,105 @@
+---
+name: project-migrate
+description: Use this skill to migrate existing projects to the SynthesisFlow structure. It uses an AI-powered analysis to intelligently discover, categorize, and migrate documentation, generate rich frontmatter, and preserve git history.
+---
+
+# Project Migrate Skill
+
+## Purpose
+
+To intelligently migrate existing projects (brownfield) to the SynthesisFlow directory structure using a powerful, AI-assisted workflow. This skill goes beyond simple file moving by leveraging the **Gemini CLI** to analyze document content, ensuring accurate categorization and the generation of rich, meaningful metadata. It provides a safe, guided migration with discovery, analysis, backup, and validation phases to ensure zero data loss and high-quality results.
+
+## When to Use
+
+Use this skill in the following situations:
+
+- Adding SynthesisFlow to an existing project with established documentation.
+- Migrating docs from an ad-hoc structure to SynthesisFlow conventions.
+- When you want to automatically and intelligently categorize and add metadata to existing documents.
+- To ensure a safe migration with backups and rollback capabilities.
+
+## Prerequisites
+
+- Project with existing documentation (`docs/`, `documentation/`, `wiki/`, or markdown files).
+- Git repository initialized.
+- Write permissions to the project directory.
+- `gemini` CLI tool installed and authenticated.
+- `doc-indexer` skill available for final compliance checking.
+
+## Workflow
+
+The skill guides you through a series of phases with interactive approval.
+
+### Step 1: Run the Migration Script
+
+Execute with one of three modes:
+
+**Interactive (default)** - Review and approve each phase:
+```bash
+bash scripts/project-migrate.sh
+```
+
+**Dry-run** - Preview the plan without making any changes:
+```bash
+bash scripts/project-migrate.sh --dry-run
+```
+
+**Auto-approve** - Skip prompts for automation (useful for CI/CD):
+```bash
+bash scripts/project-migrate.sh --auto-approve
+```
+
+### Step 2: Review Each Phase
+
+**Phase 1 & 2 - AI-Powered Discovery and Analysis**:
+The script scans for all markdown files. For each file, it calls the **Gemini CLI** to analyze the document's *content*, not just its filename. This results in a much more accurate categorization of files into types like `spec`, `proposal`, `adr`, etc. The output is a detailed plan mapping each file to its new, correct location in the SynthesisFlow structure.
+
+**Phase 3 - Planning**:
+Shows you the complete, AI-driven migration plan for your approval. You can review source and target mappings before any files are moved.
+
+**Phase 4 - Backup**:
+Creates a timestamped backup directory of your entire `docs/` folder and includes a `rollback.sh` script before any changes are made.
+
+**Phase 5 - Migration**:
+Executes the plan, moving files using `git mv` to preserve history and creating the necessary directory structure.
+
+**Phase 6 - LLM-Based Link Updates**:
+Uses the Gemini CLI to intelligently identify and correct broken or outdated relative links within migrated files. This LLM-based approach is more robust than simple path recalculation, as it understands document context and can handle edge cases that pattern matching might miss.
+
+**Phase 7 - Validation**:
+Verifies that all files were migrated correctly, checks link integrity, and validates the new directory structure.
+
+**Phase 8 - AI-Powered Frontmatter Generation (Optional)**:
+For files that lack YAML frontmatter, the script uses the **Gemini CLI** to read the file content and generate rich, `doc-indexer` compliant frontmatter. This includes a suggested `title`, the `type` determined during the analysis phase, and a concise `description` summarizing the document's purpose.
+
+### Step 3: Post-Migration
+
+After successful completion:
+- Review the validation report for any warnings.
+- Run the `doc-indexer` skill to verify full documentation compliance.
+- Commit the migration changes to git.
+
+## Error Handling
+
+### Gemini CLI Issues
+
+**Symptom**: The script fails during the "Analysis" or "Frontmatter Generation" phase with an error related to the `gemini` command.
+
+**Solution**:
+- Ensure the `gemini` CLI is installed and in your system's PATH.
+- Verify you are authenticated by running `gemini auth`.
+- Check for Gemini API outages or network connectivity issues.
+- The script has basic fallbacks, but for best results, ensure the Gemini CLI is functional.
+
+### Other Issues
+
+For issues related to permissions, conflicts, or broken links, the script provides detailed error messages and resolution suggestions during its interactive execution. The backup and rollback script is always available for a safe exit.
+
+## Notes
+
+- **AI-Enhanced**: Uses Gemini for intelligent content analysis, not just simple pattern matching.
+- **Safe by default**: Creates a full backup with a rollback script before making any changes.
+- **Git-aware**: Preserves file history using `git mv`.
+- **Interactive**: You review and approve the AI-generated plan before execution.
+- **Rich Metadata**: Generates high-quality frontmatter, including titles and descriptions.
+- **LLM-Powered Link Correction**: Uses Gemini to intelligently update relative links with context awareness.
--- a/project-migrate/scripts/correct_links_llm.py
+++ b/project-migrate/scripts/correct_links_llm.py
@@ -0,0 +1,309 @@
+#!/usr/bin/env python3
+"""
+LLM-based link correction for project-migrate skill
+
+This script uses an LLM to intelligently identify and correct broken or outdated
+links within markdown content during file migration.
+"""
+
+import sys
+import os
+import re
+import argparse
+import json
+import subprocess
+from pathlib import Path
+from typing import List, Dict, Tuple, Optional
+
+
+def extract_markdown_links(content: str) -> List[Dict]:
+    """
+    Extract all markdown links from content and return structured information.
+
+    Args:
+        content: Markdown content to analyze
+
+    Returns:
+        List of dictionaries with link information
+    """
+    links = []
+
+    # Pattern to match markdown links: [text](path) and ![alt](path)
+    pattern = r'!\[([^\]]*)\]\(([^)]+)\)|\[([^\]]*)\]\(([^)]+)\)'
+
+    for match in re.finditer(pattern, content):
+        alt_text, img_src, link_text, link_href = match.groups()
+
+        if img_src:
+            # Image link
+            links.append({
+                'type': 'image',
+                'alt': alt_text,
+                'path': img_src,
+                'full_match': match.group(0)
+            })
+        elif link_href:
+            # Regular link
+            links.append({
+                'type': 'link',
+                'text': link_text,
+                'path': link_href,
+                'full_match': match.group(0)
+            })
+
+    return links
+
+
+def should_skip_link(link_path: str) -> bool:
+    """
+    Determine if a link should be skipped (external URLs, anchors, etc.).
+
+    Args:
+        link_path: The path part of the link
+
+    Returns:
+        True if the link should be skipped
+    """
+    # Skip absolute URLs
+    if link_path.startswith(('http://', 'https://', 'mailto:', 'ftp://', 'tel:')):
+        return True
+
+    # Skip anchor links
+    if link_path.startswith('#'):
+        return True
+
+    # Skip email links without mailto prefix
+    if '@' in link_path and not link_path.startswith(('http://', 'https://')):
+        return True
+
+    return False
+
+
+def get_file_context(file_path: str) -> Dict:
+    """
+    Get context about the file being processed.
+
+    Args:
+        file_path: Path to the file
+
+    Returns:
+        Dictionary with file context information
+    """
+    path = Path(file_path)
+
+    try:
+        relative_to_root = str(path.relative_to(Path.cwd()))
+    except ValueError:
+        # Handle case where file is not subdirectory of current working directory
+        relative_to_root = str(path)
+
+    context = {
+        'file_path': str(path.absolute()),
+        'filename': path.name,
+        'directory': str(path.parent.absolute()),
+        'relative_to_root': relative_to_root,
+    }
+
+    return context
+
+
+def call_llm_for_link_correction(content: str, context: Dict) -> str:
+    """
+    Call LLM to perform intelligent link correction.
+
+    Args:
+        content: Original markdown content
+        context: File context information
+
+    Returns:
+        Corrected markdown content
+    """
+    try:
+        # Prepare the prompt for the LLM
+        prompt = f"""You are a markdown link correction assistant. Your task is to identify and correct broken or outdated relative links in the following markdown content.
+
+Context:
+- File: {context['relative_to_root']}
+- Directory: {context['directory']}
+
+Instructions:
+1. Analyze all relative links in the content
+2. For each link, determine if it points to an existing file
+3. If a link appears broken or outdated, suggest a corrected path
+4. Common migrations to consider:
+   - Files moved from root to docs/ directory
+   - Files moved from docs/ to docs/specs/ or docs/changes/
+   - Changes in file extensions or naming conventions
+5. Preserve all external URLs, anchors, and email links unchanged
+6. Only modify links that clearly need correction
+
+Return ONLY the corrected markdown content without any additional explanation.
+
+Content to analyze:
+{content}"""
+
+        # Call Gemini CLI if available, otherwise fallback to a simple pass-through
+        try:
+            result = subprocess.run(
+                ['gemini', '--model', 'gemini-2.5-flash'],
+                input=prompt,
+                capture_output=True,
+                text=True,
+                timeout=30
+            )
+
+            if result.returncode == 0 and result.stdout.strip():
+                return result.stdout.strip()
+        except (subprocess.TimeoutExpired, FileNotFoundError):
+            # Gemini not available or timed out - fallback to basic processing
+            pass
+
+    except Exception as e:
+        print(f"Warning: LLM call failed: {e}", file=sys.stderr)
+
+    # Fallback: return original content unchanged
+    return content
+
+
+def validate_corrected_links(original: str, corrected: str) -> Dict[str, int]:
+    """
+    Compare original and corrected content to count changes.
+
+    Args:
+        original: Original markdown content
+        corrected: Corrected markdown content
+
+    Returns:
+        Dictionary with change statistics
+    """
+    original_links = extract_markdown_links(original)
+    corrected_links = extract_markdown_links(corrected)
+
+    original_paths = {link['path'] for link in original_links if not should_skip_link(link['path'])}
+    corrected_paths = {link['path'] for link in corrected_links if not should_skip_link(link['path'])}
+
+    changes = {
+        'total_links': len(original_links),
+        'skipped_links': len([link for link in original_links if should_skip_link(link['path'])]),
+        'corrected_links': len(original_paths - corrected_paths),
+        'new_links': len(corrected_paths - original_paths)
+    }
+
+    return changes
+
+
+def correct_links_in_content(content: str, file_path: str) -> Tuple[str, Dict]:
+    """
+    Correct links in markdown content using LLM.
+
+    Args:
+        content: Markdown content to process
+        file_path: Path to the file being processed
+
+    Returns:
+        Tuple of (corrected_content, statistics)
+    """
+    # Extract links for analysis
+    links = extract_markdown_links(content)
+
+    # Filter for links that need processing
+    processable_links = [link for link in links if not should_skip_link(link['path'])]
+
+    if not processable_links:
+        # No links to process
+        return content, {
+            'total_links': len(links),
+            'processable_links': 0,
+            'corrected_links': 0,
+            'llm_called': False
+        }
+
+    # Get file context
+    context = get_file_context(file_path)
+
+    # Call LLM for correction
+    corrected_content = call_llm_for_link_correction(content, context)
+
+    # Validate changes
+    changes = validate_corrected_links(content, corrected_content)
+    changes.update({
+        'processable_links': len(processable_links),
+        'llm_called': True
+    })
+
+    return corrected_content, changes
+
+
+def main():
+    """Main entry point."""
+    parser = argparse.ArgumentParser(
+        description='LLM-based markdown link correction',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Correct links in a file
+  cat README.md | correct_links_llm.py --file README.md
+
+  # Process multiple files
+  find docs -name "*.md" -exec correct_links_llm.py --file {} \\;
+
+  # Show statistics only
+  cat file.md | correct_links_llm.py --file file.md --stats-only
+        """
+    )
+
+    parser.add_argument(
+        '--file',
+        required=True,
+        help='Path to the file being processed (required for context)'
+    )
+
+    parser.add_argument(
+        '--stats-only',
+        action='store_true',
+        help='Only show statistics, don\'t output corrected content'
+    )
+
+    parser.add_argument(
+        '--dry-run',
+        action='store_true',
+        help='Analyze without making changes'
+    )
+
+    args = parser.parse_args()
+
+    try:
+        # Read content from stdin
+        content = sys.stdin.read()
+
+        if not content.strip():
+            print("Error: No content provided on stdin", file=sys.stderr)
+            sys.exit(1)
+
+        # Correct links
+        corrected_content, stats = correct_links_in_content(content, args.file)
+
+        # Output statistics
+        if stats['llm_called']:
+            print(f"Link correction statistics for {args.file}:", file=sys.stderr)
+            print(f"  Total links: {stats['total_links']}", file=sys.stderr)
+            print(f"  Processable links: {stats['processable_links']}", file=sys.stderr)
+            print(f"  Corrected links: {stats['corrected_links']}", file=sys.stderr)
+            print(f"  Skipped links: {stats['skipped_links']}", file=sys.stderr)
+        else:
+            print(f"No links to process in {args.file}", file=sys.stderr)
+
+        # Output corrected content (unless stats-only)
+        if not args.stats_only:
+            print(corrected_content)
+
+    except KeyboardInterrupt:
+        print("\nInterrupted by user", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error: {e}", file=sys.stderr)
+        sys.exit(1)
+
+
+if __name__ == '__main__':
+    main()
--- a/project-migrate/scripts/project-migrate.sh
+++ b/project-migrate/scripts/project-migrate.sh