Files
gh-tenequm-claude-plugins-s…/skills/skill/references/skill-seekers-integration.md
2025-11-30 09:01:20 +08:00

8.2 KiB

Skill_Seekers Integration Guide

How skill-factory integrates with Skill_Seekers for automated skill creation.

What is Skill_Seekers?

Skill_Seekers is a Python tool (3,562★) that automatically converts:

  • Documentation websites → Claude skills
  • GitHub repositories → Claude skills
  • PDF files → Claude skills

Key features:

  • AST parsing for code analysis
  • OCR for scanned PDFs
  • Conflict detection (docs vs actual code)
  • MCP integration
  • 299 passing tests

Installation

One-Command Install

~/Projects/claude-skills/skill-factory/skill/scripts/install-skill-seekers.sh

Manual Install

# Clone
git clone https://github.com/yusufkaraaslan/Skill_Seekers ~/Skill_Seekers

# Install dependencies
cd ~/Skill_Seekers
pip install -r requirements.txt

# Optional: MCP integration
./setup_mcp.sh

Verify Installation

cd ~/Skill_Seekers
python3 -c "import cli.doc_scraper" && echo "✅ Installed correctly"

Usage from skill-factory

skill-factory automatically uses Skill_Seekers when appropriate.

Automatic detection:

User: "Create React skill from react.dev"
      ↓
skill-factory detects documentation source
      ↓
Automatically runs Skill_Seekers
      ↓
Post-processes output
      ↓
Quality checks
      ↓
Delivers result

Integration Points

1. Automatic Installation Check

Before using Skill_Seekers:

def check_skill_seekers():
    seekers_path = os.environ.get('SKILL_SEEKERS_PATH', f'{HOME}/Skill_Seekers')

    if not os.path.exists(seekers_path):
        print("Skill_Seekers not found. Install? (y/n)")
        if input().lower() == 'y':
            install_skill_seekers()
        else:
            return False

    # Verify dependencies
    try:
        subprocess.run(
            ['python3', '-c', 'import cli.doc_scraper'],
            cwd=seekers_path,
            check=True,
            capture_output=True
        )
        return True
    except:
        print("Dependencies missing. Installing...")
        install_dependencies(seekers_path)
        return True

2. Scraping with Optimal Settings

def scrape_documentation(url: str, skill_name: str):
    seekers_path = get_seekers_path()

    # Optimal settings for Claude skills
    cmd = [
        'python3', 'cli/doc_scraper.py',
        '--url', url,
        '--name', skill_name,
        '--async',  # 2-3x faster
        '--output', f'{seekers_path}/output/{skill_name}'
    ]

    # Run with progress monitoring
    process = subprocess.Popen(
        cmd,
        cwd=seekers_path,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT
    )

    for line in process.stdout:
        # Show progress to user
        print(f"   {line.decode().strip()}")

    return f'{seekers_path}/output/{skill_name}'

3. Post-Processing Output

Skill_Seekers output needs enhancement for Claude compatibility:

def post_process_skill_seekers_output(output_dir):
    skill_path = f'{output_dir}/SKILL.md'

    # Load skill
    skill = load_skill(skill_path)

    # Enhancements
    enhancements = []

    # 1. Check frontmatter
    if not has_proper_frontmatter(skill):
        skill = add_frontmatter(skill)
        enhancements.append("Added proper YAML frontmatter")

    # 2. Check description specificity
    if is_description_generic(skill):
        skill = improve_description(skill)
        enhancements.append("Improved description specificity")

    # 3. Check examples
    example_count = count_code_blocks(skill)
    if example_count < 5:
        # Extract more from scraped data
        skill = extract_more_examples(skill, output_dir)
        enhancements.append(f"Added {count_code_blocks(skill) - example_count} more examples")

    # 4. Apply progressive disclosure if needed
    if count_lines(skill) > 500:
        skill = apply_progressive_disclosure(skill)
        enhancements.append("Applied progressive disclosure")

    # Save enhanced skill
    save_skill(skill_path, skill)

    return skill_path, enhancements

4. Quality Scoring

def quality_check_seekers_output(skill_path):
    # Score against Anthropic best practices
    score, issues = score_skill(skill_path)

    print(f"📊 Initial quality: {score}/10")

    if score < 8.0:
        print(f"   ⚠️  Issues: {len(issues)}")
        for issue in issues:
            print(f"       - {issue}")

    return score, issues

Supported Documentation Sources

Documentation Websites

Common frameworks:

Usage:

scrape_documentation('https://react.dev', 'react-development')

GitHub Repositories

Example:

scrape_github_repo('facebook/react', 'react-internals')

Features:

  • AST parsing for actual API
  • Conflict detection vs docs
  • README extraction
  • Issues/PR analysis
  • CHANGELOG parsing

PDF Files

Example:

scrape_pdf('/path/to/manual.pdf', 'api-manual')

Features:

  • Text extraction
  • OCR for scanned pages
  • Table extraction
  • Code block detection
  • Image extraction

Configuration

Environment Variables

# Skill_Seekers location
export SKILL_SEEKERS_PATH="$HOME/Skill_Seekers"

# Cache behavior
export SKILL_SEEKERS_NO_CACHE="true"  # For "latest" requests

# Output location
export SKILL_SEEKERS_OUTPUT="$HOME/.claude/skills"

Custom Presets

Skill_Seekers has presets for common frameworks:

presets = {
    'react': {
        'url': 'https://react.dev',
        'selectors': {'main_content': 'article'},
        'categories': ['components', 'hooks', 'api']
    },
    'rust': {
        'url_pattern': 'https://docs.rs/{crate}',
        'type': 'rust_docs'
    }
    # ... more presets
}

Performance

Typical scraping times:

Documentation Size Sync Mode Async Mode
Small (100-500 pages) 15-30 min 5-10 min
Medium (500-2K pages) 30-60 min 10-20 min
Large (10K+ pages) 60-120 min 20-40 min

Always use --async flag (2-3x faster)

Troubleshooting

Skill_Seekers Not Found

# Check installation
ls ~/Skill_Seekers

# If missing, install
scripts/install-skill-seekers.sh

Dependencies Missing

cd ~/Skill_Seekers
pip install -r requirements.txt

Python Version Error

Skill_Seekers requires Python 3.10+:

python3 --version  # Should be 3.10 or higher

Scraping Fails

Check selectors in configuration:

# If default selectors don't work
python3 cli/doc_scraper.py \
    --url https://example.com \
    --name example \
    --selector "main" \  # Custom selector
    --async

Advanced Features

Conflict Detection

When combining docs + GitHub:

scrape_multi_source({
    'docs': 'https://react.dev',
    'github': 'facebook/react'
}, 'react-complete')

# Outputs:
# - Documented APIs
# - Actual code APIs
# - ⚠️  Conflicts highlighted
# - Side-by-side comparison

MCP Integration

If Skill_Seekers MCP is installed:

User (in Claude Code): "Generate React skill from react.dev"

Claude automatically uses Skill_Seekers MCP server

Quality Enhancement Loop

After Skill_Seekers scraping:

1. Scrape with Skill_Seekers  Initial skill
2. Quality check  Score: 7.4/10
3. Apply enhancements  Fix issues
4. Re-check  Score: 8.2/10 
5. Test with scenarios
6. Deliver

When NOT to Use Skill_Seekers

Don't use for:

  • Custom workflows (no docs to scrape)
  • Company-specific processes
  • Novel methodologies
  • Skills requiring original thinking

Use manual TDD approach instead (Path B).

Source

Integration built on Skill_Seekers v2.0.0

  • MIT License
  • 3,562 stars
  • Active maintenance
  • 299 passing tests

Quick Reference

# Check installation
scripts/check-skill-seekers.sh

# Install
scripts/install-skill-seekers.sh

# Scrape documentation
scripts/run-automated.sh <url> <skill-name>

# Scrape GitHub
scripts/run-github-scrape.sh <org/repo> <skill-name>

# Scrape PDF
scripts/run-pdf-scrape.sh <pdf-path> <skill-name>