Files
gh-tenequm-claude-plugins-s…/skills/skill/references/skill-seekers-integration.md
2025-11-30 09:01:20 +08:00

390 lines
8.2 KiB
Markdown

# Skill_Seekers Integration Guide
How skill-factory integrates with Skill_Seekers for automated skill creation.
## What is Skill_Seekers?
[Skill_Seekers](https://github.com/yusufkaraaslan/Skill_Seekers) is a Python tool (3,562★) that automatically converts:
- Documentation websites → Claude skills
- GitHub repositories → Claude skills
- PDF files → Claude skills
**Key features:**
- AST parsing for code analysis
- OCR for scanned PDFs
- Conflict detection (docs vs actual code)
- MCP integration
- 299 passing tests
## Installation
### One-Command Install
```bash
~/Projects/claude-skills/skill-factory/skill/scripts/install-skill-seekers.sh
```
### Manual Install
```bash
# Clone
git clone https://github.com/yusufkaraaslan/Skill_Seekers ~/Skill_Seekers
# Install dependencies
cd ~/Skill_Seekers
pip install -r requirements.txt
# Optional: MCP integration
./setup_mcp.sh
```
### Verify Installation
```bash
cd ~/Skill_Seekers
python3 -c "import cli.doc_scraper" && echo "✅ Installed correctly"
```
## Usage from skill-factory
skill-factory automatically uses Skill_Seekers when appropriate.
**Automatic detection:**
```
User: "Create React skill from react.dev"
skill-factory detects documentation source
Automatically runs Skill_Seekers
Post-processes output
Quality checks
Delivers result
```
## Integration Points
### 1. Automatic Installation Check
Before using Skill_Seekers:
```python
def check_skill_seekers():
seekers_path = os.environ.get('SKILL_SEEKERS_PATH', f'{HOME}/Skill_Seekers')
if not os.path.exists(seekers_path):
print("Skill_Seekers not found. Install? (y/n)")
if input().lower() == 'y':
install_skill_seekers()
else:
return False
# Verify dependencies
try:
subprocess.run(
['python3', '-c', 'import cli.doc_scraper'],
cwd=seekers_path,
check=True,
capture_output=True
)
return True
except:
print("Dependencies missing. Installing...")
install_dependencies(seekers_path)
return True
```
### 2. Scraping with Optimal Settings
```python
def scrape_documentation(url: str, skill_name: str):
seekers_path = get_seekers_path()
# Optimal settings for Claude skills
cmd = [
'python3', 'cli/doc_scraper.py',
'--url', url,
'--name', skill_name,
'--async', # 2-3x faster
'--output', f'{seekers_path}/output/{skill_name}'
]
# Run with progress monitoring
process = subprocess.Popen(
cmd,
cwd=seekers_path,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT
)
for line in process.stdout:
# Show progress to user
print(f" {line.decode().strip()}")
return f'{seekers_path}/output/{skill_name}'
```
### 3. Post-Processing Output
Skill_Seekers output needs enhancement for Claude compatibility:
```python
def post_process_skill_seekers_output(output_dir):
skill_path = f'{output_dir}/SKILL.md'
# Load skill
skill = load_skill(skill_path)
# Enhancements
enhancements = []
# 1. Check frontmatter
if not has_proper_frontmatter(skill):
skill = add_frontmatter(skill)
enhancements.append("Added proper YAML frontmatter")
# 2. Check description specificity
if is_description_generic(skill):
skill = improve_description(skill)
enhancements.append("Improved description specificity")
# 3. Check examples
example_count = count_code_blocks(skill)
if example_count < 5:
# Extract more from scraped data
skill = extract_more_examples(skill, output_dir)
enhancements.append(f"Added {count_code_blocks(skill) - example_count} more examples")
# 4. Apply progressive disclosure if needed
if count_lines(skill) > 500:
skill = apply_progressive_disclosure(skill)
enhancements.append("Applied progressive disclosure")
# Save enhanced skill
save_skill(skill_path, skill)
return skill_path, enhancements
```
### 4. Quality Scoring
```python
def quality_check_seekers_output(skill_path):
# Score against Anthropic best practices
score, issues = score_skill(skill_path)
print(f"📊 Initial quality: {score}/10")
if score < 8.0:
print(f" ⚠️ Issues: {len(issues)}")
for issue in issues:
print(f" - {issue}")
return score, issues
```
## Supported Documentation Sources
### Documentation Websites
**Common frameworks:**
- React: https://react.dev
- Vue: https://vuejs.org
- Django: https://docs.djangoproject.com
- FastAPI: https://fastapi.tiangolo.com
- Rust docs: https://docs.rs/[crate]
**Usage:**
```python
scrape_documentation('https://react.dev', 'react-development')
```
### GitHub Repositories
**Example:**
```python
scrape_github_repo('facebook/react', 'react-internals')
```
Features:
- AST parsing for actual API
- Conflict detection vs docs
- README extraction
- Issues/PR analysis
- CHANGELOG parsing
### PDF Files
**Example:**
```python
scrape_pdf('/path/to/manual.pdf', 'api-manual')
```
Features:
- Text extraction
- OCR for scanned pages
- Table extraction
- Code block detection
- Image extraction
## Configuration
### Environment Variables
```bash
# Skill_Seekers location
export SKILL_SEEKERS_PATH="$HOME/Skill_Seekers"
# Cache behavior
export SKILL_SEEKERS_NO_CACHE="true" # For "latest" requests
# Output location
export SKILL_SEEKERS_OUTPUT="$HOME/.claude/skills"
```
### Custom Presets
Skill_Seekers has presets for common frameworks:
```python
presets = {
'react': {
'url': 'https://react.dev',
'selectors': {'main_content': 'article'},
'categories': ['components', 'hooks', 'api']
},
'rust': {
'url_pattern': 'https://docs.rs/{crate}',
'type': 'rust_docs'
}
# ... more presets
}
```
## Performance
Typical scraping times:
| Documentation Size | Sync Mode | Async Mode |
|-------------------|-----------|------------|
| Small (100-500 pages) | 15-30 min | 5-10 min |
| Medium (500-2K pages) | 30-60 min | 10-20 min |
| Large (10K+ pages) | 60-120 min | 20-40 min |
**Always use `--async` flag** (2-3x faster)
## Troubleshooting
### Skill_Seekers Not Found
```bash
# Check installation
ls ~/Skill_Seekers
# If missing, install
scripts/install-skill-seekers.sh
```
### Dependencies Missing
```bash
cd ~/Skill_Seekers
pip install -r requirements.txt
```
### Python Version Error
Skill_Seekers requires Python 3.10+:
```bash
python3 --version # Should be 3.10 or higher
```
### Scraping Fails
Check selectors in configuration:
```python
# If default selectors don't work
python3 cli/doc_scraper.py \
--url https://example.com \
--name example \
--selector "main" \ # Custom selector
--async
```
## Advanced Features
### Conflict Detection
When combining docs + GitHub:
```python
scrape_multi_source({
'docs': 'https://react.dev',
'github': 'facebook/react'
}, 'react-complete')
# Outputs:
# - Documented APIs
# - Actual code APIs
# - ⚠️ Conflicts highlighted
# - Side-by-side comparison
```
### MCP Integration
If Skill_Seekers MCP is installed:
```
User (in Claude Code): "Generate React skill from react.dev"
Claude automatically uses Skill_Seekers MCP server
```
## Quality Enhancement Loop
After Skill_Seekers scraping:
```python
1. Scrape with Skill_Seekers Initial skill
2. Quality check Score: 7.4/10
3. Apply enhancements Fix issues
4. Re-check Score: 8.2/10
5. Test with scenarios
6. Deliver
```
## When NOT to Use Skill_Seekers
Don't use for:
- Custom workflows (no docs to scrape)
- Company-specific processes
- Novel methodologies
- Skills requiring original thinking
Use manual TDD approach instead (Path B).
## Source
Integration built on [Skill_Seekers v2.0.0](https://github.com/yusufkaraaslan/Skill_Seekers)
- MIT License
- 3,562 stars
- Active maintenance
- 299 passing tests
## Quick Reference
```bash
# Check installation
scripts/check-skill-seekers.sh
# Install
scripts/install-skill-seekers.sh
# Scrape documentation
scripts/run-automated.sh <url> <skill-name>
# Scrape GitHub
scripts/run-github-scrape.sh <org/repo> <skill-name>
# Scrape PDF
scripts/run-pdf-scrape.sh <pdf-path> <skill-name>
```