390 lines
8.2 KiB
Markdown
390 lines
8.2 KiB
Markdown
# Skill_Seekers Integration Guide
|
|
|
|
How skill-factory integrates with Skill_Seekers for automated skill creation.
|
|
|
|
## What is Skill_Seekers?
|
|
|
|
[Skill_Seekers](https://github.com/yusufkaraaslan/Skill_Seekers) is a Python tool (3,562★) that automatically converts:
|
|
- Documentation websites → Claude skills
|
|
- GitHub repositories → Claude skills
|
|
- PDF files → Claude skills
|
|
|
|
**Key features:**
|
|
- AST parsing for code analysis
|
|
- OCR for scanned PDFs
|
|
- Conflict detection (docs vs actual code)
|
|
- MCP integration
|
|
- 299 passing tests
|
|
|
|
## Installation
|
|
|
|
### One-Command Install
|
|
|
|
```bash
|
|
~/Projects/claude-skills/skill-factory/skill/scripts/install-skill-seekers.sh
|
|
```
|
|
|
|
### Manual Install
|
|
|
|
```bash
|
|
# Clone
|
|
git clone https://github.com/yusufkaraaslan/Skill_Seekers ~/Skill_Seekers
|
|
|
|
# Install dependencies
|
|
cd ~/Skill_Seekers
|
|
pip install -r requirements.txt
|
|
|
|
# Optional: MCP integration
|
|
./setup_mcp.sh
|
|
```
|
|
|
|
### Verify Installation
|
|
|
|
```bash
|
|
cd ~/Skill_Seekers
|
|
python3 -c "import cli.doc_scraper" && echo "✅ Installed correctly"
|
|
```
|
|
|
|
## Usage from skill-factory
|
|
|
|
skill-factory automatically uses Skill_Seekers when appropriate.
|
|
|
|
**Automatic detection:**
|
|
```
|
|
User: "Create React skill from react.dev"
|
|
↓
|
|
skill-factory detects documentation source
|
|
↓
|
|
Automatically runs Skill_Seekers
|
|
↓
|
|
Post-processes output
|
|
↓
|
|
Quality checks
|
|
↓
|
|
Delivers result
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
### 1. Automatic Installation Check
|
|
|
|
Before using Skill_Seekers:
|
|
```python
|
|
def check_skill_seekers():
|
|
seekers_path = os.environ.get('SKILL_SEEKERS_PATH', f'{HOME}/Skill_Seekers')
|
|
|
|
if not os.path.exists(seekers_path):
|
|
print("Skill_Seekers not found. Install? (y/n)")
|
|
if input().lower() == 'y':
|
|
install_skill_seekers()
|
|
else:
|
|
return False
|
|
|
|
# Verify dependencies
|
|
try:
|
|
subprocess.run(
|
|
['python3', '-c', 'import cli.doc_scraper'],
|
|
cwd=seekers_path,
|
|
check=True,
|
|
capture_output=True
|
|
)
|
|
return True
|
|
except:
|
|
print("Dependencies missing. Installing...")
|
|
install_dependencies(seekers_path)
|
|
return True
|
|
```
|
|
|
|
### 2. Scraping with Optimal Settings
|
|
|
|
```python
|
|
def scrape_documentation(url: str, skill_name: str):
|
|
seekers_path = get_seekers_path()
|
|
|
|
# Optimal settings for Claude skills
|
|
cmd = [
|
|
'python3', 'cli/doc_scraper.py',
|
|
'--url', url,
|
|
'--name', skill_name,
|
|
'--async', # 2-3x faster
|
|
'--output', f'{seekers_path}/output/{skill_name}'
|
|
]
|
|
|
|
# Run with progress monitoring
|
|
process = subprocess.Popen(
|
|
cmd,
|
|
cwd=seekers_path,
|
|
stdout=subprocess.PIPE,
|
|
stderr=subprocess.STDOUT
|
|
)
|
|
|
|
for line in process.stdout:
|
|
# Show progress to user
|
|
print(f" {line.decode().strip()}")
|
|
|
|
return f'{seekers_path}/output/{skill_name}'
|
|
```
|
|
|
|
### 3. Post-Processing Output
|
|
|
|
Skill_Seekers output needs enhancement for Claude compatibility:
|
|
|
|
```python
|
|
def post_process_skill_seekers_output(output_dir):
|
|
skill_path = f'{output_dir}/SKILL.md'
|
|
|
|
# Load skill
|
|
skill = load_skill(skill_path)
|
|
|
|
# Enhancements
|
|
enhancements = []
|
|
|
|
# 1. Check frontmatter
|
|
if not has_proper_frontmatter(skill):
|
|
skill = add_frontmatter(skill)
|
|
enhancements.append("Added proper YAML frontmatter")
|
|
|
|
# 2. Check description specificity
|
|
if is_description_generic(skill):
|
|
skill = improve_description(skill)
|
|
enhancements.append("Improved description specificity")
|
|
|
|
# 3. Check examples
|
|
example_count = count_code_blocks(skill)
|
|
if example_count < 5:
|
|
# Extract more from scraped data
|
|
skill = extract_more_examples(skill, output_dir)
|
|
enhancements.append(f"Added {count_code_blocks(skill) - example_count} more examples")
|
|
|
|
# 4. Apply progressive disclosure if needed
|
|
if count_lines(skill) > 500:
|
|
skill = apply_progressive_disclosure(skill)
|
|
enhancements.append("Applied progressive disclosure")
|
|
|
|
# Save enhanced skill
|
|
save_skill(skill_path, skill)
|
|
|
|
return skill_path, enhancements
|
|
```
|
|
|
|
### 4. Quality Scoring
|
|
|
|
```python
|
|
def quality_check_seekers_output(skill_path):
|
|
# Score against Anthropic best practices
|
|
score, issues = score_skill(skill_path)
|
|
|
|
print(f"📊 Initial quality: {score}/10")
|
|
|
|
if score < 8.0:
|
|
print(f" ⚠️ Issues: {len(issues)}")
|
|
for issue in issues:
|
|
print(f" - {issue}")
|
|
|
|
return score, issues
|
|
```
|
|
|
|
## Supported Documentation Sources
|
|
|
|
### Documentation Websites
|
|
|
|
**Common frameworks:**
|
|
- React: https://react.dev
|
|
- Vue: https://vuejs.org
|
|
- Django: https://docs.djangoproject.com
|
|
- FastAPI: https://fastapi.tiangolo.com
|
|
- Rust docs: https://docs.rs/[crate]
|
|
|
|
**Usage:**
|
|
```python
|
|
scrape_documentation('https://react.dev', 'react-development')
|
|
```
|
|
|
|
### GitHub Repositories
|
|
|
|
**Example:**
|
|
```python
|
|
scrape_github_repo('facebook/react', 'react-internals')
|
|
```
|
|
|
|
Features:
|
|
- AST parsing for actual API
|
|
- Conflict detection vs docs
|
|
- README extraction
|
|
- Issues/PR analysis
|
|
- CHANGELOG parsing
|
|
|
|
### PDF Files
|
|
|
|
**Example:**
|
|
```python
|
|
scrape_pdf('/path/to/manual.pdf', 'api-manual')
|
|
```
|
|
|
|
Features:
|
|
- Text extraction
|
|
- OCR for scanned pages
|
|
- Table extraction
|
|
- Code block detection
|
|
- Image extraction
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Skill_Seekers location
|
|
export SKILL_SEEKERS_PATH="$HOME/Skill_Seekers"
|
|
|
|
# Cache behavior
|
|
export SKILL_SEEKERS_NO_CACHE="true" # For "latest" requests
|
|
|
|
# Output location
|
|
export SKILL_SEEKERS_OUTPUT="$HOME/.claude/skills"
|
|
```
|
|
|
|
### Custom Presets
|
|
|
|
Skill_Seekers has presets for common frameworks:
|
|
```python
|
|
presets = {
|
|
'react': {
|
|
'url': 'https://react.dev',
|
|
'selectors': {'main_content': 'article'},
|
|
'categories': ['components', 'hooks', 'api']
|
|
},
|
|
'rust': {
|
|
'url_pattern': 'https://docs.rs/{crate}',
|
|
'type': 'rust_docs'
|
|
}
|
|
# ... more presets
|
|
}
|
|
```
|
|
|
|
## Performance
|
|
|
|
Typical scraping times:
|
|
|
|
| Documentation Size | Sync Mode | Async Mode |
|
|
|-------------------|-----------|------------|
|
|
| Small (100-500 pages) | 15-30 min | 5-10 min |
|
|
| Medium (500-2K pages) | 30-60 min | 10-20 min |
|
|
| Large (10K+ pages) | 60-120 min | 20-40 min |
|
|
|
|
**Always use `--async` flag** (2-3x faster)
|
|
|
|
## Troubleshooting
|
|
|
|
### Skill_Seekers Not Found
|
|
|
|
```bash
|
|
# Check installation
|
|
ls ~/Skill_Seekers
|
|
|
|
# If missing, install
|
|
scripts/install-skill-seekers.sh
|
|
```
|
|
|
|
### Dependencies Missing
|
|
|
|
```bash
|
|
cd ~/Skill_Seekers
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Python Version Error
|
|
|
|
Skill_Seekers requires Python 3.10+:
|
|
```bash
|
|
python3 --version # Should be 3.10 or higher
|
|
```
|
|
|
|
### Scraping Fails
|
|
|
|
Check selectors in configuration:
|
|
```python
|
|
# If default selectors don't work
|
|
python3 cli/doc_scraper.py \
|
|
--url https://example.com \
|
|
--name example \
|
|
--selector "main" \ # Custom selector
|
|
--async
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Conflict Detection
|
|
|
|
When combining docs + GitHub:
|
|
```python
|
|
scrape_multi_source({
|
|
'docs': 'https://react.dev',
|
|
'github': 'facebook/react'
|
|
}, 'react-complete')
|
|
|
|
# Outputs:
|
|
# - Documented APIs
|
|
# - Actual code APIs
|
|
# - ⚠️ Conflicts highlighted
|
|
# - Side-by-side comparison
|
|
```
|
|
|
|
### MCP Integration
|
|
|
|
If Skill_Seekers MCP is installed:
|
|
```
|
|
User (in Claude Code): "Generate React skill from react.dev"
|
|
|
|
Claude automatically uses Skill_Seekers MCP server
|
|
```
|
|
|
|
## Quality Enhancement Loop
|
|
|
|
After Skill_Seekers scraping:
|
|
|
|
```python
|
|
1. Scrape with Skill_Seekers → Initial skill
|
|
2. Quality check → Score: 7.4/10
|
|
3. Apply enhancements → Fix issues
|
|
4. Re-check → Score: 8.2/10 ✅
|
|
5. Test with scenarios
|
|
6. Deliver
|
|
```
|
|
|
|
## When NOT to Use Skill_Seekers
|
|
|
|
Don't use for:
|
|
- Custom workflows (no docs to scrape)
|
|
- Company-specific processes
|
|
- Novel methodologies
|
|
- Skills requiring original thinking
|
|
|
|
Use manual TDD approach instead (Path B).
|
|
|
|
## Source
|
|
|
|
Integration built on [Skill_Seekers v2.0.0](https://github.com/yusufkaraaslan/Skill_Seekers)
|
|
- MIT License
|
|
- 3,562 stars
|
|
- Active maintenance
|
|
- 299 passing tests
|
|
|
|
## Quick Reference
|
|
|
|
```bash
|
|
# Check installation
|
|
scripts/check-skill-seekers.sh
|
|
|
|
# Install
|
|
scripts/install-skill-seekers.sh
|
|
|
|
# Scrape documentation
|
|
scripts/run-automated.sh <url> <skill-name>
|
|
|
|
# Scrape GitHub
|
|
scripts/run-github-scrape.sh <org/repo> <skill-name>
|
|
|
|
# Scrape PDF
|
|
scripts/run-pdf-scrape.sh <pdf-path> <skill-name>
|
|
```
|