Initial commit
This commit is contained in:
379
skills/skill/references/quality-loops.md
Normal file
379
skills/skill/references/quality-loops.md
Normal file
@@ -0,0 +1,379 @@
|
||||
# Quality Assurance Loops
|
||||
|
||||
How skill-factory ensures every skill meets minimum quality standards.
|
||||
|
||||
## Quality Scoring (Anthropic Best Practices)
|
||||
|
||||
Based on official Anthropic guidelines, total possible: 10.0 points
|
||||
|
||||
### Scoring Criteria
|
||||
|
||||
| Criterion | Weight | What to Check |
|
||||
|-----------|--------|---------------|
|
||||
| Description Quality | 2.0 | Specific, includes when_to_use, third-person |
|
||||
| Name Convention | 0.5 | Lowercase, hyphens, descriptive |
|
||||
| Conciseness | 1.5 | <500 lines OR progressive disclosure |
|
||||
| Progressive Disclosure | 1.0 | Reference files for details |
|
||||
| Examples & Workflows | 1.0 | Concrete code samples |
|
||||
| Degree of Freedom | 0.5 | Appropriate for task type |
|
||||
| Dependencies | 0.5 | Documented and verified |
|
||||
| Structure | 1.0 | Well-organized sections |
|
||||
| Error Handling | 0.5 | Scripts handle errors |
|
||||
| Anti-Patterns | 1.0 | No time-sensitive info, consistent terminology |
|
||||
| Testing | 0.5 | Evidence of testing |
|
||||
|
||||
## Enhancement Loop Algorithm
|
||||
|
||||
```python
|
||||
def quality_assurance_loop(skill_path: str, min_score: float = 8.0) -> Skill:
|
||||
"""
|
||||
Iteratively improve skill until it meets quality threshold.
|
||||
Max iterations: 5 (prevents infinite loops)
|
||||
"""
|
||||
max_iterations = 5
|
||||
iteration = 0
|
||||
|
||||
while iteration < max_iterations:
|
||||
# Score skill
|
||||
score, issues = score_skill(skill_path)
|
||||
|
||||
print(f"📊 Quality check: {score}/10")
|
||||
|
||||
if score >= min_score:
|
||||
print(f"✅ Quality threshold met ({score} >= {min_score})")
|
||||
return load_skill(skill_path)
|
||||
|
||||
# Report issues
|
||||
print(f" ⚠️ Issues found:")
|
||||
for issue in issues:
|
||||
print(f" - {issue.description}")
|
||||
|
||||
# Apply fixes
|
||||
print(f"🔧 Enhancing skill...")
|
||||
skill = apply_fixes(skill_path, issues)
|
||||
|
||||
iteration += 1
|
||||
|
||||
# If we hit max iterations without reaching threshold
|
||||
if score < min_score:
|
||||
print(f"⚠️ Quality score {score} below threshold after {max_iterations} iterations")
|
||||
print(f" Manual review recommended")
|
||||
return load_skill(skill_path)
|
||||
|
||||
return load_skill(skill_path)
|
||||
```
|
||||
|
||||
## Fix Strategies
|
||||
|
||||
### Issue: Description Too Generic
|
||||
|
||||
**Detection:**
|
||||
```python
|
||||
def check_description(skill):
|
||||
desc = skill.frontmatter.description
|
||||
if len(desc) < 50:
|
||||
return Issue("Description too short (< 50 chars)")
|
||||
if not contains_specifics(desc):
|
||||
return Issue("Description lacks specifics")
|
||||
if "help" in desc.lower() or "tool" in desc.lower():
|
||||
return Issue("Description too vague")
|
||||
return None
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```python
|
||||
def fix_description(skill):
|
||||
# Extract key topics from skill content
|
||||
topics = extract_topics(skill.content)
|
||||
|
||||
# Generate specific description
|
||||
desc = f"Comprehensive guide for {skill.name} covering "
|
||||
desc += ", ".join(topics[:3])
|
||||
desc += f". Use when working with {topics[0]} "
|
||||
desc += f"and need {', '.join(topics[1:3])}"
|
||||
|
||||
skill.frontmatter.description = desc
|
||||
return skill
|
||||
```
|
||||
|
||||
### Issue: Missing Examples
|
||||
|
||||
**Detection:**
|
||||
```python
|
||||
def check_examples(skill):
|
||||
code_blocks = count_code_blocks(skill.content)
|
||||
if code_blocks < 3:
|
||||
return Issue(f"Only {code_blocks} code examples (recommend 5+)")
|
||||
return None
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```python
|
||||
def add_examples(skill, source_docs=None):
|
||||
if source_docs:
|
||||
# Extract from documentation
|
||||
examples = extract_code_examples(source_docs)
|
||||
else:
|
||||
# Generate from skill content
|
||||
examples = generate_examples_from_topics(skill)
|
||||
|
||||
# Add examples section
|
||||
if "## Examples" not in skill.content:
|
||||
skill.content += "\n\n## Examples\n\n"
|
||||
|
||||
for ex in examples[:5]: # Add top 5 examples
|
||||
skill.content += f"### {ex.title}\n\n"
|
||||
skill.content += f"```{ex.language}\n{ex.code}\n```\n\n"
|
||||
if ex.explanation:
|
||||
skill.content += f"{ex.explanation}\n\n"
|
||||
|
||||
return skill
|
||||
```
|
||||
|
||||
### Issue: Too Long (> 500 lines)
|
||||
|
||||
**Detection:**
|
||||
```python
|
||||
def check_length(skill):
|
||||
line_count = count_lines(skill.content)
|
||||
if line_count > 500:
|
||||
return Issue(f"SKILL.md is {line_count} lines (recommend <500)")
|
||||
return None
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```python
|
||||
def apply_progressive_disclosure(skill):
|
||||
# Identify sections that can be moved to references
|
||||
movable_sections = find_detail_sections(skill.content)
|
||||
|
||||
skill.references = {}
|
||||
|
||||
for section in movable_sections:
|
||||
# Create reference file
|
||||
ref_name = slugify(section.title)
|
||||
ref_path = f"references/{ref_name}.md"
|
||||
|
||||
# Move content
|
||||
skill.references[ref_name] = section.content
|
||||
|
||||
# Replace with reference
|
||||
skill.content = skill.content.replace(
|
||||
section.full_text,
|
||||
f"See {ref_path} for detailed {section.title.lower()}."
|
||||
)
|
||||
|
||||
return skill
|
||||
```
|
||||
|
||||
### Issue: Poor Structure
|
||||
|
||||
**Detection:**
|
||||
```python
|
||||
def check_structure(skill):
|
||||
issues = []
|
||||
|
||||
# Check for required sections
|
||||
required = ["## Overview", "## Usage", "## Examples"]
|
||||
for section in required:
|
||||
if section not in skill.content:
|
||||
issues.append(f"Missing {section}")
|
||||
|
||||
# Check heading hierarchy
|
||||
if has_heading_skips(skill.content):
|
||||
issues.append("Heading hierarchy skips levels")
|
||||
|
||||
# Check for TOC if long
|
||||
if count_lines(skill.content) > 200 and "## Table of Contents" not in skill.content:
|
||||
issues.append("Long skill missing table of contents")
|
||||
|
||||
return issues if issues else None
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```python
|
||||
def fix_structure(skill, issues):
|
||||
# Add missing sections
|
||||
if "Missing ## Overview" in issues:
|
||||
overview = generate_overview(skill)
|
||||
skill.content = insert_after_frontmatter(skill.content, overview)
|
||||
|
||||
if "Missing ## Usage" in issues:
|
||||
usage = generate_usage_section(skill)
|
||||
skill.content = insert_before_examples(skill.content, usage)
|
||||
|
||||
# Fix heading hierarchy
|
||||
if "Heading hierarchy" in str(issues):
|
||||
skill.content = normalize_headings(skill.content)
|
||||
|
||||
# Add TOC if needed
|
||||
if "missing table of contents" in str(issues):
|
||||
toc = generate_toc(skill.content)
|
||||
skill.content = insert_toc(skill.content, toc)
|
||||
|
||||
return skill
|
||||
```
|
||||
|
||||
### Issue: Vague/Generic Content
|
||||
|
||||
**Detection:**
|
||||
```python
|
||||
def check_specificity(skill):
|
||||
vague_phrases = [
|
||||
"you can", "might want to", "it's possible",
|
||||
"there are various", "several options",
|
||||
"many ways to", "different approaches"
|
||||
]
|
||||
|
||||
content_lower = skill.content.lower()
|
||||
vague_count = sum(1 for phrase in vague_phrases if phrase in content_lower)
|
||||
|
||||
if vague_count > 10:
|
||||
return Issue(f"Too many vague phrases ({vague_count})")
|
||||
|
||||
return None
|
||||
```
|
||||
|
||||
**Fix:**
|
||||
```python
|
||||
def improve_specificity(skill):
|
||||
# Replace vague with specific
|
||||
replacements = {
|
||||
"you can": "Use",
|
||||
"might want to": "Should",
|
||||
"there are various": "Three main approaches:",
|
||||
"several options": "Options:",
|
||||
"many ways to": "Primary methods:",
|
||||
}
|
||||
|
||||
for vague, specific in replacements.items():
|
||||
skill.content = skill.content.replace(vague, specific)
|
||||
|
||||
return skill
|
||||
```
|
||||
|
||||
## Testing Integration
|
||||
|
||||
After each enhancement, run tests:
|
||||
|
||||
```python
|
||||
def enhance_and_test(skill):
|
||||
while score < min_score:
|
||||
# Enhance
|
||||
skill = apply_enhancements(skill)
|
||||
|
||||
# Score
|
||||
score = calculate_score(skill)
|
||||
|
||||
# Test
|
||||
test_results = run_tests(skill)
|
||||
|
||||
if not test_results.all_passed():
|
||||
# Tests revealed new issues
|
||||
issues = test_results.get_failures()
|
||||
skill = fix_test_failures(skill, issues)
|
||||
|
||||
return skill
|
||||
```
|
||||
|
||||
## Progress Reporting
|
||||
|
||||
User sees:
|
||||
|
||||
```
|
||||
📊 Quality check: 7.4/10
|
||||
⚠️ Issues found:
|
||||
- Description too generic
|
||||
- Missing examples in 4 sections
|
||||
- Some outdated patterns detected
|
||||
|
||||
🔧 Enhancing skill...
|
||||
✏️ Improving description... ✅
|
||||
📝 Adding code examples... ✅
|
||||
🔄 Updating patterns... ✅
|
||||
|
||||
📊 Quality check: 8.9/10 ✅
|
||||
```
|
||||
|
||||
Internal execution:
|
||||
|
||||
```python
|
||||
issues = [
|
||||
Issue("description_generic", fix=fix_description),
|
||||
Issue("missing_examples", fix=add_examples, count=4),
|
||||
Issue("outdated_patterns", fix=update_patterns)
|
||||
]
|
||||
|
||||
for issue in issues:
|
||||
print(f" {issue.icon} {issue.action}... ", end="")
|
||||
skill = issue.fix(skill)
|
||||
print("✅")
|
||||
```
|
||||
|
||||
## Quality Metrics Dashboard
|
||||
|
||||
After completion:
|
||||
|
||||
```
|
||||
📊 Final Quality Report
|
||||
|
||||
Anthropic Best Practices Score: 8.9/10
|
||||
|
||||
Breakdown:
|
||||
✅ Description Quality: 2.0/2.0 (Excellent)
|
||||
✅ Name Convention: 0.5/0.5 (Correct)
|
||||
✅ Conciseness: 1.4/1.5 (Good - 420 lines)
|
||||
✅ Progressive Disclosure: 1.0/1.0 (Excellent - 3 reference files)
|
||||
✅ Examples & Workflows: 1.0/1.0 (12 code examples)
|
||||
✅ Degree of Freedom: 0.5/0.5 (Appropriate)
|
||||
✅ Dependencies: 0.5/0.5 (Documented)
|
||||
✅ Structure: 1.0/1.0 (Well-organized)
|
||||
✅ Error Handling: 0.5/0.5 (N/A for doc skill)
|
||||
✅ Anti-Patterns: 0.5/1.0 (Minor: 2 time refs)
|
||||
✅ Testing: 0.5/0.5 (15/15 tests passing)
|
||||
|
||||
Recommendations:
|
||||
⚠️ Remove 2 time-sensitive references for 1.0/1.0 on anti-patterns
|
||||
```
|
||||
|
||||
## Failure Modes
|
||||
|
||||
### Can't Reach Threshold
|
||||
|
||||
If after 5 iterations score is still < 8.0:
|
||||
|
||||
```
|
||||
⚠️ Quality score 7.8 after 5 iterations
|
||||
|
||||
Blocking issues:
|
||||
- Source documentation lacks code examples
|
||||
- Framework has limited reference material
|
||||
|
||||
Recommendations:
|
||||
1. Manual examples needed (auto-generation limited)
|
||||
2. Consider hybrid approach with custom content
|
||||
3. Lower quality threshold to 7.5 for this specific case
|
||||
|
||||
Continue with current skill? (y/n)
|
||||
```
|
||||
|
||||
### Conflicting Requirements
|
||||
|
||||
```
|
||||
⚠️ Conflicting requirements detected
|
||||
|
||||
Issue: Comprehensive coverage (800 lines) vs Conciseness (<500 lines)
|
||||
|
||||
Resolution: Applying progressive disclosure
|
||||
- Main SKILL.md: 380 lines (overview + quick ref)
|
||||
- Reference files: 5 files with detailed content
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
Quality loops ensure:
|
||||
1. Every skill scores >= threshold (default 8.0)
|
||||
2. Anthropic best practices followed
|
||||
3. Automatic fixes applied
|
||||
4. Tests pass
|
||||
5. User sees progress, not complexity
|
||||
Reference in New Issue
Block a user