Initial commit

2025-11-30 09:01:20 +08:00
commit 650cfdbc05
15 changed files with 4252 additions and 0 deletions
--- a/skills/skill/references/quality-loops.md
+++ b/skills/skill/references/quality-loops.md
@@ -0,0 +1,379 @@
+# Quality Assurance Loops
+
+How skill-factory ensures every skill meets minimum quality standards.
+
+## Quality Scoring (Anthropic Best Practices)
+
+Based on official Anthropic guidelines, total possible: 10.0 points
+
+### Scoring Criteria
+
+| Criterion | Weight | What to Check |
+|-----------|--------|---------------|
+| Description Quality | 2.0 | Specific, includes when_to_use, third-person |
+| Name Convention | 0.5 | Lowercase, hyphens, descriptive |
+| Conciseness | 1.5 | <500 lines OR progressive disclosure |
+| Progressive Disclosure | 1.0 | Reference files for details |
+| Examples & Workflows | 1.0 | Concrete code samples |
+| Degree of Freedom | 0.5 | Appropriate for task type |
+| Dependencies | 0.5 | Documented and verified |
+| Structure | 1.0 | Well-organized sections |
+| Error Handling | 0.5 | Scripts handle errors |
+| Anti-Patterns | 1.0 | No time-sensitive info, consistent terminology |
+| Testing | 0.5 | Evidence of testing |
+
+## Enhancement Loop Algorithm
+
+```python
+def quality_assurance_loop(skill_path: str, min_score: float = 8.0) -> Skill:
+    """
+    Iteratively improve skill until it meets quality threshold.
+    Max iterations: 5 (prevents infinite loops)
+    """
+    max_iterations = 5
+    iteration = 0
+
+    while iteration < max_iterations:
+        # Score skill
+        score, issues = score_skill(skill_path)
+
+        print(f"📊 Quality check: {score}/10")
+
+        if score >= min_score:
+            print(f"✅ Quality threshold met ({score} >= {min_score})")
+            return load_skill(skill_path)
+
+        # Report issues
+        print(f"   ⚠️  Issues found:")
+        for issue in issues:
+            print(f"       - {issue.description}")
+
+        # Apply fixes
+        print(f"🔧 Enhancing skill...")
+        skill = apply_fixes(skill_path, issues)
+
+        iteration += 1
+
+    # If we hit max iterations without reaching threshold
+    if score < min_score:
+        print(f"⚠️  Quality score {score} below threshold after {max_iterations} iterations")
+        print(f"   Manual review recommended")
+        return load_skill(skill_path)
+
+    return load_skill(skill_path)
+```
+
+## Fix Strategies
+
+### Issue: Description Too Generic
+
+**Detection:**
+```python
+def check_description(skill):
+    desc = skill.frontmatter.description
+    if len(desc) < 50:
+        return Issue("Description too short (< 50 chars)")
+    if not contains_specifics(desc):
+        return Issue("Description lacks specifics")
+    if "help" in desc.lower() or "tool" in desc.lower():
+        return Issue("Description too vague")
+    return None
+```
+
+**Fix:**
+```python
+def fix_description(skill):
+    # Extract key topics from skill content
+    topics = extract_topics(skill.content)
+
+    # Generate specific description
+    desc = f"Comprehensive guide for {skill.name} covering "
+    desc += ", ".join(topics[:3])
+    desc += f". Use when working with {topics[0]} "
+    desc += f"and need {', '.join(topics[1:3])}"
+
+    skill.frontmatter.description = desc
+    return skill
+```
+
+### Issue: Missing Examples
+
+**Detection:**
+```python
+def check_examples(skill):
+    code_blocks = count_code_blocks(skill.content)
+    if code_blocks < 3:
+        return Issue(f"Only {code_blocks} code examples (recommend 5+)")
+    return None
+```
+
+**Fix:**
+```python
+def add_examples(skill, source_docs=None):
+    if source_docs:
+        # Extract from documentation
+        examples = extract_code_examples(source_docs)
+    else:
+        # Generate from skill content
+        examples = generate_examples_from_topics(skill)
+
+    # Add examples section
+    if "## Examples" not in skill.content:
+        skill.content += "\n\n## Examples\n\n"
+
+    for ex in examples[:5]:  # Add top 5 examples
+        skill.content += f"### {ex.title}\n\n"
+        skill.content += f"```{ex.language}\n{ex.code}\n```\n\n"
+        if ex.explanation:
+            skill.content += f"{ex.explanation}\n\n"
+
+    return skill
+```
+
+### Issue: Too Long (> 500 lines)
+
+**Detection:**
+```python
+def check_length(skill):
+    line_count = count_lines(skill.content)
+    if line_count > 500:
+        return Issue(f"SKILL.md is {line_count} lines (recommend <500)")
+    return None
+```
+
+**Fix:**
+```python
+def apply_progressive_disclosure(skill):
+    # Identify sections that can be moved to references
+    movable_sections = find_detail_sections(skill.content)
+
+    skill.references = {}
+
+    for section in movable_sections:
+        # Create reference file
+        ref_name = slugify(section.title)
+        ref_path = f"references/{ref_name}.md"
+
+        # Move content
+        skill.references[ref_name] = section.content
+
+        # Replace with reference
+        skill.content = skill.content.replace(
+            section.full_text,
+            f"See {ref_path} for detailed {section.title.lower()}."
+        )
+
+    return skill
+```
+
+### Issue: Poor Structure
+
+**Detection:**
+```python
+def check_structure(skill):
+    issues = []
+
+    # Check for required sections
+    required = ["## Overview", "## Usage", "## Examples"]
+    for section in required:
+        if section not in skill.content:
+            issues.append(f"Missing {section}")
+
+    # Check heading hierarchy
+    if has_heading_skips(skill.content):
+        issues.append("Heading hierarchy skips levels")
+
+    # Check for TOC if long
+    if count_lines(skill.content) > 200 and "## Table of Contents" not in skill.content:
+        issues.append("Long skill missing table of contents")
+
+    return issues if issues else None
+```
+
+**Fix:**
+```python
+def fix_structure(skill, issues):
+    # Add missing sections
+    if "Missing ## Overview" in issues:
+        overview = generate_overview(skill)
+        skill.content = insert_after_frontmatter(skill.content, overview)
+
+    if "Missing ## Usage" in issues:
+        usage = generate_usage_section(skill)
+        skill.content = insert_before_examples(skill.content, usage)
+
+    # Fix heading hierarchy
+    if "Heading hierarchy" in str(issues):
+        skill.content = normalize_headings(skill.content)
+
+    # Add TOC if needed
+    if "missing table of contents" in str(issues):
+        toc = generate_toc(skill.content)
+        skill.content = insert_toc(skill.content, toc)
+
+    return skill
+```
+
+### Issue: Vague/Generic Content
+
+**Detection:**
+```python
+def check_specificity(skill):
+    vague_phrases = [
+        "you can", "might want to", "it's possible",
+        "there are various", "several options",
+        "many ways to", "different approaches"
+    ]
+
+    content_lower = skill.content.lower()
+    vague_count = sum(1 for phrase in vague_phrases if phrase in content_lower)
+
+    if vague_count > 10:
+        return Issue(f"Too many vague phrases ({vague_count})")
+
+    return None
+```
+
+**Fix:**
+```python
+def improve_specificity(skill):
+    # Replace vague with specific
+    replacements = {
+        "you can": "Use",
+        "might want to": "Should",
+        "there are various": "Three main approaches:",
+        "several options": "Options:",
+        "many ways to": "Primary methods:",
+    }
+
+    for vague, specific in replacements.items():
+        skill.content = skill.content.replace(vague, specific)
+
+    return skill
+```
+
+## Testing Integration
+
+After each enhancement, run tests:
+
+```python
+def enhance_and_test(skill):
+    while score < min_score:
+        # Enhance
+        skill = apply_enhancements(skill)
+
+        # Score
+        score = calculate_score(skill)
+
+        # Test
+        test_results = run_tests(skill)
+
+        if not test_results.all_passed():
+            # Tests revealed new issues
+            issues = test_results.get_failures()
+            skill = fix_test_failures(skill, issues)
+
+    return skill
+```
+
+## Progress Reporting
+
+User sees:
+
+```
+📊 Quality check: 7.4/10
+   ⚠️  Issues found:
+       - Description too generic
+       - Missing examples in 4 sections
+       - Some outdated patterns detected
+
+🔧 Enhancing skill...
+   ✏️  Improving description... ✅
+   📝 Adding code examples... ✅
+   🔄 Updating patterns... ✅
+
+📊 Quality check: 8.9/10 ✅
+```
+
+Internal execution:
+
+```python
+issues = [
+    Issue("description_generic", fix=fix_description),
+    Issue("missing_examples", fix=add_examples, count=4),
+    Issue("outdated_patterns", fix=update_patterns)
+]
+
+for issue in issues:
+    print(f"   {issue.icon}  {issue.action}... ", end="")
+    skill = issue.fix(skill)
+    print("✅")
+```
+
+## Quality Metrics Dashboard
+
+After completion:
+
+```
+📊 Final Quality Report
+
+Anthropic Best Practices Score: 8.9/10
+
+Breakdown:
+✅ Description Quality:      2.0/2.0  (Excellent)
+✅ Name Convention:          0.5/0.5  (Correct)
+✅ Conciseness:              1.4/1.5  (Good - 420 lines)
+✅ Progressive Disclosure:   1.0/1.0  (Excellent - 3 reference files)
+✅ Examples & Workflows:     1.0/1.0  (12 code examples)
+✅ Degree of Freedom:        0.5/0.5  (Appropriate)
+✅ Dependencies:             0.5/0.5  (Documented)
+✅ Structure:                1.0/1.0  (Well-organized)
+✅ Error Handling:           0.5/0.5  (N/A for doc skill)
+✅ Anti-Patterns:            0.5/1.0  (Minor: 2 time refs)
+✅ Testing:                  0.5/0.5  (15/15 tests passing)
+
+Recommendations:
+⚠️  Remove 2 time-sensitive references for 1.0/1.0 on anti-patterns
+```
+
+## Failure Modes
+
+### Can't Reach Threshold
+
+If after 5 iterations score is still < 8.0:
+
+```
+⚠️  Quality score 7.8 after 5 iterations
+
+Blocking issues:
+- Source documentation lacks code examples
+- Framework has limited reference material
+
+Recommendations:
+1. Manual examples needed (auto-generation limited)
+2. Consider hybrid approach with custom content
+3. Lower quality threshold to 7.5 for this specific case
+
+Continue with current skill? (y/n)
+```
+
+### Conflicting Requirements
+
+```
+⚠️  Conflicting requirements detected
+
+Issue: Comprehensive coverage (800 lines) vs Conciseness (<500 lines)
+
+Resolution: Applying progressive disclosure
+- Main SKILL.md: 380 lines (overview + quick ref)
+- Reference files: 5 files with detailed content
+```
+
+## Summary
+
+Quality loops ensure:
+1. Every skill scores >= threshold (default 8.0)
+2. Anthropic best practices followed
+3. Automatic fixes applied
+4. Tests pass
+5. User sees progress, not complexity