Initial commit
This commit is contained in:
249
skills/using-skillpack-maintenance/SKILL.md
Normal file
249
skills/using-skillpack-maintenance/SKILL.md
Normal file
@@ -0,0 +1,249 @@
|
||||
---
|
||||
name: using-skillpack-maintenance
|
||||
description: Use when maintaining or enhancing existing skill packs in the skillpacks repository - systematic pack refresh through domain analysis, structure review, RED-GREEN-REFACTOR gauntlet testing, and automated quality improvements
|
||||
---
|
||||
|
||||
# Skillpack Maintenance
|
||||
|
||||
## Overview
|
||||
|
||||
Systematic maintenance and enhancement of existing skill packs using investigative domain analysis, RED-GREEN-REFACTOR testing, and automated improvements.
|
||||
|
||||
**Core principle:** Maintenance uses behavioral testing (gauntlet with subagents), not syntactic validation. Skills are process documentation - test if they guide agents correctly, not if they parse correctly.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use when:
|
||||
- Enhancing an existing skill pack (e.g., "refresh yzmir-deep-rl")
|
||||
- Improving existing SKILL.md files
|
||||
- Identifying gaps in pack coverage
|
||||
- Validating skill quality through testing
|
||||
|
||||
**Do NOT use for:**
|
||||
- Creating new skills from scratch (use superpowers:writing-skills)
|
||||
- Creating new packs from scratch (design first, then use creation workflow)
|
||||
|
||||
## The Iron Law
|
||||
|
||||
**NO SKILL CHANGES WITHOUT BEHAVIORAL TESTING**
|
||||
|
||||
Syntactic validation (does it parse?) ≠ Behavioral testing (does it work?)
|
||||
|
||||
## Common Rationalizations (from baseline testing)
|
||||
|
||||
| Excuse | Reality |
|
||||
|--------|---------|
|
||||
| "Syntactic validation is sufficient" | Parsing ≠ effectiveness. Test with subagents. |
|
||||
| "Quality benchmarking = effectiveness" | Comparing structure ≠ testing behavior. Run gauntlet. |
|
||||
| "Comprehensive coverage = working skill" | Coverage ≠ guidance quality. Test if agents follow it. |
|
||||
| "Following patterns = success" | Pattern-matching ≠ validation. Behavioral testing required. |
|
||||
| "I'll test if issues emerge" | Issues = broken skills in production. Test BEFORE deploying. |
|
||||
|
||||
**All of these mean: Run behavioral tests with subagents. No exceptions.**
|
||||
|
||||
## Workflow Overview
|
||||
|
||||
**Review → Discuss → [Create New Skills if Needed] → Execute**
|
||||
|
||||
1. **Investigation & Scorecard** → Load `analyzing-pack-domain.md`
|
||||
2. **Structure Review (Pass 1)** → Load `reviewing-pack-structure.md`
|
||||
3. **Content Testing (Pass 2)** → Load `testing-skill-quality.md`
|
||||
4. **Coherence Check (Pass 3)** → Validate cross-skill consistency
|
||||
5. **Discussion** → Present findings, get approval
|
||||
6. **[CONDITIONAL] Create New Skills** → If gaps identified, use `superpowers:writing-skills` for EACH gap (RED-GREEN-REFACTOR)
|
||||
7. **Execution** → Load `implementing-fixes.md`, enhance existing skills only
|
||||
8. **Commit** → Single commit with version bump
|
||||
|
||||
## Stage 1: Investigation & Scorecard
|
||||
|
||||
**Load briefing:** `analyzing-pack-domain.md`
|
||||
|
||||
**Purpose:** Establish "what this pack should cover" from first principles.
|
||||
|
||||
**Adaptive investigation (D→B→C→A):**
|
||||
1. **User-guided scope (D)** - Ask user about pack intent and boundaries
|
||||
2. **LLM knowledge analysis (B)** - Map domain comprehensively, flag if research needed
|
||||
3. **Existing pack audit (C)** - Compare current state vs. coverage map
|
||||
4. **Research if needed (A)** - Conditional: only if domain is rapidly evolving
|
||||
|
||||
**Output:** Domain coverage map, gap analysis, research currency flag
|
||||
|
||||
**Then: Load `reviewing-pack-structure.md` for scorecard**
|
||||
|
||||
**Scorecard levels:**
|
||||
- **Critical** - Pack unusable, recommend rebuild vs. enhance
|
||||
- **Major** - Significant gaps or duplicates
|
||||
- **Minor** - Organizational improvements
|
||||
- **Pass** - Structurally sound
|
||||
|
||||
**Decision gate:** Present scorecard → User decides: Proceed / Rebuild / Cancel
|
||||
|
||||
## Stage 2: Comprehensive Review
|
||||
|
||||
### Pass 1: Structure (from reviewing-pack-structure.md)
|
||||
|
||||
**Analyze:**
|
||||
- Gaps (missing skills based on coverage map)
|
||||
- Duplicates (overlapping coverage - merge/specialize/remove)
|
||||
- Organization (router accuracy, faction alignment, metadata sync)
|
||||
|
||||
**Output:** Structural issues with priorities (critical/major/minor)
|
||||
|
||||
### Pass 2: Content Quality (from testing-skill-quality.md)
|
||||
|
||||
**CRITICAL:** This is behavioral testing with subagents, not syntactic validation.
|
||||
|
||||
**Gauntlet design (A→C→B priority):**
|
||||
|
||||
**A. Pressure scenarios** - Catch rationalizations:
|
||||
- Time pressure: "This is urgent, just do it quickly"
|
||||
- Simplicity temptation: "Too simple to need the skill"
|
||||
- Overkill perception: "Skill is for complex cases, this is straightforward"
|
||||
|
||||
**C. Adversarial edge cases** - Test robustness:
|
||||
- Corner cases where skill principles conflict
|
||||
- Situations where naive application fails
|
||||
|
||||
**B. Real-world complexity** - Validate utility:
|
||||
- Messy requirements, unclear constraints
|
||||
- Multiple valid approaches
|
||||
|
||||
**Testing process per skill:**
|
||||
1. Design challenging scenario from gauntlet categories
|
||||
2. **Run subagent WITH current skill** (behavioral test)
|
||||
3. Observe: Does it follow? Where does it rationalize/fail?
|
||||
4. Document failure modes
|
||||
5. Result: Pass OR Fix needed (with specific issues listed)
|
||||
|
||||
**Philosophy:** D as gauntlet to identify issues, B for targeted fixes. If skill passes gauntlet, no changes needed.
|
||||
|
||||
**Output:** Per-skill test results (Pass / Fix needed + priorities)
|
||||
|
||||
### Pass 3: Coherence
|
||||
|
||||
**After structure/content analysis, validate pack-level coherence:**
|
||||
|
||||
1. **Cross-skill consistency** - Terminology, examples, cross-references
|
||||
2. **Router accuracy** - Does using-X router reflect current specialists?
|
||||
3. **Faction alignment** - Check FACTIONS.md, flag drift, suggest rehoming if needed
|
||||
4. **Metadata sync** - plugin.json description, skill count
|
||||
5. **Navigation** - Can users find skills easily?
|
||||
|
||||
**CRITICAL:** Update skills to reference new/enhanced skills (post-update hygiene)
|
||||
|
||||
**Output:** Coherence issues, faction drift flags
|
||||
|
||||
## Stage 3: Interactive Discussion
|
||||
|
||||
**Present findings conversationally:**
|
||||
|
||||
**Structural category:**
|
||||
- **Gaps requiring superpowers:writing-skills** (new skills needed - each requires RED-GREEN-REFACTOR)
|
||||
- Duplicates to remove/merge
|
||||
- Organization issues
|
||||
|
||||
**Content category:**
|
||||
- Skills needing enhancement (from gauntlet failures)
|
||||
- Severity levels (critical/major/minor)
|
||||
- Specific failure modes identified
|
||||
|
||||
**Coherence category:**
|
||||
- Cross-reference updates needed
|
||||
- Faction alignment issues
|
||||
- Metadata corrections
|
||||
|
||||
**Get user approval for scope of work**
|
||||
|
||||
**CRITICAL DECISION POINT:** If gaps (new skills) were identified:
|
||||
- User approves → **IMMEDIATELY use superpowers:writing-skills for EACH gap**
|
||||
- Do NOT proceed to Stage 4 until ALL new skills are created and tested
|
||||
- Each gap = separate RED-GREEN-REFACTOR cycle
|
||||
- Return to Stage 4 only after ALL gaps are filled
|
||||
|
||||
## Stage 4: Autonomous Execution
|
||||
|
||||
**Load briefing:** `implementing-fixes.md`
|
||||
|
||||
**PREREQUISITE CHECK:**
|
||||
- ✓ Zero gaps identified, OR
|
||||
- ✓ All gaps already filled using superpowers:writing-skills (each skill individually tested)
|
||||
|
||||
**If gaps exist and you haven't used writing-skills:** STOP. Return to Stage 3.
|
||||
|
||||
**Execute approved changes:**
|
||||
|
||||
1. **Structural fixes** - Remove/merge duplicate skills, update router
|
||||
2. **Content enhancements** - Fix gauntlet failures, add missing guidance to existing skills
|
||||
3. **Coherence improvements** - Cross-references, terminology alignment, faction voice
|
||||
4. **Version management** - Apply impact-based bump (patch/minor/major)
|
||||
5. **Git commit** - Single commit with all changes
|
||||
|
||||
**Version bump rules (impact-based):**
|
||||
- **Patch (x.y.Z)** - Low-impact: typos, formatting, minor clarifications
|
||||
- **Minor (x.Y.0)** - Medium-impact: enhanced guidance, new skills, better examples (DEFAULT)
|
||||
- **Major (X.0.0)** - High-impact: skills removed, structural changes, philosophy shifts (RARE)
|
||||
|
||||
**Commit format:**
|
||||
```
|
||||
feat(meta): enhance [pack-name] - [summary]
|
||||
|
||||
[Detailed list of changes by category]
|
||||
- Structure: [changes]
|
||||
- Content: [changes]
|
||||
- Coherence: [changes]
|
||||
|
||||
Version bump: [reason for patch/minor/major]
|
||||
```
|
||||
|
||||
**Output:** Enhanced pack, commit created, summary report
|
||||
|
||||
## Briefing Files Reference
|
||||
|
||||
All briefing files are in this skill directory:
|
||||
|
||||
- `analyzing-pack-domain.md` - Investigative domain analysis (D→B→C→A)
|
||||
- `reviewing-pack-structure.md` - Structure review, scorecard, gap/duplicate analysis
|
||||
- `testing-skill-quality.md` - Gauntlet testing methodology with subagents
|
||||
- `implementing-fixes.md` - Autonomous execution, version management, git commit
|
||||
|
||||
**Load appropriate briefing at each stage.**
|
||||
|
||||
## Critical Distinctions
|
||||
|
||||
**Behavioral vs. Syntactic Testing:**
|
||||
- ❌ **Syntactic:** "Does Python code parse?" → ast.parse()
|
||||
- ✅ **Behavioral:** "Does skill guide agents correctly?" → Subagent gauntlet
|
||||
|
||||
**This workflow requires BEHAVIORAL testing.**
|
||||
|
||||
**Maintenance vs. Creation:**
|
||||
- **Maintenance** (this skill): Enhancing existing SKILL.md files
|
||||
- **Creation** (superpowers:writing-skills): Writing new skills from scratch
|
||||
|
||||
**Use the right tool for the task.**
|
||||
|
||||
## Red Flags - STOP and Switch Tools
|
||||
|
||||
If you catch yourself thinking ANY of these:
|
||||
- "I'll write the new skills during execution" → NO. Use superpowers:writing-skills for EACH gap
|
||||
- "implementing-fixes.md says to create skills" → NO. That section was REMOVED. Exit and use writing-skills
|
||||
- "Token efficiency - I can just write good skills" → NO. Untested skills = broken skills
|
||||
- "I see the pattern, I can replicate it" → NO. Pattern-matching ≠ behavioral testing
|
||||
- "User wants this done quickly" → NO. Fast + untested = waste of time fixing later
|
||||
- "I'm competent, testing is overkill" → NO. Competence = following the process
|
||||
- "Gaps were approved, so I should fill them" → YES, but using writing-skills, not here
|
||||
- Validating syntax instead of behavior → Load testing-skill-quality.md
|
||||
- Skipping gauntlet testing → You're violating the Iron Law
|
||||
- Making changes without user approval → Follow Review→Discuss→Execute
|
||||
|
||||
**All of these mean: STOP. Exit workflow. Use superpowers:writing-skills.**
|
||||
|
||||
## The Bottom Line
|
||||
|
||||
**Maintaining skills requires behavioral testing, not syntactic validation.**
|
||||
|
||||
Same principle as code: you test behavior, not syntax.
|
||||
|
||||
Load briefings at each stage. Test with subagents. Get approval. Execute.
|
||||
|
||||
No shortcuts. No rationalizations.
|
||||
Reference in New Issue
Block a user