9.7 KiB
name, description
| name | description |
|---|---|
| using-skillpack-maintenance | Use when maintaining or enhancing existing skill packs in the skillpacks repository - systematic pack refresh through domain analysis, structure review, RED-GREEN-REFACTOR gauntlet testing, and automated quality improvements |
Skillpack Maintenance
Overview
Systematic maintenance and enhancement of existing skill packs using investigative domain analysis, RED-GREEN-REFACTOR testing, and automated improvements.
Core principle: Maintenance uses behavioral testing (gauntlet with subagents), not syntactic validation. Skills are process documentation - test if they guide agents correctly, not if they parse correctly.
When to Use
Use when:
- Enhancing an existing skill pack (e.g., "refresh yzmir-deep-rl")
- Improving existing SKILL.md files
- Identifying gaps in pack coverage
- Validating skill quality through testing
Do NOT use for:
- Creating new skills from scratch (use superpowers:writing-skills)
- Creating new packs from scratch (design first, then use creation workflow)
The Iron Law
NO SKILL CHANGES WITHOUT BEHAVIORAL TESTING
Syntactic validation (does it parse?) ≠ Behavioral testing (does it work?)
Common Rationalizations (from baseline testing)
| Excuse | Reality |
|---|---|
| "Syntactic validation is sufficient" | Parsing ≠ effectiveness. Test with subagents. |
| "Quality benchmarking = effectiveness" | Comparing structure ≠ testing behavior. Run gauntlet. |
| "Comprehensive coverage = working skill" | Coverage ≠ guidance quality. Test if agents follow it. |
| "Following patterns = success" | Pattern-matching ≠ validation. Behavioral testing required. |
| "I'll test if issues emerge" | Issues = broken skills in production. Test BEFORE deploying. |
All of these mean: Run behavioral tests with subagents. No exceptions.
Workflow Overview
Review → Discuss → [Create New Skills if Needed] → Execute
- Investigation & Scorecard → Load
analyzing-pack-domain.md - Structure Review (Pass 1) → Load
reviewing-pack-structure.md - Content Testing (Pass 2) → Load
testing-skill-quality.md - Coherence Check (Pass 3) → Validate cross-skill consistency
- Discussion → Present findings, get approval
- [CONDITIONAL] Create New Skills → If gaps identified, use
superpowers:writing-skillsfor EACH gap (RED-GREEN-REFACTOR) - Execution → Load
implementing-fixes.md, enhance existing skills only - Commit → Single commit with version bump
Stage 1: Investigation & Scorecard
Load briefing: analyzing-pack-domain.md
Purpose: Establish "what this pack should cover" from first principles.
Adaptive investigation (D→B→C→A):
- User-guided scope (D) - Ask user about pack intent and boundaries
- LLM knowledge analysis (B) - Map domain comprehensively, flag if research needed
- Existing pack audit (C) - Compare current state vs. coverage map
- Research if needed (A) - Conditional: only if domain is rapidly evolving
Output: Domain coverage map, gap analysis, research currency flag
Then: Load reviewing-pack-structure.md for scorecard
Scorecard levels:
- Critical - Pack unusable, recommend rebuild vs. enhance
- Major - Significant gaps or duplicates
- Minor - Organizational improvements
- Pass - Structurally sound
Decision gate: Present scorecard → User decides: Proceed / Rebuild / Cancel
Stage 2: Comprehensive Review
Pass 1: Structure (from reviewing-pack-structure.md)
Analyze:
- Gaps (missing skills based on coverage map)
- Duplicates (overlapping coverage - merge/specialize/remove)
- Organization (router accuracy, faction alignment, metadata sync)
Output: Structural issues with priorities (critical/major/minor)
Pass 2: Content Quality (from testing-skill-quality.md)
CRITICAL: This is behavioral testing with subagents, not syntactic validation.
Gauntlet design (A→C→B priority):
A. Pressure scenarios - Catch rationalizations:
- Time pressure: "This is urgent, just do it quickly"
- Simplicity temptation: "Too simple to need the skill"
- Overkill perception: "Skill is for complex cases, this is straightforward"
C. Adversarial edge cases - Test robustness:
- Corner cases where skill principles conflict
- Situations where naive application fails
B. Real-world complexity - Validate utility:
- Messy requirements, unclear constraints
- Multiple valid approaches
Testing process per skill:
- Design challenging scenario from gauntlet categories
- Run subagent WITH current skill (behavioral test)
- Observe: Does it follow? Where does it rationalize/fail?
- Document failure modes
- Result: Pass OR Fix needed (with specific issues listed)
Philosophy: D as gauntlet to identify issues, B for targeted fixes. If skill passes gauntlet, no changes needed.
Output: Per-skill test results (Pass / Fix needed + priorities)
Pass 3: Coherence
After structure/content analysis, validate pack-level coherence:
- Cross-skill consistency - Terminology, examples, cross-references
- Router accuracy - Does using-X router reflect current specialists?
- Faction alignment - Check FACTIONS.md, flag drift, suggest rehoming if needed
- Metadata sync - plugin.json description, skill count
- Navigation - Can users find skills easily?
CRITICAL: Update skills to reference new/enhanced skills (post-update hygiene)
Output: Coherence issues, faction drift flags
Stage 3: Interactive Discussion
Present findings conversationally:
Structural category:
- Gaps requiring superpowers:writing-skills (new skills needed - each requires RED-GREEN-REFACTOR)
- Duplicates to remove/merge
- Organization issues
Content category:
- Skills needing enhancement (from gauntlet failures)
- Severity levels (critical/major/minor)
- Specific failure modes identified
Coherence category:
- Cross-reference updates needed
- Faction alignment issues
- Metadata corrections
Get user approval for scope of work
CRITICAL DECISION POINT: If gaps (new skills) were identified:
- User approves → IMMEDIATELY use superpowers:writing-skills for EACH gap
- Do NOT proceed to Stage 4 until ALL new skills are created and tested
- Each gap = separate RED-GREEN-REFACTOR cycle
- Return to Stage 4 only after ALL gaps are filled
Stage 4: Autonomous Execution
Load briefing: implementing-fixes.md
PREREQUISITE CHECK:
- ✓ Zero gaps identified, OR
- ✓ All gaps already filled using superpowers:writing-skills (each skill individually tested)
If gaps exist and you haven't used writing-skills: STOP. Return to Stage 3.
Execute approved changes:
- Structural fixes - Remove/merge duplicate skills, update router
- Content enhancements - Fix gauntlet failures, add missing guidance to existing skills
- Coherence improvements - Cross-references, terminology alignment, faction voice
- Version management - Apply impact-based bump (patch/minor/major)
- Git commit - Single commit with all changes
Version bump rules (impact-based):
- Patch (x.y.Z) - Low-impact: typos, formatting, minor clarifications
- Minor (x.Y.0) - Medium-impact: enhanced guidance, new skills, better examples (DEFAULT)
- Major (X.0.0) - High-impact: skills removed, structural changes, philosophy shifts (RARE)
Commit format:
feat(meta): enhance [pack-name] - [summary]
[Detailed list of changes by category]
- Structure: [changes]
- Content: [changes]
- Coherence: [changes]
Version bump: [reason for patch/minor/major]
Output: Enhanced pack, commit created, summary report
Briefing Files Reference
All briefing files are in this skill directory:
analyzing-pack-domain.md- Investigative domain analysis (D→B→C→A)reviewing-pack-structure.md- Structure review, scorecard, gap/duplicate analysistesting-skill-quality.md- Gauntlet testing methodology with subagentsimplementing-fixes.md- Autonomous execution, version management, git commit
Load appropriate briefing at each stage.
Critical Distinctions
Behavioral vs. Syntactic Testing:
- ❌ Syntactic: "Does Python code parse?" → ast.parse()
- ✅ Behavioral: "Does skill guide agents correctly?" → Subagent gauntlet
This workflow requires BEHAVIORAL testing.
Maintenance vs. Creation:
- Maintenance (this skill): Enhancing existing SKILL.md files
- Creation (superpowers:writing-skills): Writing new skills from scratch
Use the right tool for the task.
Red Flags - STOP and Switch Tools
If you catch yourself thinking ANY of these:
- "I'll write the new skills during execution" → NO. Use superpowers:writing-skills for EACH gap
- "implementing-fixes.md says to create skills" → NO. That section was REMOVED. Exit and use writing-skills
- "Token efficiency - I can just write good skills" → NO. Untested skills = broken skills
- "I see the pattern, I can replicate it" → NO. Pattern-matching ≠ behavioral testing
- "User wants this done quickly" → NO. Fast + untested = waste of time fixing later
- "I'm competent, testing is overkill" → NO. Competence = following the process
- "Gaps were approved, so I should fill them" → YES, but using writing-skills, not here
- Validating syntax instead of behavior → Load testing-skill-quality.md
- Skipping gauntlet testing → You're violating the Iron Law
- Making changes without user approval → Follow Review→Discuss→Execute
All of these mean: STOP. Exit workflow. Use superpowers:writing-skills.
The Bottom Line
Maintaining skills requires behavioral testing, not syntactic validation.
Same principle as code: you test behavior, not syntax.
Load briefings at each stage. Test with subagents. Get approval. Execute.
No shortcuts. No rationalizations.