zhongwei/gh-tachyon-beep-skillpacks-plugins-meta-skillpack-maintenance

Files

Zhongwei Li 57790ee711 Initial commit

2025-11-30 08:59:38 +08:00

9.7 KiB

Raw Blame History

name, description

name	description
using-skillpack-maintenance	Use when maintaining or enhancing existing skill packs in the skillpacks repository - systematic pack refresh through domain analysis, structure review, RED-GREEN-REFACTOR gauntlet testing, and automated quality improvements

Skillpack Maintenance

Overview

Systematic maintenance and enhancement of existing skill packs using investigative domain analysis, RED-GREEN-REFACTOR testing, and automated improvements.

Core principle: Maintenance uses behavioral testing (gauntlet with subagents), not syntactic validation. Skills are process documentation - test if they guide agents correctly, not if they parse correctly.

When to Use

Use when:

Enhancing an existing skill pack (e.g., "refresh yzmir-deep-rl")
Improving existing SKILL.md files
Identifying gaps in pack coverage
Validating skill quality through testing

Do NOT use for:

Creating new skills from scratch (use superpowers:writing-skills)
Creating new packs from scratch (design first, then use creation workflow)

The Iron Law

NO SKILL CHANGES WITHOUT BEHAVIORAL TESTING

Syntactic validation (does it parse?) ≠ Behavioral testing (does it work?)

Common Rationalizations (from baseline testing)

Excuse	Reality
"Syntactic validation is sufficient"	Parsing ≠ effectiveness. Test with subagents.
"Quality benchmarking = effectiveness"	Comparing structure ≠ testing behavior. Run gauntlet.
"Comprehensive coverage = working skill"	Coverage ≠ guidance quality. Test if agents follow it.
"Following patterns = success"	Pattern-matching ≠ validation. Behavioral testing required.
"I'll test if issues emerge"	Issues = broken skills in production. Test BEFORE deploying.

All of these mean: Run behavioral tests with subagents. No exceptions.

Workflow Overview

Review → Discuss → [Create New Skills if Needed] → Execute

Investigation & Scorecard → Load analyzing-pack-domain.md
Structure Review (Pass 1) → Load reviewing-pack-structure.md
Content Testing (Pass 2) → Load testing-skill-quality.md
Coherence Check (Pass 3) → Validate cross-skill consistency
Discussion → Present findings, get approval
[CONDITIONAL] Create New Skills → If gaps identified, use superpowers:writing-skills for EACH gap (RED-GREEN-REFACTOR)
Execution → Load implementing-fixes.md, enhance existing skills only
Commit → Single commit with version bump

Stage 1: Investigation & Scorecard

Load briefing: analyzing-pack-domain.md

Purpose: Establish "what this pack should cover" from first principles.

Adaptive investigation (D→B→C→A):

User-guided scope (D) - Ask user about pack intent and boundaries
LLM knowledge analysis (B) - Map domain comprehensively, flag if research needed
Existing pack audit (C) - Compare current state vs. coverage map
Research if needed (A) - Conditional: only if domain is rapidly evolving

Output: Domain coverage map, gap analysis, research currency flag

Then: Load reviewing-pack-structure.md for scorecard

Scorecard levels:

Critical - Pack unusable, recommend rebuild vs. enhance
Major - Significant gaps or duplicates
Minor - Organizational improvements
Pass - Structurally sound

Decision gate: Present scorecard → User decides: Proceed / Rebuild / Cancel

Stage 2: Comprehensive Review

Pass 1: Structure (from reviewing-pack-structure.md)

Analyze:

Gaps (missing skills based on coverage map)
Duplicates (overlapping coverage - merge/specialize/remove)
Organization (router accuracy, faction alignment, metadata sync)

Output: Structural issues with priorities (critical/major/minor)

Pass 2: Content Quality (from testing-skill-quality.md)

CRITICAL: This is behavioral testing with subagents, not syntactic validation.

Gauntlet design (A→C→B priority):

A. Pressure scenarios - Catch rationalizations:

Time pressure: "This is urgent, just do it quickly"
Simplicity temptation: "Too simple to need the skill"
Overkill perception: "Skill is for complex cases, this is straightforward"

C. Adversarial edge cases - Test robustness:

Corner cases where skill principles conflict
Situations where naive application fails

B. Real-world complexity - Validate utility:

Messy requirements, unclear constraints
Multiple valid approaches

Testing process per skill:

Design challenging scenario from gauntlet categories
Run subagent WITH current skill (behavioral test)
Observe: Does it follow? Where does it rationalize/fail?
Document failure modes
Result: Pass OR Fix needed (with specific issues listed)

Philosophy: D as gauntlet to identify issues, B for targeted fixes. If skill passes gauntlet, no changes needed.

Output: Per-skill test results (Pass / Fix needed + priorities)

Pass 3: Coherence

After structure/content analysis, validate pack-level coherence:

Cross-skill consistency - Terminology, examples, cross-references
Router accuracy - Does using-X router reflect current specialists?
Faction alignment - Check FACTIONS.md, flag drift, suggest rehoming if needed
Metadata sync - plugin.json description, skill count
Navigation - Can users find skills easily?

CRITICAL: Update skills to reference new/enhanced skills (post-update hygiene)

Output: Coherence issues, faction drift flags

Stage 3: Interactive Discussion

Present findings conversationally:

Structural category:

Gaps requiring superpowers:writing-skills (new skills needed - each requires RED-GREEN-REFACTOR)
Duplicates to remove/merge
Organization issues

Content category:

Skills needing enhancement (from gauntlet failures)
Severity levels (critical/major/minor)
Specific failure modes identified

Coherence category:

Cross-reference updates needed
Faction alignment issues
Metadata corrections

Get user approval for scope of work

CRITICAL DECISION POINT: If gaps (new skills) were identified:

User approves → IMMEDIATELY use superpowers:writing-skills for EACH gap
Do NOT proceed to Stage 4 until ALL new skills are created and tested
Each gap = separate RED-GREEN-REFACTOR cycle
Return to Stage 4 only after ALL gaps are filled

Stage 4: Autonomous Execution

Load briefing: implementing-fixes.md

PREREQUISITE CHECK:

✓ Zero gaps identified, OR
✓ All gaps already filled using superpowers:writing-skills (each skill individually tested)

If gaps exist and you haven't used writing-skills: STOP. Return to Stage 3.

Execute approved changes:

Structural fixes - Remove/merge duplicate skills, update router
Content enhancements - Fix gauntlet failures, add missing guidance to existing skills
Coherence improvements - Cross-references, terminology alignment, faction voice
Version management - Apply impact-based bump (patch/minor/major)
Git commit - Single commit with all changes

Version bump rules (impact-based):

Patch (x.y.Z) - Low-impact: typos, formatting, minor clarifications
Minor (x.Y.0) - Medium-impact: enhanced guidance, new skills, better examples (DEFAULT)
Major (X.0.0) - High-impact: skills removed, structural changes, philosophy shifts (RARE)

Commit format:

feat(meta): enhance [pack-name] - [summary]

[Detailed list of changes by category]
- Structure: [changes]
- Content: [changes]
- Coherence: [changes]

Version bump: [reason for patch/minor/major]

Output: Enhanced pack, commit created, summary report

Briefing Files Reference

All briefing files are in this skill directory:

analyzing-pack-domain.md - Investigative domain analysis (D→B→C→A)
reviewing-pack-structure.md - Structure review, scorecard, gap/duplicate analysis
testing-skill-quality.md - Gauntlet testing methodology with subagents
implementing-fixes.md - Autonomous execution, version management, git commit

Load appropriate briefing at each stage.

Critical Distinctions

Behavioral vs. Syntactic Testing:

❌ Syntactic: "Does Python code parse?" → ast.parse()
✅ Behavioral: "Does skill guide agents correctly?" → Subagent gauntlet

This workflow requires BEHAVIORAL testing.

Maintenance vs. Creation:

Maintenance (this skill): Enhancing existing SKILL.md files
Creation (superpowers:writing-skills): Writing new skills from scratch

Use the right tool for the task.

Red Flags - STOP and Switch Tools

If you catch yourself thinking ANY of these:

"I'll write the new skills during execution" → NO. Use superpowers:writing-skills for EACH gap
"implementing-fixes.md says to create skills" → NO. That section was REMOVED. Exit and use writing-skills
"Token efficiency - I can just write good skills" → NO. Untested skills = broken skills
"I see the pattern, I can replicate it" → NO. Pattern-matching ≠ behavioral testing
"User wants this done quickly" → NO. Fast + untested = waste of time fixing later
"I'm competent, testing is overkill" → NO. Competence = following the process
"Gaps were approved, so I should fill them" → YES, but using writing-skills, not here
Validating syntax instead of behavior → Load testing-skill-quality.md
Skipping gauntlet testing → You're violating the Iron Law
Making changes without user approval → Follow Review→Discuss→Execute

All of these mean: STOP. Exit workflow. Use superpowers:writing-skills.

The Bottom Line

Maintaining skills requires behavioral testing, not syntactic validation.

Same principle as code: you test behavior, not syntax.

Load briefings at each stage. Test with subagents. Get approval. Execute.

No shortcuts. No rationalizations.

9.7 KiB Raw Blame History