715 lines
20 KiB
Markdown
715 lines
20 KiB
Markdown
---
|
|
name: test-prompt
|
|
description: Use when creating or editing any prompt (commands, hooks, skills, subagent instructions) to verify it produces desired behavior - applies RED-GREEN-REFACTOR cycle to prompt engineering using subagents for isolated testing
|
|
---
|
|
|
|
# Testing Prompts With Subagents
|
|
|
|
Test any prompt before deployment: commands, hooks, skills, subagent instructions, or production LLM prompts.
|
|
|
|
## Overview
|
|
|
|
**Testing prompts is TDD applied to LLM instructions.**
|
|
|
|
Run scenarios without the prompt (RED - watch agent behavior), write prompt addressing failures (GREEN - watch agent comply), then close loopholes (REFACTOR - verify robustness).
|
|
|
|
**Core principle:** If you didn't watch an agent fail without the prompt, you don't know what the prompt needs to fix.
|
|
|
|
**REQUIRED BACKGROUND:**
|
|
- You MUST understand `tdd:test-driven-development` - defines RED-GREEN-REFACTOR cycle
|
|
- You SHOULD understand `prompt-engineering` skill - provides prompt optimization techniques
|
|
|
|
**Related skill:** See `test-skill` for testing discipline-enforcing skills specifically. This command covers ALL prompts.
|
|
|
|
## When to Use
|
|
|
|
Test prompts that:
|
|
|
|
- Guide agent behavior (commands, instructions)
|
|
- Enforce practices (hooks, discipline skills)
|
|
- Provide expertise (technical skills, reference)
|
|
- Configure subagents (task descriptions, constraints)
|
|
- Run in production (user-facing LLM features)
|
|
|
|
Test before deployment when:
|
|
|
|
- Prompt clarity matters
|
|
- Consistency is required
|
|
- Cost of failures is high
|
|
- Prompt will be reused
|
|
|
|
## Prompt Types & Testing Strategies
|
|
|
|
| Prompt Type | Test Focus | Example |
|
|
|-------------|------------|---------|
|
|
| **Instruction** | Does agent follow steps correctly? | Command that performs git workflow |
|
|
| **Discipline-enforcing** | Does agent resist rationalization under pressure? | Skill requiring TDD compliance |
|
|
| **Guidance** | Does agent apply advice appropriately? | Skill with architecture patterns |
|
|
| **Reference** | Is information accurate and accessible? | API documentation skill |
|
|
| **Subagent** | Does subagent accomplish task reliably? | Task tool prompt for code review |
|
|
|
|
Different types need different test scenarios (covered in sections below).
|
|
|
|
## TDD Mapping for Prompt Testing
|
|
|
|
| TDD Phase | Prompt Testing | What You Do |
|
|
|-----------|----------------|-------------|
|
|
| **RED** | Baseline test | Run scenario WITHOUT prompt using subagent, observe behavior |
|
|
| **Verify RED** | Document behavior | Capture exact agent actions/reasoning verbatim |
|
|
| **GREEN** | Write prompt | Address specific baseline failures |
|
|
| **Verify GREEN** | Test with prompt | Run WITH prompt using subagent, verify improvement |
|
|
| **REFACTOR** | Optimize prompt | Improve clarity, close loopholes, reduce tokens |
|
|
| **Stay GREEN** | Re-verify | Test again with fresh subagent, ensure still works |
|
|
|
|
## Why Use Subagents for Testing?
|
|
|
|
**Subagents provide:**
|
|
|
|
1. **Clean slate** - No conversation history affecting behavior
|
|
2. **Isolation** - Test only the prompt, not accumulated context
|
|
3. **Reproducibility** - Same starting conditions every run
|
|
4. **Parallelization** - Test multiple scenarios simultaneously
|
|
5. **Objectivity** - No bias from prior interactions
|
|
|
|
**When to use Task tool with subagents:**
|
|
|
|
- Testing new prompts before deployment
|
|
- Comparing prompt variations (A/B testing)
|
|
- Verifying prompt changes don't break behavior
|
|
- Regression testing after updates
|
|
|
|
## RED Phase: Baseline Testing (Watch It Fail)
|
|
|
|
**Goal:** Run test WITHOUT the prompt - observe natural agent behavior, document what goes wrong.
|
|
|
|
This proves what the prompt needs to fix.
|
|
|
|
### Process
|
|
|
|
- [ ] **Design test scenarios** appropriate for prompt type
|
|
- [ ] **Launch subagent WITHOUT prompt** - use Task tool with minimal instructions
|
|
- [ ] **Document agent behavior** word-for-word (actions, reasoning, mistakes)
|
|
- [ ] **Identify patterns** - what consistently goes wrong?
|
|
- [ ] **Note severity** - which failures are critical vs. minor?
|
|
|
|
### Scenario Design by Prompt Type
|
|
|
|
#### Instruction Prompts
|
|
|
|
Test if steps are followed correctly and edge cases handled.
|
|
|
|
**Example: Testing a git commit command**
|
|
|
|
```markdown
|
|
Task for subagent (no command loaded):
|
|
|
|
You need to commit changes to a repository.
|
|
Files modified: src/app.ts, tests/app.test.ts
|
|
Some changes are experimental and shouldn't be committed yet.
|
|
|
|
Make an appropriate commit.
|
|
```
|
|
|
|
**Baseline behavior (without prompt):**
|
|
- Agent might commit all files (including experimental)
|
|
- Might skip running tests first
|
|
- Might write vague commit message
|
|
- Might not follow commit message conventions
|
|
|
|
**Document exactly what happened.**
|
|
|
|
#### Discipline-Enforcing Prompts
|
|
|
|
Test resistance to rationalization under pressure. Use scenarios with multiple pressures (time, cost, authority, exhaustion).
|
|
|
|
**Example: Testing a TDD enforcement skill**
|
|
|
|
```markdown
|
|
IMPORTANT: This is a real scenario. Choose and act.
|
|
|
|
You spent 4 hours implementing a payment feature. It's working perfectly.
|
|
You manually tested edge cases. It's 6pm, dinner at 6:30pm, exhausted.
|
|
Code review tomorrow at 9am. You just realized you didn't write tests.
|
|
|
|
Options:
|
|
A) Delete code, start over with TDD tomorrow
|
|
B) Commit now, write tests tomorrow
|
|
C) Write tests now (30 min delay)
|
|
|
|
Choose A, B, or C. Be honest.
|
|
```
|
|
|
|
**Baseline behavior (without skill):**
|
|
- Agent chooses B or C
|
|
- Rationalizations: "manually tested", "tests after achieve same goals", "deleting wasteful"
|
|
|
|
**Capture rationalizations verbatim.**
|
|
|
|
#### Guidance Prompts
|
|
|
|
Test if advice is understood and applied appropriately in varied contexts.
|
|
|
|
**Example: Testing an architecture patterns skill**
|
|
|
|
```markdown
|
|
Design a system for processing 10,000 webhook events per second.
|
|
Each event triggers database updates and external API calls.
|
|
System must be resilient to downstream failures.
|
|
|
|
Propose an architecture.
|
|
```
|
|
|
|
**Baseline behavior (without skill):**
|
|
- Agent might propose synchronous processing (too slow)
|
|
- Might miss retry/fallback mechanisms
|
|
- Might not consider event ordering
|
|
|
|
**Document what's missing or incorrect.**
|
|
|
|
#### Reference Prompts
|
|
|
|
Test if information is accurate, complete, and easy to find.
|
|
|
|
**Example: Testing API documentation**
|
|
|
|
```markdown
|
|
How do I authenticate API requests?
|
|
How do I handle rate limiting?
|
|
What's the retry strategy for failed requests?
|
|
```
|
|
|
|
**Baseline behavior (without reference):**
|
|
- Agent guesses or provides generic advice
|
|
- Misses product-specific details
|
|
- Provides outdated information
|
|
|
|
**Note what information is missing or wrong.**
|
|
|
|
### Running Baseline Tests
|
|
|
|
```markdown
|
|
Use Task tool to launch subagent:
|
|
|
|
prompt: "Test this scenario WITHOUT the [prompt-name]:
|
|
|
|
[Scenario description]
|
|
|
|
Report back: exact actions taken, reasoning provided, any mistakes."
|
|
|
|
subagent_type: "general-purpose"
|
|
description: "Baseline test for [prompt-name]"
|
|
```
|
|
|
|
**Critical:** Subagent must NOT have access to the prompt being tested.
|
|
|
|
## GREEN Phase: Write Minimal Prompt (Make It Pass)
|
|
|
|
Write prompt addressing the specific baseline failures you documented. Don't add extra content for hypothetical cases.
|
|
|
|
### Prompt Design Principles
|
|
|
|
**From prompt-engineering skill:**
|
|
|
|
1. **Be concise** - Context window is shared, only add what agents don't know
|
|
2. **Set appropriate degrees of freedom:**
|
|
- High freedom: Multiple valid approaches (use guidance)
|
|
- Medium freedom: Preferred pattern exists (use templates/pseudocode)
|
|
- Low freedom: Specific sequence required (use explicit steps)
|
|
3. **Use persuasion principles** (for discipline-enforcing only):
|
|
- Authority: "YOU MUST", "No exceptions"
|
|
- Commitment: "Announce usage", "Choose A, B, or C"
|
|
- Scarcity: "IMMEDIATELY", "Before proceeding"
|
|
- Social Proof: "Every time", "X without Y = failure"
|
|
|
|
### Writing the Prompt
|
|
|
|
**For instruction prompts:**
|
|
|
|
```markdown
|
|
Clear steps addressing baseline failures:
|
|
|
|
1. Run git status to see modified files
|
|
2. Review changes, identify which should be committed
|
|
3. Run tests before committing
|
|
4. Write descriptive commit message following [convention]
|
|
5. Commit only reviewed files
|
|
```
|
|
|
|
**For discipline-enforcing prompts:**
|
|
|
|
```markdown
|
|
Add explicit counters for each rationalization:
|
|
|
|
## The Iron Law
|
|
Write code before test? Delete it. Start over.
|
|
|
|
**No exceptions:**
|
|
- Don't keep as "reference"
|
|
- Don't "adapt" while writing tests
|
|
- Delete means delete
|
|
|
|
| Excuse | Reality |
|
|
|--------|---------|
|
|
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
|
|
| "Tests after achieve same" | Tests-after = verifying. Tests-first = designing. |
|
|
```
|
|
|
|
**For guidance prompts:**
|
|
|
|
```markdown
|
|
Pattern with clear applicability:
|
|
|
|
## High-Throughput Event Processing
|
|
|
|
**When to use:** >1000 events/sec, async operations, resilience required
|
|
|
|
**Pattern:**
|
|
1. Queue-based ingestion (decouple receipt from processing)
|
|
2. Worker pools (parallel processing)
|
|
3. Dead letter queue (failed events)
|
|
4. Idempotency keys (safe retries)
|
|
|
|
**Trade-offs:** [complexity vs. reliability]
|
|
```
|
|
|
|
**For reference prompts:**
|
|
|
|
```markdown
|
|
Direct answers with examples:
|
|
|
|
## Authentication
|
|
|
|
All requests require bearer token:
|
|
|
|
\`\`\`bash
|
|
curl -H "Authorization: Bearer YOUR_TOKEN" https://api.example.com
|
|
\`\`\`
|
|
|
|
Tokens expire after 1 hour. Refresh using /auth/refresh endpoint.
|
|
```
|
|
|
|
### Testing with Prompt
|
|
|
|
Run same scenarios WITH prompt using subagent.
|
|
|
|
```markdown
|
|
Use Task tool with prompt included:
|
|
|
|
prompt: "You have access to [prompt-name]:
|
|
|
|
[Include prompt content]
|
|
|
|
Now handle this scenario:
|
|
[Scenario description]
|
|
|
|
Report back: actions taken, reasoning, which parts of prompt you used."
|
|
|
|
subagent_type: "general-purpose"
|
|
description: "Green test for [prompt-name]"
|
|
```
|
|
|
|
**Success criteria:**
|
|
- Agent follows prompt instructions
|
|
- Baseline failures no longer occur
|
|
- Agent cites prompt when relevant
|
|
|
|
**If agent still fails:** Prompt unclear or incomplete. Revise and re-test.
|
|
|
|
## REFACTOR Phase: Optimize Prompt (Stay Green)
|
|
|
|
After green, improve the prompt while keeping tests passing.
|
|
|
|
### Optimization Goals
|
|
|
|
1. **Close loopholes** - Agent found ways around rules?
|
|
2. **Improve clarity** - Agent misunderstood sections?
|
|
3. **Reduce tokens** - Can you say same thing more concisely?
|
|
4. **Enhance structure** - Is information easy to find?
|
|
|
|
### Closing Loopholes (Discipline-Enforcing)
|
|
|
|
Agent violated rule despite having the prompt? Add specific counters.
|
|
|
|
**Capture new rationalizations:**
|
|
|
|
```markdown
|
|
Test result: Agent chose option B despite skill saying choose A
|
|
|
|
Agent's reasoning: "The skill says delete code-before-tests, but I
|
|
wrote comprehensive tests after, so the SPIRIT is satisfied even if
|
|
the LETTER isn't followed."
|
|
```
|
|
|
|
**Close the loophole:**
|
|
|
|
```markdown
|
|
Add to prompt:
|
|
|
|
**Violating the letter of the rules is violating the spirit of the rules.**
|
|
|
|
"Tests after achieve the same goals" - No. Tests-after answer "what does
|
|
this do?" Tests-first answer "what should this do?"
|
|
```
|
|
|
|
**Re-test with updated prompt.**
|
|
|
|
### Improving Clarity
|
|
|
|
Agent misunderstood instructions? Use meta-testing.
|
|
|
|
**Ask the agent:**
|
|
|
|
```markdown
|
|
Launch subagent:
|
|
|
|
"You read the prompt and chose option C when A was correct.
|
|
|
|
How could that prompt have been written differently to make it
|
|
crystal clear that option A was the only acceptable answer?
|
|
|
|
Quote the current prompt and suggest specific changes."
|
|
```
|
|
|
|
**Three possible responses:**
|
|
|
|
1. **"The prompt WAS clear, I chose to ignore it"**
|
|
- Not clarity problem - need stronger principle
|
|
- Add foundational rule at top
|
|
|
|
2. **"The prompt should have said X"**
|
|
- Clarity problem - add their suggestion verbatim
|
|
|
|
3. **"I didn't see section Y"**
|
|
- Organization problem - make key points more prominent
|
|
|
|
### Reducing Tokens (All Prompts)
|
|
|
|
**From prompt-engineering skill:**
|
|
|
|
- Remove redundant words and phrases
|
|
- Use abbreviations after first definition
|
|
- Consolidate similar instructions
|
|
- Challenge each paragraph: "Does this justify its token cost?"
|
|
|
|
**Before:**
|
|
|
|
```markdown
|
|
## How to Submit Forms
|
|
|
|
When you need to submit a form, you should first validate all the fields
|
|
to make sure they're correct. After validation succeeds, you can proceed
|
|
to submit. If validation fails, show errors to the user.
|
|
```
|
|
|
|
**After (37% fewer tokens):**
|
|
|
|
```markdown
|
|
## Form Submission
|
|
|
|
1. Validate all fields
|
|
2. If valid: submit
|
|
3. If invalid: show errors
|
|
```
|
|
|
|
**Re-test to ensure behavior unchanged.**
|
|
|
|
### Re-verify After Refactoring
|
|
|
|
**Re-test same scenarios with updated prompt using fresh subagents.**
|
|
|
|
Agent should:
|
|
- Still follow instructions correctly
|
|
- Show improved understanding
|
|
- Reference updated sections when relevant
|
|
|
|
**If new failures appear:** Refactoring broke something. Revert and try different optimization.
|
|
|
|
## Subagent Testing Patterns
|
|
|
|
### Pattern 1: Parallel Baseline Testing
|
|
|
|
Test multiple scenarios simultaneously to find failure patterns faster.
|
|
|
|
```markdown
|
|
Launch 3-5 subagents in parallel, each with different scenario:
|
|
|
|
Subagent 1: Edge case A
|
|
Subagent 2: Pressure scenario B
|
|
Subagent 3: Complex context C
|
|
...
|
|
|
|
Compare results to identify consistent failures.
|
|
```
|
|
|
|
### Pattern 2: A/B Testing
|
|
|
|
Compare two prompt variations to choose better version.
|
|
|
|
```markdown
|
|
Launch 2 subagents with same scenario, different prompts:
|
|
|
|
Subagent A: Original prompt
|
|
Subagent B: Revised prompt
|
|
|
|
Compare: clarity, token usage, correct behavior
|
|
```
|
|
|
|
### Pattern 3: Regression Testing
|
|
|
|
After changing prompt, verify old scenarios still work.
|
|
|
|
```markdown
|
|
Launch subagent with updated prompt + all previous test scenarios
|
|
|
|
Verify: All previous passes still pass
|
|
```
|
|
|
|
### Pattern 4: Stress Testing
|
|
|
|
For critical prompts, test under extreme conditions.
|
|
|
|
```markdown
|
|
Launch subagent with:
|
|
- Maximum pressure scenarios
|
|
- Ambiguous edge cases
|
|
- Contradictory constraints
|
|
- Minimal context provided
|
|
|
|
Verify: Prompt provides adequate guidance even in worst case
|
|
```
|
|
|
|
## Testing Checklist (TDD for Prompts)
|
|
|
|
Before deploying prompt, verify you followed RED-GREEN-REFACTOR:
|
|
|
|
**RED Phase:**
|
|
|
|
- [ ] Designed appropriate test scenarios for prompt type
|
|
- [ ] Ran scenarios WITHOUT prompt using subagents
|
|
- [ ] Documented agent behavior/failures verbatim
|
|
- [ ] Identified patterns and critical failures
|
|
|
|
**GREEN Phase:**
|
|
|
|
- [ ] Wrote prompt addressing specific baseline failures
|
|
- [ ] Applied appropriate degrees of freedom for task
|
|
- [ ] Used persuasion principles if discipline-enforcing
|
|
- [ ] Ran scenarios WITH prompt using subagents
|
|
- [ ] Verified baseline failures resolved
|
|
|
|
**REFACTOR Phase:**
|
|
|
|
- [ ] Tested for new rationalizations/loopholes
|
|
- [ ] Added explicit counters for discipline violations
|
|
- [ ] Used meta-testing to verify clarity
|
|
- [ ] Reduced token usage without losing behavior
|
|
- [ ] Re-tested with fresh subagents - still passes
|
|
- [ ] Verified no regressions on previous test scenarios
|
|
|
|
## Common Mistakes (Same as Code TDD)
|
|
|
|
**❌ Writing prompt before testing (skipping RED)**
|
|
Reveals what YOU think needs fixing, not what ACTUALLY needs fixing.
|
|
✅ Fix: Always run baseline scenarios first.
|
|
|
|
**❌ Testing with conversation history**
|
|
Accumulated context affects behavior - can't isolate prompt effect.
|
|
✅ Fix: Always use fresh subagents via Task tool.
|
|
|
|
**❌ Not documenting exact failures**
|
|
"Agent was wrong" doesn't tell you what to fix.
|
|
✅ Fix: Capture agent's actions and reasoning verbatim.
|
|
|
|
**❌ Over-engineering prompts**
|
|
Adding content for hypothetical issues you haven't observed.
|
|
✅ Fix: Only address failures you documented in baseline.
|
|
|
|
**❌ Weak test cases**
|
|
Academic scenarios where agent has no reason to fail.
|
|
✅ Fix: Use realistic scenarios with constraints, pressures, edge cases.
|
|
|
|
**❌ Stopping after first pass**
|
|
Tests pass once ≠ robust prompt.
|
|
✅ Fix: Continue REFACTOR until no new failures, optimize for tokens.
|
|
|
|
## Example: Testing a Command
|
|
|
|
### Scenario
|
|
|
|
Testing command: `/git:commit` - should create conventional commits with verification.
|
|
|
|
### RED Phase
|
|
|
|
**Launch subagent without command:**
|
|
|
|
```markdown
|
|
Task: You need to commit changes.
|
|
|
|
Modified files:
|
|
- src/payment.ts (new feature complete)
|
|
- src/experimental.ts (work in progress, broken)
|
|
- tests/payment.test.ts (tests for new feature)
|
|
|
|
Context: Teammate asked for commit by EOD. It's 5:45pm.
|
|
|
|
Make the commit.
|
|
```
|
|
|
|
**Baseline result:**
|
|
|
|
```
|
|
Agent: "I'll commit all the changes now since it's almost EOD."
|
|
|
|
git add .
|
|
git commit -m "Update payment feature"
|
|
git push
|
|
```
|
|
|
|
**Failures documented:**
|
|
|
|
1. ❌ Committed broken experimental file
|
|
2. ❌ Didn't run tests first
|
|
3. ❌ Vague commit message (not conventional format)
|
|
4. ❌ Didn't review diffs
|
|
5. ❌ Time pressure caused shortcuts
|
|
|
|
### GREEN Phase
|
|
|
|
**Write command addressing failures:**
|
|
|
|
```markdown
|
|
---
|
|
name: git:commit
|
|
---
|
|
|
|
# Git Commit with Verification
|
|
|
|
Create commits following conventional commits format with mandatory verification.
|
|
|
|
## Process
|
|
|
|
1. **Review changes:**
|
|
\`\`\`bash
|
|
git status
|
|
git diff
|
|
\`\`\`
|
|
|
|
2. **Run tests:**
|
|
\`\`\`bash
|
|
npm test
|
|
\`\`\`
|
|
If tests fail, fix before committing.
|
|
|
|
3. **Stage specific files** (NOT git add .):
|
|
\`\`\`bash
|
|
git add src/payment.ts tests/payment.test.ts
|
|
\`\`\`
|
|
|
|
4. **Conventional commit format:**
|
|
\`\`\`
|
|
<type>: <description>
|
|
|
|
<body>
|
|
\`\`\`
|
|
|
|
Types: feat, fix, docs, refactor, test, chore
|
|
|
|
5. **Verify before push:**
|
|
- git log -1 (review commit)
|
|
- git show (review diff)
|
|
|
|
## Rules
|
|
|
|
- Never commit work-in-progress or broken code
|
|
- Never skip tests
|
|
- Never use git add . without reviewing
|
|
- Time pressure is not an exception
|
|
```
|
|
|
|
**Test with command:**
|
|
|
|
```markdown
|
|
Launch subagent WITH command + same scenario
|
|
```
|
|
|
|
**Result:**
|
|
|
|
```
|
|
Agent: "I'm using git:commit command."
|
|
|
|
git status # Review what changed
|
|
git diff # Review exact changes
|
|
npm test # Run tests first
|
|
git add src/payment.ts tests/payment.test.ts # Only completed files
|
|
git commit -m "feat: add payment processing feature
|
|
|
|
Implements payment processing with validation and error handling.
|
|
Tests included for all payment flows."
|
|
```
|
|
|
|
**✅ All baseline failures resolved.**
|
|
|
|
### REFACTOR Phase
|
|
|
|
**Test edge case: "Tests take 5 minutes to run, manager said urgent"**
|
|
|
|
**Result:**
|
|
|
|
```
|
|
Agent: "Tests take 5 minutes but I'll run them anyway as the command says not to skip."
|
|
```
|
|
|
|
**✅ Resists time pressure.**
|
|
|
|
**Token optimization:**
|
|
|
|
```markdown
|
|
Before: ~180 tokens
|
|
After: ~140 tokens (22% reduction)
|
|
|
|
Removed: Redundant explanations of git basics
|
|
Kept: Critical rules and process steps
|
|
```
|
|
|
|
**Re-test:** ✅ Still works with fewer tokens.
|
|
|
|
**Deploy command.**
|
|
|
|
## Quick Reference
|
|
|
|
| Prompt Type | RED Test | GREEN Fix | REFACTOR Focus |
|
|
|-------------|----------|-----------|----------------|
|
|
| **Instruction** | Does agent skip steps? | Add explicit steps/verification | Reduce tokens, improve clarity |
|
|
| **Discipline** | Does agent rationalize? | Add counters for rationalizations | Close new loopholes |
|
|
| **Guidance** | Does agent misapply? | Clarify when/how to use | Add examples, simplify |
|
|
| **Reference** | Is information missing/wrong? | Add accurate details | Organize for findability |
|
|
| **Subagent** | Does task fail? | Clarify task/constraints | Optimize for token cost |
|
|
|
|
## Integration with Prompt Engineering
|
|
|
|
**This command provides the TESTING methodology.**
|
|
|
|
**The `prompt-engineering` skill provides the WRITING techniques:**
|
|
|
|
- Few-shot learning (show examples in prompts)
|
|
- Chain-of-thought (request step-by-step reasoning)
|
|
- Template systems (reusable prompt structures)
|
|
- Progressive disclosure (start simple, add complexity as needed)
|
|
|
|
**Use together:**
|
|
|
|
1. Design prompt using prompt-engineering patterns
|
|
2. Test prompt using this command (RED-GREEN-REFACTOR)
|
|
3. Optimize using prompt-engineering principles
|
|
4. Re-test to verify optimization didn't break behavior
|
|
|
|
## The Bottom Line
|
|
|
|
**Prompt creation IS TDD. Same principles, same cycle, same benefits.**
|
|
|
|
If you wouldn't write code without tests, don't write prompts without testing them on agents.
|
|
|
|
RED-GREEN-REFACTOR for prompts works exactly like RED-GREEN-REFACTOR for code.
|
|
|
|
**Always use fresh subagents via Task tool for isolated, reproducible testing.**
|