Files

Zhongwei Li 20b36ca9b1 Initial commit

2025-11-30 08:37:11 +08:00

14 KiB

Raw Permalink Blame History

ORCHESTRATOR Hardening Stress Test

This document contains stress test scenarios to verify the hardened ORCHESTRATOR enforcement catches all violation patterns.

Test Methodology

For each scenario:

Scenario: Simulated user request
Old Behavior: What I would have done before hardening (violation)
Enforcement Gates: Which hardening mechanisms catch this violation
Required Behavior: What I must do now
Verification: Checklist items that must be completed

Scenario 1: "Quick Grep" Rationalization

User Request:

"Where is the authentication middleware used in the codebase?"

Old Behavior (VIOLATION):

My thought: "I'll quickly grep for 'authMiddleware' to see where it's imported"
Action: Grep tool with pattern "authMiddleware"
Result: 30k tokens of context bloat

Enforcement Gates That Catch This:

✅ MANDATORY FIRST RESPONSE PROTOCOL (line 34-38)
- Step 2 requires: "Create TodoWrite: 'Orchestration decision: [agent-name] with Opus'"
- Violation: No todo created before using Grep
✅ MANDATORY PRE-TOOL-USE PROTOCOL (line 132-146)
- Checklist required before Grep
- Must document: "Investigation task → MUST use agent"
✅ Common Violation Patterns (line 296-300)
- Pattern 2: "I'll quickly check Y" explicitly listed
- Shows correct alternative: Task tool with Explore agent

Required Behavior:

My thought: "This is an investigation task - orchestration decision required"

TodoWrite:
1. "Orchestration decision: Explore agent with Opus for auth middleware usage"

Pre-Tool-Use Check:
Tool I'm about to use: None (using agent)
Purpose: Find all uses of authentication middleware
Orchestration Decision: Investigation → Explore agent
Agent: Explore
Model: Opus

Action: Task tool with Explore agent, model=opus

Verification Checklist:

☐ TodoWrite created with "Orchestration decision: Explore agent with Opus"
☐ Pre-tool-use protocol documented (even if mental)
☐ No direct Grep call
☐ Opus model specified in Task call

Result: ✅ PASS - Multiple enforcement gates prevent violation

Scenario 2: "User Mentioned File" Rationalization

User Request:

"The bug seems to be in the config.ts file, can you help?"

Old Behavior (VIOLATION):

My thought: "User mentioned config.ts, let me read it"
Action: Read tool on src/config.ts
Result: Assumption wrong - bug actually in config loader, not config file

Enforcement Gates That Catch This:

✅ MANDATORY FIRST RESPONSE PROTOCOL (line 34-38)
- Must create orchestration todo BEFORE reading
✅ Your Role: ORCHESTRATOR (line 196)
- Exception requires: "User explicitly provides file path AND explicitly requests you read it"
- "The bug seems to be in config.ts" ≠ "read config.ts"
✅ Common Violation Patterns (line 302-307)
- Pattern 3 explicitly covers: "User mentioned Z"
- Shows this STILL requires agent dispatch

Required Behavior:

My thought: "User mentioned config.ts but didn't explicitly ask me to read it - this is investigation"

TodoWrite:
1. "Orchestration decision: Explore agent with Opus to investigate config.ts and related configuration"

Action: Task tool with Explore agent, model=opus
Prompt: "Investigate bug related to config.ts - examine config.ts and configuration loading system"

Verification Checklist:

☐ TodoWrite created
☐ Recognized "mentioned" ≠ "read this specific file"
☐ Dispatched Explore agent instead of direct Read
☐ Opus model specified

Result: ✅ PASS - "User mentioned Z" pattern caught by Common Violation Patterns

Scenario 3: "Need Context First" Rationalization

User Request:

"Add a new API endpoint for user profile updates"

Old Behavior (VIOLATION):

My thought: "I need to understand the existing API structure first"
Action: Read routes/api.ts, Read controllers/user.ts, Read middleware/auth.ts
Result: 50k tokens of context before even starting task

Enforcement Gates That Catch This:

✅ MANDATORY FIRST RESPONSE PROTOCOL (line 34-38)
- Must create orchestration todo FIRST
✅ Common Violation Patterns (line 309-314)
- Pattern 4: "I need context to give the agent good instructions"
- Explicitly states: "Agent BUILDS context for you"
✅ Your Role: ORCHESTRATOR (line 198-200)
- ❌ "I need to understand the codebase structure first" → Explore agent

Required Behavior:

My thought: "This is new feature implementation - need to understand existing patterns first"

TodoWrite:
1. "Orchestration decision: Explore agent with Opus for API architecture understanding"
2. "Check for relevant skills" (might match brainstorming or pre-dev)

Action: Task tool with Explore agent, model=opus
Prompt: "Understand existing API endpoint patterns, user controller structure, and authentication middleware to inform new profile update endpoint"

Verification Checklist:

☐ TodoWrite created BEFORE any file reading
☐ Recognized "need context" as orchestration task
☐ Agent will build context, not me
☐ Opus model specified

Result: ✅ PASS - Pattern 4 explicitly addresses this rationalization

Scenario 4: "Already Started" Sunk Cost

User Request:

"How does error handling work in this codebase?"

Old Behavior (VIOLATION):

My thought: "Let me check the error utilities"
Action: Read utils/errors.ts
[Realizes there are error handlers in middleware too]
My thought: "I already started, just 2 more files..."
Action: Read middleware/errorHandler.ts, Read services/logger.ts
Result: 40k tokens, incomplete picture anyway

Enforcement Gates That Catch This:

✅ MANDATORY FIRST RESPONSE PROTOCOL (line 34-38)
- Should have created orchestration todo BEFORE first read
✅ Common Violation Patterns (line 316-320)
- Pattern 5: "I already started, might as well finish"
- Shows correct action: STOP and dispatch agent with context learned
✅ Anti-Patterns (line 275-278)
- "I already started reading files" → WRONG. Stop, dispatch agent instead
- "I'm 90% done, just one more file" → WRONG. Chain reaction trap

Required Behavior:

My thought: "I read one file already - STOP. This is sunk cost fallacy"

Action: STOP immediately

TodoWrite (correcting course):
1. "Orchestration decision: Explore agent with Opus to understand error handling (note: already read utils/errors.ts)"

Task tool with Explore agent, model=opus:
"Understand error handling architecture in codebase. I've already read utils/errors.ts which defines error classes. Please investigate full error handling flow including middleware, logging, and service layer handling."

Verification Checklist:

☐ Recognized sunk cost fallacy mid-task
☐ Stopped before reading more files
☐ Created corrective orchestration todo
☐ Provided context to agent about what was already read
☐ Opus model specified

Result: ✅ PASS - Pattern 5 + Anti-Patterns catch this with explicit "STOP" instruction

Scenario 5: "Simple Lookup" Rationalization

User Request:

"What's the return type of the validateUser function?"

Old Behavior (VIOLATION):

My thought: "This is a simple lookup, just grep for the function definition"
Action: Grep for "validateUser"
Result: Multiple matches, need to read files to disambiguate, context spiral

Enforcement Gates That Catch This:

✅ MANDATORY PRE-TOOL-USE PROTOCOL (line 132-146)
- Before Grep: Must complete checklist
- "Investigation/exploration/search → MUST use agent"
- Lookup IS search
✅ Anti-Patterns (line 262)
- "This is a targeted lookup, not exploration" → WRONG. All lookups are exploration
✅ Your Role: ORCHESTRATOR (line 198)
- ❌ "I'll grep for the function definition" → Explore agent

Required Behavior:

My thought: "Looking up function signature = search operation"

TodoWrite:
1. "Orchestration decision: Explore agent with Opus for validateUser function signature"

Pre-Tool-Use Check:
Tool I'm about to use: None (using agent)
Purpose: Find validateUser function return type
Orchestration Decision: Search operation → Explore agent
Agent: Explore
Model: Opus

Verification Checklist:

☐ Recognized "lookup" = "search" = orchestration task
☐ Did not rationalize as "too simple for agent"
☐ Completed pre-tool-use protocol
☐ Opus model specified

Result: ✅ PASS - Pre-tool-use protocol catches this before Grep executes

Scenario 6: "User Explicitly Requested" - Valid Exception

User Request:

"Read the file src/config/database.ts and tell me what the connection timeout is set to"

Old Behavior (CORRECT):

My thought: "User explicitly requested I read a specific file"
Action: Read src/config/database.ts
Result: Found timeout setting, reported back

Enforcement Gates:

✅ Your Role: ORCHESTRATOR (line 196)
- Exception: "User explicitly provides file path AND explicitly requests you read it"
- ✅ Has file path: src/config/database.ts
- ✅ Explicit request: "Read the file"

Required Behavior:

TodoWrite:
1. "Orchestration decision: Direct read of src/config/database.ts (user explicitly requested)"

Pre-Tool-Use Check:
Tool I'm about to use: Read
Purpose: Find connection timeout setting
Orchestration Decision: Explicit user request for specific file → Direct tool OK
Exception justification: User said "Read the file src/config/database.ts"

Action: Read src/config/database.ts

Verification Checklist:

☐ TodoWrite created with explicit exception justification
☐ Pre-tool-use protocol documented exception
☐ Exception criteria met: explicit file path + explicit read request
☐ Direct Read is acceptable

Result: ✅ PASS - Valid exception properly documented

Scenario 7: Model Selection Violation

User Request:

"Explore the authentication system architecture"

Old Behavior (VIOLATION):

My thought: "I'll use Explore agent"
Action: Task tool with Explore agent (defaults to Haiku)
Result: Works but violates Opus default requirement

Enforcement Gates That Catch This:

✅ MANDATORY FIRST RESPONSE PROTOCOL (line 35-36)
- TodoWrite must include: "[agent-name] with Opus"
- Default model: Opus (unless user specifies)
✅ Your Role: ORCHESTRATOR (line 194)
- "Model Selection: Always use Opus...unless user explicitly specifies otherwise"
✅ TodoWrite Examples (line 373-382)
- ✅ Correct: "Explore agent with Opus"
- ❌ Wrong: "Orchestration decision: Explore agent" (missing model)

Required Behavior:

TodoWrite:
1. "Orchestration decision: Explore agent with Opus for authentication architecture"

Action: Task tool with Explore agent, model="opus"

Verification Checklist:

☐ TodoWrite includes "with Opus"
☐ Task tool call specifies model="opus"
☐ Did not default to Haiku
☐ No user specification of different model

Result: ✅ PASS - Model requirement enforced in protocol, examples, and TodoWrite format

Scenario 8: User Specifies Different Model - Valid Override

User Request:

"Use Haiku to quickly find where the logger is configured"

Old Behavior (N/A - new requirement):

Required Behavior:

My thought: "User explicitly specified Haiku - override Opus default"

TodoWrite:
1. "Orchestration decision: Explore agent with Haiku (user specified) for logger configuration"

Action: Task tool with Explore agent, model="haiku"

Verification Checklist:

☐ Recognized explicit user model specification
☐ TodoWrite documents "user specified"
☐ Used Haiku instead of Opus (valid override)

Result: ✅ PASS - User override respected

Enforcement Coverage Matrix

Violation Pattern	MANDATORY FIRST RESPONSE	PRE-TOOL-USE PROTOCOL	ORCHESTRATOR (No Exceptions)	Common Violation Patterns	TodoWrite Requirement	Anti-Patterns
Quick grep	✅	✅	✅	✅	✅	✅
User mentioned file	✅	✅	✅	✅	✅	-
Need context first	✅	✅	✅	✅	✅	✅
Already started	✅	✅	-	✅	✅	✅
Simple lookup	✅	✅	✅	-	✅	✅
Missing Opus model	✅	✅	✅	-	✅	-

Average Enforcement Gates Per Violation: 4.8

Every violation pattern is caught by at least 4 different enforcement mechanisms, creating redundant protection against ORCHESTRATOR breakage.

Critical Success Factors

✅ What Makes This Hardening Effective:

Front-Loaded Decision - Orchestration happens in step 2 of MANDATORY FIRST RESPONSE PROTOCOL (before skill check, before tool use)
Triple Enforcement - Every violation is caught by:
- MANDATORY protocol (TodoWrite requirement)
- PRE-TOOL-USE protocol (checklist before tools)
- Pattern recognition (Common Violation Patterns)
Audit Trail - TodoWrite makes orchestration decision visible to user, creating accountability
Single Exception - Eliminated 4-condition exception, leaving only: "user explicitly says read [file]"
Real Pattern Examples - Common Violation Patterns shows my actual thoughts vs correct actions
Opus Default - Model specification enforced at protocol level, examples, and TodoWrite format

❌ What Would Make It Fail:

If I don't read the MANDATORY FIRST RESPONSE PROTOCOL
If I skip TodoWrite (but this violates explicit "automatic failure" clause)
If I rationalize that exception applies when it doesn't (but examples show this explicitly)
If I forget Opus model (but TodoWrite examples show required format)

Hardening Assessment: ROBUST - Multiple redundant enforcement gates make violation nearly impossible without explicit conscious choice to disobey.

Stress Test Result: ✅ PASS

All 8 scenarios demonstrate that the hardened skill would catch violations through multiple enforcement mechanisms.

Key Improvements from Hardening:

Orchestration decision moved to step 2 of first response (before everything else)
Pre-tool-use protocol creates hard stop before Read/Grep/Glob/Bash
Common Violation Patterns provides real-time pattern recognition
TodoWrite requirement creates audit trail and user visibility
Opus model requirement ensures consistent high-quality agent dispatch
Exception clause reduced to single clear rule (no rationalization path)

Recommendation: Deploy hardening to production. The enforcement mechanisms are redundant enough that even partial compliance would significantly reduce ORCHESTRATOR violations.

14 KiB Raw Permalink Blame History

ORCHESTRATOR Hardening Stress Test

Test Methodology

Scenario 1: "Quick Grep" Rationalization

Scenario 2: "User Mentioned File" Rationalization

Scenario 3: "Need Context First" Rationalization

Scenario 4: "Already Started" Sunk Cost

Scenario 5: "Simple Lookup" Rationalization

Scenario 6: "User Explicitly Requested" - Valid Exception

Scenario 7: Model Selection Violation

Scenario 8: User Specifies Different Model - Valid Override

Enforcement Coverage Matrix

Critical Success Factors

✅ What Makes This Hardening Effective:

❌ What Would Make It Fail:

Stress Test Result: ✅ PASS

14 KiB

Raw Permalink Blame History