Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions

View File

@@ -0,0 +1,253 @@
---
name: deliberation-debate-red-teaming
description: Use when testing plans or decisions for blind spots, need adversarial review before launch, validating strategy against worst-case scenarios, building consensus through structured debate, identifying attack vectors or vulnerabilities, user mentions "play devil's advocate", "what could go wrong", "challenge our assumptions", "stress test this", "red team", or when groupthink or confirmation bias may be hiding risks.
---
# Deliberation, Debate & Red Teaming
## What Is It?
Deliberation-debate-red-teaming is a structured adversarial process where you intentionally challenge plans, designs, or decisions from multiple critical perspectives to surface blind spots, hidden assumptions, and vulnerabilities before they cause real damage.
**Quick example:**
**Proposal:** "Launch new feature to all users next week"
**Red team critiques:**
- **Security:** "No penetration testing done, could expose user data"
- **Operations:** "No runbook for rollback, deployment on Friday risks weekend outage"
- **Customer:** "Feature breaks existing workflow for power users (20% of revenue)"
- **Legal:** "GDPR consent flow unclear, could trigger regulatory investigation"
**Result:** Delay launch 2 weeks, address security/legal/ops gaps, add feature flag for gradual rollout
## Workflow
Copy this checklist and track your progress:
```
Deliberation & Red Teaming Progress:
- [ ] Step 1: Define the proposal and stakes
- [ ] Step 2: Assign adversarial roles
- [ ] Step 3: Generate critiques and challenges
- [ ] Step 4: Synthesize findings and prioritize risks
- [ ] Step 5: Recommend mitigations and revisions
```
**Step 1: Define the proposal and stakes**
Ask user for the plan/decision to evaluate (specific proposal, not vague idea), stakes (what happens if this fails), current confidence level (how certain are they), and deadline (when must decision be made). Understanding stakes helps calibrate critique intensity. See [Scoping Questions](#scoping-questions).
**Step 2: Assign adversarial roles**
Identify critical perspectives that could expose blind spots. Choose 3-5 roles based on proposal type (security, legal, operations, customer, competitor, etc.). Each role has different incentives and concerns. See [Adversarial Role Types](#adversarial-role-types) and [resources/template.md](resources/template.md) for role assignment guidance.
**Step 3: Generate critiques and challenges**
For each role, generate specific critiques: What could go wrong? What assumptions are questionable? What edge cases break this? Be adversarial but realistic (steelman, not strawman arguments). For advanced critique techniques → See [resources/methodology.md](resources/methodology.md) for red team attack patterns.
**Step 4: Synthesize findings and prioritize risks**
Collect all critiques, identify themes (security gaps, operational risks, customer impact, etc.), assess severity and likelihood for each risk. Distinguish between showstoppers (must fix) and acceptable risks (monitor/mitigate). See [Risk Prioritization](#risk-prioritization).
**Step 5: Recommend mitigations and revisions**
For each critical risk, propose concrete mitigation (change the plan, add safeguards, gather more data, or accept risk with monitoring). Present revised proposal incorporating fixes. See [Mitigation Patterns](#mitigation-patterns) for common approaches.
## Scoping Questions
**To define the proposal:**
- What exactly are we evaluating? (Be specific: "launch feature X to cohort Y on date Z")
- What's the goal? (Why do this?)
- Who made this proposal? (Understanding bias helps)
**To understand stakes:**
- What happens if this succeeds? (Upside)
- What happens if this fails? (Downside, worst case)
- Is this reversible? (Can we roll back if wrong?)
- What's the cost of delay? (Opportunity cost of waiting)
**To calibrate critique:**
- How confident is the team? (0-100%)
- What analysis has been done already?
- What concerns have been raised internally?
- When do we need to decide? (Time pressure affects rigor)
## Adversarial Role Types
Choose 3-5 roles that are most likely to expose blind spots for this specific proposal:
### External Adversary Roles
**Competitor:**
- "How would our competitor exploit this decision?"
- "What gives them an opening in the market?"
- Useful for: Strategy, product launches, pricing decisions
**Malicious Actor (Security):**
- "How would an attacker compromise this?"
- "What's the weakest link in the chain?"
- Useful for: Security architecture, data handling, access controls
**Regulator/Auditor:**
- "Does this violate any laws, regulations, or compliance requirements?"
- "What documentation is missing for audit trail?"
- Useful for: Privacy, financial, healthcare, legal matters
**Investigative Journalist:**
- "What looks bad if this becomes public?"
- "What are we hiding or not disclosing?"
- Useful for: PR-sensitive decisions, ethics, transparency
### Internal Stakeholder Roles
**Operations/SRE:**
- "Will this break production? Can we maintain it?"
- "What's the runbook for when this fails at 2am?"
- Useful for: Technical changes, deployments, infrastructure
**Customer/User:**
- "Does this actually solve my problem or create new friction?"
- "Am I being asked to change behavior? Why should I?"
- Useful for: Product features, UX changes, pricing
**Finance/Budget:**
- "What are the hidden costs? TCO over 3 years?"
- "Is ROI realistic or based on optimistic assumptions?"
- Useful for: Investments, vendor selection, resource allocation
**Legal/Compliance:**
- "What liability does this create?"
- "Are contracts/terms clear? What disputes could arise?"
- Useful for: Partnerships, licensing, data usage
**Engineering/Technical:**
- "Is this technically feasible? What's the technical debt?"
- "What are we underestimating in complexity?"
- Useful for: Architecture decisions, technology choices, timelines
### Devil's Advocate Roles
**Pessimist:**
- "What's the worst-case scenario?"
- "Murphy's Law: What can go wrong will go wrong"
- Useful for: Risk assessment, contingency planning
**Contrarian:**
- "What if the opposite is true?"
- "Challenge every assumption: What if market research is wrong?"
- Useful for: Validating assumptions, testing consensus
**Long-term Thinker:**
- "What are second-order effects in 1-3 years?"
- "Are we solving today's problem and creating tomorrow's crisis?"
- Useful for: Strategic decisions, architectural choices
## Risk Prioritization
After generating critiques, prioritize by severity and likelihood:
### Severity Scale
**Critical (5):** Catastrophic failure (data breach, regulatory fine, business shutdown)
**High (4):** Major damage (significant revenue loss, customer exodus, reputation hit)
**Medium (3):** Moderate impact (delays, budget overrun, customer complaints)
**Low (2):** Minor inconvenience (edge case bugs, small inefficiency)
**Trivial (1):** Negligible (cosmetic issues, minor UX friction)
### Likelihood Scale
**Very Likely (5):** >80% chance if we proceed
**Likely (4):** 50-80% chance
**Possible (3):** 20-50% chance
**Unlikely (2):** 5-20% chance
**Rare (1):** <5% chance
### Risk Score = Severity × Likelihood
**Showstoppers (score ≥ 15):** Must address before proceeding
**High Priority (score 10-14):** Should address, or have strong mitigation plan
**Monitor (score 5-9):** Accept risk but have contingency
**Accept (score < 5):** Acknowledge and move on
### Risk Matrix
| Severity ↓ / Likelihood → | Rare (1) | Unlikely (2) | Possible (3) | Likely (4) | Very Likely (5) |
|---------------------------|----------|--------------|--------------|------------|-----------------|
| **Critical (5)** | 5 (Monitor) | 10 (High Priority) | 15 (SHOWSTOPPER) | 20 (SHOWSTOPPER) | 25 (SHOWSTOPPER) |
| **High (4)** | 4 (Accept) | 8 (Monitor) | 12 (High Priority) | 16 (SHOWSTOPPER) | 20 (SHOWSTOPPER) |
| **Medium (3)** | 3 (Accept) | 6 (Monitor) | 9 (Monitor) | 12 (High Priority) | 15 (SHOWSTOPPER) |
| **Low (2)** | 2 (Accept) | 4 (Accept) | 6 (Monitor) | 8 (Monitor) | 10 (High Priority) |
| **Trivial (1)** | 1 (Accept) | 2 (Accept) | 3 (Accept) | 4 (Accept) | 5 (Monitor) |
## Mitigation Patterns
For each identified risk, choose mitigation approach:
**1. Revise the Proposal (Change Plan)**
- Fix the flaw in design/approach
- Example: Security risk → Add authentication layer before launch
**2. Add Safeguards (Reduce Likelihood)**
- Implement controls to prevent risk
- Example: Operations risk → Add automated rollback, feature flags
**3. Reduce Blast Radius (Reduce Severity)**
- Limit scope or impact if failure occurs
- Example: Customer risk → Gradual rollout to 5% of users first
**4. Contingency Planning (Prepare for Failure)**
- Have plan B ready
- Example: Vendor risk → Identify backup supplier in advance
**5. Gather More Data (Reduce Uncertainty)**
- Research, prototype, or test before committing
- Example: Assumption risk → Run A/B test to validate hypothesis
**6. Accept and Monitor (Informed Risk)**
- Acknowledge risk, set up alerts/metrics to detect if it manifests
- Example: Low-probability edge case → Monitor error rates, have fix ready
**7. Delay/Cancel (Avoid Risk Entirely)**
- If risk is too high and can't be mitigated, don't proceed
- Example: Showstopper legal risk → Delay until legal review complete
## When NOT to Use This Skill
**Skip red teaming if:**
- Decision is trivial/low-stakes (not worth the overhead)
- Time-critical emergency (no time for deliberation, must act now)
- Already thoroughly vetted (extensive prior review, red team would be redundant)
- No reasonable alternatives (one viable path, red team can't change outcome)
- Pure research/exploration (not committing to anything, failure is cheap)
**Use instead:**
- Trivial decision → Just decide, move on
- Emergency → Act immediately, retrospective later
- Already vetted → Proceed with monitoring
- No alternatives → Focus on execution planning
## Quick Reference
**Process:**
1. Define proposal and stakes → Set scope
2. Assign adversarial roles → Choose 3-5 critical perspectives
3. Generate critiques → What could go wrong from each role?
4. Prioritize risks → Severity × Likelihood matrix
5. Recommend mitigations → Revise, safeguard, contingency, accept, or cancel
**Common adversarial roles:**
- Competitor, Malicious Actor, Regulator, Operations, Customer, Finance, Legal, Engineer, Pessimist, Contrarian, Long-term Thinker
**Risk prioritization:**
- Showstoppers (≥15): Must fix
- High Priority (10-14): Should address
- Monitor (5-9): Accept with contingency
- Accept (<5): Acknowledge
**Resources:**
- [resources/template.md](resources/template.md) - Structured red team process with role templates
- [resources/methodology.md](resources/methodology.md) - Advanced techniques (attack trees, pre-mortem, wargaming)
- [resources/evaluators/rubric_deliberation_debate_red_teaming.json](resources/evaluators/rubric_deliberation_debate_red_teaming.json) - Quality checklist
**Deliverable:** `deliberation-debate-red-teaming.md` with critiques, risk assessment, and mitigation recommendations

View File

@@ -0,0 +1,248 @@
{
"criteria": [
{
"name": "Proposal Definition & Context",
"description": "Is the proposal clearly defined with stakes, constraints, and decision timeline?",
"scoring": {
"1": "Vague proposal definition. Stakes unclear or not documented. No timeline, no decision-maker identified. Insufficient context for meaningful red team.",
"3": "Proposal defined but lacks specificity. Stakes mentioned but not quantified. Timeline present but may be unrealistic. Some context gaps.",
"5": "Exemplary proposal definition. Specific, actionable proposal with clear scope. Stakes quantified (upside/downside/reversibility). Timeline realistic with decision-maker identified. Constraints documented. Sufficient context for adversarial analysis."
}
},
{
"name": "Adversarial Role Selection",
"description": "Are adversarial roles appropriately chosen to expose blind spots for this specific proposal?",
"scoring": {
"1": "Generic roles not tailored to proposal (e.g., same 3 roles for every analysis). Missing critical perspectives. Roles have overlapping concerns, no diversity of viewpoint.",
"3": "Roles are reasonable but may not be optimal for proposal type. 3-5 roles selected. Some critical perspectives may be missing. Roles have some overlap but provide different angles.",
"5": "Optimal role selection. 3-5 roles specifically chosen to expose blind spots for this proposal. All critical perspectives covered (security, ops, customer, legal, etc. as appropriate). Roles have distinct, non-overlapping concerns. Rationale provided for role choices."
}
},
{
"name": "Critique Quality & Depth",
"description": "Are critiques specific, realistic (steelman not strawman), and adversarial?",
"scoring": {
"1": "Critiques are vague, strawman arguments, or superficial. No specific failure modes identified. Critiques confirm bias rather than challenge assumptions. Groupthink evident.",
"3": "Critiques are specific to components but may lack depth. Some realistic challenges raised. Mix of steelman and strawman arguments. Adversarial tone present but may not fully expose vulnerabilities.",
"5": "Exemplary critique quality. Specific, realistic failure modes identified. Steelman arguments (strongest version of critique). Genuinely adversarial (surfaces blind spots, challenges assumptions). For each role: what could go wrong, questionable assumptions, missing elements, stress scenarios addressed."
}
},
{
"name": "Risk Assessment Rigor",
"description": "Are risks assessed with severity, likelihood, and scoring methodology? Is prioritization data-driven?",
"scoring": {
"1": "No risk scoring. All risks treated as equal priority. Severity/likelihood not assessed or purely subjective with no rationale.",
"3": "Risks scored with severity and likelihood but ratings may be subjective or inconsistent. Risk score calculated (S×L). Some prioritization present but may not be rigorous. Showstoppers identified.",
"5": "Rigorous risk assessment. All risks scored with severity (1-5) and likelihood (1-5) with clear rationale. Risk score (S×L) calculated correctly. Prioritization into categories (Showstopper ≥15, High Priority 10-14, Monitor 5-9, Accept <5). Showstoppers clearly flagged and justified."
}
},
{
"name": "Mitigation Appropriateness",
"description": "Are mitigations specific, actionable, and matched to risk severity?",
"scoring": {
"1": "Vague mitigations ('be careful', 'think about it'). No concrete actions. Mitigations don't address root cause. Showstoppers not mitigated.",
"3": "Mitigations are specific but may lack implementation detail. Mitigation strategy stated (revise, safeguard, accept, delay). Showstoppers addressed but plan may be incomplete. Some mitigations may not match risk severity.",
"5": "Exemplary mitigations. Specific, actionable recommendations with clear strategy (revise proposal, add safeguards, reduce scope, contingency plan, gather data, accept with monitoring, delay/cancel). All showstoppers have concrete mitigation with owner and deadline. Mitigations proportional to risk severity. Expected impact quantified where possible."
}
},
{
"name": "Argumentation Validity (Steelman vs Strawman)",
"description": "Are critiques evaluated for validity using structured argumentation? Are strawman arguments identified and filtered?",
"scoring": {
"1": "No evaluation of critique validity. Strawman arguments accepted as valid. Weak critiques (no data, illogical warrants) treated same as strong critiques.",
"3": "Some attempt to evaluate critique validity. Toulmin model components mentioned (claim, data, warrant) but not systematically applied. Some strawman arguments filtered but others may remain.",
"5": "Rigorous argumentation analysis. Toulmin model applied (claim, data, warrant, backing, qualifier, rebuttal). Strong critiques clearly distinguished from strawman arguments. Rebuttals structured (accept, refine, reject with counter-evidence). Only valid critiques inform final recommendations."
}
},
{
"name": "Facilitation Quality (if group exercise)",
"description": "If red team involved facilitated session, was defensiveness managed, intensity calibrated, and airtime balanced?",
"scoring": {
"1": "Facilitation absent or poor. Defensive responses shut down critique. HiPPO effect (highest-paid person dominates). Hostile tone or groupthink prevails. No synthesis or structure.",
"3": "Facilitation present but uneven. Some defensive responses managed. Intensity mostly appropriate but may be too soft or too aggressive. Airtime somewhat balanced. Some synthesis of findings.",
"5": "Exemplary facilitation. Defensive responses skillfully redirected (Socratic questioning, 'Yes and...'). Adversarial intensity calibrated to team culture (escalating approach). Airtime balanced (quiet voices heard). Strategic use of silence, parking lot, synthesis. Session stayed focused and constructive. If async (Delphi), anonymization and iteration done correctly."
}
},
{
"name": "Consensus Building",
"description": "If multi-stakeholder, was alignment achieved on showstoppers and mitigations? Were disagreements documented?",
"scoring": {
"1": "No consensus process. Stakeholder concerns ignored. Disagreements not documented. Decision appears one-sided.",
"3": "Some consensus process. Stakeholder concerns acknowledged. Agreement on some showstoppers but others may be unresolved. Dissent mentioned but may not be fully documented.",
"5": "Robust consensus building. All stakeholder perspectives acknowledged. Shared goals identified. Showstoppers negotiated to consensus (or decision-maker adjudicated with rationale). Remaining disagreements explicitly documented with stakeholder positions. Transparent process."
}
},
{
"name": "Assumption & Rebuttal Documentation",
"description": "Are assumptions surfaced and tested? Are rebuttals to critiques documented?",
"scoring": {
"1": "Assumptions implicit, not documented. Rebuttals absent (all critiques accepted without question) or dismissive ('you're wrong').",
"3": "Some assumptions surfaced. Rebuttals present for some critiques but may lack counter-evidence. Assumption testing may be superficial.",
"5": "Comprehensive assumption surfacing. For each key claim: 'What must be true for this to work?' Assumptions tested (validated or flagged as unvalidated). Rebuttals structured with counter-data and counter-warrant for rejected critiques. Accepted critiques acknowledged and added to mitigation plan."
}
},
{
"name": "Communication & Deliverable Quality",
"description": "Is red team analysis clear, structured, and actionable? Does deliverable enable decision-making?",
"scoring": {
"1": "Deliverable is confusing, unstructured, or missing. Findings are unclear. No clear recommendation (proceed/delay/cancel). Decision-maker cannot act on analysis.",
"3": "Deliverable is present and somewhat structured. Findings are documented but may require effort to understand. Recommendation present but may lack justification. Sufficient for decision-making with clarification.",
"5": "Exemplary deliverable. Clear structure (proposal definition, roles, critiques, risk assessment, mitigations, revised proposal, recommendation). Findings are evidence-based and understandable. Recommendation explicit (Proceed / Proceed with caution / Delay / Cancel) with clear rationale. Decision-maker can confidently act on analysis. Appropriate level of detail for audience."
}
}
],
"minimum_score": 3.5,
"guidance_by_proposal_type": {
"Product/Feature Launch": {
"target_score": 4.0,
"focus_criteria": [
"Adversarial Role Selection",
"Critique Quality & Depth",
"Risk Assessment Rigor"
],
"recommended_roles": [
"Customer/User (adoption, friction)",
"Operations (reliability, maintenance)",
"Competitor (market response)",
"Legal/Privacy (compliance)"
],
"common_pitfalls": [
"Missing customer adoption risk (assuming if you build it, they will come)",
"Underestimating operational burden post-launch",
"Not considering competitive response or timing"
]
},
"Technical/Architecture Decision": {
"target_score": 4.2,
"focus_criteria": [
"Adversarial Role Selection",
"Critique Quality & Depth",
"Mitigation Appropriateness"
],
"recommended_roles": [
"Security (attack vectors)",
"Operations (operability, debugging)",
"Engineer (technical debt, complexity)",
"Long-term Thinker (future costs, scalability)"
],
"common_pitfalls": [
"Missing security implications of architectural choice",
"Not considering operational complexity (monitoring, debugging, incident response)",
"Underestimating technical debt accumulation"
]
},
"Strategy/Business Decision": {
"target_score": 4.0,
"focus_criteria": [
"Critique Quality & Depth",
"Risk Assessment Rigor",
"Assumption & Rebuttal Documentation"
],
"recommended_roles": [
"Competitor (market response)",
"Finance (ROI assumptions, hidden costs)",
"Contrarian (challenge core assumptions)",
"Long-term Thinker (second-order effects)"
],
"common_pitfalls": [
"Optimistic ROI projections not stress-tested",
"Missing second-order effects (what happens after the obvious first-order outcome)",
"Not questioning core assumptions about market or customer behavior"
]
},
"Policy/Process Change": {
"target_score": 3.8,
"focus_criteria": [
"Adversarial Role Selection",
"Consensus Building",
"Communication & Deliverable Quality"
],
"recommended_roles": [
"Affected User (workflow disruption)",
"Operations (enforcement burden)",
"Legal/Compliance (regulatory risk)",
"Investigative Journalist (PR risk)"
],
"common_pitfalls": [
"Not considering impact on affected users' workflows",
"Assuming policy will be followed without enforcement mechanism",
"Missing PR or reputational risk if policy becomes public"
]
},
"Security/Safety Initiative": {
"target_score": 4.5,
"focus_criteria": [
"Critique Quality & Depth",
"Risk Assessment Rigor",
"Argumentation Validity (Steelman vs Strawman)"
],
"recommended_roles": [
"Malicious Actor (attack vectors)",
"Operations (false positives, operational burden)",
"Affected User (friction, usability)",
"Regulator (compliance)"
],
"common_pitfalls": [
"Strawman attacks (unrealistic threat models)",
"Missing usability/friction trade-offs (security so burdensome it's bypassed)",
"Not considering adversarial adaptation (attacker evolves to bypass control)"
]
}
},
"guidance_by_complexity": {
"Simple (Low stakes, reversible, single stakeholder, clear success criteria)": {
"target_score": 3.5,
"sufficient_depth": "3 adversarial roles. Basic critique framework (what could go wrong, questionable assumptions). Risk assessment with severity/likelihood. Mitigation for showstoppers only. Lightweight process (template.md sufficient)."
},
"Moderate (Medium stakes, partially reversible, multiple stakeholders, some uncertainty)": {
"target_score": 3.8,
"sufficient_depth": "4-5 adversarial roles covering key perspectives. Comprehensive critique (failure modes, assumptions, gaps, stress scenarios). Risk assessment with prioritization matrix. Mitigations for showstoppers and high-priority risks. Structured argumentation to filter strawman. Stakeholder alignment on key risks."
},
"Complex (High stakes, irreversible or costly to reverse, many stakeholders, high uncertainty, strategic importance)": {
"target_score": 4.2,
"sufficient_depth": "5+ adversarial roles, may use advanced techniques (pre-mortem, wargaming, tabletop). Rigorous critique with multiple rounds. Full risk assessment with sensitivity analysis. Attack trees or FMEA if security/safety critical. Toulmin model for all critiques. Facilitated session with calibrated intensity. Consensus building process with documented dissent. Comprehensive deliverable with revised proposal."
}
},
"common_failure_modes": {
"1. Red Team as Rubber Stamp": {
"symptom": "Critiques are superficial, confirm existing bias, no real challenges raised. Groupthink persists.",
"detection": "All critiques are minor. No showstoppers identified. Proposal unchanged after red team.",
"fix": "Choose truly adversarial roles. Use pre-mortem or attack trees to force critical thinking. Bring in external red team if internal team too aligned. Calibrate intensity upward."
},
"2. Strawman Arguments Dominate": {
"symptom": "Critiques are unrealistic, extreme scenarios, or not applicable to proposal.",
"detection": "Proposer easily dismisses all critiques as 'not realistic'. No valid concerns surfaced.",
"fix": "Apply Toulmin model. Require data and backing for each critique. Filter out critiques with no evidence or illogical warrants. Focus on steelman (strongest version of argument)."
},
"3. Defensive Shutdown": {
"symptom": "Team rejects all critiques, hostility emerges, session becomes unproductive.",
"detection": "Eye-rolling, dismissive language ('we already thought of that'), personal attacks, participants disengage.",
"fix": "Recalibrate adversarial intensity downward. Use 'Yes, and...' framing. Reaffirm purpose (improve proposal, not attack people). Facilitator redirects defensive responses with Socratic questioning."
},
"4. Analysis Paralysis": {
"symptom": "Red team drags on for weeks, endless critique rounds, no decision.",
"detection": "Stakeholders losing patience. 'When will we decide?' Critique list grows but no prioritization.",
"fix": "Time-box red team (half-day to 1-day max). Focus on showstoppers only. Use risk matrix to prioritize. Decision-maker sets cutoff: 'We decide by [date] with info we have.'"
},
"5. Missing Critical Perspectives": {
"symptom": "Red team identifies some risks but misses key blind spot that later causes failure.",
"detection": "Post-failure retrospective reveals 'we should have had [role X] in red team.'",
"fix": "Systematically identify who has most to lose from proposal. Include those roles. Use role selection guide (template.md) to avoid generic role choices."
},
"6. No Follow-Through on Mitigations": {
"symptom": "Great red team analysis, mitigations documented, but none implemented.",
"detection": "Months later, proposal proceeds unchanged. Mitigations never actioned.",
"fix": "Assign owner and deadline to each showstopper mitigation. Track in project plan. Decision-maker blocks proceed decision until showstoppers addressed. Document acceptance if risk knowingly taken."
},
"7. Vague Risk Assessment": {
"symptom": "Risks identified but not scored or prioritized. 'Everything is high risk.'",
"detection": "Cannot answer 'which risks are most critical?' All risks treated equally.",
"fix": "Apply severity × likelihood matrix rigorously. Score all risks 1-5 on both dimensions. Calculate risk score. Categorize (Showstopper/High/Monitor/Accept). Focus mitigation effort on highest scores."
},
"8. HiPPO Effect (Highest-Paid Person's Opinion)": {
"symptom": "Senior leader's opinion dominates, other perspectives suppressed.",
"detection": "All critiques align with leader's known position. Dissenting views not voiced.",
"fix": "Use anonymous brainstorming (pre-mortem, written critiques before discussion). Delphi method for async consensus. Facilitator explicitly solicits quiet voices. Leader speaks last, not first."
}
}
}

View File

@@ -0,0 +1,330 @@
# Deliberation, Debate & Red Teaming: Advanced Methodology
## Workflow
Copy this checklist for advanced red team scenarios:
```
Advanced Red Teaming Progress:
- [ ] Step 1: Select appropriate red team technique
- [ ] Step 2: Design adversarial simulation or exercise
- [ ] Step 3: Facilitate session and capture critiques
- [ ] Step 4: Synthesize findings with structured argumentation
- [ ] Step 5: Build consensus on mitigations
```
**Step 1: Select appropriate red team technique** - Match technique to proposal complexity and stakes. See [Technique Selection](#technique-selection).
**Step 2: Design adversarial simulation** - Structure attack trees, pre-mortem, wargaming, or tabletop exercise. See techniques below.
**Step 3: Facilitate session** - Manage group dynamics, overcome defensiveness, calibrate intensity. See [Facilitation Techniques](#facilitation-techniques).
**Step 4: Synthesize findings** - Use structured argumentation to evaluate critique validity. See [Argumentation Framework](#argumentation-framework).
**Step 5: Build consensus** - Align stakeholders on risk prioritization and mitigations. See [Consensus Building](#consensus-building).
---
## Technique Selection
**Match technique to proposal characteristics:**
| Proposal Type | Complexity | Stakes | Group Size | Best Technique |
|--------------|------------|--------|------------|----------------|
| Security/Architecture | High | High | 3-5 | Attack Trees |
| Strategy/Product | Medium | High | 5-10 | Pre-mortem |
| Policy/Process | Medium | Medium | 8-15 | Tabletop Exercise |
| Crisis Response | High | Critical | 4-8 | Wargaming |
| Feature/Design | Low | Medium | 3-5 | Structured Critique (template.md) |
**Time availability:**
- 1-2 hours: Structured critique (template.md), Pre-mortem
- Half-day: Tabletop exercise, Attack trees
- Full-day: Wargaming, Multi-round simulation
---
## 1. Attack Trees
### What Are Attack Trees?
Systematic enumeration of attack vectors against a system. Start with attacker goal (root), decompose into sub-goals using AND/OR logic.
**Use case:** Security architecture, product launches with abuse potential
### Building Attack Trees
**Process:**
1. Define attacker goal (root node): "Compromise user data"
2. Decompose with AND/OR gates:
- **OR gate:** Attacker succeeds if ANY child succeeds
- **AND gate:** Must achieve ALL children
3. Assign properties to each path: Feasibility (1-5), Cost (L/M/H), Detection (1-5)
4. Identify critical paths: High feasibility + low detection + low cost
5. Design mitigations: Prevent (remove vulnerability), Detect (monitoring), Respond (incident plan)
**Example tree:**
```
[Compromise user data]
OR
├─ [Exploit API] → SQL injection / Auth bypass / Rate limit bypass
├─ [Social engineer] → Phish credentials AND Access admin panel
└─ [Physical access] → Breach datacenter AND Extract disk
```
**Template:**
```markdown
## Attack Tree: [Goal]
**Attacker profile:** [Script kiddie / Insider / Nation-state]
**Attack paths:**
1. **[Attack vector]** - Feasibility: [1-5] | Cost: [L/M/H] | Detection: [1-5] | Critical: [Y/N] | Mitigation: [Defense]
2. **[Attack vector]** [Same structure]
**Critical paths:** [Feasibility ≥4, detection ≤2]
**Recommended defenses:** [Prioritized mitigations]
```
---
## 2. Pre-mortem
### What Is Pre-mortem?
Assume proposal failed in future, work backwards to identify causes. Exploits prospective hindsight (easier to imagine causes of known failure than predict unknowns).
**Use case:** Product launches, strategic decisions, high-stakes initiatives
### Pre-mortem Process (90 min total)
1. **Set the stage (5 min):** "It's [date]. Our proposal failed spectacularly. [Describe worst outcome]"
2. **Individual brainstorming (10 min):** Each person writes 5-10 failure reasons independently
3. **Round-robin sharing (20 min):** Go around room, each shares one reason until all surfaced
4. **Cluster and prioritize (15 min):** Group similar, vote (3 votes/person), identify top 5-7
5. **Risk assessment (20 min):** For each: Severity (1-5), Likelihood (1-5), Early warning signs
6. **Design mitigations (30 min):** Preventative actions for highest-risk modes
**Template:**
```markdown
## Pre-mortem: [Proposal]
**Scenario:** It's [date]. Failed. [Vivid worst outcome]
**Failure modes (by votes):**
1. **[Mode]** (Votes: [X]) - Why: [Root cause] | S: [1-5] L: [1-5] Score: [S×L] | Warnings: [Indicators] | Mitigation: [Action]
2. [Same structure]
**Showstoppers (≥15):** [Must-address]
**Revised plan:** [Changes based on pre-mortem]
```
**Facilitator tips:** Make failure vivid, encourage wild ideas, avoid blame, time-box ruthlessly
---
## 3. Wargaming
### What Is Wargaming?
Multi-party simulation where teams play adversarial roles over multiple rounds. Reveals dynamic effects (competitor responses, escalation, unintended consequences).
**Use case:** Competitive strategy, crisis response, market entry
### Wargaming Structure
**Roles:** Proposer team, Adversary team(s) (competitors, regulators), Control team (adjudicates outcomes)
**Turn sequence per round (35 min):**
1. Planning (15 min): Teams plan moves in secret
2. Execution (5 min): Reveal simultaneously
3. Adjudication (10 min): Control determines outcomes, updates game state
4. Debrief (5 min): Reflect on consequences
**Process:**
1. **Define scenario (30 min):** Scenario, victory conditions per team, constraints
2. **Brief teams (15 min):** Role sheets with incentives, capabilities, constraints
3. **Run 3-5 rounds (45 min each):** Control introduces events to stress-test
4. **Post-game debrief (45 min):** Strategies emerged, vulnerabilities exposed, contingencies needed
**Template:**
```markdown
## Wargame: [Proposal]
**Scenario:** [Environment] | **Teams:** Proposer: [Us] | Adversary 1: [Competitor] | Adversary 2: [Regulator] | Control: [Facilitator]
**Victory conditions:** Proposer: [Goal] | Adversary 1: [Goal] | Adversary 2: [Goal]
**Round 1:** Proposer: [Move] | Adv1: [Response] | Adv2: [Response] | Outcome: [New state]
**Round 2-5:** [Same structure]
**Key insights:** [Unexpected dynamics, blind spots, countermoves]
**Recommendations:** [Mitigations, contingencies]
```
---
## 4. Tabletop Exercises
### What Are Tabletop Exercises?
Structured walkthrough where participants discuss how they'd respond to scenario. Focuses on coordination, process gaps, decision-making under stress.
**Use case:** Incident response, crisis management, operational readiness
### Tabletop Process
1. **Design scenario (1 hr prep):** Realistic incident with injects (new info at intervals), decision points
2. **Brief participants (10 min):** Set scene, define roles, clarify it's simulation
3. **Run scenario (90 min):** Present 5-7 injects, discuss responses (10-15 min each)
4. **Debrief (30 min):** What went well? Gaps exposed? Changes needed?
**Example inject sequence:**
- T+0: "Alert fires: unusual DB access" → Who's notified? First action?
- T+15: "10K records accessed" → Who notify (legal, PR)? Communication?
- T+30: "CEO wants briefing, reporter called" → CEO message? PR statement?
**Template:**
```markdown
## Tabletop: [Scenario]
**Objective:** Test [plan/procedure] | **Participants:** [Roles] | **Scenario:** [Incident description]
**Injects:**
**T+0 - [Event]** | Q: [Who responsible? What action?] | Decisions: [Responses] | Gaps: [Unclear/missing]
**T+15 - [Escalation]** [Same structure]
**Debrief:** Strengths: [Worked well] | Gaps: [Process/tool/authority] | Recommendations: [Changes]
```
---
## Facilitation Techniques
### Managing Defensive Responses
| Pattern | Response | Goal |
|---------|----------|------|
| "We already thought of that" | "Great. Walk me through the analysis and mitigation?" | Verify claim, check adequacy |
| "That's not realistic" | "What makes this unlikely?" (Socratic) | Challenge without confrontation |
| "You don't understand context" | "You're right, help me. Can you explain [X]? How does that address [critique]?" | Acknowledge expertise, stay focused |
| Dismissive tone/eye-rolling | "Sensing resistance. Goal is improve, not attack. What would help?" | Reset tone, reaffirm purpose |
### Calibrating Adversarial Intensity
**Too aggressive:** Team shuts down, hostile | **Too soft:** Superficial critiques, groupthink
**Escalation approach:**
- Round 1: Curious questions ("What if X?")
- Round 2: Direct challenges ("Assumes Y, but what if false?")
- Round 3: Aggressive probing ("How does this survive Z?")
**Adjust to culture:**
- High-trust teams: Aggressive critique immediately
- Defensive teams: Start curious, frame as "helping improve"
**"Yes, and..." technique:** "Yes, solves X, AND creates Y for users Z" (acknowledges value + raises concern)
### Facilitator Tactics
- **Parking lot:** "Important but out-of-scope. Capture for later."
- **Redirect attacks:** "Critique proposal, not people. Rephrase?"
- **Balance airtime:** "Let's hear from [quiet person]."
- **Synthesize:** "Here's what I heard: [3-5 themes]. Accurate?"
- **Strategic silence:** Wait 10+ sec after tough question. Forces deeper thinking.
---
## Argumentation Framework
### Toulmin Model for Evaluating Critiques
**Use case:** Determine if critique is valid or strawman
**Components:** Claim (assertion) + Data (evidence) + Warrant (logical link) + Backing (support for warrant) + Qualifier (certainty) + Rebuttal (conditions where claim fails)
**Example:**
- **Claim:** "Feature will fail, users won't adopt"
- **Data:** "5% beta adoption"
- **Warrant:** "Beta users = target audience, beta predicts production"
- **Backing:** "Past 3 features: beta adoption r=0.89 correlation"
- **Qualifier:** "Likely"
- **Rebuttal:** "Unless we improve onboarding (not in beta)"
### Evaluating Critique Validity
**Strong:** Specific data, logical warrant, backing exists, acknowledges rebuttals
**Weak (strawman):** Vague hypotheticals, illogical warrant, no backing, ignores rebuttals
**Example evaluation:**
"API slow because complex DB queries" | Data: "5+ table joins" ✓ | Warrant: "Multi-joins slow" ✓ | Backing: "Prior 5+ joins = 2s" ✓ | Rebuttal acknowledged? No (caching, indexes) | **Verdict:** Moderate strength, address rebuttal
### Structured Rebuttal
**Proposer response:**
1. **Accept:** Valid, will address → Add to mitigation
2. **Refine:** Partially valid → Clarify conditions
3. **Reject:** Invalid → Provide counter-data + counter-warrant (substantive, not dismissive)
---
## Consensus Building
### Multi-Stakeholder Alignment (65 min)
**Challenge:** Different stakeholders prioritize different risks
**Process:**
1. **Acknowledge perspectives (15 min):** Each states top concern, facilitator captures
2. **Identify shared goals (10 min):** What do all agree on?
3. **Negotiate showstoppers (30 min):** For risks ≥15, discuss: Is this truly showstopper? Minimum mitigation? Vote if needed (stakeholder-weighted scoring)
4. **Accept disagreements (10 min):** Decision-maker breaks tie on non-showstoppers. Document dissent.
### Delphi Method (Asynchronous)
**Use case:** Distributed team, avoid group pressure
**Process:** Round 1 (independent assessments) → Round 2 (share anonymized, experts revise) → Round 3 (share aggregate, final assessments) → Convergence or decision-maker adjudicates
**Advantage:** Eliminates groupthink, HiPPO effect | **Disadvantage:** Slower (days/weeks)
---
## Advanced Critique Patterns
### Second-Order Effects
**Identify ripple effects:** "If we change this, what happens next? Then what?" (3-5 iterations)
**Example:** Launch referral → Users invite friends → Invited users lower engagement (didn't choose organically) → Churn ↑, LTV ↓ → Unit economics worsen → Budget cuts
### Inversion
**Ask "How do we guarantee failure?" then check if proposal avoids those modes**
**Example:** New market entry
- Inversion: Wrong product-market fit, underestimate competition, violate regulations, misunderstand culture
- Check: Market research? Regulatory review? Localization?
### Assumption Surfacing
**For each claim: "What must be true for this to work?"**
**Example:** "Feature increases engagement 20%"
- Assumptions: Users want it (validated?), will discover it (discoverability?), works reliably (load tested?), 20% credible (source?)
- Test each. If questionable, critique valid.
---
## Common Pitfalls & Mitigations
| Pitfall | Detection | Mitigation |
|---------|-----------|------------|
| **Analysis paralysis** | Red team drags on for weeks, no decision | Time-box exercise (half-day max). Focus on showstoppers only. |
| **Strawman arguments** | Critiques are unrealistic or extreme | Use Toulmin model to evaluate. Require data and backing. |
| **Groupthink persists** | All critiques are minor, no real challenges | Use adversarial roles explicitly. Pre-mortem or attack trees force critical thinking. |
| **Defensive shutdown** | Team rejects all critiques, hostility | Recalibrate tone. Use "Yes, and..." framing. Reaffirm red team purpose. |
| **HiPPO effect** | Highest-paid person's opinion dominates | Anonymous brainstorming (pre-mortem). Delphi method. |
| **No follow-through** | Great critiques, no mitigations implemented | Assign owners and deadlines to each mitigation. Track in project plan. |
| **Red team as rubber stamp** | Critique is superficial, confirms bias | Choose truly adversarial roles. Bring in external red team if internal team too aligned. |
| **Over-optimization of low-risk items** | Spending time on low-impact risks | Use risk matrix. Only address showstoppers and high-priority. |

View File

@@ -0,0 +1,380 @@
# Deliberation, Debate & Red Teaming Template
## Workflow
Copy this checklist and track your progress:
```
Red Teaming Progress:
- [ ] Step 1: Proposal definition and context
- [ ] Step 2: Adversarial role assignment
- [ ] Step 3: Critique generation
- [ ] Step 4: Risk assessment and prioritization
- [ ] Step 5: Mitigation recommendations
```
**Step 1: Proposal definition** - Define what we're evaluating, stakes, constraints. See [Proposal Definition](#proposal-definition).
**Step 2: Adversarial role assignment** - Choose 3-5 critical perspectives. See [Role Assignment](#role-assignment).
**Step 3: Critique generation** - For each role, generate specific challenges. See [Critique Generation](#critique-generation).
**Step 4: Risk assessment** - Score risks by severity × likelihood. See [Risk Assessment](#risk-assessment).
**Step 5: Mitigation recommendations** - Propose fixes for critical risks. See [Mitigation Recommendations](#mitigation-recommendations).
---
## Proposal Definition
### Input Template
```markdown
## Proposal Under Review
**Title:** [Concise name of proposal]
**Description:** [1-3 sentences explaining what we're proposing to do]
**Goal:** [Why are we doing this? What problem does it solve?]
**Stakeholders:**
- **Proposer:** [Who is advocating for this]
- **Decision maker:** [Who has final authority]
- **Affected parties:** [Who will be impacted]
**Stakes:**
- **If successful:** [Upside, benefits]
- **If fails:** [Downside, worst case]
- **Reversibility:** [Can we undo this? At what cost?]
**Timeline:**
- **Decision deadline:** [When must we decide]
- **Implementation timeline:** [When would this happen]
- **Cost of delay:** [What do we lose by waiting]
**Current confidence:** [Team's current belief this is the right decision, 0-100%]
**Prior analysis:** [What vetting has been done already]
```
---
## Role Assignment
### Role Selection Guide
**Choose 3-5 adversarial roles most likely to expose blind spots for this specific proposal.**
**For product/feature launches:**
- Customer (user friction)
- Operations (reliability, maintenance)
- Competitor (market positioning)
- Legal/Privacy (compliance)
**For technical/architecture decisions:**
- Security (attack vectors)
- Operations (operability, debugging)
- Engineer (technical debt, complexity)
- Long-term Thinker (future costs)
**For strategy/business decisions:**
- Competitor (market response)
- Finance (hidden costs, ROI assumptions)
- Contrarian (challenge core assumptions)
- Long-term Thinker (second-order effects)
**For policy/process changes:**
- Affected User (workflow disruption)
- Operations (enforcement burden)
- Legal/Compliance (regulatory risk)
- Investigative Journalist (PR risk)
### Role Assignment Template
```markdown
## Adversarial Roles
**Role 1: [Role Name]** (e.g., Security / Malicious Actor)
- **Perspective:** [What are this role's incentives and concerns?]
- **Key question:** [What would this role be most worried about?]
**Role 2: [Role Name]** (e.g., Customer)
- **Perspective:** [What does this role care about?]
- **Key question:** [What friction or cost does this create for them?]
**Role 3: [Role Name]** (e.g., Operations)
- **Perspective:** [What operational burden does this create?]
- **Key question:** [What breaks at 2am?]
**Role 4: [Role Name]** (optional)
- **Perspective:**
- **Key question:**
**Role 5: [Role Name]** (optional)
- **Perspective:**
- **Key question:**
```
---
## Critique Generation
### Critique Framework
For each assigned role, answer these questions:
**1. What could go wrong?**
- Failure modes from this perspective
- Edge cases that break the proposal
- Unintended consequences
**2. What assumptions are questionable?**
- Optimistic estimates (timeline, cost, adoption)
- Unvalidated beliefs (market demand, technical feasibility)
- Hidden dependencies
**3. What are we missing?**
- Gaps in analysis
- Stakeholders not considered
- Alternative approaches not evaluated
**4. What happens under stress?**
- How does this fail under load, pressure, or adversarial conditions?
- What cascading failures could occur?
### Critique Template (Per Role)
```markdown
### Critique from [Role Name]
**What could go wrong:**
1. [Specific failure mode]
2. [Edge case that breaks this]
3. [Unintended consequence]
**Questionable assumptions:**
1. [Optimistic estimate: e.g., "assumes 80% adoption but no user testing"]
2. [Unvalidated belief: e.g., "assumes competitors won't respond"]
3. [Hidden dependency: e.g., "requires Team X to deliver by date Y"]
**What we're missing:**
1. [Gap in analysis: e.g., "no security review"]
2. [Unconsidered stakeholder: e.g., "impact on support team not assessed"]
3. [Alternative not evaluated: e.g., "could achieve same goal with lower-risk approach"]
**Stress test scenarios:**
1. [High load: e.g., "What if usage is 10x expected?"]
2. [Adversarial: e.g., "What if competitor launches similar feature first?"]
3. [Cascading failure: e.g., "What if dependency X goes down?"]
**Severity assessment:** [Critical / High / Medium / Low / Trivial]
**Likelihood assessment:** [Very Likely / Likely / Possible / Unlikely / Rare]
```
### Multi-Role Synthesis Template
```markdown
## Critique Summary (All Roles)
**Themes across roles:**
- **Security/Privacy:** [Cross-role security concerns]
- **Operations/Reliability:** [Operational risks raised by multiple roles]
- **Customer Impact:** [User friction points]
- **Financial:** [Cost or ROI concerns]
- **Legal/Compliance:** [Regulatory or liability issues]
- **Technical Feasibility:** [Implementation challenges]
**Showstopper risks** (mentioned by multiple roles or rated Critical):
1. [Risk that appeared in multiple critiques or scored ≥15]
2. [Another critical cross-cutting concern]
```
---
## Risk Assessment
### Risk Scoring Process
**For each risk identified in critiques:**
1. **Assess Severity (1-5):**
- 5 = Critical (catastrophic failure)
- 4 = High (major damage)
- 3 = Medium (moderate impact)
- 2 = Low (minor inconvenience)
- 1 = Trivial (negligible)
2. **Assess Likelihood (1-5):**
- 5 = Very Likely (>80%)
- 4 = Likely (50-80%)
- 3 = Possible (20-50%)
- 2 = Unlikely (5-20%)
- 1 = Rare (<5%)
3. **Calculate Risk Score:** Severity × Likelihood
4. **Categorize:**
- ≥15: Showstopper (must fix)
- 10-14: High Priority (should address)
- 5-9: Monitor (accept with contingency)
- <5: Accept (acknowledge)
### Risk Assessment Template
```markdown
## Risk Register
| # | Risk Description | Source Role | Severity | Likelihood | Score | Category | Priority |
|---|-----------------|-------------|----------|------------|-------|----------|----------|
| 1 | [Specific risk] | [Role] | [1-5] | [1-5] | [Score] | [Showstopper/High/Monitor/Accept] | [1/2/3] |
| 2 | [Specific risk] | [Role] | [1-5] | [1-5] | [Score] | [Category] | [Priority] |
| 3 | [Specific risk] | [Role] | [1-5] | [1-5] | [Score] | [Category] | [Priority] |
**Showstoppers (Score ≥ 15):**
- [List all must-fix risks]
**High Priority (Score 10-14):**
- [List should-address risks]
**Monitored (Score 5-9):**
- [List accept-with-contingency risks]
**Accepted (Score < 5):**
- [List acknowledged low-risk items]
```
---
## Mitigation Recommendations
### Mitigation Strategy Selection
For each risk, choose appropriate mitigation:
**Showstoppers:** Must be addressed before proceeding
- **Revise:** Change the proposal to eliminate risk
- **Safeguard:** Add controls to reduce likelihood
- **Reduce scope:** Limit blast radius (gradual rollout, pilot)
- **Delay:** Gather more data or wait for conditions to improve
**High Priority:** Should address or have strong plan
- **Safeguard:** Add monitoring, rollback capability
- **Contingency:** Have plan B ready
- **Reduce scope:** Phase implementation
- **Monitor:** Track closely with trigger for action
**Monitor:** Accept with contingency
- **Monitor:** Set up alerts/metrics
- **Contingency:** Have fix ready but don't implement preemptively
**Accept:** Acknowledge and move on
- **Document:** Note risk for awareness
- **No action:** Proceed as planned
### Mitigation Template
```markdown
## Mitigation Plan
### Showstopper Risks (Must Address)
**Risk 1: [Description]** (Score: [XX], S: [X], L: [X])
- **Strategy:** [Revise / Safeguard / Reduce Scope / Delay]
- **Actions:** [1. Concrete action, 2. Another action, 3. Measurement]
- **Owner:** [Who] | **Deadline:** [When] | **Success:** [How we know it's mitigated]
**Risk 2: [Description]** (Score: [XX])
[Same structure]
### High Priority Risks (Should Address)
**Risk 3: [Description]** (Score: [XX])
- **Strategy:** [Safeguard / Contingency / Reduce Scope / Monitor]
- **Actions:** [1. Action, 2. Monitoring setup]
- **Owner:** [Who] | **Deadline:** [When]
### Monitored Risks
**Risk 4: [Description]** (Score: [XX])
- **Metrics:** [Track] | **Alert:** [Threshold] | **Contingency:** [Action if manifests]
### Accepted Risks
**Risk 5: [Description]** (Score: [XX]) - [Rationale for acceptance]
```
### Revised Proposal Template
```markdown
## Revised Proposal
**Original Proposal:**
[Brief summary of what was originally proposed]
**Key Changes Based on Red Team:**
1. [Change based on showstopper risk X]
2. [Safeguard added for high-priority risk Y]
3. [Scope reduction for risk Z]
**New Implementation Plan:**
- **Phase 1:** [Revised timeline, reduced scope, or pilot approach]
- **Phase 2:** [Gradual expansion if Phase 1 succeeds]
- **Rollback Plan:** [How we undo if something goes wrong]
**Updated Risk Profile:**
- **Showstoppers remaining:** [None / X issues pending resolution]
- **High-priority risks with mitigation:** [List with brief mitigation]
- **Monitoring plan:** [Key metrics and thresholds]
**Recommendation:**
- [ ] **Proceed** - All showstoppers addressed, high-priority risks mitigated
- [ ] **Proceed with caution** - Some high-priority risks remain, monitoring required
- [ ] **Delay** - Showstoppers unresolved, gather more data
- [ ] **Cancel** - Risks too high even with mitigations, pursue alternative
```
---
## Quality Checklist
Before delivering red team analysis, verify:
**Critique quality:**
- [ ] Each role provides specific, realistic critiques (not strawman arguments)
- [ ] Critiques identify failure modes, questionable assumptions, gaps, and stress scenarios
- [ ] At least 3 roles provide independent perspectives
- [ ] Critiques are adversarial but constructive (steelman, not strawman)
**Risk assessment:**
- [ ] All identified risks have severity and likelihood ratings
- [ ] Risk scores calculated correctly (Severity × Likelihood)
- [ ] Showstoppers clearly flagged (score ≥ 15)
- [ ] Risk categories assigned (Showstopper/High/Monitor/Accept)
**Mitigation quality:**
- [ ] Every showstopper has specific mitigation plan
- [ ] High-priority risks either mitigated or explicitly accepted with rationale
- [ ] Mitigations are concrete (not vague like "be careful")
- [ ] Responsibility and deadlines assigned for showstopper mitigations
**Revised proposal:**
- [ ] Changes clearly linked to risks identified in red team
- [ ] Implementation plan updated (phasing, rollback, monitoring)
- [ ] Recommendation made (Proceed / Proceed with caution / Delay / Cancel)
- [ ] Rationale provided for recommendation
---
## Common Pitfalls
| Pitfall | Fix |
|---------|-----|
| **Strawman critiques** (weak arguments) | Make critiques realistic. If real attacker wouldn't make this argument, don't use it. |
| **Missing critical perspectives** | Identify who has most to lose. Include those roles. |
| **No prioritization** (all risks equal) | Use severity × likelihood matrix. Not everything is showstopper. |
| **Vague mitigations** ("be careful") | Make concrete, measurable, with owners and deadlines. |
| **Red team as rubber stamp** | Genuinely seek to break proposal. If nothing found, critique wasn't adversarial enough. |
| **Defensive response** | Red team's job is to find problems. Fix or accept risk, don't dismiss. |
| **Analysis paralysis** | Time-box red team (1-2 sessions). Focus on showstoppers. |
| **Ignoring culture** | Calibrate tone. Some teams prefer "curious questions" over "aggressive challenges." |