Initial commit
This commit is contained in:
274
skills/ethics-safety-impact/SKILL.md
Normal file
274
skills/ethics-safety-impact/SKILL.md
Normal file
@@ -0,0 +1,274 @@
|
||||
---
|
||||
name: ethics-safety-impact
|
||||
description: Use when decisions could affect groups differently and need to anticipate harms/benefits, assess fairness and safety concerns, identify vulnerable populations, propose risk mitigations, define monitoring metrics, or when user mentions ethical review, impact assessment, differential harm, safety analysis, vulnerable groups, bias audit, or responsible AI/tech.
|
||||
---
|
||||
# Ethics, Safety & Impact Assessment
|
||||
|
||||
## Table of Contents
|
||||
- [Purpose](#purpose)
|
||||
- [When to Use](#when-to-use)
|
||||
- [What Is It?](#what-is-it)
|
||||
- [Workflow](#workflow)
|
||||
- [Common Patterns](#common-patterns)
|
||||
- [Guardrails](#guardrails)
|
||||
- [Quick Reference](#quick-reference)
|
||||
|
||||
## Purpose
|
||||
|
||||
Ethics, Safety & Impact Assessment provides a structured framework for identifying potential harms, benefits, and differential impacts before launching features, implementing policies, or making decisions that affect people. This skill guides you through stakeholder identification, harm/benefit analysis, fairness evaluation, risk mitigation design, and ongoing monitoring to ensure responsible and equitable outcomes.
|
||||
|
||||
## When to Use
|
||||
|
||||
Use this skill when:
|
||||
|
||||
- **Product launches**: New features, algorithm changes, UI redesigns that affect user experience or outcomes
|
||||
- **Policy decisions**: Terms of service updates, content moderation rules, data usage policies, pricing changes
|
||||
- **Data & AI systems**: Training models, deploying algorithms, using sensitive data, automated decision-making
|
||||
- **Platform changes**: Recommendation systems, search ranking, feed algorithms, matching/routing logic
|
||||
- **Access & inclusion**: Features affecting accessibility, vulnerable populations, underrepresented groups, global markets
|
||||
- **Safety-critical systems**: Health, finance, transportation, security applications where errors have serious consequences
|
||||
- **High-stakes decisions**: Hiring, lending, admissions, criminal justice, insurance where outcomes significantly affect lives
|
||||
- **Content & communication**: Moderation policies, fact-checking systems, content ranking, amplification rules
|
||||
|
||||
Trigger phrases: "ethical review", "impact assessment", "who might be harmed", "differential impact", "vulnerable populations", "bias audit", "fairness check", "safety analysis", "responsible AI", "unintended consequences"
|
||||
|
||||
## What Is It?
|
||||
|
||||
Ethics, Safety & Impact Assessment is a proactive evaluation framework that systematically examines:
|
||||
- **Who** is affected (stakeholder mapping, vulnerable groups)
|
||||
- **What** could go wrong (harm scenarios, failure modes)
|
||||
- **Why** it matters (severity, likelihood, distribution of impacts)
|
||||
- **How** to mitigate (design changes, safeguards, monitoring)
|
||||
- **When** to escalate (triggers, thresholds, review processes)
|
||||
|
||||
**Core ethical principles:**
|
||||
- **Fairness**: Equal treatment, non-discrimination, equitable outcomes across groups
|
||||
- **Autonomy**: User choice, informed consent, control over data and experience
|
||||
- **Beneficence**: Maximize benefits, design for positive impact
|
||||
- **Non-maleficence**: Minimize harms, "do no harm" as baseline
|
||||
- **Transparency**: Explain decisions, disclose limitations, build trust
|
||||
- **Accountability**: Clear ownership, redress mechanisms, audit trails
|
||||
- **Privacy**: Data protection, confidentiality, purpose limitation
|
||||
- **Justice**: Equitable distribution of benefits and burdens, address historical inequities
|
||||
|
||||
**Quick example:**
|
||||
|
||||
**Scenario**: Launching credit scoring algorithm for loan approvals
|
||||
|
||||
**Ethical impact assessment**:
|
||||
|
||||
1. **Stakeholders affected**: Loan applicants (diverse demographics), lenders, society (economic mobility)
|
||||
|
||||
2. **Potential harms**:
|
||||
- **Disparate impact**: Algorithm trained on historical data may perpetuate bias against protected groups (race, gender, age)
|
||||
- **Opacity**: Applicants denied loans without explanation, cannot contest decision
|
||||
- **Feedback loops**: Denying loans to disadvantaged groups → lack of credit history → continued denials
|
||||
- **Economic harm**: Incorrect denials prevent wealth building, perpetuate poverty
|
||||
|
||||
3. **Vulnerable groups**: Racial minorities historically discriminated in lending, immigrants with thin credit files, young adults, people in poverty
|
||||
|
||||
4. **Mitigations**:
|
||||
- **Fairness audit**: Test for disparate impact across protected classes, equalized odds
|
||||
- **Explainability**: Provide reason codes (top 3 factors), allow appeals
|
||||
- **Alternative data**: Include rent, utility payments to expand access
|
||||
- **Human review**: Flag edge cases for manual review, override capability
|
||||
- **Regular monitoring**: Track approval rates by demographic, quarterly bias audits
|
||||
|
||||
5. **Monitoring & escalation**:
|
||||
- **Metrics**: Approval rate parity (within 10% across groups), false positive/negative rates, appeal overturn rate
|
||||
- **Triggers**: If disparate impact >20%, escalate to ethics committee
|
||||
- **Review**: Quarterly fairness audits, annual independent assessment
|
||||
|
||||
## Workflow
|
||||
|
||||
Copy this checklist and track your progress:
|
||||
|
||||
```
|
||||
Ethics & Safety Assessment Progress:
|
||||
- [ ] Step 1: Map stakeholders and identify vulnerable groups
|
||||
- [ ] Step 2: Analyze potential harms and benefits
|
||||
- [ ] Step 3: Assess fairness and differential impacts
|
||||
- [ ] Step 4: Evaluate severity and likelihood
|
||||
- [ ] Step 5: Design mitigations and safeguards
|
||||
- [ ] Step 6: Define monitoring and escalation protocols
|
||||
```
|
||||
|
||||
**Step 1: Map stakeholders and identify vulnerable groups**
|
||||
|
||||
Identify all affected parties (direct users, indirect, society). Prioritize vulnerable populations most at risk. See [resources/template.md](resources/template.md#stakeholder-mapping-template) for stakeholder analysis framework.
|
||||
|
||||
**Step 2: Analyze potential harms and benefits**
|
||||
|
||||
Brainstorm what could go wrong (harms) and what value is created (benefits) for each stakeholder group. See [resources/template.md](resources/template.md#harm-benefit-analysis-template) for structured analysis.
|
||||
|
||||
**Step 3: Assess fairness and differential impacts**
|
||||
|
||||
Evaluate whether outcomes, treatment, or access differ across groups. Check for disparate impact. See [resources/methodology.md](resources/methodology.md#fairness-metrics) for fairness criteria and measurement.
|
||||
|
||||
**Step 4: Evaluate severity and likelihood**
|
||||
|
||||
Score each harm on severity (1-5) and likelihood (1-5), prioritize high-risk combinations. See [resources/template.md](resources/template.md#risk-matrix-template) for prioritization framework.
|
||||
|
||||
**Step 5: Design mitigations and safeguards**
|
||||
|
||||
For high-priority harms, propose design changes, policy safeguards, oversight mechanisms. See [resources/methodology.md](resources/methodology.md#mitigation-strategies) for intervention types.
|
||||
|
||||
**Step 6: Define monitoring and escalation protocols**
|
||||
|
||||
Set metrics, thresholds, review cadence, escalation triggers. Validate using [resources/evaluators/rubric_ethics_safety_impact.json](resources/evaluators/rubric_ethics_safety_impact.json). **Minimum standard**: Average score ≥ 3.5.
|
||||
|
||||
## Common Patterns
|
||||
|
||||
**Pattern 1: Algorithm Fairness Audit**
|
||||
- **Stakeholders**: Users receiving algorithmic decisions (hiring, lending, content ranking), protected groups
|
||||
- **Harms**: Disparate impact (bias against protected classes), feedback loops amplifying inequality, opacity preventing accountability
|
||||
- **Assessment**: Test for demographic parity, equalized odds, calibration across groups; analyze training data for historical bias
|
||||
- **Mitigations**: Debiasing techniques, fairness constraints, explainability, human review for edge cases, regular audits
|
||||
- **Monitoring**: Disparate impact ratio, false positive/negative rates by group, user appeals and overturn rates
|
||||
|
||||
**Pattern 2: Data Privacy & Consent**
|
||||
- **Stakeholders**: Data subjects (users whose data is collected), vulnerable groups (children, marginalized communities)
|
||||
- **Harms**: Privacy violations, surveillance, data breaches, lack of informed consent, secondary use without permission, re-identification risk
|
||||
- **Assessment**: Map data flows (collection → storage → use → sharing), identify sensitive attributes (PII, health, location), consent adequacy
|
||||
- **Mitigations**: Data minimization (collect only necessary), anonymization/differential privacy, granular consent, user data controls (export, delete), encryption
|
||||
- **Monitoring**: Breach incidents, data access logs, consent withdrawal rates, user data requests (GDPR, CCPA)
|
||||
|
||||
**Pattern 3: Content Moderation & Free Expression**
|
||||
- **Stakeholders**: Content creators, viewers, vulnerable groups (targets of harassment), society (information integrity)
|
||||
- **Harms**: Over-moderation (silencing legitimate speech, especially marginalized voices), under-moderation (allowing harm, harassment, misinformation), inconsistent enforcement
|
||||
- **Assessment**: Analyze moderation error rates (false positives/negatives), differential enforcement across groups, cultural context sensitivity
|
||||
- **Mitigations**: Clear policies with examples, appeals process, human review, diverse moderators, cultural context training, transparency reports
|
||||
- **Monitoring**: Moderation volume and error rates by category, appeal overturn rates, disparate enforcement across languages/regions
|
||||
|
||||
**Pattern 4: Accessibility & Inclusive Design**
|
||||
- **Stakeholders**: Users with disabilities (visual, auditory, motor, cognitive), elderly, low-literacy, low-bandwidth users
|
||||
- **Harms**: Exclusion (cannot use product), degraded experience, safety risks (cannot access critical features), digital divide
|
||||
- **Assessment**: WCAG compliance audit, assistive technology testing, user research with diverse abilities, cross-cultural usability
|
||||
- **Mitigations**: Accessible design (WCAG AA/AAA), alt text, keyboard navigation, screen reader support, low-bandwidth mode, multi-language, plain language
|
||||
- **Monitoring**: Accessibility test coverage, user feedback from disability communities, task completion rates across abilities
|
||||
|
||||
**Pattern 5: Safety-Critical Systems**
|
||||
- **Stakeholders**: End users (patients, drivers, operators), vulnerable groups (children, elderly, compromised health), public safety
|
||||
- **Harms**: Physical harm (injury, death), psychological harm (trauma), property damage, cascade failures affecting many
|
||||
- **Assessment**: Failure mode analysis (FMEA), fault tree analysis, worst-case scenarios, edge cases that break assumptions
|
||||
- **Mitigations**: Redundancy, fail-safes, human oversight, rigorous testing (stress, chaos, adversarial), incident response plans, staged rollouts
|
||||
- **Monitoring**: Error rates, near-miss incidents, safety metrics (accidents, adverse events), user-reported issues, compliance audits
|
||||
|
||||
## Guardrails
|
||||
|
||||
**Critical requirements:**
|
||||
|
||||
1. **Identify vulnerable groups explicitly**: Not all stakeholders are equally at risk. Prioritize: children, elderly, people with disabilities, marginalized/discriminated groups, low-income, low-literacy, geographically isolated, politically targeted. If none identified, you're probably missing them.
|
||||
|
||||
2. **Consider second-order and long-term effects**: First-order obvious harms are just the start. Look for: feedback loops (harm → disadvantage → more harm), normalization (practice becomes standard), precedent (enables worse future behavior), accumulation (small harms compound over time). Ask "what happens next?"
|
||||
|
||||
3. **Assess differential impact, not just average**: Feature may help average user but harm specific groups. Metrics: disparate impact (outcome differences across groups >20% = red flag), intersectionality (combinations of identities may face unique harms), distributive justice (who gets benefits vs. burdens?).
|
||||
|
||||
4. **Design mitigations before launch, not after harm**: Reactive fixes are too late for those already harmed. Proactive: Build safeguards into design, test with diverse users, staged rollout with monitoring, kill switches, pre-commit to audits. "Move fast and break things" is unethical for systems affecting people's lives.
|
||||
|
||||
5. **Provide transparency and recourse**: People affected have right to know and contest. Minimum: Explain decisions (what factors, why outcome), Appeal mechanism (human review, overturn if wrong), Redress (compensate harm), Audit trails (investigate complaints). Opacity is often a sign of hidden bias or risk.
|
||||
|
||||
6. **Monitor outcomes, not just intentions**: Good intentions don't prevent harm. Measure actual impacts: outcome disparities by group, user-reported harms, error rates and their distribution, unintended consequences. Set thresholds that trigger review/shutdown.
|
||||
|
||||
7. **Establish clear accountability and escalation**: Assign ownership. Define: Who reviews ethics risks before launch? Who monitors post-launch? What triggers escalation? Who can halt harmful features? Document decisions and rationale for later review.
|
||||
|
||||
8. **Respect autonomy and consent**: Users deserve: Informed choice (understand what they're agreeing to, in plain language), Meaningful alternatives (consent not coerced), Control (opt out, delete data, configure settings), Purpose limitation (data used only for stated purpose). Children and vulnerable groups need extra protections.
|
||||
|
||||
**Common pitfalls:**
|
||||
|
||||
- ❌ **Assuming "we treat everyone the same" = fairness**: Equal treatment of unequal groups perpetuates inequality. Fairness often requires differential treatment.
|
||||
- ❌ **Optimization without constraints**: Maximizing engagement/revenue unconstrained leads to amplifying outrage, addiction, polarization. Set ethical boundaries.
|
||||
- ❌ **Moving fast and apologizing later**: For safety/ethics, prevention > apology. Harms to vulnerable groups are not acceptable experiments.
|
||||
- ❌ **Privacy theater**: Requiring consent without explaining risks, or making consent mandatory for service, is not meaningful consent.
|
||||
- ❌ **Sampling bias in testing**: Testing only on employees (young, educated, English-speaking) misses how diverse users experience harm.
|
||||
- ❌ **Ethics washing**: Performative statements without material changes. Impact assessments must change decisions, not just document them.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Key resources:**
|
||||
|
||||
- **[resources/template.md](resources/template.md)**: Stakeholder mapping, harm/benefit analysis, risk matrix, mitigation planning, monitoring framework
|
||||
- **[resources/methodology.md](resources/methodology.md)**: Fairness metrics, privacy analysis, safety assessment, bias detection, participatory design
|
||||
- **[resources/evaluators/rubric_ethics_safety_impact.json](resources/evaluators/rubric_ethics_safety_impact.json)**: Quality criteria for stakeholder analysis, harm identification, mitigation design, monitoring
|
||||
|
||||
**Stakeholder Priorities:**
|
||||
|
||||
High-risk groups to always consider:
|
||||
- Children (<18, especially <13)
|
||||
- People with disabilities (visual, auditory, motor, cognitive)
|
||||
- Racial/ethnic minorities, especially historically discriminated groups
|
||||
- Low-income, unhoused, financially precarious
|
||||
- LGBTQ+, especially in hostile jurisdictions
|
||||
- Elderly (>65), especially digitally less-skilled
|
||||
- Non-English speakers, low-literacy
|
||||
- Political dissidents, activists, journalists in repressive contexts
|
||||
- Refugees, immigrants, undocumented
|
||||
- Mentally ill, cognitively impaired
|
||||
|
||||
**Harm Categories:**
|
||||
|
||||
- **Physical**: Injury, death, health deterioration
|
||||
- **Psychological**: Trauma, stress, anxiety, depression, addiction
|
||||
- **Economic**: Lost income, debt, poverty, exclusion from opportunity
|
||||
- **Social**: Discrimination, harassment, ostracism, loss of relationships
|
||||
- **Autonomy**: Coercion, manipulation, loss of control, dignity violation
|
||||
- **Privacy**: Surveillance, exposure, data breach, re-identification
|
||||
- **Reputational**: Stigma, defamation, loss of standing
|
||||
- **Epistemic**: Misinformation, loss of knowledge access, filter bubbles
|
||||
- **Political**: Disenfranchisement, censorship, targeted repression
|
||||
|
||||
**Fairness Definitions** (choose appropriate for context):
|
||||
|
||||
- **Demographic parity**: Outcome rates equal across groups (e.g., 40% approval rate for all)
|
||||
- **Equalized odds**: False positive and false negative rates equal across groups
|
||||
- **Equal opportunity**: True positive rate equal across groups (equal access to benefit)
|
||||
- **Calibration**: Predicted probabilities match observed frequencies for all groups
|
||||
- **Individual fairness**: Similar individuals treated similarly (Lipschitz condition)
|
||||
- **Counterfactual fairness**: Outcome same if sensitive attribute (race, gender) were different
|
||||
|
||||
**Mitigation Strategies:**
|
||||
|
||||
- **Prevent**: Design change eliminates harm (e.g., don't collect sensitive data)
|
||||
- **Reduce**: Decrease likelihood or severity (e.g., rate limiting, friction for risky actions)
|
||||
- **Detect**: Monitor and alert when harm occurs (e.g., bias dashboard, anomaly detection)
|
||||
- **Respond**: Process to address harm when found (e.g., appeals, human review, compensation)
|
||||
- **Safeguard**: Redundancy, fail-safes, circuit breakers for critical failures
|
||||
- **Transparency**: Explain, educate, build understanding and trust
|
||||
- **Empower**: Give users control, choice, ability to opt out or customize
|
||||
|
||||
**Monitoring Metrics:**
|
||||
|
||||
- **Outcome disparities**: Measure by protected class (approval rates, error rates, treatment quality)
|
||||
- **Error distribution**: False positives/negatives, who bears burden?
|
||||
- **User complaints**: Volume, categories, resolution rates, disparities
|
||||
- **Engagement/retention**: Differences across groups (are some excluded?)
|
||||
- **Safety incidents**: Volume, severity, affected populations
|
||||
- **Consent/opt-outs**: How many decline? Demographics of decliners?
|
||||
|
||||
**Escalation Triggers:**
|
||||
|
||||
- Disparate impact >20% without justification
|
||||
- Safety incidents causing serious harm (injury, death)
|
||||
- Vulnerable group disproportionately affected (>2× harm rate)
|
||||
- User complaints spike (>2× baseline)
|
||||
- Press/regulator attention
|
||||
- Internal ethics concerns raised
|
||||
|
||||
**When to escalate beyond this skill:**
|
||||
|
||||
- Legal compliance required (GDPR, ADA, Civil Rights Act, industry regulations)
|
||||
- Life-or-death safety-critical system (medical, transportation)
|
||||
- Children or vulnerable populations primary users
|
||||
- High controversy or political salience
|
||||
- Novel ethical terrain (new technology, no precedent)
|
||||
→ Consult: Legal counsel, ethics board, domain experts, affected communities, regulators
|
||||
|
||||
**Inputs required:**
|
||||
|
||||
- **Feature or decision** (what is being proposed? what changes?)
|
||||
- **Affected groups** (who is impacted? direct and indirect?)
|
||||
- **Context** (what problem does this solve? why now?)
|
||||
|
||||
**Outputs produced:**
|
||||
|
||||
- `ethics-safety-impact.md`: Stakeholder analysis, harm/benefit assessment, fairness evaluation, risk prioritization, mitigation plan, monitoring framework, escalation protocol
|
||||
@@ -0,0 +1,255 @@
|
||||
{
|
||||
"criteria": [
|
||||
{
|
||||
"name": "Stakeholder Identification",
|
||||
"1": "Only obvious stakeholders identified, vulnerable groups missing, no power/voice analysis",
|
||||
"3": "Primary stakeholders identified, some vulnerable groups noted, basic power analysis",
|
||||
"5": "Comprehensive stakeholder map (primary, secondary, societal), vulnerable groups prioritized with specific risk factors, power/voice dynamics analyzed, intersectionality considered"
|
||||
},
|
||||
{
|
||||
"name": "Harm Analysis Depth",
|
||||
"1": "Surface-level harms only, no mechanism analysis, severity/likelihood guessed",
|
||||
"3": "Multiple harms identified with mechanisms, severity/likelihood scored, some second-order effects",
|
||||
"5": "Comprehensive harm catalog across types (physical, psychological, economic, social, autonomy, privacy), mechanisms explained, severity/likelihood justified, second-order effects (feedback loops, accumulation, normalization, precedent) analyzed"
|
||||
},
|
||||
{
|
||||
"name": "Benefit Analysis Balance",
|
||||
"1": "Only harms or only benefits listed, no distribution analysis, rose-colored or overly negative",
|
||||
"3": "Both harms and benefits identified, some distribution analysis (who gets what)",
|
||||
"5": "Balanced harm/benefit analysis, distribution clearly specified (universal, subset, vulnerable groups), magnitude and timeline assessed, tradeoffs acknowledged"
|
||||
},
|
||||
{
|
||||
"name": "Fairness Assessment",
|
||||
"1": "No fairness analysis, assumes equal treatment = fairness, no metrics",
|
||||
"3": "Outcome disparities measured for some groups, fairness concern noted, basic mitigation proposed",
|
||||
"5": "Rigorous fairness analysis (outcome, treatment, access fairness), quantitative metrics (disparate impact ratio, error rates by group), intersectional analysis, appropriate fairness definition chosen for context"
|
||||
},
|
||||
{
|
||||
"name": "Risk Prioritization",
|
||||
"1": "No prioritization or arbitrary, all harms treated equally, no severity/likelihood scoring",
|
||||
"3": "Risk matrix used, severity and likelihood scored, high-risk harms identified",
|
||||
"5": "Rigorous risk prioritization (5x5 matrix), severity/likelihood justified with evidence/precedent, color-coded priorities, focus on red/orange (high-risk) harms, considers vulnerable group concentration"
|
||||
},
|
||||
{
|
||||
"name": "Mitigation Design",
|
||||
"1": "No mitigations or vague promises, reactive only, no ownership or timeline",
|
||||
"3": "Mitigations proposed for key harms, some specificity, owners/timelines mentioned",
|
||||
"5": "Specific mitigations for all high-priority harms, type specified (prevent/reduce/detect/respond/safeguard), effectiveness assessed, cost/tradeoffs acknowledged, owners assigned, timelines set, residual risk calculated"
|
||||
},
|
||||
{
|
||||
"name": "Monitoring & Metrics",
|
||||
"1": "No monitoring plan, intentions stated without measurement, no metrics defined",
|
||||
"3": "Some metrics defined, monitoring frequency mentioned, thresholds set",
|
||||
"5": "Comprehensive monitoring framework (outcome metrics disaggregated by group, leading indicators, qualitative feedback), specific thresholds for concern, escalation protocol (yellow/orange/red alerts), review cadence set, accountability clear"
|
||||
},
|
||||
{
|
||||
"name": "Transparency & Recourse",
|
||||
"1": "No mechanisms for affected parties to contest or understand decisions, opacity accepted",
|
||||
"3": "Some explainability mentioned, appeals process exists, basic transparency",
|
||||
"5": "Clear transparency (decisions explained in plain language, limitations disclosed), robust recourse (appeals with human review, overturn process, redress for harm), audit trails for investigation, accessible to affected groups"
|
||||
},
|
||||
{
|
||||
"name": "Stakeholder Participation",
|
||||
"1": "No involvement of affected groups, internal team only, no external input",
|
||||
"3": "Some user research or feedback collection, affected groups consulted",
|
||||
"5": "Meaningful participation of vulnerable/affected groups (advisory boards, co-design, participatory audits), diverse team conducting assessment, external review (ethics board, independent audit), ongoing consultation not one-time"
|
||||
},
|
||||
{
|
||||
"name": "Proportionality & Precaution",
|
||||
"1": "Assumes go-ahead, burden on critics to prove harm, move fast and apologize later",
|
||||
"3": "Some precaution for high-risk features, staged rollout considered, mitigation before launch",
|
||||
"5": "Precautionary principle applied (mitigate before launch for irreversible harms), proportional response (higher stakes = more safeguards), staged rollout with kill switches, burden on proponents to demonstrate safety, continuous monitoring post-launch"
|
||||
}
|
||||
],
|
||||
"guidance_by_type": {
|
||||
"Algorithm Fairness Audit": {
|
||||
"target_score": 4.2,
|
||||
"key_requirements": [
|
||||
"Fairness Assessment (score ≥5): Quantitative metrics (disparate impact, equalized odds, calibration), disaggregated by protected groups",
|
||||
"Harm Analysis: Disparate impact, feedback loops, opacity, inability to contest",
|
||||
"Mitigation Design: Debiasing techniques, fairness constraints, explainability, human review for edge cases",
|
||||
"Monitoring: Bias dashboard with real-time metrics by group, drift detection, periodic audits"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Assuming colorblindness = fairness (need to collect/analyze demographic data)",
|
||||
"Only checking one fairness metric (tradeoffs exist, choose appropriate for context)",
|
||||
"Not testing for intersectionality (race × gender unique harms)"
|
||||
]
|
||||
},
|
||||
"Data Privacy & Consent": {
|
||||
"target_score": 4.0,
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification: Data subjects, vulnerable groups (children, marginalized)",
|
||||
"Harm Analysis: Privacy violations, surveillance, breaches, secondary use, re-identification",
|
||||
"Mitigation Design: Data minimization, anonymization/differential privacy, granular consent, encryption, user controls",
|
||||
"Monitoring: Breach incidents, access logs, consent withdrawals, data requests (GDPR)"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Privacy theater (consent mandatory for service = not meaningful choice)",
|
||||
"De-identification without considering linkage attacks",
|
||||
"Not providing genuine user controls (export, delete)"
|
||||
]
|
||||
},
|
||||
"Content Moderation & Free Expression": {
|
||||
"target_score": 3.9,
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification: Creators, viewers, vulnerable groups (harassment targets), society (information integrity)",
|
||||
"Harm Analysis: Over-moderation (silencing marginalized voices), under-moderation (harassment, misinfo), inconsistent enforcement",
|
||||
"Fairness Assessment: Error rates by group, differential enforcement across languages/regions, cultural context",
|
||||
"Mitigation: Clear policies, appeals with human review, diverse moderators, transparency reports"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Optimizing for engagement without ethical constraints (amplifies outrage)",
|
||||
"Not accounting for cultural context (policies designed for US applied globally)",
|
||||
"Transparency without accountability (reports without action)"
|
||||
]
|
||||
},
|
||||
"Accessibility & Inclusive Design": {
|
||||
"target_score": 4.1,
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification: People with disabilities (visual, auditory, motor, cognitive), elderly, low-literacy, low-bandwidth",
|
||||
"Harm Analysis: Exclusion, degraded experience, safety risks (cannot access critical features)",
|
||||
"Mitigation: WCAG AA/AAA compliance, assistive technology testing, keyboard navigation, alt text, plain language, multi-language",
|
||||
"Monitoring: Accessibility test coverage, feedback from disability communities, task completion rates across abilities"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Accessibility as afterthought (retrofit harder than design-in)",
|
||||
"Testing only with non-disabled users or automated tools (miss real user experience)",
|
||||
"Meeting minimum standards without usability (technically compliant but unusable)"
|
||||
]
|
||||
},
|
||||
"Safety-Critical Systems": {
|
||||
"target_score": 4.3,
|
||||
"key_requirements": [
|
||||
"Harm Analysis: Physical harm (injury, death), psychological trauma, property damage, cascade failures",
|
||||
"Risk Prioritization: FMEA or Fault Tree Analysis, worst-case scenario planning, single points of failure identified",
|
||||
"Mitigation: Redundancy, fail-safes, human oversight, rigorous testing (stress, chaos, adversarial), incident response",
|
||||
"Monitoring: Error rates, near-miss incidents, safety metrics (adverse events), compliance audits, real-time alerts"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Underestimating tail risks (low probability high impact events dismissed)",
|
||||
"Assuming technical safety alone (ignoring human factors, socio-technical risks)",
|
||||
"No graceful degradation (system fails completely rather than degraded mode)"
|
||||
]
|
||||
}
|
||||
},
|
||||
"guidance_by_complexity": {
|
||||
"Simple/Low-Risk": {
|
||||
"target_score": 3.5,
|
||||
"description": "Limited scope, low stakes, reversible, small user base, no vulnerable groups primary users",
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification (≥3): Primary stakeholders clear, consider if any vulnerable groups affected",
|
||||
"Harm Analysis (≥3): Key harms identified with mechanisms, severity/likelihood scored",
|
||||
"Mitigation (≥3): Mitigations for high-risk harms, owners assigned",
|
||||
"Monitoring (≥3): Basic metrics, thresholds, review schedule"
|
||||
],
|
||||
"time_estimate": "4-8 hours",
|
||||
"examples": [
|
||||
"UI redesign for internal tool (low external impact)",
|
||||
"Feature flag for optional enhancement (user opt-in)",
|
||||
"Non-sensitive data analytics (no PII)"
|
||||
]
|
||||
},
|
||||
"Moderate/Medium-Risk": {
|
||||
"target_score": 4.0,
|
||||
"description": "Broader scope, moderate stakes, affects diverse users, some vulnerable groups, decisions partially reversible",
|
||||
"key_requirements": [
|
||||
"Comprehensive stakeholder map with vulnerable group prioritization",
|
||||
"Harm/benefit analysis across types, second-order effects considered",
|
||||
"Fairness assessment if algorithmic or differential impact likely",
|
||||
"Risk prioritization with justification, focus on red/orange harms",
|
||||
"Specific mitigations with effectiveness/tradeoffs, residual risk assessed",
|
||||
"Monitoring with disaggregated metrics, escalation protocol, staged rollout"
|
||||
],
|
||||
"time_estimate": "12-20 hours, stakeholder consultation",
|
||||
"examples": [
|
||||
"New user-facing feature with personalization",
|
||||
"Policy change affecting large user base",
|
||||
"Data collection expansion with privacy implications"
|
||||
]
|
||||
},
|
||||
"Complex/High-Risk": {
|
||||
"target_score": 4.3,
|
||||
"description": "System-level impact, high stakes, irreversible harm possible, vulnerable groups primary, algorithmic/high-sensitivity decisions",
|
||||
"key_requirements": [
|
||||
"Deep stakeholder analysis with intersectionality, power dynamics, meaningful participation",
|
||||
"Comprehensive harm analysis (all types), second-order and long-term effects, feedback loops",
|
||||
"Rigorous fairness assessment with quantitative metrics, appropriate fairness definitions",
|
||||
"FMEA or Fault Tree Analysis for safety-critical, worst-case scenarios",
|
||||
"Prevent/reduce mitigations (not just detect/respond), redundancy, fail-safes, kill switches",
|
||||
"Real-time monitoring, bias dashboards, participatory audits, external review",
|
||||
"Precautionary principle (prove safety before launch), staged rollout, continuous oversight"
|
||||
],
|
||||
"time_estimate": "40-80 hours, ethics board review, external audit",
|
||||
"examples": [
|
||||
"Algorithmic hiring/lending/admissions decisions",
|
||||
"Medical AI diagnosis or treatment recommendations",
|
||||
"Content moderation at scale affecting speech",
|
||||
"Surveillance or sensitive data processing",
|
||||
"Features targeting children or vulnerable populations"
|
||||
]
|
||||
}
|
||||
},
|
||||
"common_failure_modes": [
|
||||
{
|
||||
"failure": "Missing vulnerable groups",
|
||||
"symptom": "Assessment claims 'no vulnerable groups affected' or only lists obvious majority stakeholders",
|
||||
"detection": "Checklist vulnerable categories (children, elderly, disabled, racial minorities, low-income, LGBTQ+, etc.) - if none apply, likely oversight",
|
||||
"fix": "Explicitly consider each vulnerable category, intersectionality, indirect effects. If truly none affected, document reasoning."
|
||||
},
|
||||
{
|
||||
"failure": "Assuming equal treatment = fairness",
|
||||
"symptom": "'We treat everyone the same' stated as fairness defense, no disparate impact analysis, colorblind approach",
|
||||
"detection": "No quantitative fairness metrics, no disaggregation by protected group, claims of neutrality without evidence",
|
||||
"fix": "Collect demographic data (with consent), measure outcomes by group, assess disparate impact. Equal treatment of unequal groups can perpetuate inequality."
|
||||
},
|
||||
{
|
||||
"failure": "Reactive mitigation only",
|
||||
"symptom": "Mitigations are appeals/redress after harm, no prevention, 'we'll fix it if problems arise', move fast and break things",
|
||||
"detection": "No design changes to prevent harm, only detection/response mechanisms, no staged rollout or testing with affected groups",
|
||||
"fix": "Prioritize prevent/reduce mitigations, build safeguards into design, test with diverse users before launch, staged rollout with monitoring, kill switches."
|
||||
},
|
||||
{
|
||||
"failure": "No monitoring or vague metrics",
|
||||
"symptom": "Monitoring section says 'we will track metrics' without specifying which, or 'user feedback' without thresholds",
|
||||
"detection": "No specific metrics named, no thresholds for concern, no disaggregation by group, no escalation triggers",
|
||||
"fix": "Define precise metrics (what, how measured, from what data), baseline and target values, thresholds that trigger action, disaggregate by protected groups, assign monitoring owner."
|
||||
},
|
||||
{
|
||||
"failure": "Ignoring second-order effects",
|
||||
"symptom": "Only immediate/obvious harms listed, no consideration of feedback loops, normalization, precedent, accumulation",
|
||||
"detection": "Ask 'What happens next? If this harms Group X, does that create conditions for more harm? Does this normalize a practice? Enable future worse behavior?'",
|
||||
"fix": "Explicitly analyze: Feedback loops (harm → disadvantage → more harm), Accumulation (small harms compound), Normalization (practice becomes standard), Precedent (what does this enable?)"
|
||||
},
|
||||
{
|
||||
"failure": "No transparency or recourse",
|
||||
"symptom": "Decisions not explained to affected parties, no appeals process, opacity justified as 'proprietary' or 'too complex'",
|
||||
"detection": "Assessment doesn't mention explainability, appeals, audit trails, or dismisses as infeasible",
|
||||
"fix": "Build in transparency (explain decisions in plain language, disclose limitations), appeals with human review, audit trails for investigation. Opacity often masks bias or risk."
|
||||
},
|
||||
{
|
||||
"failure": "Sampling bias in testing",
|
||||
"symptom": "Testing only with employees, privileged users, English speakers; diverse users not represented",
|
||||
"detection": "Test group demographics described as 'internal team', 'beta users' without diversity analysis",
|
||||
"fix": "Recruit testers from affected populations, especially vulnerable groups most at risk. Compensate for their time. Test across devices, languages, abilities, contexts."
|
||||
},
|
||||
{
|
||||
"failure": "False precision in risk scores",
|
||||
"symptom": "Severity and likelihood scored without justification, numbers seem arbitrary, no evidence or precedent cited",
|
||||
"detection": "Risk scores provided but no explanation why 'Severity=4' vs 'Severity=3', no reference to similar incidents",
|
||||
"fix": "Ground severity/likelihood in evidence: Historical incidents, expert judgment, user research, industry benchmarks. If uncertain, use ranges. Document reasoning."
|
||||
},
|
||||
{
|
||||
"failure": "Privacy-fairness tradeoff ignored",
|
||||
"symptom": "Claims 'we don't collect race/gender to protect privacy' but also no fairness audit, or collects data but no strong protections",
|
||||
"detection": "Either no demographic data AND no fairness analysis, OR demographic data collected without access controls/purpose limitation",
|
||||
"fix": "Balance: Collect minimal demographic data necessary for fairness auditing (with consent, strong access controls, aggregate-only reporting, differential privacy). Can't audit bias without data."
|
||||
},
|
||||
{
|
||||
"failure": "One-time assessment, no updates",
|
||||
"symptom": "Assessment completed at launch, no plan for ongoing monitoring, assumes static system",
|
||||
"detection": "No review schedule, no drift detection, no process for updating assessment as system evolves",
|
||||
"fix": "Continuous monitoring (daily/weekly/monthly/quarterly depending on risk), scenario validation (are harms emerging as predicted?), update assessment when system changes, feedback loop to strategy."
|
||||
}
|
||||
]
|
||||
}
|
||||
416
skills/ethics-safety-impact/resources/methodology.md
Normal file
416
skills/ethics-safety-impact/resources/methodology.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Ethics, Safety & Impact Assessment Methodology
|
||||
|
||||
Advanced techniques for fairness metrics, privacy analysis, safety assessment, bias detection, and participatory design.
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Ethics & Safety Assessment Progress:
|
||||
- [ ] Step 1: Map stakeholders and identify vulnerable groups
|
||||
- [ ] Step 2: Analyze potential harms and benefits
|
||||
- [ ] Step 3: Assess fairness and differential impacts
|
||||
- [ ] Step 4: Evaluate severity and likelihood
|
||||
- [ ] Step 5: Design mitigations and safeguards
|
||||
- [ ] Step 6: Define monitoring and escalation protocols
|
||||
```
|
||||
|
||||
**Step 1: Map stakeholders and identify vulnerable groups**
|
||||
|
||||
Apply stakeholder analysis and vulnerability assessment frameworks.
|
||||
|
||||
**Step 2: Analyze potential harms and benefits**
|
||||
|
||||
Use harm taxonomies and benefit frameworks to systematically catalog impacts.
|
||||
|
||||
**Step 3: Assess fairness and differential impacts**
|
||||
|
||||
Apply [1. Fairness Metrics](#1-fairness-metrics) to measure group disparities.
|
||||
|
||||
**Step 4: Evaluate severity and likelihood**
|
||||
|
||||
Use [2. Safety Assessment Methods](#2-safety-assessment-methods) for safety-critical systems.
|
||||
|
||||
**Step 5: Design mitigations and safeguards**
|
||||
|
||||
Apply [3. Mitigation Strategies](#3-mitigation-strategies) and [4. Privacy-Preserving Techniques](#4-privacy-preserving-techniques).
|
||||
|
||||
**Step 6: Define monitoring and escalation protocols**
|
||||
|
||||
Implement [5. Bias Detection in Deployment](#5-bias-detection-in-deployment) and participatory oversight.
|
||||
|
||||
---
|
||||
|
||||
## 1. Fairness Metrics
|
||||
|
||||
Mathematical definitions of fairness for algorithmic systems.
|
||||
|
||||
### Group Fairness Metrics
|
||||
|
||||
**Demographic Parity** (Statistical Parity, Independence)
|
||||
- **Definition**: Positive outcome rate equal across groups
|
||||
- **Formula**: P(Ŷ=1 | A=a) = P(Ŷ=1 | A=b) for all groups a, b
|
||||
- **When to use**: When equal representation in positive outcomes is goal (admissions, hiring pipelines)
|
||||
- **Limitations**: Ignores base rates, may require different treatment to achieve equal outcomes
|
||||
- **Example**: 40% approval rate for all racial groups
|
||||
|
||||
**Equalized Odds** (Error Rate Balance)
|
||||
- **Definition**: False positive and false negative rates equal across groups
|
||||
- **Formula**: P(Ŷ=1 | Y=y, A=a) = P(Ŷ=1 | Y=y, A=b) for all y, a, b
|
||||
- **When to use**: When fairness means equal error rates (lending, criminal justice, medical diagnosis)
|
||||
- **Strengths**: Accounts for true outcomes, balances burden of errors
|
||||
- **Example**: 5% false positive rate and 10% false negative rate for all groups
|
||||
|
||||
**Equal Opportunity** (True Positive Rate Parity)
|
||||
- **Definition**: True positive rate equal across groups (among qualified individuals)
|
||||
- **Formula**: P(Ŷ=1 | Y=1, A=a) = P(Ŷ=1 | Y=1, A=b)
|
||||
- **When to use**: When accessing benefit/opportunity is key concern (scholarships, job offers)
|
||||
- **Strengths**: Ensures qualified members of all groups have equal chance
|
||||
- **Example**: 80% of qualified applicants from each group receive offers
|
||||
|
||||
**Calibration** (Test Fairness)
|
||||
- **Definition**: Predicted probabilities match observed frequencies for all groups
|
||||
- **Formula**: P(Y=1 | Ŷ=p, A=a) = P(Y=1 | Ŷ=p, A=b) = p
|
||||
- **When to use**: When probability scores used for decision-making (risk scores, credit scores)
|
||||
- **Strengths**: Predictions are well-calibrated across groups
|
||||
- **Example**: Among all applicants scored 70%, 70% actually repay loan in each group
|
||||
|
||||
**Disparate Impact** (80% Rule, Four-Fifths Rule)
|
||||
- **Definition**: Selection rate for protected group ≥ 80% of selection rate for reference group
|
||||
- **Formula**: P(Ŷ=1 | A=protected) / P(Ŷ=1 | A=reference) ≥ 0.8
|
||||
- **When to use**: Legal compliance (EEOC guidelines for hiring, lending)
|
||||
- **Regulatory threshold**: <0.8 triggers investigation, <0.5 strong evidence of discrimination
|
||||
- **Example**: If 50% of white applicants hired, ≥40% of Black applicants should be hired
|
||||
|
||||
### Individual Fairness Metrics
|
||||
|
||||
**Similar Individuals Treated Similarly** (Lipschitz Fairness)
|
||||
- **Definition**: Similar inputs receive similar outputs, regardless of protected attributes
|
||||
- **Formula**: d(f(x), f(x')) ≤ L · d(x, x') where d is distance metric, L is Lipschitz constant
|
||||
- **When to use**: When individual treatment should depend on relevant factors only
|
||||
- **Challenge**: Defining "similarity" in a fair way (what features are relevant?)
|
||||
- **Example**: Two loan applicants with same income, credit history get similar rates regardless of race
|
||||
|
||||
**Counterfactual Fairness**
|
||||
- **Definition**: Outcome would be same if protected attribute were different (causal fairness)
|
||||
- **Formula**: P(Ŷ | A=a, X=x) = P(Ŷ | A=a', X=x) for all a, a'
|
||||
- **When to use**: When causal reasoning appropriate, can model interventions
|
||||
- **Strengths**: Captures intuition "would outcome change if only race/gender differed?"
|
||||
- **Example**: Applicant's loan decision same whether they're coded male or female
|
||||
|
||||
### Fairness Tradeoffs
|
||||
|
||||
**Impossibility results**: Cannot satisfy all fairness definitions simultaneously (except in trivial cases)
|
||||
|
||||
Key tradeoffs:
|
||||
- **Demographic parity vs. calibration**: If base rates differ, cannot have both (Chouldechova 2017)
|
||||
- **Equalized odds vs. calibration**: Generally incompatible unless perfect accuracy (Kleinberg et al. 2017)
|
||||
- **Individual vs. group fairness**: Treating similar individuals similarly may still produce group disparities
|
||||
|
||||
**Choosing metrics**: Context-dependent
|
||||
- **High-stakes binary decisions** (hire/fire, admit/reject): Equalized odds or equal opportunity
|
||||
- **Scored rankings** (credit scores, risk assessments): Calibration
|
||||
- **Access to benefits** (scholarships, programs): Demographic parity or equal opportunity
|
||||
- **Legal compliance**: Disparate impact (80% rule)
|
||||
|
||||
### Fairness Auditing Process
|
||||
|
||||
1. **Identify protected groups**: Race, gender, age, disability, religion, national origin, etc.
|
||||
2. **Collect disaggregated data**: Outcome metrics by group (requires demographic data collection with consent)
|
||||
3. **Compute fairness metrics**: Calculate demographic parity, equalized odds, disparate impact across groups
|
||||
4. **Test statistical significance**: Are differences statistically significant or due to chance? (Chi-square, t-tests)
|
||||
5. **Investigate causes**: If unfair, why? Biased training data? Proxy features? Measurement error?
|
||||
6. **Iterate on mitigation**: Debiasing techniques, fairness constraints, data augmentation, feature engineering
|
||||
|
||||
---
|
||||
|
||||
## 2. Safety Assessment Methods
|
||||
|
||||
Systematic techniques for identifying and mitigating safety risks.
|
||||
|
||||
### Failure Mode and Effects Analysis (FMEA)
|
||||
|
||||
**Purpose**: Identify ways system can fail and prioritize mitigation efforts
|
||||
|
||||
**Process**:
|
||||
1. **Decompose system**: Break into components, functions
|
||||
2. **Identify failure modes**: For each component, how can it fail? (hardware failure, software bug, human error, environmental condition)
|
||||
3. **Analyze effects**: What happens if this fails? Local effect? System effect? End effect?
|
||||
4. **Score severity** (1-10): 1 = negligible, 10 = catastrophic (injury, death)
|
||||
5. **Score likelihood** (1-10): 1 = rare, 10 = very likely
|
||||
6. **Score detectability** (1-10): 1 = easily detected, 10 = undetectable before harm
|
||||
7. **Compute Risk Priority Number (RPN)**: RPN = Severity × Likelihood × Detectability
|
||||
8. **Prioritize**: High RPN = high priority for mitigation
|
||||
9. **Design mitigations**: Eliminate failure mode, reduce likelihood, improve detection, add safeguards
|
||||
10. **Re-compute RPN**: Has mitigation adequately reduced risk?
|
||||
|
||||
**Example - Medical AI diagnosis**:
|
||||
- **Failure mode**: AI misclassifies cancer as benign
|
||||
- **Effect**: Patient not treated, cancer progresses, death
|
||||
- **Severity**: 10 (death)
|
||||
- **Likelihood**: 3 (5% false negative rate)
|
||||
- **Detectability**: 8 (hard to catch without second opinion)
|
||||
- **RPN**: 10×3×8 = 240 (high, requires mitigation)
|
||||
- **Mitigation**: Human review of all negative diagnoses, require 2nd AI model for confirmation, patient follow-up at 3 months
|
||||
- **New RPN**: 10×1×3 = 30 (acceptable)
|
||||
|
||||
### Fault Tree Analysis (FTA)
|
||||
|
||||
**Purpose**: Identify root causes that lead to hazard (top-down causal reasoning)
|
||||
|
||||
**Process**:
|
||||
1. **Define top event** (hazard): e.g., "Patient receives wrong medication"
|
||||
2. **Work backward**: What immediate causes could lead to top event?
|
||||
- Use logic gates: AND (all required), OR (any sufficient)
|
||||
3. **Decompose recursively**: For each cause, what are its causes?
|
||||
4. **Reach basic events**: Hardware failure, software bug, human error, environmental
|
||||
5. **Compute probability**: If basic event probabilities known, compute probability of top event
|
||||
6. **Find minimal cut sets**: Smallest combinations of basic events that cause top event
|
||||
7. **Prioritize mitigations**: Address minimal cut sets (small changes with big safety impact)
|
||||
|
||||
**Example - Wrong medication**:
|
||||
- Top event: Patient receives wrong medication (OR gate)
|
||||
- Path 1: Prescription error (AND gate)
|
||||
- Doctor prescribes wrong drug (human error)
|
||||
- Pharmacist doesn't catch (human error)
|
||||
- Path 2: Dispensing error (AND gate)
|
||||
- Correct prescription but wrong drug selected (human error)
|
||||
- Barcode scanner fails (equipment failure)
|
||||
- Path 3: Administration error
|
||||
- Nurse administers wrong drug (human error)
|
||||
- Minimal cut sets: {Doctor error AND Pharmacist error}, {Dispensing error AND Scanner failure}, {Nurse error}
|
||||
- Mitigation: Double-check systems (reduce AND probability), barcode scanning (detect errors), nurse training (reduce error rate)
|
||||
|
||||
### Hazard and Operability Study (HAZOP)
|
||||
|
||||
**Purpose**: Systematic brainstorming to find deviations from intended operation
|
||||
|
||||
**Process**:
|
||||
1. **Divide system into nodes**: Functional components
|
||||
2. **For each node, apply guide words**: NO, MORE, LESS, AS WELL AS, PART OF, REVERSE, OTHER THAN
|
||||
3. **Identify deviations**: Guide word + parameter (e.g., "MORE pressure", "NO flow")
|
||||
4. **Analyze causes**: What could cause this deviation?
|
||||
5. **Analyze consequences**: What harm results?
|
||||
6. **Propose safeguards**: Detection, prevention, mitigation
|
||||
|
||||
**Example - Content moderation system**:
|
||||
- Node: Content moderation AI
|
||||
- Deviation: "MORE false positives" (over-moderation)
|
||||
- Causes: Model too aggressive, training data skewed, threshold too low
|
||||
- Consequences: Silencing legitimate speech, especially marginalized voices
|
||||
- Safeguards: Appeals process, human review sample, error rate dashboard by demographic
|
||||
- Deviation: "NO moderation" (under-moderation)
|
||||
- Causes: Model failure, overwhelming volume, adversarial evasion
|
||||
- Consequences: Harmful content remains, harassment, misinformation spreads
|
||||
- Safeguards: Redundant systems, rate limiting, user reporting, human backup
|
||||
|
||||
### Worst-Case Scenario Analysis
|
||||
|
||||
**Purpose**: Stress-test system against extreme but plausible threats
|
||||
|
||||
**Process**:
|
||||
1. **Brainstorm worst cases**: Adversarial attacks, cascading failures, edge cases, Murphy's Law
|
||||
2. **Assess plausibility**: Could this actually happen? Historical precedents?
|
||||
3. **Estimate impact**: If it happened, how bad?
|
||||
4. **Identify single points of failure**: What one thing, if it fails, causes catastrophe?
|
||||
5. **Design resilience**: Redundancy, fail-safes, graceful degradation, circuit breakers
|
||||
6. **Test**: Chaos engineering, red teaming, adversarial testing
|
||||
|
||||
**Examples**:
|
||||
- **AI model**: Adversary crafts inputs that fool model (adversarial examples) → Test robustness, ensemble models
|
||||
- **Data breach**: All user data leaked → Encrypt data, minimize collection, differential privacy
|
||||
- **Bias amplification**: Feedback loop causes AI to become more biased over time → Monitor drift, periodic retraining, fairness constraints
|
||||
- **Denial of service**: System overwhelmed by load → Rate limiting, auto-scaling, graceful degradation
|
||||
|
||||
---
|
||||
|
||||
## 3. Mitigation Strategies
|
||||
|
||||
Taxonomy of interventions to reduce harm.
|
||||
|
||||
### Prevention (Eliminate Harm)
|
||||
|
||||
**Design out the risk**:
|
||||
- Don't collect sensitive data you don't need (data minimization)
|
||||
- Don't build risky features (dark patterns, addictive mechanics, manipulation)
|
||||
- Use less risky alternatives (aggregate statistics vs. individual data, contextual recommendations vs. behavioral targeting)
|
||||
|
||||
**Examples**:
|
||||
- Instead of collecting browsing history, use contextual ads (keywords on current page)
|
||||
- Instead of infinite scroll (addiction), paginate with clear endpoints
|
||||
- Instead of storing plaintext passwords, use salted hashes (can't be leaked)
|
||||
|
||||
### Reduction (Decrease Likelihood or Severity)
|
||||
|
||||
**Technical mitigations**:
|
||||
- Rate limiting (prevent abuse)
|
||||
- Friction (slow down impulsive harmful actions - time delays, confirmations, warnings)
|
||||
- Debiasing algorithms (pre-processing data, in-processing fairness constraints, post-processing calibration)
|
||||
- Differential privacy (add noise to protect individuals while preserving aggregate statistics)
|
||||
|
||||
**Process mitigations**:
|
||||
- Staged rollouts (limited exposure to catch problems early)
|
||||
- A/B testing (measure impact before full deployment)
|
||||
- Diverse teams (more perspectives catch more problems)
|
||||
- External audits (independent review)
|
||||
|
||||
**Examples**:
|
||||
- Limit posts per hour to prevent spam
|
||||
- Require confirmation before deleting account or posting sensitive content
|
||||
- Apply fairness constraints during model training to reduce disparate impact
|
||||
- Release to 1% of users, monitor for issues, then scale
|
||||
|
||||
### Detection (Monitor and Alert)
|
||||
|
||||
**Dashboards**: Real-time metrics on harm indicators (error rates by group, complaints, safety incidents)
|
||||
|
||||
**Anomaly detection**: Alert when metrics deviate from baseline (spike in false positives, drop in engagement from specific group)
|
||||
|
||||
**User reporting**: Easy channels for reporting harms, responsive investigation
|
||||
|
||||
**Audit logs**: Track decisions for later investigation (who accessed what data, which users affected by algorithm)
|
||||
|
||||
**Examples**:
|
||||
- Bias dashboard showing approval rates by race, gender, age updated daily
|
||||
- Alert if moderation false positive rate >2× baseline for any language
|
||||
- "Report this" button on all content with category options
|
||||
- Log all loan denials with reason codes for audit
|
||||
|
||||
### Response (Address Harm When Found)
|
||||
|
||||
**Appeals**: Process for contesting decisions (human review, overturn if wrong)
|
||||
|
||||
**Redress**: Compensate those harmed (refunds, apologies, corrective action)
|
||||
|
||||
**Incident response**: Playbook for handling harms (who to notify, how to investigate, when to escalate, communication plan)
|
||||
|
||||
**Iterative improvement**: Learn from incidents to prevent recurrence
|
||||
|
||||
**Examples**:
|
||||
- Allow users to appeal content moderation decisions, review within 48 hours
|
||||
- Offer compensation to users affected by outage or data breach
|
||||
- If bias detected, pause system, investigate, retrain model, re-audit before re-launch
|
||||
- Publish transparency report on harms, mitigations, outcomes
|
||||
|
||||
### Safeguards (Redundancy and Fail-Safes)
|
||||
|
||||
**Human oversight**: Human in the loop (review all decisions) or human on the loop (review samples, alert on anomalies)
|
||||
|
||||
**Redundancy**: Multiple independent systems, consensus required
|
||||
|
||||
**Fail-safes**: If system fails, default to safe state (e.g., medical device fails → alarm, not silent failure)
|
||||
|
||||
**Circuit breakers**: Kill switches to shut down harmful features quickly
|
||||
|
||||
**Examples**:
|
||||
- High-stakes decisions (loan denial, medical diagnosis, criminal sentencing) require human review
|
||||
- Two independent AI models must agree before autonomous action
|
||||
- If fraud detection fails, default to human review rather than approving all transactions
|
||||
- CEO can halt product launch if ethics concerns raised, even at last minute
|
||||
|
||||
---
|
||||
|
||||
## 4. Privacy-Preserving Techniques
|
||||
|
||||
Methods to protect individual privacy while enabling data use.
|
||||
|
||||
### Data Minimization
|
||||
|
||||
- **Collect only necessary data**: Purpose limitation (collect only for stated purpose), don't collect "just in case"
|
||||
- **Aggregate where possible**: Avoid individual-level data when population-level sufficient
|
||||
- **Short retention**: Delete data when no longer needed, enforce retention limits
|
||||
|
||||
### De-identification
|
||||
|
||||
**Anonymization**: Remove direct identifiers (name, SSN, email)
|
||||
- **Limitation**: Re-identification attacks possible (linkage to other datasets, inference from quasi-identifiers)
|
||||
- **Example**: Netflix dataset de-identified, but researchers re-identified users by linking to IMDB reviews
|
||||
|
||||
**K-anonymity**: Each record indistinguishable from k-1 others (generalize quasi-identifiers like zip, age, gender)
|
||||
- **Limitation**: Attribute disclosure (if all k records have same sensitive attribute), composition attacks
|
||||
- **Example**: {Age=32, Zip=12345} → {Age=30-35, Zip=123**}
|
||||
|
||||
**Differential Privacy**: Add calibrated noise such that individual's presence/absence doesn't significantly change query results
|
||||
- **Definition**: P(M(D) = O) / P(M(D') = O) ≤ e^ε where D, D' differ by one person, ε is privacy budget
|
||||
- **Strengths**: Provable privacy guarantee, composes well, resistant to post-processing
|
||||
- **Limitations**: Accuracy-privacy tradeoff (more privacy → more noise → less accuracy), privacy budget exhausted over many queries
|
||||
- **Example**: Census releases aggregate statistics with differential privacy (Apple, Google use for local learning)
|
||||
|
||||
### Access Controls
|
||||
|
||||
- **Least privilege**: Users access only data needed for their role
|
||||
- **Audit logs**: Track who accessed what data when, detect anomalies
|
||||
- **Encryption**: At rest (storage), in transit (network), in use (processing)
|
||||
- **Multi-party computation**: Compute on encrypted data without decrypting
|
||||
|
||||
### Consent and Control
|
||||
|
||||
- **Granular consent**: Opt-in for each purpose, not blanket consent
|
||||
- **Transparency**: Explain what data collected, how used, who it's shared with (in plain language)
|
||||
- **User controls**: Export data (GDPR right to portability), delete data (right to erasure), opt-out of processing
|
||||
- **Meaningful choice**: Consent not coerced (service available without consent for non-essential features)
|
||||
|
||||
---
|
||||
|
||||
## 5. Bias Detection in Deployment
|
||||
|
||||
Ongoing monitoring to detect and respond to bias post-launch.
|
||||
|
||||
### Bias Dashboards
|
||||
|
||||
**Disaggregated metrics**: Track outcomes by protected groups (race, gender, age, disability)
|
||||
- Approval/rejection rates
|
||||
- False positive/negative rates
|
||||
- Recommendation quality (precision, recall, ranking)
|
||||
- User engagement (click-through, conversion, retention)
|
||||
|
||||
**Visualizations**:
|
||||
- Bar charts showing metric by group, flag >20% disparities
|
||||
- Time series to detect drift (bias increasing over time?)
|
||||
- Heatmaps for intersectional analysis (race × gender)
|
||||
|
||||
**Alerting**: Automated alerts when disparity crosses threshold
|
||||
|
||||
### Drift Detection
|
||||
|
||||
**Distribution shift**: Has data distribution changed since training? (Covariate shift, concept drift)
|
||||
- Monitor input distributions, flag anomalies
|
||||
- Retrain periodically on recent data
|
||||
|
||||
**Performance degradation**: Is model accuracy declining? For which groups?
|
||||
- A/B test new model vs. old continuously
|
||||
- Track metrics by group, ensure improvements don't harm any group
|
||||
|
||||
**Feedback loops**: Is model changing environment in ways that amplify bias?
|
||||
- Example: Predictive policing → more arrests in flagged areas → more training data from those areas → more policing (vicious cycle)
|
||||
- Monitor for amplification: Are disparities increasing over time?
|
||||
|
||||
### Participatory Auditing
|
||||
|
||||
**Stakeholder involvement**: Include affected groups in oversight, not just internal teams
|
||||
- Community advisory boards
|
||||
- Public comment periods
|
||||
- Transparency reports reviewed by civil society
|
||||
|
||||
**Contests**: Bug bounties for finding bias (reward researchers/users who identify fairness issues)
|
||||
|
||||
**External audits**: Independent third-party assessment (not self-regulation)
|
||||
|
||||
---
|
||||
|
||||
## 6. Common Pitfalls
|
||||
|
||||
**Fairness theater**: Performative statements without material changes. Impact assessments must change decisions, not just document them.
|
||||
|
||||
**Sampling bias in testing**: Testing only on employees (young, educated, English-speaking) misses how diverse users experience harm. Test with actual affected populations.
|
||||
|
||||
**Assuming "colorblind" = fair**: Not collecting race data doesn't eliminate bias, it makes bias invisible and impossible to audit. Collect demographic data (with consent and safeguards) to measure fairness.
|
||||
|
||||
**Optimization without constraints**: Maximizing engagement/revenue unconstrained leads to amplifying outrage, addiction, polarization. Set ethical boundaries as constraints, not just aspirations.
|
||||
|
||||
**Privacy vs. fairness tradeoff**: Can't audit bias without demographic data. Balance: Collect minimal data necessary for fairness auditing, strong access controls, differential privacy.
|
||||
|
||||
**One-time assessment**: Ethics is not a launch checkbox. Continuous monitoring required as systems evolve, data drifts, harms emerge over time.
|
||||
|
||||
**Technochauvinism**: Believing technical fixes alone solve social problems. Bias mitigation algorithms can help but can't replace addressing root causes (historical discrimination, structural inequality).
|
||||
|
||||
**Moving fast and apologizing later**: For safety/ethics, prevention > apology. Harms to vulnerable groups are not acceptable experiments. Staged rollouts, kill switches, continuous monitoring required.
|
||||
398
skills/ethics-safety-impact/resources/template.md
Normal file
398
skills/ethics-safety-impact/resources/template.md
Normal file
@@ -0,0 +1,398 @@
|
||||
# Ethics, Safety & Impact Assessment Templates
|
||||
|
||||
Quick-start templates for stakeholder mapping, harm/benefit analysis, fairness evaluation, risk prioritization, mitigation planning, and monitoring.
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Ethics & Safety Assessment Progress:
|
||||
- [ ] Step 1: Map stakeholders and identify vulnerable groups
|
||||
- [ ] Step 2: Analyze potential harms and benefits
|
||||
- [ ] Step 3: Assess fairness and differential impacts
|
||||
- [ ] Step 4: Evaluate severity and likelihood
|
||||
- [ ] Step 5: Design mitigations and safeguards
|
||||
- [ ] Step 6: Define monitoring and escalation protocols
|
||||
```
|
||||
|
||||
**Step 1: Map stakeholders and identify vulnerable groups**
|
||||
|
||||
Use [Stakeholder Mapping Template](#stakeholder-mapping-template) to identify all affected parties and prioritize vulnerable populations.
|
||||
|
||||
**Step 2: Analyze potential harms and benefits**
|
||||
|
||||
Brainstorm harms and benefits for each stakeholder group using [Harm/Benefit Analysis Template](#harmbenefit-analysis-template).
|
||||
|
||||
**Step 3: Assess fairness and differential impacts**
|
||||
|
||||
Evaluate outcome, treatment, and access disparities using [Fairness Assessment Template](#fairness-assessment-template).
|
||||
|
||||
**Step 4: Evaluate severity and likelihood**
|
||||
|
||||
Prioritize risks using [Risk Matrix Template](#risk-matrix-template) scoring severity and likelihood.
|
||||
|
||||
**Step 5: Design mitigations and safeguards**
|
||||
|
||||
Plan interventions using [Mitigation Planning Template](#mitigation-planning-template) for high-priority harms.
|
||||
|
||||
**Step 6: Define monitoring and escalation protocols**
|
||||
|
||||
Set up ongoing oversight using [Monitoring Framework Template](#monitoring-framework-template).
|
||||
|
||||
---
|
||||
|
||||
## Stakeholder Mapping Template
|
||||
|
||||
### Primary Stakeholders (directly affected)
|
||||
|
||||
**Group 1**: [Name of stakeholder group]
|
||||
- **Size/reach**: [How many people?]
|
||||
- **Relationship**: [How do they interact with feature/decision?]
|
||||
- **Power/voice**: [Can they advocate for themselves? High/Medium/Low]
|
||||
- **Vulnerability factors**: [Age, disability, marginalization, economic precarity, etc.]
|
||||
- **Priority**: [High/Medium/Low risk]
|
||||
|
||||
**Group 2**: [Name of stakeholder group]
|
||||
- **Size/reach**:
|
||||
- **Relationship**:
|
||||
- **Power/voice**:
|
||||
- **Vulnerability factors**:
|
||||
- **Priority**:
|
||||
|
||||
[Add more groups as needed]
|
||||
|
||||
### Secondary Stakeholders (indirectly affected)
|
||||
|
||||
**Group**: [Name]
|
||||
- **How affected**: [Indirect impact mechanism]
|
||||
- **Priority**: [High/Medium/Low risk]
|
||||
|
||||
### Societal/Systemic Impacts
|
||||
|
||||
- **Norms affected**: [What behaviors/expectations might shift?]
|
||||
- **Precedents set**: [What does this enable or legitimize for future?]
|
||||
- **Long-term effects**: [Cumulative, feedback loops, structural changes]
|
||||
|
||||
### Vulnerable Groups Prioritization
|
||||
|
||||
Check all that apply and note specific considerations:
|
||||
|
||||
- [ ] **Children** (<18): Special protections needed (consent, safety, development impact)
|
||||
- [ ] **Elderly** (>65): Accessibility, digital literacy, vulnerability to fraud
|
||||
- [ ] **People with disabilities**: Accessibility compliance, exclusion risk, safety
|
||||
- [ ] **Racial/ethnic minorities**: Historical discrimination, disparate impact, cultural sensitivity
|
||||
- [ ] **Low-income**: Economic harm, access barriers, inability to absorb costs
|
||||
- [ ] **LGBTQ+**: Safety in hostile contexts, privacy, outing risk
|
||||
- [ ] **Non-English speakers**: Language barriers, exclusion, misunderstanding
|
||||
- [ ] **Politically targeted**: Dissidents, journalists, activists (surveillance, safety)
|
||||
- [ ] **Other**: [Specify]
|
||||
|
||||
**Highest priority groups** (most vulnerable + highest risk):
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
---
|
||||
|
||||
## Harm/Benefit Analysis Template
|
||||
|
||||
For each stakeholder group, identify potential harms and benefits.
|
||||
|
||||
### Stakeholder Group: [Name]
|
||||
|
||||
#### Potential Benefits
|
||||
|
||||
**Benefit 1**: [Description]
|
||||
- **Type**: Economic, Social, Health, Autonomy, Access, Safety, etc.
|
||||
- **Magnitude**: [High/Medium/Low]
|
||||
- **Distribution**: [Who gets this benefit? Everyone or subset?]
|
||||
- **Timeline**: [Immediate, Short-term <1yr, Long-term >1yr]
|
||||
|
||||
**Benefit 2**: [Description]
|
||||
- **Type**:
|
||||
- **Magnitude**:
|
||||
- **Distribution**:
|
||||
- **Timeline**:
|
||||
|
||||
#### Potential Harms
|
||||
|
||||
**Harm 1**: [Description]
|
||||
- **Type**: Physical, Psychological, Economic, Social, Autonomy, Privacy, Reputational, Epistemic, Political
|
||||
- **Mechanism**: [How does harm occur?]
|
||||
- **Affected subgroup**: [Everyone or specific subset within stakeholder group?]
|
||||
- **Severity**: [1-5, where 5 = catastrophic]
|
||||
- **Likelihood**: [1-5, where 5 = very likely]
|
||||
- **Risk Score**: [Severity × Likelihood]
|
||||
|
||||
**Harm 2**: [Description]
|
||||
- **Type**:
|
||||
- **Mechanism**:
|
||||
- **Affected subgroup**:
|
||||
- **Severity**:
|
||||
- **Likelihood**:
|
||||
- **Risk Score**:
|
||||
|
||||
**Harm 3**: [Description]
|
||||
- **Type**:
|
||||
- **Mechanism**:
|
||||
- **Affected subgroup**:
|
||||
- **Severity**:
|
||||
- **Likelihood**:
|
||||
- **Risk Score**:
|
||||
|
||||
#### Second-Order Effects
|
||||
|
||||
- **Feedback loops**: [Does harm create conditions for more harm?]
|
||||
- **Accumulation**: [Do small harms compound over time?]
|
||||
- **Normalization**: [Does this normalize harmful practices?]
|
||||
- **Precedent**: [What does this enable others to do?]
|
||||
|
||||
---
|
||||
|
||||
## Fairness Assessment Template
|
||||
|
||||
### Outcome Fairness (results)
|
||||
|
||||
**Metric being measured**: [e.g., approval rate, error rate, recommendation quality]
|
||||
|
||||
**By group**:
|
||||
|
||||
| Group | Metric Value | Difference from Average | Disparate Impact Ratio |
|
||||
|-------|--------------|-------------------------|------------------------|
|
||||
| Group A | | | |
|
||||
| Group B | | | |
|
||||
| Group C | | | |
|
||||
| Overall | | - | - |
|
||||
|
||||
**Disparate Impact Ratio** = (Outcome rate for protected group) / (Outcome rate for reference group)
|
||||
- **> 0.8**: Generally acceptable (80% rule)
|
||||
- **< 0.8**: Potential disparate impact, investigate
|
||||
|
||||
**Questions**:
|
||||
- [ ] Are outcome rates similar across groups (within 20%)?
|
||||
- [ ] If not, is there a legitimate justification?
|
||||
- [ ] Do error rates (false positives/negatives) differ across groups?
|
||||
- [ ] Who bears the burden of errors?
|
||||
|
||||
### Treatment Fairness (process)
|
||||
|
||||
**How decisions are made**: [Algorithm, human judgment, hybrid]
|
||||
|
||||
**By group**:
|
||||
|
||||
| Group | Treatment Description | Dignity/Respect | Transparency | Recourse |
|
||||
|-------|----------------------|-----------------|--------------|----------|
|
||||
| Group A | | High/Med/Low | High/Med/Low | High/Med/Low |
|
||||
| Group B | | High/Med/Low | High/Med/Low | High/Med/Low |
|
||||
|
||||
**Questions**:
|
||||
- [ ] Do all groups receive same quality of service/interaction?
|
||||
- [ ] Are decisions explained equally well to all groups?
|
||||
- [ ] Do all groups have equal access to appeals/recourse?
|
||||
- [ ] Are there cultural or language barriers affecting treatment?
|
||||
|
||||
### Access Fairness (opportunity)
|
||||
|
||||
**Barriers to access**:
|
||||
|
||||
| Barrier Type | Description | Affected Groups | Severity |
|
||||
|--------------|-------------|-----------------|----------|
|
||||
| Economic | [e.g., cost, credit required] | | High/Med/Low |
|
||||
| Technical | [e.g., device, internet, literacy] | | High/Med/Low |
|
||||
| Geographic | [e.g., location restrictions] | | High/Med/Low |
|
||||
| Physical | [e.g., accessibility, disability] | | High/Med/Low |
|
||||
| Social | [e.g., stigma, discrimination] | | High/Med/Low |
|
||||
| Legal | [e.g., documentation required] | | High/Med/Low |
|
||||
|
||||
**Questions**:
|
||||
- [ ] Can all groups access the service/benefit equally?
|
||||
- [ ] Are there unnecessary barriers that could be removed?
|
||||
- [ ] Do barriers disproportionately affect vulnerable groups?
|
||||
|
||||
### Intersectionality Check
|
||||
|
||||
**Combinations of identities that may face unique harms**:
|
||||
- Example: Black women (face both racial and gender bias)
|
||||
- Example: Elderly immigrants (language + digital literacy + age)
|
||||
|
||||
Groups to check:
|
||||
- [ ] Intersection of race and gender
|
||||
- [ ] Intersection of disability and age
|
||||
- [ ] Intersection of income and language
|
||||
- [ ] Other combinations: [Specify]
|
||||
|
||||
---
|
||||
|
||||
## Risk Matrix Template
|
||||
|
||||
Score each harm on Severity (1-5) and Likelihood (1-5). Prioritize high-risk (red/orange) harms for mitigation.
|
||||
|
||||
### Severity Scale
|
||||
|
||||
- **5 - Catastrophic**: Death, serious injury, irreversible harm, widespread impact
|
||||
- **4 - Major**: Significant harm, lasting impact, affects many people
|
||||
- **3 - Moderate**: Noticeable harm, temporary impact, affects some people
|
||||
- **2 - Minor**: Small harm, easily reversed, affects few people
|
||||
- **1 - Negligible**: Minimal harm, no lasting impact
|
||||
|
||||
### Likelihood Scale
|
||||
|
||||
- **5 - Very Likely**: >75% chance, expected to occur
|
||||
- **4 - Likely**: 50-75% chance, probable
|
||||
- **3 - Possible**: 25-50% chance, could happen
|
||||
- **2 - Unlikely**: 5-25% chance, improbable
|
||||
- **1 - Rare**: <5% chance, very unlikely
|
||||
|
||||
### Risk Matrix
|
||||
|
||||
| Harm | Stakeholder Group | Severity | Likelihood | Risk Score | Priority |
|
||||
|------|------------------|----------|------------|------------|----------|
|
||||
| [Harm 1 description] | [Group] | [1-5] | [1-5] | [S×L] | [Color] |
|
||||
| [Harm 2 description] | [Group] | [1-5] | [1-5] | [S×L] | [Color] |
|
||||
| [Harm 3 description] | [Group] | [1-5] | [1-5] | [S×L] | [Color] |
|
||||
|
||||
**Priority Color Coding**:
|
||||
- **Red** (Risk ≥15): Critical, must address before launch
|
||||
- **Orange** (Risk 9-14): High priority, address soon
|
||||
- **Yellow** (Risk 5-8): Monitor, mitigate if feasible
|
||||
- **Green** (Risk ≤4): Low priority, document and monitor
|
||||
|
||||
**Prioritized Harms** (Red + Orange):
|
||||
1. [Highest risk harm]
|
||||
2. [Second highest]
|
||||
3. [Third highest]
|
||||
|
||||
---
|
||||
|
||||
## Mitigation Planning Template
|
||||
|
||||
For each high-priority harm, design interventions.
|
||||
|
||||
### Harm: [Description of harm being mitigated]
|
||||
|
||||
**Affected Group**: [Who experiences this harm]
|
||||
|
||||
**Risk Score**: [Severity × Likelihood = X]
|
||||
|
||||
#### Mitigation Strategies
|
||||
|
||||
**Option 1: [Mitigation name]**
|
||||
- **Type**: Prevent, Reduce, Detect, Respond, Safeguard, Transparency, Empower
|
||||
- **Description**: [What is the intervention?]
|
||||
- **Effectiveness**: [How much does this reduce risk? High/Medium/Low]
|
||||
- **Cost/effort**: [Resources required? High/Medium/Low]
|
||||
- **Tradeoffs**: [What are downsides or tensions?]
|
||||
- **Owner**: [Who is responsible for implementation?]
|
||||
- **Timeline**: [By when?]
|
||||
|
||||
**Option 2: [Mitigation name]**
|
||||
- **Type**:
|
||||
- **Description**:
|
||||
- **Effectiveness**:
|
||||
- **Cost/effort**:
|
||||
- **Tradeoffs**:
|
||||
- **Owner**:
|
||||
- **Timeline**:
|
||||
|
||||
**Recommended Approach**: [Which option(s) to pursue and why]
|
||||
|
||||
#### Residual Risk
|
||||
|
||||
After mitigation:
|
||||
- **New severity**: [1-5]
|
||||
- **New likelihood**: [1-5]
|
||||
- **New risk score**: [S×L]
|
||||
|
||||
**Acceptable?**
|
||||
- [ ] Yes, residual risk is acceptable given tradeoffs
|
||||
- [ ] No, need additional mitigations
|
||||
- [ ] Escalate to [ethics committee/leadership/etc.]
|
||||
|
||||
#### Implementation Checklist
|
||||
|
||||
- [ ] Design changes specified
|
||||
- [ ] Testing plan includes affected groups
|
||||
- [ ] Documentation updated (policies, help docs, disclosures)
|
||||
- [ ] Training provided (if human review/moderation involved)
|
||||
- [ ] Monitoring metrics defined (see next template)
|
||||
- [ ] Review date scheduled (when to reassess)
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Framework Template
|
||||
|
||||
### Outcome Metrics
|
||||
|
||||
Track actual impacts post-launch to detect harms early.
|
||||
|
||||
**Metric 1**: [Metric name, e.g., "Approval rate parity"]
|
||||
- **Definition**: [Precisely what is measured]
|
||||
- **Measurement method**: [How calculated, from what data]
|
||||
- **Baseline**: [Current or expected value]
|
||||
- **Target**: [Goal value]
|
||||
- **Threshold for concern**: [Value that triggers action]
|
||||
- **Disaggregation**: [Break down by race, gender, age, disability, etc.]
|
||||
- **Frequency**: [Daily, weekly, monthly, quarterly]
|
||||
- **Owner**: [Who tracks and reports this]
|
||||
|
||||
**Metric 2**: [Metric name]
|
||||
- Definition, method, baseline, target, threshold, disaggregation, frequency, owner
|
||||
|
||||
**Metric 3**: [Metric name]
|
||||
- Definition, method, baseline, target, threshold, disaggregation, frequency, owner
|
||||
|
||||
### Leading Indicators & Qualitative Monitoring
|
||||
|
||||
- **Indicator 1**: [e.g., "User reports spike"] - Threshold: [level]
|
||||
- **Indicator 2**: [e.g., "Declining engagement Group Y"] - Threshold: [level]
|
||||
- **User feedback**: Channels for reporting concerns
|
||||
- **Community listening**: Forums, social media, support tickets
|
||||
- **Affected group outreach**: Check-ins with vulnerable communities
|
||||
|
||||
### Escalation Protocol
|
||||
|
||||
**Yellow Alert** (early warning):
|
||||
- **Trigger**: [e.g., Metric exceeds threshold by 10-20%]
|
||||
- **Response**: Investigate, analyze patterns, prepare report
|
||||
|
||||
**Orange Alert** (concerning):
|
||||
- **Trigger**: [e.g., Metric exceeds threshold by >20%, or multiple yellow alerts]
|
||||
- **Response**: Escalate to product/ethics team, begin mitigation planning
|
||||
|
||||
**Red Alert** (critical):
|
||||
- **Trigger**: [e.g., Serious harm reported, disparate impact >20%, safety incident]
|
||||
- **Response**: Escalate to leadership, pause rollout or rollback, immediate remediation
|
||||
|
||||
**Escalation Path**:
|
||||
1. First escalation: [Role/person]
|
||||
2. If unresolved or critical: [Role/person]
|
||||
3. Final escalation: [Ethics committee, CEO, board]
|
||||
|
||||
### Review Cadence
|
||||
|
||||
- **Daily**: Critical safety metrics (safety-critical systems only)
|
||||
- **Weekly**: User complaints, support tickets
|
||||
- **Monthly**: Outcome metrics, disparate impact dashboard
|
||||
- **Quarterly**: Comprehensive fairness audit
|
||||
- **Annually**: External audit, stakeholder consultation
|
||||
|
||||
### Audit & Accountability
|
||||
|
||||
- **Audits**: Internal (who, frequency), external (independent, when)
|
||||
- **Transparency**: What disclosed, where published
|
||||
- **Affected group consultation**: How vulnerable groups involved in oversight
|
||||
|
||||
---
|
||||
|
||||
## Complete Assessment Template
|
||||
|
||||
Full documentation structure combines all above templates:
|
||||
|
||||
1. **Context**: Feature/decision description, problem, alternatives
|
||||
2. **Stakeholder Analysis**: Use Stakeholder Mapping Template
|
||||
3. **Harm & Benefit Analysis**: Use Harm/Benefit Analysis Template for each group
|
||||
4. **Fairness Assessment**: Use Fairness Assessment Template (outcome/treatment/access)
|
||||
5. **Risk Prioritization**: Use Risk Matrix Template, identify critical harms
|
||||
6. **Mitigation Plan**: Use Mitigation Planning Template for each critical harm
|
||||
7. **Monitoring & Escalation**: Use Monitoring Framework Template
|
||||
8. **Decision**: Proceed/staged rollout/delay/reject with rationale and sign-off
|
||||
9. **Post-Launch Review**: 30-day, 90-day checks, ongoing monitoring, updates
|
||||
Reference in New Issue
Block a user