Initial commit
This commit is contained in:
@@ -0,0 +1,255 @@
|
||||
{
|
||||
"criteria": [
|
||||
{
|
||||
"name": "Stakeholder Identification",
|
||||
"1": "Only obvious stakeholders identified, vulnerable groups missing, no power/voice analysis",
|
||||
"3": "Primary stakeholders identified, some vulnerable groups noted, basic power analysis",
|
||||
"5": "Comprehensive stakeholder map (primary, secondary, societal), vulnerable groups prioritized with specific risk factors, power/voice dynamics analyzed, intersectionality considered"
|
||||
},
|
||||
{
|
||||
"name": "Harm Analysis Depth",
|
||||
"1": "Surface-level harms only, no mechanism analysis, severity/likelihood guessed",
|
||||
"3": "Multiple harms identified with mechanisms, severity/likelihood scored, some second-order effects",
|
||||
"5": "Comprehensive harm catalog across types (physical, psychological, economic, social, autonomy, privacy), mechanisms explained, severity/likelihood justified, second-order effects (feedback loops, accumulation, normalization, precedent) analyzed"
|
||||
},
|
||||
{
|
||||
"name": "Benefit Analysis Balance",
|
||||
"1": "Only harms or only benefits listed, no distribution analysis, rose-colored or overly negative",
|
||||
"3": "Both harms and benefits identified, some distribution analysis (who gets what)",
|
||||
"5": "Balanced harm/benefit analysis, distribution clearly specified (universal, subset, vulnerable groups), magnitude and timeline assessed, tradeoffs acknowledged"
|
||||
},
|
||||
{
|
||||
"name": "Fairness Assessment",
|
||||
"1": "No fairness analysis, assumes equal treatment = fairness, no metrics",
|
||||
"3": "Outcome disparities measured for some groups, fairness concern noted, basic mitigation proposed",
|
||||
"5": "Rigorous fairness analysis (outcome, treatment, access fairness), quantitative metrics (disparate impact ratio, error rates by group), intersectional analysis, appropriate fairness definition chosen for context"
|
||||
},
|
||||
{
|
||||
"name": "Risk Prioritization",
|
||||
"1": "No prioritization or arbitrary, all harms treated equally, no severity/likelihood scoring",
|
||||
"3": "Risk matrix used, severity and likelihood scored, high-risk harms identified",
|
||||
"5": "Rigorous risk prioritization (5x5 matrix), severity/likelihood justified with evidence/precedent, color-coded priorities, focus on red/orange (high-risk) harms, considers vulnerable group concentration"
|
||||
},
|
||||
{
|
||||
"name": "Mitigation Design",
|
||||
"1": "No mitigations or vague promises, reactive only, no ownership or timeline",
|
||||
"3": "Mitigations proposed for key harms, some specificity, owners/timelines mentioned",
|
||||
"5": "Specific mitigations for all high-priority harms, type specified (prevent/reduce/detect/respond/safeguard), effectiveness assessed, cost/tradeoffs acknowledged, owners assigned, timelines set, residual risk calculated"
|
||||
},
|
||||
{
|
||||
"name": "Monitoring & Metrics",
|
||||
"1": "No monitoring plan, intentions stated without measurement, no metrics defined",
|
||||
"3": "Some metrics defined, monitoring frequency mentioned, thresholds set",
|
||||
"5": "Comprehensive monitoring framework (outcome metrics disaggregated by group, leading indicators, qualitative feedback), specific thresholds for concern, escalation protocol (yellow/orange/red alerts), review cadence set, accountability clear"
|
||||
},
|
||||
{
|
||||
"name": "Transparency & Recourse",
|
||||
"1": "No mechanisms for affected parties to contest or understand decisions, opacity accepted",
|
||||
"3": "Some explainability mentioned, appeals process exists, basic transparency",
|
||||
"5": "Clear transparency (decisions explained in plain language, limitations disclosed), robust recourse (appeals with human review, overturn process, redress for harm), audit trails for investigation, accessible to affected groups"
|
||||
},
|
||||
{
|
||||
"name": "Stakeholder Participation",
|
||||
"1": "No involvement of affected groups, internal team only, no external input",
|
||||
"3": "Some user research or feedback collection, affected groups consulted",
|
||||
"5": "Meaningful participation of vulnerable/affected groups (advisory boards, co-design, participatory audits), diverse team conducting assessment, external review (ethics board, independent audit), ongoing consultation not one-time"
|
||||
},
|
||||
{
|
||||
"name": "Proportionality & Precaution",
|
||||
"1": "Assumes go-ahead, burden on critics to prove harm, move fast and apologize later",
|
||||
"3": "Some precaution for high-risk features, staged rollout considered, mitigation before launch",
|
||||
"5": "Precautionary principle applied (mitigate before launch for irreversible harms), proportional response (higher stakes = more safeguards), staged rollout with kill switches, burden on proponents to demonstrate safety, continuous monitoring post-launch"
|
||||
}
|
||||
],
|
||||
"guidance_by_type": {
|
||||
"Algorithm Fairness Audit": {
|
||||
"target_score": 4.2,
|
||||
"key_requirements": [
|
||||
"Fairness Assessment (score ≥5): Quantitative metrics (disparate impact, equalized odds, calibration), disaggregated by protected groups",
|
||||
"Harm Analysis: Disparate impact, feedback loops, opacity, inability to contest",
|
||||
"Mitigation Design: Debiasing techniques, fairness constraints, explainability, human review for edge cases",
|
||||
"Monitoring: Bias dashboard with real-time metrics by group, drift detection, periodic audits"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Assuming colorblindness = fairness (need to collect/analyze demographic data)",
|
||||
"Only checking one fairness metric (tradeoffs exist, choose appropriate for context)",
|
||||
"Not testing for intersectionality (race × gender unique harms)"
|
||||
]
|
||||
},
|
||||
"Data Privacy & Consent": {
|
||||
"target_score": 4.0,
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification: Data subjects, vulnerable groups (children, marginalized)",
|
||||
"Harm Analysis: Privacy violations, surveillance, breaches, secondary use, re-identification",
|
||||
"Mitigation Design: Data minimization, anonymization/differential privacy, granular consent, encryption, user controls",
|
||||
"Monitoring: Breach incidents, access logs, consent withdrawals, data requests (GDPR)"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Privacy theater (consent mandatory for service = not meaningful choice)",
|
||||
"De-identification without considering linkage attacks",
|
||||
"Not providing genuine user controls (export, delete)"
|
||||
]
|
||||
},
|
||||
"Content Moderation & Free Expression": {
|
||||
"target_score": 3.9,
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification: Creators, viewers, vulnerable groups (harassment targets), society (information integrity)",
|
||||
"Harm Analysis: Over-moderation (silencing marginalized voices), under-moderation (harassment, misinfo), inconsistent enforcement",
|
||||
"Fairness Assessment: Error rates by group, differential enforcement across languages/regions, cultural context",
|
||||
"Mitigation: Clear policies, appeals with human review, diverse moderators, transparency reports"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Optimizing for engagement without ethical constraints (amplifies outrage)",
|
||||
"Not accounting for cultural context (policies designed for US applied globally)",
|
||||
"Transparency without accountability (reports without action)"
|
||||
]
|
||||
},
|
||||
"Accessibility & Inclusive Design": {
|
||||
"target_score": 4.1,
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification: People with disabilities (visual, auditory, motor, cognitive), elderly, low-literacy, low-bandwidth",
|
||||
"Harm Analysis: Exclusion, degraded experience, safety risks (cannot access critical features)",
|
||||
"Mitigation: WCAG AA/AAA compliance, assistive technology testing, keyboard navigation, alt text, plain language, multi-language",
|
||||
"Monitoring: Accessibility test coverage, feedback from disability communities, task completion rates across abilities"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Accessibility as afterthought (retrofit harder than design-in)",
|
||||
"Testing only with non-disabled users or automated tools (miss real user experience)",
|
||||
"Meeting minimum standards without usability (technically compliant but unusable)"
|
||||
]
|
||||
},
|
||||
"Safety-Critical Systems": {
|
||||
"target_score": 4.3,
|
||||
"key_requirements": [
|
||||
"Harm Analysis: Physical harm (injury, death), psychological trauma, property damage, cascade failures",
|
||||
"Risk Prioritization: FMEA or Fault Tree Analysis, worst-case scenario planning, single points of failure identified",
|
||||
"Mitigation: Redundancy, fail-safes, human oversight, rigorous testing (stress, chaos, adversarial), incident response",
|
||||
"Monitoring: Error rates, near-miss incidents, safety metrics (adverse events), compliance audits, real-time alerts"
|
||||
],
|
||||
"common_pitfalls": [
|
||||
"Underestimating tail risks (low probability high impact events dismissed)",
|
||||
"Assuming technical safety alone (ignoring human factors, socio-technical risks)",
|
||||
"No graceful degradation (system fails completely rather than degraded mode)"
|
||||
]
|
||||
}
|
||||
},
|
||||
"guidance_by_complexity": {
|
||||
"Simple/Low-Risk": {
|
||||
"target_score": 3.5,
|
||||
"description": "Limited scope, low stakes, reversible, small user base, no vulnerable groups primary users",
|
||||
"key_requirements": [
|
||||
"Stakeholder Identification (≥3): Primary stakeholders clear, consider if any vulnerable groups affected",
|
||||
"Harm Analysis (≥3): Key harms identified with mechanisms, severity/likelihood scored",
|
||||
"Mitigation (≥3): Mitigations for high-risk harms, owners assigned",
|
||||
"Monitoring (≥3): Basic metrics, thresholds, review schedule"
|
||||
],
|
||||
"time_estimate": "4-8 hours",
|
||||
"examples": [
|
||||
"UI redesign for internal tool (low external impact)",
|
||||
"Feature flag for optional enhancement (user opt-in)",
|
||||
"Non-sensitive data analytics (no PII)"
|
||||
]
|
||||
},
|
||||
"Moderate/Medium-Risk": {
|
||||
"target_score": 4.0,
|
||||
"description": "Broader scope, moderate stakes, affects diverse users, some vulnerable groups, decisions partially reversible",
|
||||
"key_requirements": [
|
||||
"Comprehensive stakeholder map with vulnerable group prioritization",
|
||||
"Harm/benefit analysis across types, second-order effects considered",
|
||||
"Fairness assessment if algorithmic or differential impact likely",
|
||||
"Risk prioritization with justification, focus on red/orange harms",
|
||||
"Specific mitigations with effectiveness/tradeoffs, residual risk assessed",
|
||||
"Monitoring with disaggregated metrics, escalation protocol, staged rollout"
|
||||
],
|
||||
"time_estimate": "12-20 hours, stakeholder consultation",
|
||||
"examples": [
|
||||
"New user-facing feature with personalization",
|
||||
"Policy change affecting large user base",
|
||||
"Data collection expansion with privacy implications"
|
||||
]
|
||||
},
|
||||
"Complex/High-Risk": {
|
||||
"target_score": 4.3,
|
||||
"description": "System-level impact, high stakes, irreversible harm possible, vulnerable groups primary, algorithmic/high-sensitivity decisions",
|
||||
"key_requirements": [
|
||||
"Deep stakeholder analysis with intersectionality, power dynamics, meaningful participation",
|
||||
"Comprehensive harm analysis (all types), second-order and long-term effects, feedback loops",
|
||||
"Rigorous fairness assessment with quantitative metrics, appropriate fairness definitions",
|
||||
"FMEA or Fault Tree Analysis for safety-critical, worst-case scenarios",
|
||||
"Prevent/reduce mitigations (not just detect/respond), redundancy, fail-safes, kill switches",
|
||||
"Real-time monitoring, bias dashboards, participatory audits, external review",
|
||||
"Precautionary principle (prove safety before launch), staged rollout, continuous oversight"
|
||||
],
|
||||
"time_estimate": "40-80 hours, ethics board review, external audit",
|
||||
"examples": [
|
||||
"Algorithmic hiring/lending/admissions decisions",
|
||||
"Medical AI diagnosis or treatment recommendations",
|
||||
"Content moderation at scale affecting speech",
|
||||
"Surveillance or sensitive data processing",
|
||||
"Features targeting children or vulnerable populations"
|
||||
]
|
||||
}
|
||||
},
|
||||
"common_failure_modes": [
|
||||
{
|
||||
"failure": "Missing vulnerable groups",
|
||||
"symptom": "Assessment claims 'no vulnerable groups affected' or only lists obvious majority stakeholders",
|
||||
"detection": "Checklist vulnerable categories (children, elderly, disabled, racial minorities, low-income, LGBTQ+, etc.) - if none apply, likely oversight",
|
||||
"fix": "Explicitly consider each vulnerable category, intersectionality, indirect effects. If truly none affected, document reasoning."
|
||||
},
|
||||
{
|
||||
"failure": "Assuming equal treatment = fairness",
|
||||
"symptom": "'We treat everyone the same' stated as fairness defense, no disparate impact analysis, colorblind approach",
|
||||
"detection": "No quantitative fairness metrics, no disaggregation by protected group, claims of neutrality without evidence",
|
||||
"fix": "Collect demographic data (with consent), measure outcomes by group, assess disparate impact. Equal treatment of unequal groups can perpetuate inequality."
|
||||
},
|
||||
{
|
||||
"failure": "Reactive mitigation only",
|
||||
"symptom": "Mitigations are appeals/redress after harm, no prevention, 'we'll fix it if problems arise', move fast and break things",
|
||||
"detection": "No design changes to prevent harm, only detection/response mechanisms, no staged rollout or testing with affected groups",
|
||||
"fix": "Prioritize prevent/reduce mitigations, build safeguards into design, test with diverse users before launch, staged rollout with monitoring, kill switches."
|
||||
},
|
||||
{
|
||||
"failure": "No monitoring or vague metrics",
|
||||
"symptom": "Monitoring section says 'we will track metrics' without specifying which, or 'user feedback' without thresholds",
|
||||
"detection": "No specific metrics named, no thresholds for concern, no disaggregation by group, no escalation triggers",
|
||||
"fix": "Define precise metrics (what, how measured, from what data), baseline and target values, thresholds that trigger action, disaggregate by protected groups, assign monitoring owner."
|
||||
},
|
||||
{
|
||||
"failure": "Ignoring second-order effects",
|
||||
"symptom": "Only immediate/obvious harms listed, no consideration of feedback loops, normalization, precedent, accumulation",
|
||||
"detection": "Ask 'What happens next? If this harms Group X, does that create conditions for more harm? Does this normalize a practice? Enable future worse behavior?'",
|
||||
"fix": "Explicitly analyze: Feedback loops (harm → disadvantage → more harm), Accumulation (small harms compound), Normalization (practice becomes standard), Precedent (what does this enable?)"
|
||||
},
|
||||
{
|
||||
"failure": "No transparency or recourse",
|
||||
"symptom": "Decisions not explained to affected parties, no appeals process, opacity justified as 'proprietary' or 'too complex'",
|
||||
"detection": "Assessment doesn't mention explainability, appeals, audit trails, or dismisses as infeasible",
|
||||
"fix": "Build in transparency (explain decisions in plain language, disclose limitations), appeals with human review, audit trails for investigation. Opacity often masks bias or risk."
|
||||
},
|
||||
{
|
||||
"failure": "Sampling bias in testing",
|
||||
"symptom": "Testing only with employees, privileged users, English speakers; diverse users not represented",
|
||||
"detection": "Test group demographics described as 'internal team', 'beta users' without diversity analysis",
|
||||
"fix": "Recruit testers from affected populations, especially vulnerable groups most at risk. Compensate for their time. Test across devices, languages, abilities, contexts."
|
||||
},
|
||||
{
|
||||
"failure": "False precision in risk scores",
|
||||
"symptom": "Severity and likelihood scored without justification, numbers seem arbitrary, no evidence or precedent cited",
|
||||
"detection": "Risk scores provided but no explanation why 'Severity=4' vs 'Severity=3', no reference to similar incidents",
|
||||
"fix": "Ground severity/likelihood in evidence: Historical incidents, expert judgment, user research, industry benchmarks. If uncertain, use ranges. Document reasoning."
|
||||
},
|
||||
{
|
||||
"failure": "Privacy-fairness tradeoff ignored",
|
||||
"symptom": "Claims 'we don't collect race/gender to protect privacy' but also no fairness audit, or collects data but no strong protections",
|
||||
"detection": "Either no demographic data AND no fairness analysis, OR demographic data collected without access controls/purpose limitation",
|
||||
"fix": "Balance: Collect minimal demographic data necessary for fairness auditing (with consent, strong access controls, aggregate-only reporting, differential privacy). Can't audit bias without data."
|
||||
},
|
||||
{
|
||||
"failure": "One-time assessment, no updates",
|
||||
"symptom": "Assessment completed at launch, no plan for ongoing monitoring, assumes static system",
|
||||
"detection": "No review schedule, no drift detection, no process for updating assessment as system evolves",
|
||||
"fix": "Continuous monitoring (daily/weekly/monthly/quarterly depending on risk), scenario validation (are harms emerging as predicted?), update assessment when system changes, feedback loop to strategy."
|
||||
}
|
||||
]
|
||||
}
|
||||
416
skills/ethics-safety-impact/resources/methodology.md
Normal file
416
skills/ethics-safety-impact/resources/methodology.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Ethics, Safety & Impact Assessment Methodology
|
||||
|
||||
Advanced techniques for fairness metrics, privacy analysis, safety assessment, bias detection, and participatory design.
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Ethics & Safety Assessment Progress:
|
||||
- [ ] Step 1: Map stakeholders and identify vulnerable groups
|
||||
- [ ] Step 2: Analyze potential harms and benefits
|
||||
- [ ] Step 3: Assess fairness and differential impacts
|
||||
- [ ] Step 4: Evaluate severity and likelihood
|
||||
- [ ] Step 5: Design mitigations and safeguards
|
||||
- [ ] Step 6: Define monitoring and escalation protocols
|
||||
```
|
||||
|
||||
**Step 1: Map stakeholders and identify vulnerable groups**
|
||||
|
||||
Apply stakeholder analysis and vulnerability assessment frameworks.
|
||||
|
||||
**Step 2: Analyze potential harms and benefits**
|
||||
|
||||
Use harm taxonomies and benefit frameworks to systematically catalog impacts.
|
||||
|
||||
**Step 3: Assess fairness and differential impacts**
|
||||
|
||||
Apply [1. Fairness Metrics](#1-fairness-metrics) to measure group disparities.
|
||||
|
||||
**Step 4: Evaluate severity and likelihood**
|
||||
|
||||
Use [2. Safety Assessment Methods](#2-safety-assessment-methods) for safety-critical systems.
|
||||
|
||||
**Step 5: Design mitigations and safeguards**
|
||||
|
||||
Apply [3. Mitigation Strategies](#3-mitigation-strategies) and [4. Privacy-Preserving Techniques](#4-privacy-preserving-techniques).
|
||||
|
||||
**Step 6: Define monitoring and escalation protocols**
|
||||
|
||||
Implement [5. Bias Detection in Deployment](#5-bias-detection-in-deployment) and participatory oversight.
|
||||
|
||||
---
|
||||
|
||||
## 1. Fairness Metrics
|
||||
|
||||
Mathematical definitions of fairness for algorithmic systems.
|
||||
|
||||
### Group Fairness Metrics
|
||||
|
||||
**Demographic Parity** (Statistical Parity, Independence)
|
||||
- **Definition**: Positive outcome rate equal across groups
|
||||
- **Formula**: P(Ŷ=1 | A=a) = P(Ŷ=1 | A=b) for all groups a, b
|
||||
- **When to use**: When equal representation in positive outcomes is goal (admissions, hiring pipelines)
|
||||
- **Limitations**: Ignores base rates, may require different treatment to achieve equal outcomes
|
||||
- **Example**: 40% approval rate for all racial groups
|
||||
|
||||
**Equalized Odds** (Error Rate Balance)
|
||||
- **Definition**: False positive and false negative rates equal across groups
|
||||
- **Formula**: P(Ŷ=1 | Y=y, A=a) = P(Ŷ=1 | Y=y, A=b) for all y, a, b
|
||||
- **When to use**: When fairness means equal error rates (lending, criminal justice, medical diagnosis)
|
||||
- **Strengths**: Accounts for true outcomes, balances burden of errors
|
||||
- **Example**: 5% false positive rate and 10% false negative rate for all groups
|
||||
|
||||
**Equal Opportunity** (True Positive Rate Parity)
|
||||
- **Definition**: True positive rate equal across groups (among qualified individuals)
|
||||
- **Formula**: P(Ŷ=1 | Y=1, A=a) = P(Ŷ=1 | Y=1, A=b)
|
||||
- **When to use**: When accessing benefit/opportunity is key concern (scholarships, job offers)
|
||||
- **Strengths**: Ensures qualified members of all groups have equal chance
|
||||
- **Example**: 80% of qualified applicants from each group receive offers
|
||||
|
||||
**Calibration** (Test Fairness)
|
||||
- **Definition**: Predicted probabilities match observed frequencies for all groups
|
||||
- **Formula**: P(Y=1 | Ŷ=p, A=a) = P(Y=1 | Ŷ=p, A=b) = p
|
||||
- **When to use**: When probability scores used for decision-making (risk scores, credit scores)
|
||||
- **Strengths**: Predictions are well-calibrated across groups
|
||||
- **Example**: Among all applicants scored 70%, 70% actually repay loan in each group
|
||||
|
||||
**Disparate Impact** (80% Rule, Four-Fifths Rule)
|
||||
- **Definition**: Selection rate for protected group ≥ 80% of selection rate for reference group
|
||||
- **Formula**: P(Ŷ=1 | A=protected) / P(Ŷ=1 | A=reference) ≥ 0.8
|
||||
- **When to use**: Legal compliance (EEOC guidelines for hiring, lending)
|
||||
- **Regulatory threshold**: <0.8 triggers investigation, <0.5 strong evidence of discrimination
|
||||
- **Example**: If 50% of white applicants hired, ≥40% of Black applicants should be hired
|
||||
|
||||
### Individual Fairness Metrics
|
||||
|
||||
**Similar Individuals Treated Similarly** (Lipschitz Fairness)
|
||||
- **Definition**: Similar inputs receive similar outputs, regardless of protected attributes
|
||||
- **Formula**: d(f(x), f(x')) ≤ L · d(x, x') where d is distance metric, L is Lipschitz constant
|
||||
- **When to use**: When individual treatment should depend on relevant factors only
|
||||
- **Challenge**: Defining "similarity" in a fair way (what features are relevant?)
|
||||
- **Example**: Two loan applicants with same income, credit history get similar rates regardless of race
|
||||
|
||||
**Counterfactual Fairness**
|
||||
- **Definition**: Outcome would be same if protected attribute were different (causal fairness)
|
||||
- **Formula**: P(Ŷ | A=a, X=x) = P(Ŷ | A=a', X=x) for all a, a'
|
||||
- **When to use**: When causal reasoning appropriate, can model interventions
|
||||
- **Strengths**: Captures intuition "would outcome change if only race/gender differed?"
|
||||
- **Example**: Applicant's loan decision same whether they're coded male or female
|
||||
|
||||
### Fairness Tradeoffs
|
||||
|
||||
**Impossibility results**: Cannot satisfy all fairness definitions simultaneously (except in trivial cases)
|
||||
|
||||
Key tradeoffs:
|
||||
- **Demographic parity vs. calibration**: If base rates differ, cannot have both (Chouldechova 2017)
|
||||
- **Equalized odds vs. calibration**: Generally incompatible unless perfect accuracy (Kleinberg et al. 2017)
|
||||
- **Individual vs. group fairness**: Treating similar individuals similarly may still produce group disparities
|
||||
|
||||
**Choosing metrics**: Context-dependent
|
||||
- **High-stakes binary decisions** (hire/fire, admit/reject): Equalized odds or equal opportunity
|
||||
- **Scored rankings** (credit scores, risk assessments): Calibration
|
||||
- **Access to benefits** (scholarships, programs): Demographic parity or equal opportunity
|
||||
- **Legal compliance**: Disparate impact (80% rule)
|
||||
|
||||
### Fairness Auditing Process
|
||||
|
||||
1. **Identify protected groups**: Race, gender, age, disability, religion, national origin, etc.
|
||||
2. **Collect disaggregated data**: Outcome metrics by group (requires demographic data collection with consent)
|
||||
3. **Compute fairness metrics**: Calculate demographic parity, equalized odds, disparate impact across groups
|
||||
4. **Test statistical significance**: Are differences statistically significant or due to chance? (Chi-square, t-tests)
|
||||
5. **Investigate causes**: If unfair, why? Biased training data? Proxy features? Measurement error?
|
||||
6. **Iterate on mitigation**: Debiasing techniques, fairness constraints, data augmentation, feature engineering
|
||||
|
||||
---
|
||||
|
||||
## 2. Safety Assessment Methods
|
||||
|
||||
Systematic techniques for identifying and mitigating safety risks.
|
||||
|
||||
### Failure Mode and Effects Analysis (FMEA)
|
||||
|
||||
**Purpose**: Identify ways system can fail and prioritize mitigation efforts
|
||||
|
||||
**Process**:
|
||||
1. **Decompose system**: Break into components, functions
|
||||
2. **Identify failure modes**: For each component, how can it fail? (hardware failure, software bug, human error, environmental condition)
|
||||
3. **Analyze effects**: What happens if this fails? Local effect? System effect? End effect?
|
||||
4. **Score severity** (1-10): 1 = negligible, 10 = catastrophic (injury, death)
|
||||
5. **Score likelihood** (1-10): 1 = rare, 10 = very likely
|
||||
6. **Score detectability** (1-10): 1 = easily detected, 10 = undetectable before harm
|
||||
7. **Compute Risk Priority Number (RPN)**: RPN = Severity × Likelihood × Detectability
|
||||
8. **Prioritize**: High RPN = high priority for mitigation
|
||||
9. **Design mitigations**: Eliminate failure mode, reduce likelihood, improve detection, add safeguards
|
||||
10. **Re-compute RPN**: Has mitigation adequately reduced risk?
|
||||
|
||||
**Example - Medical AI diagnosis**:
|
||||
- **Failure mode**: AI misclassifies cancer as benign
|
||||
- **Effect**: Patient not treated, cancer progresses, death
|
||||
- **Severity**: 10 (death)
|
||||
- **Likelihood**: 3 (5% false negative rate)
|
||||
- **Detectability**: 8 (hard to catch without second opinion)
|
||||
- **RPN**: 10×3×8 = 240 (high, requires mitigation)
|
||||
- **Mitigation**: Human review of all negative diagnoses, require 2nd AI model for confirmation, patient follow-up at 3 months
|
||||
- **New RPN**: 10×1×3 = 30 (acceptable)
|
||||
|
||||
### Fault Tree Analysis (FTA)
|
||||
|
||||
**Purpose**: Identify root causes that lead to hazard (top-down causal reasoning)
|
||||
|
||||
**Process**:
|
||||
1. **Define top event** (hazard): e.g., "Patient receives wrong medication"
|
||||
2. **Work backward**: What immediate causes could lead to top event?
|
||||
- Use logic gates: AND (all required), OR (any sufficient)
|
||||
3. **Decompose recursively**: For each cause, what are its causes?
|
||||
4. **Reach basic events**: Hardware failure, software bug, human error, environmental
|
||||
5. **Compute probability**: If basic event probabilities known, compute probability of top event
|
||||
6. **Find minimal cut sets**: Smallest combinations of basic events that cause top event
|
||||
7. **Prioritize mitigations**: Address minimal cut sets (small changes with big safety impact)
|
||||
|
||||
**Example - Wrong medication**:
|
||||
- Top event: Patient receives wrong medication (OR gate)
|
||||
- Path 1: Prescription error (AND gate)
|
||||
- Doctor prescribes wrong drug (human error)
|
||||
- Pharmacist doesn't catch (human error)
|
||||
- Path 2: Dispensing error (AND gate)
|
||||
- Correct prescription but wrong drug selected (human error)
|
||||
- Barcode scanner fails (equipment failure)
|
||||
- Path 3: Administration error
|
||||
- Nurse administers wrong drug (human error)
|
||||
- Minimal cut sets: {Doctor error AND Pharmacist error}, {Dispensing error AND Scanner failure}, {Nurse error}
|
||||
- Mitigation: Double-check systems (reduce AND probability), barcode scanning (detect errors), nurse training (reduce error rate)
|
||||
|
||||
### Hazard and Operability Study (HAZOP)
|
||||
|
||||
**Purpose**: Systematic brainstorming to find deviations from intended operation
|
||||
|
||||
**Process**:
|
||||
1. **Divide system into nodes**: Functional components
|
||||
2. **For each node, apply guide words**: NO, MORE, LESS, AS WELL AS, PART OF, REVERSE, OTHER THAN
|
||||
3. **Identify deviations**: Guide word + parameter (e.g., "MORE pressure", "NO flow")
|
||||
4. **Analyze causes**: What could cause this deviation?
|
||||
5. **Analyze consequences**: What harm results?
|
||||
6. **Propose safeguards**: Detection, prevention, mitigation
|
||||
|
||||
**Example - Content moderation system**:
|
||||
- Node: Content moderation AI
|
||||
- Deviation: "MORE false positives" (over-moderation)
|
||||
- Causes: Model too aggressive, training data skewed, threshold too low
|
||||
- Consequences: Silencing legitimate speech, especially marginalized voices
|
||||
- Safeguards: Appeals process, human review sample, error rate dashboard by demographic
|
||||
- Deviation: "NO moderation" (under-moderation)
|
||||
- Causes: Model failure, overwhelming volume, adversarial evasion
|
||||
- Consequences: Harmful content remains, harassment, misinformation spreads
|
||||
- Safeguards: Redundant systems, rate limiting, user reporting, human backup
|
||||
|
||||
### Worst-Case Scenario Analysis
|
||||
|
||||
**Purpose**: Stress-test system against extreme but plausible threats
|
||||
|
||||
**Process**:
|
||||
1. **Brainstorm worst cases**: Adversarial attacks, cascading failures, edge cases, Murphy's Law
|
||||
2. **Assess plausibility**: Could this actually happen? Historical precedents?
|
||||
3. **Estimate impact**: If it happened, how bad?
|
||||
4. **Identify single points of failure**: What one thing, if it fails, causes catastrophe?
|
||||
5. **Design resilience**: Redundancy, fail-safes, graceful degradation, circuit breakers
|
||||
6. **Test**: Chaos engineering, red teaming, adversarial testing
|
||||
|
||||
**Examples**:
|
||||
- **AI model**: Adversary crafts inputs that fool model (adversarial examples) → Test robustness, ensemble models
|
||||
- **Data breach**: All user data leaked → Encrypt data, minimize collection, differential privacy
|
||||
- **Bias amplification**: Feedback loop causes AI to become more biased over time → Monitor drift, periodic retraining, fairness constraints
|
||||
- **Denial of service**: System overwhelmed by load → Rate limiting, auto-scaling, graceful degradation
|
||||
|
||||
---
|
||||
|
||||
## 3. Mitigation Strategies
|
||||
|
||||
Taxonomy of interventions to reduce harm.
|
||||
|
||||
### Prevention (Eliminate Harm)
|
||||
|
||||
**Design out the risk**:
|
||||
- Don't collect sensitive data you don't need (data minimization)
|
||||
- Don't build risky features (dark patterns, addictive mechanics, manipulation)
|
||||
- Use less risky alternatives (aggregate statistics vs. individual data, contextual recommendations vs. behavioral targeting)
|
||||
|
||||
**Examples**:
|
||||
- Instead of collecting browsing history, use contextual ads (keywords on current page)
|
||||
- Instead of infinite scroll (addiction), paginate with clear endpoints
|
||||
- Instead of storing plaintext passwords, use salted hashes (can't be leaked)
|
||||
|
||||
### Reduction (Decrease Likelihood or Severity)
|
||||
|
||||
**Technical mitigations**:
|
||||
- Rate limiting (prevent abuse)
|
||||
- Friction (slow down impulsive harmful actions - time delays, confirmations, warnings)
|
||||
- Debiasing algorithms (pre-processing data, in-processing fairness constraints, post-processing calibration)
|
||||
- Differential privacy (add noise to protect individuals while preserving aggregate statistics)
|
||||
|
||||
**Process mitigations**:
|
||||
- Staged rollouts (limited exposure to catch problems early)
|
||||
- A/B testing (measure impact before full deployment)
|
||||
- Diverse teams (more perspectives catch more problems)
|
||||
- External audits (independent review)
|
||||
|
||||
**Examples**:
|
||||
- Limit posts per hour to prevent spam
|
||||
- Require confirmation before deleting account or posting sensitive content
|
||||
- Apply fairness constraints during model training to reduce disparate impact
|
||||
- Release to 1% of users, monitor for issues, then scale
|
||||
|
||||
### Detection (Monitor and Alert)
|
||||
|
||||
**Dashboards**: Real-time metrics on harm indicators (error rates by group, complaints, safety incidents)
|
||||
|
||||
**Anomaly detection**: Alert when metrics deviate from baseline (spike in false positives, drop in engagement from specific group)
|
||||
|
||||
**User reporting**: Easy channels for reporting harms, responsive investigation
|
||||
|
||||
**Audit logs**: Track decisions for later investigation (who accessed what data, which users affected by algorithm)
|
||||
|
||||
**Examples**:
|
||||
- Bias dashboard showing approval rates by race, gender, age updated daily
|
||||
- Alert if moderation false positive rate >2× baseline for any language
|
||||
- "Report this" button on all content with category options
|
||||
- Log all loan denials with reason codes for audit
|
||||
|
||||
### Response (Address Harm When Found)
|
||||
|
||||
**Appeals**: Process for contesting decisions (human review, overturn if wrong)
|
||||
|
||||
**Redress**: Compensate those harmed (refunds, apologies, corrective action)
|
||||
|
||||
**Incident response**: Playbook for handling harms (who to notify, how to investigate, when to escalate, communication plan)
|
||||
|
||||
**Iterative improvement**: Learn from incidents to prevent recurrence
|
||||
|
||||
**Examples**:
|
||||
- Allow users to appeal content moderation decisions, review within 48 hours
|
||||
- Offer compensation to users affected by outage or data breach
|
||||
- If bias detected, pause system, investigate, retrain model, re-audit before re-launch
|
||||
- Publish transparency report on harms, mitigations, outcomes
|
||||
|
||||
### Safeguards (Redundancy and Fail-Safes)
|
||||
|
||||
**Human oversight**: Human in the loop (review all decisions) or human on the loop (review samples, alert on anomalies)
|
||||
|
||||
**Redundancy**: Multiple independent systems, consensus required
|
||||
|
||||
**Fail-safes**: If system fails, default to safe state (e.g., medical device fails → alarm, not silent failure)
|
||||
|
||||
**Circuit breakers**: Kill switches to shut down harmful features quickly
|
||||
|
||||
**Examples**:
|
||||
- High-stakes decisions (loan denial, medical diagnosis, criminal sentencing) require human review
|
||||
- Two independent AI models must agree before autonomous action
|
||||
- If fraud detection fails, default to human review rather than approving all transactions
|
||||
- CEO can halt product launch if ethics concerns raised, even at last minute
|
||||
|
||||
---
|
||||
|
||||
## 4. Privacy-Preserving Techniques
|
||||
|
||||
Methods to protect individual privacy while enabling data use.
|
||||
|
||||
### Data Minimization
|
||||
|
||||
- **Collect only necessary data**: Purpose limitation (collect only for stated purpose), don't collect "just in case"
|
||||
- **Aggregate where possible**: Avoid individual-level data when population-level sufficient
|
||||
- **Short retention**: Delete data when no longer needed, enforce retention limits
|
||||
|
||||
### De-identification
|
||||
|
||||
**Anonymization**: Remove direct identifiers (name, SSN, email)
|
||||
- **Limitation**: Re-identification attacks possible (linkage to other datasets, inference from quasi-identifiers)
|
||||
- **Example**: Netflix dataset de-identified, but researchers re-identified users by linking to IMDB reviews
|
||||
|
||||
**K-anonymity**: Each record indistinguishable from k-1 others (generalize quasi-identifiers like zip, age, gender)
|
||||
- **Limitation**: Attribute disclosure (if all k records have same sensitive attribute), composition attacks
|
||||
- **Example**: {Age=32, Zip=12345} → {Age=30-35, Zip=123**}
|
||||
|
||||
**Differential Privacy**: Add calibrated noise such that individual's presence/absence doesn't significantly change query results
|
||||
- **Definition**: P(M(D) = O) / P(M(D') = O) ≤ e^ε where D, D' differ by one person, ε is privacy budget
|
||||
- **Strengths**: Provable privacy guarantee, composes well, resistant to post-processing
|
||||
- **Limitations**: Accuracy-privacy tradeoff (more privacy → more noise → less accuracy), privacy budget exhausted over many queries
|
||||
- **Example**: Census releases aggregate statistics with differential privacy (Apple, Google use for local learning)
|
||||
|
||||
### Access Controls
|
||||
|
||||
- **Least privilege**: Users access only data needed for their role
|
||||
- **Audit logs**: Track who accessed what data when, detect anomalies
|
||||
- **Encryption**: At rest (storage), in transit (network), in use (processing)
|
||||
- **Multi-party computation**: Compute on encrypted data without decrypting
|
||||
|
||||
### Consent and Control
|
||||
|
||||
- **Granular consent**: Opt-in for each purpose, not blanket consent
|
||||
- **Transparency**: Explain what data collected, how used, who it's shared with (in plain language)
|
||||
- **User controls**: Export data (GDPR right to portability), delete data (right to erasure), opt-out of processing
|
||||
- **Meaningful choice**: Consent not coerced (service available without consent for non-essential features)
|
||||
|
||||
---
|
||||
|
||||
## 5. Bias Detection in Deployment
|
||||
|
||||
Ongoing monitoring to detect and respond to bias post-launch.
|
||||
|
||||
### Bias Dashboards
|
||||
|
||||
**Disaggregated metrics**: Track outcomes by protected groups (race, gender, age, disability)
|
||||
- Approval/rejection rates
|
||||
- False positive/negative rates
|
||||
- Recommendation quality (precision, recall, ranking)
|
||||
- User engagement (click-through, conversion, retention)
|
||||
|
||||
**Visualizations**:
|
||||
- Bar charts showing metric by group, flag >20% disparities
|
||||
- Time series to detect drift (bias increasing over time?)
|
||||
- Heatmaps for intersectional analysis (race × gender)
|
||||
|
||||
**Alerting**: Automated alerts when disparity crosses threshold
|
||||
|
||||
### Drift Detection
|
||||
|
||||
**Distribution shift**: Has data distribution changed since training? (Covariate shift, concept drift)
|
||||
- Monitor input distributions, flag anomalies
|
||||
- Retrain periodically on recent data
|
||||
|
||||
**Performance degradation**: Is model accuracy declining? For which groups?
|
||||
- A/B test new model vs. old continuously
|
||||
- Track metrics by group, ensure improvements don't harm any group
|
||||
|
||||
**Feedback loops**: Is model changing environment in ways that amplify bias?
|
||||
- Example: Predictive policing → more arrests in flagged areas → more training data from those areas → more policing (vicious cycle)
|
||||
- Monitor for amplification: Are disparities increasing over time?
|
||||
|
||||
### Participatory Auditing
|
||||
|
||||
**Stakeholder involvement**: Include affected groups in oversight, not just internal teams
|
||||
- Community advisory boards
|
||||
- Public comment periods
|
||||
- Transparency reports reviewed by civil society
|
||||
|
||||
**Contests**: Bug bounties for finding bias (reward researchers/users who identify fairness issues)
|
||||
|
||||
**External audits**: Independent third-party assessment (not self-regulation)
|
||||
|
||||
---
|
||||
|
||||
## 6. Common Pitfalls
|
||||
|
||||
**Fairness theater**: Performative statements without material changes. Impact assessments must change decisions, not just document them.
|
||||
|
||||
**Sampling bias in testing**: Testing only on employees (young, educated, English-speaking) misses how diverse users experience harm. Test with actual affected populations.
|
||||
|
||||
**Assuming "colorblind" = fair**: Not collecting race data doesn't eliminate bias, it makes bias invisible and impossible to audit. Collect demographic data (with consent and safeguards) to measure fairness.
|
||||
|
||||
**Optimization without constraints**: Maximizing engagement/revenue unconstrained leads to amplifying outrage, addiction, polarization. Set ethical boundaries as constraints, not just aspirations.
|
||||
|
||||
**Privacy vs. fairness tradeoff**: Can't audit bias without demographic data. Balance: Collect minimal data necessary for fairness auditing, strong access controls, differential privacy.
|
||||
|
||||
**One-time assessment**: Ethics is not a launch checkbox. Continuous monitoring required as systems evolve, data drifts, harms emerge over time.
|
||||
|
||||
**Technochauvinism**: Believing technical fixes alone solve social problems. Bias mitigation algorithms can help but can't replace addressing root causes (historical discrimination, structural inequality).
|
||||
|
||||
**Moving fast and apologizing later**: For safety/ethics, prevention > apology. Harms to vulnerable groups are not acceptable experiments. Staged rollouts, kill switches, continuous monitoring required.
|
||||
398
skills/ethics-safety-impact/resources/template.md
Normal file
398
skills/ethics-safety-impact/resources/template.md
Normal file
@@ -0,0 +1,398 @@
|
||||
# Ethics, Safety & Impact Assessment Templates
|
||||
|
||||
Quick-start templates for stakeholder mapping, harm/benefit analysis, fairness evaluation, risk prioritization, mitigation planning, and monitoring.
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
Ethics & Safety Assessment Progress:
|
||||
- [ ] Step 1: Map stakeholders and identify vulnerable groups
|
||||
- [ ] Step 2: Analyze potential harms and benefits
|
||||
- [ ] Step 3: Assess fairness and differential impacts
|
||||
- [ ] Step 4: Evaluate severity and likelihood
|
||||
- [ ] Step 5: Design mitigations and safeguards
|
||||
- [ ] Step 6: Define monitoring and escalation protocols
|
||||
```
|
||||
|
||||
**Step 1: Map stakeholders and identify vulnerable groups**
|
||||
|
||||
Use [Stakeholder Mapping Template](#stakeholder-mapping-template) to identify all affected parties and prioritize vulnerable populations.
|
||||
|
||||
**Step 2: Analyze potential harms and benefits**
|
||||
|
||||
Brainstorm harms and benefits for each stakeholder group using [Harm/Benefit Analysis Template](#harmbenefit-analysis-template).
|
||||
|
||||
**Step 3: Assess fairness and differential impacts**
|
||||
|
||||
Evaluate outcome, treatment, and access disparities using [Fairness Assessment Template](#fairness-assessment-template).
|
||||
|
||||
**Step 4: Evaluate severity and likelihood**
|
||||
|
||||
Prioritize risks using [Risk Matrix Template](#risk-matrix-template) scoring severity and likelihood.
|
||||
|
||||
**Step 5: Design mitigations and safeguards**
|
||||
|
||||
Plan interventions using [Mitigation Planning Template](#mitigation-planning-template) for high-priority harms.
|
||||
|
||||
**Step 6: Define monitoring and escalation protocols**
|
||||
|
||||
Set up ongoing oversight using [Monitoring Framework Template](#monitoring-framework-template).
|
||||
|
||||
---
|
||||
|
||||
## Stakeholder Mapping Template
|
||||
|
||||
### Primary Stakeholders (directly affected)
|
||||
|
||||
**Group 1**: [Name of stakeholder group]
|
||||
- **Size/reach**: [How many people?]
|
||||
- **Relationship**: [How do they interact with feature/decision?]
|
||||
- **Power/voice**: [Can they advocate for themselves? High/Medium/Low]
|
||||
- **Vulnerability factors**: [Age, disability, marginalization, economic precarity, etc.]
|
||||
- **Priority**: [High/Medium/Low risk]
|
||||
|
||||
**Group 2**: [Name of stakeholder group]
|
||||
- **Size/reach**:
|
||||
- **Relationship**:
|
||||
- **Power/voice**:
|
||||
- **Vulnerability factors**:
|
||||
- **Priority**:
|
||||
|
||||
[Add more groups as needed]
|
||||
|
||||
### Secondary Stakeholders (indirectly affected)
|
||||
|
||||
**Group**: [Name]
|
||||
- **How affected**: [Indirect impact mechanism]
|
||||
- **Priority**: [High/Medium/Low risk]
|
||||
|
||||
### Societal/Systemic Impacts
|
||||
|
||||
- **Norms affected**: [What behaviors/expectations might shift?]
|
||||
- **Precedents set**: [What does this enable or legitimize for future?]
|
||||
- **Long-term effects**: [Cumulative, feedback loops, structural changes]
|
||||
|
||||
### Vulnerable Groups Prioritization
|
||||
|
||||
Check all that apply and note specific considerations:
|
||||
|
||||
- [ ] **Children** (<18): Special protections needed (consent, safety, development impact)
|
||||
- [ ] **Elderly** (>65): Accessibility, digital literacy, vulnerability to fraud
|
||||
- [ ] **People with disabilities**: Accessibility compliance, exclusion risk, safety
|
||||
- [ ] **Racial/ethnic minorities**: Historical discrimination, disparate impact, cultural sensitivity
|
||||
- [ ] **Low-income**: Economic harm, access barriers, inability to absorb costs
|
||||
- [ ] **LGBTQ+**: Safety in hostile contexts, privacy, outing risk
|
||||
- [ ] **Non-English speakers**: Language barriers, exclusion, misunderstanding
|
||||
- [ ] **Politically targeted**: Dissidents, journalists, activists (surveillance, safety)
|
||||
- [ ] **Other**: [Specify]
|
||||
|
||||
**Highest priority groups** (most vulnerable + highest risk):
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
---
|
||||
|
||||
## Harm/Benefit Analysis Template
|
||||
|
||||
For each stakeholder group, identify potential harms and benefits.
|
||||
|
||||
### Stakeholder Group: [Name]
|
||||
|
||||
#### Potential Benefits
|
||||
|
||||
**Benefit 1**: [Description]
|
||||
- **Type**: Economic, Social, Health, Autonomy, Access, Safety, etc.
|
||||
- **Magnitude**: [High/Medium/Low]
|
||||
- **Distribution**: [Who gets this benefit? Everyone or subset?]
|
||||
- **Timeline**: [Immediate, Short-term <1yr, Long-term >1yr]
|
||||
|
||||
**Benefit 2**: [Description]
|
||||
- **Type**:
|
||||
- **Magnitude**:
|
||||
- **Distribution**:
|
||||
- **Timeline**:
|
||||
|
||||
#### Potential Harms
|
||||
|
||||
**Harm 1**: [Description]
|
||||
- **Type**: Physical, Psychological, Economic, Social, Autonomy, Privacy, Reputational, Epistemic, Political
|
||||
- **Mechanism**: [How does harm occur?]
|
||||
- **Affected subgroup**: [Everyone or specific subset within stakeholder group?]
|
||||
- **Severity**: [1-5, where 5 = catastrophic]
|
||||
- **Likelihood**: [1-5, where 5 = very likely]
|
||||
- **Risk Score**: [Severity × Likelihood]
|
||||
|
||||
**Harm 2**: [Description]
|
||||
- **Type**:
|
||||
- **Mechanism**:
|
||||
- **Affected subgroup**:
|
||||
- **Severity**:
|
||||
- **Likelihood**:
|
||||
- **Risk Score**:
|
||||
|
||||
**Harm 3**: [Description]
|
||||
- **Type**:
|
||||
- **Mechanism**:
|
||||
- **Affected subgroup**:
|
||||
- **Severity**:
|
||||
- **Likelihood**:
|
||||
- **Risk Score**:
|
||||
|
||||
#### Second-Order Effects
|
||||
|
||||
- **Feedback loops**: [Does harm create conditions for more harm?]
|
||||
- **Accumulation**: [Do small harms compound over time?]
|
||||
- **Normalization**: [Does this normalize harmful practices?]
|
||||
- **Precedent**: [What does this enable others to do?]
|
||||
|
||||
---
|
||||
|
||||
## Fairness Assessment Template
|
||||
|
||||
### Outcome Fairness (results)
|
||||
|
||||
**Metric being measured**: [e.g., approval rate, error rate, recommendation quality]
|
||||
|
||||
**By group**:
|
||||
|
||||
| Group | Metric Value | Difference from Average | Disparate Impact Ratio |
|
||||
|-------|--------------|-------------------------|------------------------|
|
||||
| Group A | | | |
|
||||
| Group B | | | |
|
||||
| Group C | | | |
|
||||
| Overall | | - | - |
|
||||
|
||||
**Disparate Impact Ratio** = (Outcome rate for protected group) / (Outcome rate for reference group)
|
||||
- **> 0.8**: Generally acceptable (80% rule)
|
||||
- **< 0.8**: Potential disparate impact, investigate
|
||||
|
||||
**Questions**:
|
||||
- [ ] Are outcome rates similar across groups (within 20%)?
|
||||
- [ ] If not, is there a legitimate justification?
|
||||
- [ ] Do error rates (false positives/negatives) differ across groups?
|
||||
- [ ] Who bears the burden of errors?
|
||||
|
||||
### Treatment Fairness (process)
|
||||
|
||||
**How decisions are made**: [Algorithm, human judgment, hybrid]
|
||||
|
||||
**By group**:
|
||||
|
||||
| Group | Treatment Description | Dignity/Respect | Transparency | Recourse |
|
||||
|-------|----------------------|-----------------|--------------|----------|
|
||||
| Group A | | High/Med/Low | High/Med/Low | High/Med/Low |
|
||||
| Group B | | High/Med/Low | High/Med/Low | High/Med/Low |
|
||||
|
||||
**Questions**:
|
||||
- [ ] Do all groups receive same quality of service/interaction?
|
||||
- [ ] Are decisions explained equally well to all groups?
|
||||
- [ ] Do all groups have equal access to appeals/recourse?
|
||||
- [ ] Are there cultural or language barriers affecting treatment?
|
||||
|
||||
### Access Fairness (opportunity)
|
||||
|
||||
**Barriers to access**:
|
||||
|
||||
| Barrier Type | Description | Affected Groups | Severity |
|
||||
|--------------|-------------|-----------------|----------|
|
||||
| Economic | [e.g., cost, credit required] | | High/Med/Low |
|
||||
| Technical | [e.g., device, internet, literacy] | | High/Med/Low |
|
||||
| Geographic | [e.g., location restrictions] | | High/Med/Low |
|
||||
| Physical | [e.g., accessibility, disability] | | High/Med/Low |
|
||||
| Social | [e.g., stigma, discrimination] | | High/Med/Low |
|
||||
| Legal | [e.g., documentation required] | | High/Med/Low |
|
||||
|
||||
**Questions**:
|
||||
- [ ] Can all groups access the service/benefit equally?
|
||||
- [ ] Are there unnecessary barriers that could be removed?
|
||||
- [ ] Do barriers disproportionately affect vulnerable groups?
|
||||
|
||||
### Intersectionality Check
|
||||
|
||||
**Combinations of identities that may face unique harms**:
|
||||
- Example: Black women (face both racial and gender bias)
|
||||
- Example: Elderly immigrants (language + digital literacy + age)
|
||||
|
||||
Groups to check:
|
||||
- [ ] Intersection of race and gender
|
||||
- [ ] Intersection of disability and age
|
||||
- [ ] Intersection of income and language
|
||||
- [ ] Other combinations: [Specify]
|
||||
|
||||
---
|
||||
|
||||
## Risk Matrix Template
|
||||
|
||||
Score each harm on Severity (1-5) and Likelihood (1-5). Prioritize high-risk (red/orange) harms for mitigation.
|
||||
|
||||
### Severity Scale
|
||||
|
||||
- **5 - Catastrophic**: Death, serious injury, irreversible harm, widespread impact
|
||||
- **4 - Major**: Significant harm, lasting impact, affects many people
|
||||
- **3 - Moderate**: Noticeable harm, temporary impact, affects some people
|
||||
- **2 - Minor**: Small harm, easily reversed, affects few people
|
||||
- **1 - Negligible**: Minimal harm, no lasting impact
|
||||
|
||||
### Likelihood Scale
|
||||
|
||||
- **5 - Very Likely**: >75% chance, expected to occur
|
||||
- **4 - Likely**: 50-75% chance, probable
|
||||
- **3 - Possible**: 25-50% chance, could happen
|
||||
- **2 - Unlikely**: 5-25% chance, improbable
|
||||
- **1 - Rare**: <5% chance, very unlikely
|
||||
|
||||
### Risk Matrix
|
||||
|
||||
| Harm | Stakeholder Group | Severity | Likelihood | Risk Score | Priority |
|
||||
|------|------------------|----------|------------|------------|----------|
|
||||
| [Harm 1 description] | [Group] | [1-5] | [1-5] | [S×L] | [Color] |
|
||||
| [Harm 2 description] | [Group] | [1-5] | [1-5] | [S×L] | [Color] |
|
||||
| [Harm 3 description] | [Group] | [1-5] | [1-5] | [S×L] | [Color] |
|
||||
|
||||
**Priority Color Coding**:
|
||||
- **Red** (Risk ≥15): Critical, must address before launch
|
||||
- **Orange** (Risk 9-14): High priority, address soon
|
||||
- **Yellow** (Risk 5-8): Monitor, mitigate if feasible
|
||||
- **Green** (Risk ≤4): Low priority, document and monitor
|
||||
|
||||
**Prioritized Harms** (Red + Orange):
|
||||
1. [Highest risk harm]
|
||||
2. [Second highest]
|
||||
3. [Third highest]
|
||||
|
||||
---
|
||||
|
||||
## Mitigation Planning Template
|
||||
|
||||
For each high-priority harm, design interventions.
|
||||
|
||||
### Harm: [Description of harm being mitigated]
|
||||
|
||||
**Affected Group**: [Who experiences this harm]
|
||||
|
||||
**Risk Score**: [Severity × Likelihood = X]
|
||||
|
||||
#### Mitigation Strategies
|
||||
|
||||
**Option 1: [Mitigation name]**
|
||||
- **Type**: Prevent, Reduce, Detect, Respond, Safeguard, Transparency, Empower
|
||||
- **Description**: [What is the intervention?]
|
||||
- **Effectiveness**: [How much does this reduce risk? High/Medium/Low]
|
||||
- **Cost/effort**: [Resources required? High/Medium/Low]
|
||||
- **Tradeoffs**: [What are downsides or tensions?]
|
||||
- **Owner**: [Who is responsible for implementation?]
|
||||
- **Timeline**: [By when?]
|
||||
|
||||
**Option 2: [Mitigation name]**
|
||||
- **Type**:
|
||||
- **Description**:
|
||||
- **Effectiveness**:
|
||||
- **Cost/effort**:
|
||||
- **Tradeoffs**:
|
||||
- **Owner**:
|
||||
- **Timeline**:
|
||||
|
||||
**Recommended Approach**: [Which option(s) to pursue and why]
|
||||
|
||||
#### Residual Risk
|
||||
|
||||
After mitigation:
|
||||
- **New severity**: [1-5]
|
||||
- **New likelihood**: [1-5]
|
||||
- **New risk score**: [S×L]
|
||||
|
||||
**Acceptable?**
|
||||
- [ ] Yes, residual risk is acceptable given tradeoffs
|
||||
- [ ] No, need additional mitigations
|
||||
- [ ] Escalate to [ethics committee/leadership/etc.]
|
||||
|
||||
#### Implementation Checklist
|
||||
|
||||
- [ ] Design changes specified
|
||||
- [ ] Testing plan includes affected groups
|
||||
- [ ] Documentation updated (policies, help docs, disclosures)
|
||||
- [ ] Training provided (if human review/moderation involved)
|
||||
- [ ] Monitoring metrics defined (see next template)
|
||||
- [ ] Review date scheduled (when to reassess)
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Framework Template
|
||||
|
||||
### Outcome Metrics
|
||||
|
||||
Track actual impacts post-launch to detect harms early.
|
||||
|
||||
**Metric 1**: [Metric name, e.g., "Approval rate parity"]
|
||||
- **Definition**: [Precisely what is measured]
|
||||
- **Measurement method**: [How calculated, from what data]
|
||||
- **Baseline**: [Current or expected value]
|
||||
- **Target**: [Goal value]
|
||||
- **Threshold for concern**: [Value that triggers action]
|
||||
- **Disaggregation**: [Break down by race, gender, age, disability, etc.]
|
||||
- **Frequency**: [Daily, weekly, monthly, quarterly]
|
||||
- **Owner**: [Who tracks and reports this]
|
||||
|
||||
**Metric 2**: [Metric name]
|
||||
- Definition, method, baseline, target, threshold, disaggregation, frequency, owner
|
||||
|
||||
**Metric 3**: [Metric name]
|
||||
- Definition, method, baseline, target, threshold, disaggregation, frequency, owner
|
||||
|
||||
### Leading Indicators & Qualitative Monitoring
|
||||
|
||||
- **Indicator 1**: [e.g., "User reports spike"] - Threshold: [level]
|
||||
- **Indicator 2**: [e.g., "Declining engagement Group Y"] - Threshold: [level]
|
||||
- **User feedback**: Channels for reporting concerns
|
||||
- **Community listening**: Forums, social media, support tickets
|
||||
- **Affected group outreach**: Check-ins with vulnerable communities
|
||||
|
||||
### Escalation Protocol
|
||||
|
||||
**Yellow Alert** (early warning):
|
||||
- **Trigger**: [e.g., Metric exceeds threshold by 10-20%]
|
||||
- **Response**: Investigate, analyze patterns, prepare report
|
||||
|
||||
**Orange Alert** (concerning):
|
||||
- **Trigger**: [e.g., Metric exceeds threshold by >20%, or multiple yellow alerts]
|
||||
- **Response**: Escalate to product/ethics team, begin mitigation planning
|
||||
|
||||
**Red Alert** (critical):
|
||||
- **Trigger**: [e.g., Serious harm reported, disparate impact >20%, safety incident]
|
||||
- **Response**: Escalate to leadership, pause rollout or rollback, immediate remediation
|
||||
|
||||
**Escalation Path**:
|
||||
1. First escalation: [Role/person]
|
||||
2. If unresolved or critical: [Role/person]
|
||||
3. Final escalation: [Ethics committee, CEO, board]
|
||||
|
||||
### Review Cadence
|
||||
|
||||
- **Daily**: Critical safety metrics (safety-critical systems only)
|
||||
- **Weekly**: User complaints, support tickets
|
||||
- **Monthly**: Outcome metrics, disparate impact dashboard
|
||||
- **Quarterly**: Comprehensive fairness audit
|
||||
- **Annually**: External audit, stakeholder consultation
|
||||
|
||||
### Audit & Accountability
|
||||
|
||||
- **Audits**: Internal (who, frequency), external (independent, when)
|
||||
- **Transparency**: What disclosed, where published
|
||||
- **Affected group consultation**: How vulnerable groups involved in oversight
|
||||
|
||||
---
|
||||
|
||||
## Complete Assessment Template
|
||||
|
||||
Full documentation structure combines all above templates:
|
||||
|
||||
1. **Context**: Feature/decision description, problem, alternatives
|
||||
2. **Stakeholder Analysis**: Use Stakeholder Mapping Template
|
||||
3. **Harm & Benefit Analysis**: Use Harm/Benefit Analysis Template for each group
|
||||
4. **Fairness Assessment**: Use Fairness Assessment Template (outcome/treatment/access)
|
||||
5. **Risk Prioritization**: Use Risk Matrix Template, identify critical harms
|
||||
6. **Mitigation Plan**: Use Mitigation Planning Template for each critical harm
|
||||
7. **Monitoring & Escalation**: Use Monitoring Framework Template
|
||||
8. **Decision**: Proceed/staged rollout/delay/reject with rationale and sign-off
|
||||
9. **Post-Launch Review**: 30-day, 90-day checks, ongoing monitoring, updates
|
||||
Reference in New Issue
Block a user