Initial commit

2025-11-30 08:38:26 +08:00
commit 41d9f6b189
304 changed files with 98322 additions and 0 deletions
--- a/skills/research-claim-map/SKILL.md
+++ b/skills/research-claim-map/SKILL.md
@@ -0,0 +1,213 @@
+---
+name: research-claim-map
+description: Use when verifying claims before decisions, fact-checking statements against sources, conducting due diligence on vendor/competitor assertions, evaluating conflicting evidence, triangulating source credibility, assessing research validity for literature reviews, investigating misinformation, rating evidence strength (primary vs secondary), identifying knowledge gaps, or when user mentions "fact-check", "verify this", "is this true", "evaluate sources", "conflicting evidence", or "due diligence".
+---
+
+# Research Claim Map
+
+## Table of Contents
+1. [Purpose](#purpose)
+2. [When to Use](#when-to-use)
+3. [What Is It](#what-is-it)
+4. [Workflow](#workflow)
+5. [Evidence Quality Framework](#evidence-quality-framework)
+6. [Source Credibility Assessment](#source-credibility-assessment)
+7. [Common Patterns](#common-patterns)
+8. [Guardrails](#guardrails)
+9. [Quick Reference](#quick-reference)
+
+## Purpose
+
+Research Claim Map helps you systematically evaluate claims by triangulating sources, assessing evidence quality, identifying limitations, and reaching evidence-based conclusions. It prevents confirmation bias, overconfidence, and reliance on unreliable sources.
+
+## When to Use
+
+**Invoke this skill when you need to:**
+- Verify factual claims before making decisions or recommendations
+- Evaluate conflicting evidence from multiple sources
+- Assess vendor claims, product benchmarks, or competitive intelligence
+- Conduct due diligence on business assertions (revenue, customers, capabilities)
+- Fact-check news stories, social media claims, or viral statements
+- Review academic literature for research validity
+- Investigate potential misinformation or misleading statistics
+- Rate evidence strength for policy decisions or strategic planning
+- Triangulate eyewitness accounts or historical records
+- Identify knowledge gaps and areas requiring further investigation
+
+**User phrases that trigger this skill:**
+- "Is this claim true?"
+- "Can you verify this?"
+- "Fact-check this statement"
+- "I found conflicting information about..."
+- "How reliable is this source?"
+- "What's the evidence for..."
+- "Due diligence on..."
+- "Evaluate these competing claims"
+
+## What Is It
+
+A Research Claim Map is a structured analysis that breaks down a claim into:
+1. **Claim statement** (specific, testable assertion)
+2. **Evidence for** (sources supporting the claim, rated by quality)
+3. **Evidence against** (sources contradicting the claim, rated by quality)
+4. **Source credibility** (expertise, bias, track record for each source)
+5. **Limitations** (gaps, uncertainties, assumptions)
+6. **Conclusion** (confidence level, decision recommendation)
+
+**Quick example:**
+- **Claim**: "Competitor X has 10,000 paying customers"
+- **Evidence for**: Press release (secondary), case study count (tertiary)
+- **Evidence against**: Industry analyst estimate of 3,000 (secondary)
+- **Credibility**: Press release (biased source), analyst (independent but uncertain methodology)
+- **Limitations**: No primary source verification, customer definition unclear
+- **Conclusion**: Low confidence (40%) - likely inflated, need primary verification
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Research Claim Map Progress:
+- [ ] Step 1: Define the claim precisely
+- [ ] Step 2: Gather and categorize evidence
+- [ ] Step 3: Rate evidence quality and source credibility
+- [ ] Step 4: Identify limitations and gaps
+- [ ] Step 5: Draw evidence-based conclusion
+```
+
+**Step 1: Define the claim precisely**
+
+Restate the claim as a specific, testable assertion. Avoid vague language - use numbers, dates, and clear terms. See [Common Patterns](#common-patterns) for claim reformulation examples.
+
+**Step 2: Gather and categorize evidence**
+
+Collect sources supporting and contradicting the claim. Organize into "Evidence For" and "Evidence Against". For straightforward verification → Use [resources/template.md](resources/template.md). For complex multi-source investigations → Study [resources/methodology.md](resources/methodology.md).
+
+**Step 3: Rate evidence quality and source credibility**
+
+Apply [Evidence Quality Framework](#evidence-quality-framework) to rate each source (primary/secondary/tertiary). Apply [Source Credibility Assessment](#source-credibility-assessment) to evaluate expertise, bias, and track record.
+
+**Step 4: Identify limitations and gaps**
+
+Document what's unknown, what assumptions were made, and where evidence is weak or missing. See [resources/methodology.md](resources/methodology.md) for gap analysis techniques.
+
+**Step 5: Draw evidence-based conclusion**
+
+Synthesize findings into confidence level (0-100%) and actionable recommendation (believe/skeptical/reject claim). Self-check using `resources/evaluators/rubric_research_claim_map.json` before delivering. Minimum standard: Average score ≥ 3.5.
+
+## Evidence Quality Framework
+
+**Rating scale:**
+
+**Primary Evidence (Strongest):**
+- Direct observation or measurement
+- Original data or records
+- First-hand accounts from participants
+- Raw datasets, transaction logs
+- Example: Sales database showing 10,000 customer IDs
+
+**Secondary Evidence (Medium):**
+- Analysis or interpretation of primary sources
+- Expert synthesis of multiple primary sources
+- Peer-reviewed research papers
+- Verified news reporting with primary source citations
+- Example: Industry analyst report analyzing public filings
+
+**Tertiary Evidence (Weakest):**
+- Summaries of secondary sources
+- Textbooks, encyclopedias, Wikipedia
+- Press releases, marketing materials
+- Anecdotal reports without verification
+- Example: Company blog post claiming customer count
+
+**Non-Evidence (Unreliable):**
+- Unverified social media posts
+- Anonymous claims
+- "Experts say" without attribution
+- Circular references (A cites B, B cites A)
+- Example: Viral tweet with no source
+
+## Source Credibility Assessment
+
+**Evaluate each source on:**
+
+**Expertise (Does source have relevant knowledge?):**
+- High: Domain expert with credentials, track record
+- Medium: Knowledgeable but not specialist
+- Low: No demonstrated expertise
+
+**Independence (Is source biased or conflicted?):**
+- High: Independent, no financial/personal stake
+- Medium: Some potential bias, disclosed
+- Low: Direct financial interest, undisclosed conflicts
+
+**Track Record (Has source been accurate before?):**
+- High: Consistent accuracy, corrections when wrong
+- Medium: Mixed record or unknown history
+- Low: History of errors, retractions, unreliability
+
+**Methodology (How did source obtain information?):**
+- High: Transparent, replicable, rigorous
+- Medium: Some methodology disclosed
+- Low: Opaque, unverifiable, cherry-picked
+
+## Common Patterns
+
+**Pattern 1: Vendor Claim Verification**
+- **Claim type**: Product performance, customer count, ROI
+- **Approach**: Seek independent verification (analysts, customers), test claims yourself
+- **Red flags**: Only vendor sources, vague metrics, "up to X%" ranges
+
+**Pattern 2: Academic Literature Review**
+- **Claim type**: Research findings, causal claims
+- **Approach**: Check for replication studies, meta-analyses, competing explanations
+- **Red flags**: Single study, small sample, conflicts of interest, p-hacking
+
+**Pattern 3: News Fact-Checking**
+- **Claim type**: Events, statistics, quotes
+- **Approach**: Trace to primary source, check multiple outlets, verify context
+- **Red flags**: Anonymous sources, circular reporting, sensational framing
+
+**Pattern 4: Statistical Claims**
+- **Claim type**: Percentages, trends, correlations
+- **Approach**: Check methodology, sample size, base rates, confidence intervals
+- **Red flags**: Cherry-picked timeframes, denominator unclear, correlation ≠ causation
+
+## Guardrails
+
+**Avoid common biases:**
+- **Confirmation bias**: Actively seek evidence against your hypothesis
+- **Authority bias**: Don't accept claims just because source is prestigious
+- **Recency bias**: Older evidence can be more reliable than latest claims
+- **Availability bias**: Vivid anecdotes ≠ representative data
+
+**Quality standards:**
+- Rate confidence numerically (0-100%), not vague terms ("probably", "likely")
+- Document all assumptions explicitly
+- Distinguish "no evidence found" from "evidence of absence"
+- Update conclusions as new evidence emerges
+- Flag when evidence quality is insufficient for confident conclusion
+
+**Ethical considerations:**
+- Respect source privacy and attribution
+- Avoid cherry-picking evidence to support desired conclusion
+- Acknowledge limitations and uncertainties
+- Correct errors promptly when found
+
+## Quick Reference
+
+**Resources:**
+- **Quick verification**: [resources/template.md](resources/template.md)
+- **Complex investigations**: [resources/methodology.md](resources/methodology.md)
+- **Quality rubric**: `resources/evaluators/rubric_research_claim_map.json`
+
+**Evidence hierarchy**: Primary > Secondary > Tertiary
+
+**Credibility factors**: Expertise + Independence + Track Record + Methodology
+
+**Confidence calibration**:
+- 90-100%: Near certain, multiple primary sources, high credibility
+- 70-89%: Confident, strong secondary sources, some limitations
+- 50-69%: Uncertain, conflicting evidence or weak sources
+- 30-49%: Skeptical, more evidence against than for
+- 0-29%: Likely false, strong evidence against
--- a/skills/research-claim-map/resources/evaluators/rubric_research_claim_map.json
+++ b/skills/research-claim-map/resources/evaluators/rubric_research_claim_map.json
@@ -0,0 +1,159 @@
+{
+  "name": "Research Claim Map Evaluator",
+  "description": "Evaluates claim verification for precise claim definition, evidence triangulation, source credibility assessment, and confidence calibration",
+  "criteria": [
+    {
+      "name": "Claim Precision and Testability",
+      "weight": 1.4,
+      "scale": {
+        "1": "Vague claim with undefined terms, no specific metrics, untestable",
+        "2": "Somewhat specific but missing key details (timeframe, scope, or metrics unclear)",
+        "3": "Specific claim with most terms defined, testable with some clarification needed",
+        "4": "Precise claim with numbers/dates/scope, clear terms, fully testable",
+        "5": "Exemplary: Reformulated from vague to specific, key terms defined explicitly, potential ambiguities addressed, claim decomposed into testable sub-claims if complex"
+      }
+    },
+    {
+      "name": "Evidence Triangulation Quality",
+      "weight": 1.5,
+      "scale": {
+        "1": "Single source or only sources agreeing with claim (no contradicting evidence sought)",
+        "2": "Multiple sources but all similar type/origin (not truly independent)",
+        "3": "2-3 independent sources for and against, basic triangulation",
+        "4": "3+ independent sources with different methodologies, both supporting and contradicting evidence gathered",
+        "5": "Exemplary: Systematic triangulation across source types (primary + secondary), methodologies (quantitative + qualitative), perspectives (proponent + skeptic), active search for disconfirming evidence, circular citations identified and excluded"
+      }
+    },
+    {
+      "name": "Source Credibility Assessment",
+      "weight": 1.4,
+      "scale": {
+        "1": "No credibility evaluation or accepted sources at face value",
+        "2": "Basic credibility check (author identified) but no depth (expertise, bias, track record ignored)",
+        "3": "Credibility assessed on 2-3 factors (e.g., expertise and independence) with brief reasoning",
+        "4": "All four factors assessed (expertise, independence, track record, methodology) for each major source with clear reasoning",
+        "5": "Exemplary: Systematic CRAAP test or equivalent, conflicts of interest identified, track record researched (retractions, corrections), methodology transparency evaluated, source bias explicitly noted and accounted for in weighting"
+      }
+    },
+    {
+      "name": "Evidence Quality Rating",
+      "weight": 1.3,
+      "scale": {
+        "1": "No distinction between evidence types (treats social media post = peer-reviewed study)",
+        "2": "Vague quality labels ('good source', 'reliable') without systematic rating",
+        "3": "Evidence categorized (primary/secondary/tertiary) but inconsistently or without justification",
+        "4": "Systematic rating using evidence hierarchy, each source classified with brief rationale",
+        "5": "Exemplary: Evidence rated on multiple dimensions (type, methodology, sample size, recency), quality justification detailed, limitations of even high-quality evidence acknowledged"
+      }
+    },
+    {
+      "name": "Limitations and Gaps Documentation",
+      "weight": 1.3,
+      "scale": {
+        "1": "No limitations noted or only minor caveats mentioned",
+        "2": "Generic limitations ('more research needed') without specifics",
+        "3": "Some limitations noted (missing data or assumptions stated) but incomplete",
+        "4": "Comprehensive limitations: gaps identified, assumptions stated, quality concerns noted, unknowns distinguished from known contradictions",
+        "5": "Exemplary: Systematic gap analysis (what evidence expected but not found), assumptions explicitly tested for sensitivity, distinction made between 'no evidence found' vs 'evidence of absence', suggestions for what would increase confidence"
+      }
+    },
+    {
+      "name": "Confidence Calibration and Reasoning",
+      "weight": 1.4,
+      "scale": {
+        "1": "No confidence level stated or vague ('probably', 'likely')",
+        "2": "Confidence stated but not justified or clearly miscalibrated (100% on weak evidence or 10% on strong)",
+        "3": "Numeric confidence (0-100%) stated with basic reasoning",
+        "4": "Well-calibrated confidence with clear reasoning linking to evidence quality, source credibility, and limitations",
+        "5": "Exemplary: Confidence range provided (not point estimate), reasoning traces from base rates through evidence to posterior, sensitivity analysis shown (confidence under different assumptions), explicit acknowledgment of uncertainty"
+      }
+    },
+    {
+      "name": "Bias Detection and Mitigation",
+      "weight": 1.2,
+      "scale": {
+        "1": "Clear bias (cherry-picked evidence, ignored contradictions, one-sided presentation)",
+        "2": "Unintentional bias (confirmation bias evident, didn't actively seek contradicting evidence)",
+        "3": "Some bias mitigation (acknowledged contradicting evidence, noted potential biases)",
+        "4": "Active bias mitigation (sought disconfirming evidence, considered alternative explanations, acknowledged own potential biases)",
+        "5": "Exemplary: Systematic bias checks (CRAAP test, conflict of interest disclosure), actively argued against own hypothesis, considered base rates to avoid availability/anchoring bias, source framing bias identified and corrected"
+      }
+    },
+    {
+      "name": "Actionable Recommendation",
+      "weight": 1.2,
+      "scale": {
+        "1": "No recommendation or action unclear ('interesting finding', 'needs more study')",
+        "2": "Vague action ('be cautious', 'consider this') without decision guidance",
+        "3": "Basic recommendation (believe/reject/uncertain) but not tied to decision context",
+        "4": "Clear actionable recommendation linked to confidence level and decision context (what to do given uncertainty)",
+        "5": "Exemplary: Recommendation with decision thresholds ('if you need 80%+ confidence to act, don't proceed; if 60% sufficient, proceed with mitigation X'), contingency plans for uncertainty, clear next steps to increase confidence if needed"
+      }
+    }
+  ],
+  "guidance": {
+    "by_claim_type": {
+      "vendor_claims": {
+        "recommended_approach": "Seek independent verification (analysts, customer references, trials), avoid relying solely on vendor sources",
+        "evidence_priority": "Primary (customer data, trials) > Secondary (analyst reports) > Tertiary (vendor press releases)",
+        "red_flags": ["Only vendor sources", "Vague metrics ('up to X%')", "Cherry-picked case studies", "No independent verification"]
+      },
+      "news_claims": {
+        "recommended_approach": "Trace to primary source, check multiple outlets, verify context and framing",
+        "evidence_priority": "Primary sources (official statements, documents) > Secondary (news reports citing primary) > Tertiary (opinion pieces)",
+        "red_flags": ["Single source", "Anonymous claims without corroboration", "Circular reporting (outlets citing each other)", "Out-of-context quotes"]
+      },
+      "research_claims": {
+        "recommended_approach": "Check replication studies, meta-analyses, assess methodology rigor, look for conflicts of interest",
+        "evidence_priority": "RCTs/meta-analyses > Observational studies > Expert opinion",
+        "red_flags": ["Single study", "Small sample", "Conflicts of interest undisclosed", "P-hacking indicators", "Correlation claimed as causation"]
+      },
+      "statistical_claims": {
+        "recommended_approach": "Verify methodology, sample size, confidence intervals, base rates, check for denominators",
+        "evidence_priority": "Transparent methodology with raw data > Summary statistics > Infographics without sources",
+        "red_flags": ["Denominator unclear", "Cherry-picked timeframes", "Correlation ≠ causation", "No confidence intervals", "Misleading visualizations"]
+      }
+    }
+  },
+  "common_failure_modes": {
+    "confirmation_bias": {
+      "symptom": "Only evidence supporting claim found, contradictions ignored or dismissed",
+      "root_cause": "Want claim to be true (motivated reasoning), didn't actively search for disconfirmation",
+      "fix": "Actively search 'why this might be wrong', assign someone to argue against, seek base rates for skepticism"
+    },
+    "authority_bias": {
+      "symptom": "Accepted claim because prestigious source (Nobel Prize, Harvard, Fortune 500) without evaluating evidence",
+      "root_cause": "Heuristic: prestigious source = truth (often valid but not always)",
+      "fix": "Evaluate evidence quality independently, check if expert in this specific domain, verify claim not opinion"
+    },
+    "single_source_overconfidence": {
+      "symptom": "High confidence based on one source, even if high quality",
+      "root_cause": "Didn't triangulate, assumed quality source = truth",
+      "fix": "Require 2-3 independent sources for high confidence, check for replication/corroboration"
+    },
+    "vague_confidence": {
+      "symptom": "Used words ('probably', 'likely', 'seems') instead of numbers (60%, 75%)",
+      "root_cause": "Uncomfortable quantifying uncertainty, didn't calibrate",
+      "fix": "Force numeric confidence (0-100%), use calibration guides, test against base rates"
+    },
+    "missing_limitations": {
+      "symptom": "Conclusion presented without caveats, gaps, or unknowns acknowledged",
+      "root_cause": "Focused on what's known, didn't systematically check for what's unknown",
+      "fix": "Template section for limitations forces documentation, ask 'what would make me more confident?'"
+    }
+  },
+  "excellence_indicators": [
+    "Claim reformulated from vague to specific, testable assertion",
+    "Evidence triangulated: 3+ independent sources, multiple methodologies",
+    "Source credibility systematically assessed: expertise, independence, track record, methodology",
+    "Evidence rated using clear hierarchy: primary > secondary > tertiary",
+    "Contradicting evidence actively sought and fairly presented",
+    "Limitations and gaps comprehensively documented",
+    "Assumptions stated explicitly and tested for sensitivity",
+    "Confidence calibrated numerically (0-100%) with reasoning",
+    "Recommendation actionable and tied to decision context",
+    "Bias mitigation demonstrated (sought disconfirming evidence, checked conflicts of interest)",
+    "Distinction made between 'no evidence found' and 'evidence of absence'",
+    "Sources properly cited with links/references for verification"
+  ]
+}
--- a/skills/research-claim-map/resources/methodology.md
+++ b/skills/research-claim-map/resources/methodology.md
@@ -0,0 +1,328 @@
+# Research Claim Map: Advanced Methodologies
+
+## Table of Contents
+1. [Triangulation Techniques](#1-triangulation-techniques)
+2. [Source Verification Methods](#2-source-verification-methods)
+3. [Evidence Synthesis Frameworks](#3-evidence-synthesis-frameworks)
+4. [Bias Detection and Mitigation](#4-bias-detection-and-mitigation)
+5. [Confidence Calibration Techniques](#5-confidence-calibration-techniques)
+6. [Advanced Investigation Patterns](#6-advanced-investigation-patterns)
+
+## 1. Triangulation Techniques
+
+### Multi-Source Verification
+
+**Independent corroboration**:
+- **Minimum 3 independent sources** for high-confidence claims
+- Sources are independent if: different authors, organizations, funding, data collection methods
+- Example: Government report + Academic study + Industry analysis (all using different data)
+
+**Detecting circular citations**:
+- Trace back to original source - if A cites B, B cites C, C cites A → circular, invalid
+- Check publication dates - later sources should cite earlier, not reverse
+- Use citation indexes (Google Scholar, Web of Science) to map citation networks
+
+**Convergent evidence**:
+- Different methodologies reaching same conclusion (surveys + experiments + observational)
+- Different populations/contexts showing same pattern
+- Example: Lab studies + field studies + meta-analyses all finding same effect
+
+### Cross-Checking Strategies
+
+**Fact-checking databases**:
+- Snopes, FactCheck.org, PolitiFact for public claims
+- Retraction Watch for scientific papers
+- OpenSecrets for political funding claims
+- SEC EDGAR for financial claims
+
+**Domain-specific verification**:
+- Medical: PubMed, Cochrane Reviews, FDA databases
+- Technology: CVE databases, vendor security advisories, benchmark repositories
+- Business: Crunchbase, SEC filings, earnings transcripts
+- Historical: Primary source archives, digitized records
+
+**Temporal consistency**:
+- Check if claim was true at time stated (not just currently)
+- Verify dates in citations match narrative
+- Look for anachronisms (technology/events cited before they existed)
+
+## 2. Source Verification Methods
+
+### CRAAP Test (Currency, Relevance, Authority, Accuracy, Purpose)
+
+**Currency**: When was it published/updated?
+- High: Within last year for fast-changing topics, within 5 years for stable domains
+- Medium: Dated but still applicable
+- Low: Outdated, context has changed significantly
+
+**Relevance**: Does it address your specific claim?
+- High: Directly addresses claim with same scope/context
+- Medium: Related but different scope (e.g., different population, timeframe)
+- Low: Tangentially related, requires extrapolation
+
+**Authority**: Who is the author/publisher?
+- High: Recognized expert, peer-reviewed publication, established institution
+- Medium: Knowledgeable but not top-tier, some editorial oversight
+- Low: Unknown author, self-published, no credentials
+
+**Accuracy**: Can it be verified?
+- High: Data/methods shared, replicable, other sources corroborate
+- Medium: Some verification possible, mostly consistent with known facts
+- Low: Unverifiable claims, contradicts established knowledge
+
+**Purpose**: Why was it created?
+- High: Inform/educate, transparent about limitations
+- Medium: Persuade but with evidence, some bias acknowledged
+- Low: Sell/propagandize, misleading framing, undisclosed conflicts
+
+### Domain Authority Assessment
+
+**Academic sources**:
+- Journal impact factor (higher = more rigorous peer review)
+- H-index of authors (citation impact)
+- Institutional affiliation (R1 research university > teaching-focused college)
+- Funding source disclosure (NIH grant > pharmaceutical company funding for drug study)
+
+**News sources**:
+- Editorial standards (corrections policy, fact-checking team)
+- Awards/recognition (Pulitzer, Peabody, investigative journalism awards)
+- Ownership transparency (independent > owned by entity with vested interest)
+- Track record (history of accurate reporting vs retractions)
+
+**Technical sources**:
+- Benchmark methodology disclosure (reproducible specs, public data)
+- Vendor independence (third-party testing > vendor self-reporting)
+- Community verification (open-source code, peer reproduction)
+- Standards compliance (IEEE, NIST, OWASP standards)
+
+## 3. Evidence Synthesis Frameworks
+
+### GRADE System (Grading of Recommendations Assessment, Development and Evaluation)
+
+**Start with evidence type**:
+- Randomized controlled trials (RCTs): Start HIGH quality
+- Observational studies: Start LOW quality
+- Expert opinion: Start VERY LOW quality
+
+**Downgrade for**:
+- Risk of bias (methodology flaws, conflicts of interest)
+- Inconsistency (conflicting results across studies)
+- Indirectness (different population/intervention than claim)
+- Imprecision (small sample, wide confidence intervals)
+- Publication bias (only positive results published)
+
+**Upgrade for**:
+- Large effect size (strong signal)
+- Dose-response gradient (more X → more Y)
+- All plausible confounders would reduce effect (conservative estimate)
+
+**Final quality rating**:
+- **High**: Very confident true effect is close to estimate
+- **Moderate**: Moderately confident, true effect likely close
+- **Low**: Limited confidence, true effect may differ substantially
+- **Very Low**: Very little confidence, true effect likely very different
+
+### Meta-Analysis Interpretation
+
+**Effect size + confidence intervals**:
+- Large effect + narrow CI = high confidence
+- Small effect + narrow CI = real but modest effect
+- Any effect + wide CI = uncertain
+- Example: "10% improvement (95% CI: 5-15%)" vs "10% improvement (95% CI: -5-25%)"
+
+**Heterogeneity (I² statistic)**:
+- I² < 25%: Low heterogeneity, studies agree
+- I² 25-75%: Moderate heterogeneity, some variation
+- I² > 75%: High heterogeneity, studies conflict (be skeptical of pooled estimate)
+
+**Publication bias detection**:
+- Funnel plot asymmetry (missing small negative studies)
+- File drawer problem (unpublished null results)
+- Check trial registries (ClinicalTrials.gov) for unreported studies
+
+## 4. Bias Detection and Mitigation
+
+### Common Cognitive Biases in Claim Evaluation
+
+**Confirmation bias**:
+- **Symptom**: Finding only supporting evidence, ignoring contradictions
+- **Mitigation**: Actively search for "why this might be wrong", assign someone to argue against
+- **Example**: Believing vendor claim because you want product to work
+
+**Availability bias**:
+- **Symptom**: Overweighting vivid anecdotes vs dry statistics
+- **Mitigation**: Prioritize data over stories, ask "how representative?"
+- **Example**: Fearing plane crashes (vivid news) over car crashes (statistically riskier)
+
+**Authority bias**:
+- **Symptom**: Accepting claims because source is prestigious (Nobel Prize, Harvard, etc.)
+- **Mitigation**: Evaluate evidence quality independently, check if expert in this specific domain
+- **Example**: Believing physicist's medical claims (out of domain expertise)
+
+**Anchoring bias**:
+- **Symptom**: First number heard becomes reference point
+- **Mitigation**: Seek base rates, compare to industry benchmarks, gather range of estimates
+- **Example**: Vendor says "saves 50%" → anchor on 50%, skeptical of analyst saying 10%
+
+**Recency bias**:
+- **Symptom**: Overweighting latest information, dismissing older evidence
+- **Mitigation**: Consider full timeline, check if latest is outlier or trend
+- **Example**: One bad quarter → ignoring 5 years of growth
+
+### Source Bias Indicators
+
+**Financial conflicts of interest**:
+- Study funded by company whose product is being evaluated
+- Author owns stock, serves on board, receives consulting fees
+- Disclosure: Look for "Conflicts of Interest" section in papers, FDA disclosures
+
+**Ideological bias**:
+- Think tank with known political lean
+- Advocacy organization with mission-driven agenda
+- Framing: Watch for loaded language, cherry-picked comparisons
+
+**Selection bias in studies**:
+- Participants not representative of target population
+- Dropout rate differs between groups
+- Outcomes measured selectively (dropped endpoints with null results)
+
+**Reporting bias**:
+- Positive results published, negative results buried
+- Outcomes changed after seeing data (HARKing: Hypothesizing After Results Known)
+- Subsetting data until significance found (p-hacking)
+
+## 5. Confidence Calibration Techniques
+
+### Bayesian Updating
+
+**Start with prior probability** (before seeing evidence):
+- Base rate: How often is this type of claim true?
+- Example: "New product will disrupt market" - base rate ~5% (most fail)
+
+**Update with evidence** (likelihood ratio):
+- How much more likely is this evidence if claim is true vs false?
+- Strong evidence: Likelihood ratio >10 (evidence 10× more likely if claim true)
+- Weak evidence: Likelihood ratio <3
+
+**Calculate posterior probability** (after evidence):
+- Use Bayes theorem or intuitive updating
+- Example: Prior 5%, strong evidence (LR=10) → Posterior ~35%
+
+### Fermi Estimation for Sanity Checks
+
+**Decompose claim into estimable parts**:
+- Claim: "Company has 10,000 paying customers"
+- Decompose: Employees × customers per employee, or revenue ÷ price per customer
+- Cross-check: Do the numbers add up?
+
+**Example**:
+- Claim: Startup has 1M users
+- Check: Founded 2 years ago → 1,370 new users/day → 57/hour (24/7) or 171/hour (8hr workday)
+- Reality check: Plausible for viral product? Need marketing spend estimate.
+
+### Confidence Intervals and Ranges
+
+**Avoid point estimates** ("70% confident"):
+- Use ranges: "60-80% confident" acknowledges uncertainty
+- Ask: What would make me 90% confident? What's missing?
+
+**Sensitivity analysis**:
+- Best case scenario (all assumptions optimistic) → upper bound confidence
+- Worst case scenario (all assumptions pessimistic) → lower bound confidence
+- Most likely scenario → central estimate
+
+## 6. Advanced Investigation Patterns
+
+### Investigative Journalism Techniques
+
+**Paper trail following**:
+- Follow money: Who benefits financially from this claim being believed?
+- Follow timeline: Who said what when? Any story changes over time?
+- Follow power: Who has authority/incentive to suppress contradicting evidence?
+
+**Source cultivation**:
+- Insider sources (whistleblowers, former employees) for claims companies hide
+- Expert sources (academics, consultants) for technical evaluation
+- Documentary sources (contracts, emails, internal memos) for ground truth
+
+**Red flags in interviews**:
+- Vague answers to specific questions
+- Defensiveness or hostility when questioned
+- Inconsistencies between different tellings
+- Refusal to provide documentation
+
+### Legal Evidence Standards
+
+**Burden of proof levels**:
+- **Beyond reasonable doubt** (criminal): 95%+ confidence
+- **Clear and convincing** (civil high stakes): 75%+ confidence
+- **Preponderance of evidence** (civil standard): 51%+ confidence (more likely than not)
+
+**Hearsay rules**:
+- Firsthand testimony > secondhand ("I saw X" > "Someone told me X")
+- Exception: Business records, public records (trustworthy hearsay)
+- Watch for: Anonymous sources, "people are saying", "experts claim"
+
+**Chain of custody**:
+- Document handling: Who collected, stored, analyzed evidence?
+- Tampering risk: Could evidence have been altered?
+- Authentication: How do we know this document/photo is genuine?
+
+### Competitive Intelligence Validation
+
+**HUMINT (Human Intelligence)**:
+- Customer interviews: "Do you use competitor's product? How does it work?"
+- Former employees: Glassdoor reviews, LinkedIn networking
+- Conference presentations: Technical details revealed publicly
+
+**OSINT (Open Source Intelligence)**:
+- Public filings: SEC 10-K, patents, trademarks
+- Job postings: What skills are they hiring for? (reveals technology stack, strategic priorities)
+- Social media: Employee posts, company announcements
+- Web archives: Wayback Machine to see claim history, website changes
+
+**TECHINT (Technical Intelligence)**:
+- Reverse engineering: Analyze product directly
+- Benchmarking: Test performance claims yourself
+- Network analysis: DNS records, API endpoints, infrastructure footprint
+
+### Scientific Reproducibility Assessment
+
+**Replication indicator**:
+- Has anyone reproduced the finding? (Strong evidence)
+- Did replication attempts fail? (Evidence against)
+- Has no one tried to replicate? (Unknown, be cautious)
+
+**Pre-registration check**:
+- Was study pre-registered (ClinicalTrials.gov, OSF)? Reduces p-hacking risk
+- Do results match pre-registered outcomes? If different, why?
+
+**Data/code availability**:
+- Can you access raw data to re-analyze?
+- Is code available to reproduce analysis?
+- Are materials specified to replicate experiment?
+
+**Robustness checks**:
+- Do findings hold with different analysis methods?
+- Are results sensitive to outliers or specific assumptions?
+- Do subsample analyses show consistent effects?
+
+---
+
+## Workflow Integration
+
+**When to use advanced techniques**:
+
+**Triangulation** → Every claim (minimum requirement)
+**CRAAP Test** → When assessing unfamiliar sources
+**GRADE System** → Medical/health claims, policy decisions
+**Bayesian Updating** → When you have prior knowledge/base rates
+**Fermi Estimation** → Quantitative claims that seem implausible
+**Investigative Techniques** → High-stakes business decisions, fraud detection
+**Legal Standards** → Determining action thresholds (e.g., firing employee, lawsuit)
+**Reproducibility Assessment** → Scientific/technical claims
+
+**Start simple, add complexity as needed**:
+1. Quick verification: CRAAP test + Google fact-check
+2. Moderate investigation: Triangulate 3 sources + basic bias check
+3. Deep investigation: Full methodology above + expert consultation
--- a/skills/research-claim-map/resources/template.md
+++ b/skills/research-claim-map/resources/template.md
@@ -0,0 +1,338 @@
+# Research Claim Map Template
+
+## Workflow
+
+Copy this checklist and track your progress:
+
+```
+Research Claim Map Progress:
+- [ ] Step 1: Define claim precisely
+- [ ] Step 2: Gather evidence for and against
+- [ ] Step 3: Rate evidence quality
+- [ ] Step 4: Assess source credibility
+- [ ] Step 5: Identify limitations
+- [ ] Step 6: Synthesize conclusion
+```
+
+**Step 1: Define claim precisely**
+
+Restate as specific, testable assertion with numbers, dates, clear terms. See [Claim Reformulation](#claim-reformulation-examples).
+
+**Step 2: Gather evidence for and against**
+
+Find sources supporting and contradicting claim. See [Evidence Categories](#evidence-categories).
+
+**Step 3: Rate evidence quality**
+
+Apply evidence hierarchy (primary > secondary > tertiary). See [Evidence Quality Rating](#evidence-quality-rating).
+
+**Step 4: Assess source credibility**
+
+Evaluate expertise, independence, track record, methodology. See [Credibility Assessment](#source-credibility-scoring).
+
+**Step 5: Identify limitations**
+
+Document gaps, assumptions, uncertainties. See [Limitations Documentation](#limitations-and-gaps).
+
+**Step 6: Synthesize conclusion**
+
+Determine confidence level (0-100%) and recommendation. See [Confidence Calibration](#confidence-level-calibration).
+
+---
+
+## Research Claim Map Template
+
+### 1. Claim Statement
+
+**Original claim**: [Quote exact claim as stated]
+
+**Reformulated claim** (specific, testable): [Restate with precise terms, numbers, dates, scope]
+
+**Why this claim matters**: [Decision impact, stakes, consequences if true/false]
+
+**Key terms defined**:
+- [Term 1]: [Definition to avoid ambiguity]
+- [Term 2]: [Definition]
+
+---
+
+### 2. Evidence For
+
+| Source | Evidence Type | Quality | Credibility | Summary |
+|--------|--------------|---------|-------------|---------|
+| [Source name/link] | [Primary/Secondary/Tertiary] | [H/M/L] | [H/M/L] | [What it says] |
+| | | | | |
+| | | | | |
+
+**Strongest evidence for**:
+1. [Most compelling evidence with explanation why it's strong]
+2. [Second strongest]
+
+---
+
+### 3. Evidence Against
+
+| Source | Evidence Type | Quality | Credibility | Summary |
+|--------|--------------|---------|-------------|---------|
+| [Source name/link] | [Primary/Secondary/Tertiary] | [H/M/L] | [H/M/L] | [What it says] |
+| | | | | |
+| | | | | |
+
+**Strongest evidence against**:
+1. [Most compelling counter-evidence with explanation]
+2. [Second strongest]
+
+---
+
+### 4. Source Credibility Analysis
+
+**For each major source, evaluate:**
+
+**Source: [Name/Link]**
+- **Expertise**: [H/M/L] - [Why: credentials, domain knowledge]
+- **Independence**: [H/M/L] - [Conflicts of interest, bias, incentives]
+- **Track Record**: [H/M/L] - [Prior accuracy, corrections, reputation]
+- **Methodology**: [H/M/L] - [How they obtained information, transparency]
+- **Overall credibility**: [H/M/L]
+
+**Source: [Name/Link]**
+- **Expertise**: [H/M/L] - [Why]
+- **Independence**: [H/M/L] - [Why]
+- **Track Record**: [H/M/L] - [Why]
+- **Methodology**: [H/M/L] - [Why]
+- **Overall credibility**: [H/M/L]
+
+---
+
+### 5. Limitations and Gaps
+
+**What's unknown or uncertain**:
+- [Gap 1: What evidence is missing]
+- [Gap 2: What couldn't be verified]
+- [Gap 3: What's ambiguous or unclear]
+
+**Assumptions made**:
+- [Assumption 1: What we're assuming to be true]
+- [Assumption 2]
+
+**Quality concerns**:
+- [Concern 1: Weaknesses in evidence or methodology]
+- [Concern 2]
+
+**Further investigation needed**:
+- [What additional evidence would increase confidence]
+- [What questions remain unanswered]
+
+---
+
+### 6. Conclusion
+
+**Confidence level**: [0-100%]
+
+**Confidence reasoning**:
+- [Why this confidence level based on evidence quality, source credibility, limitations]
+
+**Assessment**: [Choose one]
+- ✓ **Claim validated** (70-100% confidence) - Evidence strongly supports claim
+- ≈ **Claim partially true** (40-69% confidence) - Mixed or weak evidence, requires nuance
+- ✗ **Claim rejected** (0-39% confidence) - Evidence contradicts or insufficient support
+
+**Recommendation**:
+[Action to take based on this assessment - what should be believed, decided, or done]
+
+**Key caveats**:
+- [Important qualification 1]
+- [Important qualification 2]
+
+---
+
+## Guidance for Each Section
+
+### Claim Reformulation Examples
+
+**Vague → Specific:**
+- ❌ "Product X is better" → ✓ "Product X loads pages 50% faster than Product Y on benchmark Z"
+- ❌ "Most customers are satisfied" → ✓ "NPS score ≥50 based on survey of ≥1000 customers in Q3 2024"
+- ❌ "Studies show it works" → ✓ "≥3 peer-reviewed RCTs show ≥20% improvement vs placebo, p<0.05"
+
+**Avoid:**
+- Subjective terms ("better", "significant", "many")
+- Undefined metrics ("performance", "quality", "efficiency")
+- Vague time ranges ("recently", "long-term")
+- Unclear comparisons ("faster", "cheaper" - than what?)
+
+### Evidence Categories
+
+**Primary (Strongest):**
+- Original research data, raw datasets
+- Direct measurements, transaction logs
+- First-hand testimony from participants
+- Legal documents, contracts, financial filings
+- Photographs, videos of events (verified authentic)
+
+**Secondary (Medium):**
+- Analysis/synthesis of primary sources
+- Peer-reviewed research papers
+- News reporting citing primary sources
+- Expert analysis with transparent methodology
+- Government/institutional reports
+
+**Tertiary (Weakest):**
+- Summaries of secondary sources
+- Textbooks, encyclopedias, Wikipedia
+- Press releases, marketing content
+- Opinion pieces, editorials
+- Anecdotal reports
+
+**Non-Evidence (Unreliable):**
+- Social media claims without verification
+- Anonymous sources with no corroboration
+- Circular citations (A→B→A)
+- "Experts say" without named experts
+- Cherry-picked quotes out of context
+
+### Evidence Quality Rating
+
+**High (H):**
+- Multiple independent primary sources agree
+- Methodology transparent and replicable
+- Large sample size, rigorous controls
+- Peer-reviewed or independently verified
+- Recent and relevant to current context
+
+**Medium (M):**
+- Single primary source or multiple secondary sources
+- Some methodology disclosed
+- Moderate sample size, some controls
+- Some independent verification
+- Somewhat dated but still applicable
+
+**Low (L):**
+- Tertiary sources only
+- Methodology opaque or questionable
+- Small sample, no controls, anecdotal
+- No independent verification
+- Outdated or context has changed
+
+### Source Credibility Scoring
+
+**Expertise:**
+- High: Domain expert, relevant credentials, published research
+- Medium: General knowledge, some relevant experience
+- Low: No demonstrated expertise, out of domain
+
+**Independence:**
+- High: No financial/personal stake, third-party verification
+- Medium: Some potential bias but disclosed
+- Low: Direct conflict of interest, undisclosed bias
+
+**Track Record:**
+- High: Consistent accuracy, transparent about corrections
+- Medium: Unknown history or mixed record
+- Low: History of errors, retractions, misinformation
+
+**Methodology:**
+- High: Transparent process, data/methods shared, replicable
+- Medium: Some details provided, partially verifiable
+- Low: Black box, unverifiable, cherry-picked data
+
+### Limitations and Gaps
+
+**Common gaps:**
+- Missing primary sources (only secondary summaries available)
+- Conflicting evidence without clear resolution
+- Outdated information (claim may have changed)
+- Incomplete data (partial picture only)
+- Methodology unclear (can't assess quality)
+- Context missing (claim true but misleading framing)
+
+**Document:**
+- What evidence you expected to find but didn't
+- What questions you couldn't answer
+- What assumptions you had to make to proceed
+- What contradictions remain unresolved
+
+### Confidence Level Calibration
+
+**90-100% (Near Certain):**
+- Multiple independent primary sources
+- High credibility sources with strong methodology
+- No significant contradicting evidence
+- Minimal assumptions or gaps
+- Example: "Earth orbits the Sun"
+
+**70-89% (Confident):**
+- Strong secondary sources or single primary source
+- Credible sources, some methodology disclosed
+- Minor contradictions explainable
+- Some assumptions but reasonable
+- Example: "Vendor has >5,000 customers based on analyst report"
+
+**50-69% (Uncertain):**
+- Mixed evidence quality or conflicting sources
+- Moderate credibility, unclear methodology
+- Significant gaps or assumptions
+- Requires more investigation to be confident
+- Example: "Feature will improve retention 10-20%"
+
+**30-49% (Skeptical):**
+- More/stronger evidence against than for
+- Low credibility sources or weak evidence
+- Major gaps, questionable assumptions
+- Claim likely exaggerated or misleading
+- Example: "Supplement cures disease based on testimonials"
+
+**0-29% (Likely False):**
+- Strong evidence contradicting claim
+- Unreliable sources, no credible support
+- Claim contradicts established facts
+- Clear misinformation or fabrication
+- Example: "Vaccine contains tracking microchips"
+
+---
+
+## Common Patterns
+
+### Pattern 1: Vendor Due Diligence
+
+**Claim**: Vendor claims product capabilities, performance, customer metrics
+**Approach**: Seek independent verification, customer references, trials
+**Red flags**: Only vendor sources, vague metrics, "up to X" ranges, cherry-picked case studies
+
+### Pattern 2: News Fact-Check
+
+**Claim**: Event occurred, statistic cited, quote attributed
+**Approach**: Trace to primary source, check multiple outlets, verify context
+**Red flags**: Single source, anonymous claims, sensational framing, out-of-context quotes
+
+### Pattern 3: Research Validity
+
+**Claim**: Study shows X causes Y, treatment is effective
+**Approach**: Check replication, sample size, methodology, competing explanations
+**Red flags**: Single study, conflicts of interest, p-hacking, correlation claimed as causation
+
+### Pattern 4: Competitive Intelligence
+
+**Claim**: Competitor has capability, market share, strategic direction
+**Approach**: Triangulate public filings, analyst reports, customer feedback
+**Red flags**: Rumor/speculation, outdated info, no primary verification
+
+---
+
+## Quality Checklist
+
+- [ ] Claim restated as specific, testable assertion
+- [ ] Evidence gathered for both supporting and contradicting
+- [ ] Each source rated for evidence quality (Primary/Secondary/Tertiary)
+- [ ] Each source assessed for credibility (Expertise, Independence, Track Record, Methodology)
+- [ ] Strongest evidence for and against identified
+- [ ] Limitations and gaps documented explicitly
+- [ ] Assumptions stated clearly
+- [ ] Confidence level quantified (0-100%)
+- [ ] Recommendation is actionable and evidence-based
+- [ ] Caveats and qualifications noted
+- [ ] No cherry-picking (actively sought contradicting evidence)
+- [ ] Distinction made between "no evidence found" and "evidence against"
+- [ ] Sources properly attributed with links/citations
+- [ ] Avoided common biases (confirmation, authority, recency, availability)
+- [ ] Quality sufficient for decision (if not, flag need for more investigation)