Files
gh-nice-wolf-studio-wolf-sk…/skills/wolf-verification/SKILL.md
2025-11-30 08:43:48 +08:00

26 KiB

name, description, version, category, triggers, dependencies, size
name description version category triggers dependencies size
wolf-verification Three-layer verification architecture (CoVe, HSP, RAG) for self-verification, fact-checking, and hallucination prevention 1.1.0 quality-assurance
verification
fact checking
hallucination detection
self verification
CoVe
HSP
RAG grounding
wolf-principles
wolf-governance
large

Wolf Verification Framework

Three-layer verification architecture for self-verification, systematic fact-checking, and hallucination prevention. Based on ADR-043 (Verification Architecture).

Overview

Wolf's verification system combines three complementary approaches:

  1. CoVe (Chain of Verification) - Systematic fact-checking by breaking claims into verifiable steps
  2. HSP (Hierarchical Safety Prompts) - Multi-level safety validation with security-first design
  3. RAG Grounding - Contextual evidence retrieval for claims validation
  4. Verification-First - Generate verification checklist BEFORE creating response

Key Principle: Agents MUST verify their own outputs before delivery. We have built sophisticated self-verification tools - use them!


🔍 Chain of Verification (CoVe)

Purpose

Systematic fact-checking framework that breaks complex claims into independently verifiable atomic steps, creating transparent audit trails.

Core Concepts

Step Types (5 types)

class StepType(Enum):
    FACTUAL = "factual"           # Verifiable against authoritative sources
    LOGICAL = "logical"            # Reasoning steps following from conclusions
    COMPUTATIONAL = "computational" # Mathematical/algorithmic calculations
    OBSERVATIONAL = "observational" # Requires empirical validation
    DEFINITIONAL = "definitional"   # Meanings, categories, classifications

Step Artifacts

Each verification step produces structured output:

StepArtifact(
    type=StepType.FACTUAL,
    claim="Einstein was born in Germany",
    confidence=0.95,
    reasoning="Verified against biographical records",
    evidence_sources=["biography.com", "britannica.com"],
    dependencies=[]  # List of prerequisite step IDs
)

Basic Usage

from src.verification.cove import plan, verify, render, verify_claim

# Simple end-to-end verification
claim = "Albert Einstein won the Nobel Prize in Physics in 1921"
report = await verify_claim(claim)
print(f"Confidence: {report.overall_confidence:.1%}")
print(report.summary)

# Step-by-step approach
verification_plan = plan(claim)  # Break into atomic steps
report = await verify(verification_plan)  # Execute verification
markdown_audit = render(report, "markdown")  # Generate audit trail

Advanced Usage with Custom Evidence Provider

from src.verification.cove import CoVeVerifier, EvidenceProvider, StepType

class WikipediaEvidenceProvider(EvidenceProvider):
    async def get_evidence(self, claim: str, step_type: StepType):
        # Your Wikipedia API integration
        return [{
            "source": "wikipedia",
            "content": f"Wikipedia evidence for: {claim}",
            "confidence": 0.85,
            "url": f"https://wikipedia.org/search?q={claim}"
        }]

    def get_reliability_score(self):
        return 0.9  # Provider reputation (0.0-1.0)

# Use custom provider
config = CoVeConfig()
verifier = CoVeVerifier(config, WikipediaEvidenceProvider())
report = await verifier.verify(verification_plan)

Verification Workflow

1. PLAN: Break claim into atomic verifiable steps
   └─> Identify step types (factual, logical, etc.)
   └─> Establish step dependencies
   └─> Assign verification methods

2. VERIFY: Execute verification for each step
   └─> Gather evidence from providers
   └─> Validate against sources
   └─> Compute confidence scores
   └─> Track dependencies

3. AGGREGATE: Combine step results
   └─> Calculate overall confidence
   └─> Identify weak points
   └─> Generate audit trail

4. RENDER: Produce verification report
   └─> Markdown format for humans
   └─> JSON format for automation
   └─> Summary statistics

Performance Targets

  • Speed: <200ms per verification (5 steps average)
  • Accuracy: ≥90% citation coverage target
  • Thoroughness: All factual claims verified

When to Use CoVe

  • Factual claims about events, dates, names
  • Technical specifications or API documentation
  • Historical facts or timelines
  • Statistical claims or metrics
  • Citations or references
  • Subjective opinions or preferences
  • Future predictions (use "hypothetical" step type)
  • Creative content (fiction, poetry)

Implementation Location: /src/verification/cove.py Documentation: /docs/public/verification/chain-of-verification.md


🛡️ Hierarchical Safety Prompts (HSP)

Purpose

Multi-level safety validation system with sentence-level claim extraction, entity disambiguation, and security-first design.

Key Features

  • Sentence-level Processing: Extract and validate claims from individual sentences
  • Entity Disambiguation: Handle entity linking with disambiguation support
  • Security-First Design: Built-in DoS protection, injection detection, input sanitization
  • Claim Graph Generation: Structured relationships between claims
  • Evidence Integration: Multiple evidence providers with reputation scoring
  • Performance: ~500 claims/second, <10ms validation per claim

Security Layers

Level 1: Input Sanitization

# Automatic protections
- Unicode normalization (NFKC)
- Control character removal
- Pattern detection (script injection, path traversal)
- Length limits (prevent resource exhaustion)
- Timeout protection (configurable limits)

Level 2: Sentence Extraction

from src.verification.hsp_check import extract_sentences

text = """
John works at Google. The company was founded in 1998.
Google's headquarters is in Mountain View, California.
"""

sentences = extract_sentences(text)
# Result: [
#   "John works at Google",
#   "The company was founded in 1998",
#   "Google's headquarters is in Mountain View, California"
# ]

Level 3: Claim Building

from src.verification.hsp_check import build_claims

claims = build_claims(sentences)
# Result: List[Claim] with extracted entities and relationships
# Each claim has:
# - claim_text: str
# - entities: List[Entity]  # Extracted named entities
# - claim_type: str  # factual, relational, temporal, etc.
# - confidence: float  # Initial confidence score

Level 4: Claim Validation

from src.verification.hsp_check import validate_claims

report = await validate_claims(claims)
# Result: HSPReport with:
# - validated_claims: List[ValidatedClaim]
# - overall_confidence: float
# - validation_failures: List[str]
# - evidence_summary: Dict[str, Any]

Custom Evidence Provider

from src.verification.hsp_check import HSPChecker, EvidenceProvider, Evidence
import time

class CustomEvidenceProvider(EvidenceProvider):
    async def get_evidence(self, claim):
        evidence = Evidence(
            source="your_knowledge_base",
            content="Supporting evidence text",
            confidence=0.85,
            timestamp=str(time.time()),
            provenance_hash="your_hash_here"
        )
        return [evidence]

    def get_reputation_score(self):
        return 0.9  # Provider reputation (0.0-1.0)

# Use custom provider
checker = HSPChecker(CustomEvidenceProvider())
report = await checker.validate_claims(claims)

Hierarchical Validation Levels

Level 1: Content Filtering (REQUIRED)
└─> Harmful content detection
└─> PII identification and redaction
└─> Inappropriate content filtering

Level 2: Context-Aware Safety (RECOMMENDED)
└─> Domain-specific validation
└─> Relationship verification
└─> Temporal consistency checks

Level 3: Domain-Specific Safety (OPTIONAL)
└─> Industry-specific rules
└─> Compliance requirements
└─> Custom validation logic

Level 4: Human Escalation (EDGE CASES)
└─> Ambiguous claims
└─> Low-confidence assertions
└─> Contradictory evidence

Performance Targets

  • Throughput: ~500 claims/second
  • Latency: <10ms per claim validation
  • Security: Built-in DoS protection, injection detection
  • Accuracy: ≥95% entity extraction accuracy

When to Use HSP

  • Multi-sentence generated content
  • Entity-heavy claims (names, organizations, places)
  • Safety-critical applications
  • PII detection and redaction
  • High-volume claim validation
  • Single atomic claims (use CoVe instead)
  • Non-factual content (opinions, creative writing)

Implementation Location: /src/verification/hsp_check.py Documentation: /docs/public/verification/hsp-checking.md


📚 RAG Grounding

Purpose

Retrieve relevant evidence passages from Wolf's corpus to ground claims in contextual evidence.

Core Functions

Retrieve Evidence

from lib.rag import retrieve, format_context

# Retrieve top-k relevant passages
passages = retrieve(
    query="security best practices",
    n=5,
    min_score=0.1
)

# Format with citations
grounded_context = format_context(
    passages,
    max_chars=1200,
    citation_format='[Source: {id}]'
)

Integration with wolf-core-ip MCP

// RAG Retrieve - Get relevant evidence
const retrieval = await rag_retrieve(
  'security best practices',
  { k: 5, min_score: 0.1, corpus_path: './corpus' },
  sessionContext
);

// RAG Format - Format with citations
const formatted = rag_format_context(
  retrieval.passages,
  { max_chars: 1200, citation_format: '[Source: {id}]' },
  sessionContext
);

// Check Confidence - Multi-signal calibration
const confidence = check_confidence(
  {
    model_confidence: 0.75,
    evidence_count: retrieval.passages.length,
    complexity: 0.6,
    high_stakes: false
  },
  sessionContext
);

// Decision based on confidence
if (confidence.recommendation === 'proceed') {
  // Use response with RAG grounding
} else if (confidence.recommendation === 'abstain') {
  // Need more evidence, retrieve again with relaxed threshold
} else {
  // Low confidence, escalate to human review
}

Performance Targets

  • rag_retrieve (k=5): <200ms (actual: ~3-5ms)
  • rag_format_context: <50ms (actual: ~0.2-0.5ms)
  • check_confidence: <50ms (actual: ~0.05-0.1ms)
  • Full Pipeline: <300ms (actual: ~10-20ms)

Quality Metrics

  • Citation Coverage: ≥90% of claims have supporting passages
  • Retrieval Precision: ≥80% of retrieved passages are relevant
  • Calibration Error: <0.1 target

When to Use RAG

  • Claims requiring Wolf-specific context
  • Technical documentation references
  • Architecture decision lookups
  • Best practice queries
  • Code pattern searches
  • General knowledge facts (use CoVe with external sources)
  • Real-time events (corpus may be stale)

Implementation Location: servers/wolf-core-ip/tools/rag/ MCP Access: mcp__wolf-core-ip__rag_retrieve, mcp__wolf-core-ip__rag_format_context


Verification-First Pattern

Purpose

Generate verification checklist BEFORE creating response to reduce hallucination and improve fact-checking.

Why Verification-First?

Traditional prompting:

User Query → Generate Response → Verify Response (maybe)

Verification-first prompting:

User Query → Generate Verification Checklist → Use Checklist to Guide Response → Validate Against Checklist

Checklist Sections (5 required)

{
  "assumptions": [
    # Implicit beliefs or prerequisites
    "User has basic knowledge of quantum computing",
    "Latest = developments in past 12 months"
  ],
  "sources": [
    # Required information sources
    "Recent quantum computing research papers",
    "Industry announcements from major tech companies",
    "Academic conference proceedings"
  ],
  "claims": [
    # Factual assertions needing validation
    "IBM achieved 127-qubit quantum processor in 2021",
    "Google demonstrated quantum supremacy in 2019",
    "Quantum computers can break RSA encryption"
  ],
  "tests": [
    # Specific verification procedures
    "Verify IBM qubit count against official announcements",
    "Cross-check Google's quantum supremacy paper",
    "Validate RSA encryption vulnerability claims"
  ],
  "open_risks": [
    # Acknowledged limitations
    "Quantum computing field evolves rapidly, information may be outdated",
    "Technical accuracy depends on source reliability",
    "Simplified explanations may omit nuances"
  ]
}

Basic Usage

from verification.checklist_scaffold import create_verification_scaffold

# Initialize verification-first mode
scaffold = create_verification_scaffold(verification_first=True)

# Generate checklist BEFORE responding
user_prompt = "Explain the latest developments in quantum computing"
checklist_result = scaffold.generate_checklist(user_prompt)

# Use checklist to guide response generation
print(checklist_result["checklist"])

Integration with Verification Pipeline

# Step 1: Generate checklist
checklist = scaffold.generate_checklist(user_prompt)

# Step 2: Use CoVe to verify claims from checklist
for claim in checklist["claims"]:
    cove_report = await verify_claim(claim)
    if cove_report.overall_confidence < 0.7:
        print(f"Low confidence claim: {claim}")

# Step 3: Use HSP for safety validation
sentences = extract_sentences(generated_response)
claims = build_claims(sentences)
hsp_report = await validate_claims(claims)

# Step 4: Ground in evidence with RAG
passages = retrieve(user_prompt, n=5)
grounded_context = format_context(passages)

# Step 5: Check overall confidence
confidence = check_confidence({
    "model_confidence": 0.75,
    "evidence_count": len(passages),
    "complexity": assess_complexity(user_prompt)
})

# Step 6: Decide based on confidence
if confidence.recommendation == 'proceed':
    # Deliver response
elif confidence.recommendation == 'abstain':
    # Mark as "needs more research", gather more evidence
else:
    # Escalate to human review

When to Use Verification-First

  • REQUIRED: Any factual claims about historical events, dates, names
  • REQUIRED: Technical specifications, API documentation, code behavior
  • REQUIRED: Security recommendations or threat assessments
  • REQUIRED: Performance claims or benchmarks
  • REQUIRED: Compliance or governance statements
  • RECOMMENDED: Complex multi-step reasoning
  • RECOMMENDED: Novel or unusual claims
  • RECOMMENDED: High-stakes decisions
  • OPTIONAL: Simple code comments
  • OPTIONAL: Internal notes or drafts
  • OPTIONAL: Exploratory research (clearly marked as such)

Implementation Location: /src/verification/checklist_scaffold.py Documentation: /docs/public/verification/verification-first.md


🔄 Integrated Verification Workflow

Complete Self-Verification Pipeline

# Step 1: Generate verification checklist FIRST (verification-first)
scaffold = create_verification_scaffold(verification_first=True)
checklist = scaffold.generate_checklist(task_description)

# Step 2: Create response/code/documentation
response = generate_response(task_description, checklist)

# Step 3: Verify factual claims with CoVe
factual_verification = await verify_claim("Your factual claim from response")

# Step 4: Safety check with HSP
sentences = extract_sentences(response)
claims = build_claims(sentences)
safety_check = await validate_claims(claims)

# Step 5: Ground in evidence with RAG
passages = retrieve(topic, n=5)
context = format_context(passages)

# Step 6: Check confidence
confidence = check_confidence({
    "model_confidence": 0.75,
    "evidence_count": len(passages),
    "complexity": assess_complexity(task_description),
    "high_stakes": is_high_stakes(task_description)
})

# Step 7: Decision
if confidence.recommendation == 'proceed':
    # Deliver output with verification report
    deliver(response, verification_report={
        "cove": factual_verification,
        "hsp": safety_check,
        "rag": passages,
        "confidence": confidence
    })
elif confidence.recommendation == 'abstain':
    # Mark as "needs more research"
    mark_for_review(response, reason="Medium confidence, need more evidence")
else:
    # Escalate to human review
    escalate(response, reason="Low confidence, human review required")

Confidence Calibration

# Multi-signal confidence assessment
confidence_signals = {
    "model_confidence": 0.75,    # LLM's self-reported confidence
    "evidence_count": 5,          # Number of supporting passages
    "complexity": 0.6,            # Task complexity (0-1)
    "high_stakes": False          # Is this safety-critical?
}

result = check_confidence(confidence_signals)

# Result recommendations:
# - 'proceed': High confidence (>0.8), use response
# - 'abstain': Medium confidence (0.5-0.8), need more evidence
# - 'escalate': Low confidence (<0.5), human review required

Hallucination Detection & Elimination

Detection

import { detect } from './hallucination/detect';

const detection = await detect(generated_text, sessionContext);

// Returns:
// {
//   hallucinations_found: number,
//   hallucination_types: string[],  // e.g., ["factual_error", "unsupported_claim"]
//   confidence: number,
//   details: Array<{ text: string, type: string, severity: string }>
// }

Elimination

import { dehallucinate } from './hallucination/dehallucinate';

if (detection.hallucinations_found > 0) {
    const cleaned = await dehallucinate(
        generated_text,
        detection,
        sessionContext
    );
    // Use cleaned output instead
}

Implementation Location: servers/wolf-core-ip/tools/hallucination/


Best Practices

CoVe Best Practices

  • Break complex claims into atomic steps
  • Establish step dependencies clearly
  • Use appropriate step types (factual, logical, etc.)
  • Provide evidence sources for factual steps
  • Don't create circular dependencies
  • Don't combine multiple claims in one step

HSP Best Practices

  • Process text sentence-by-sentence
  • Leverage built-in security features
  • Configure timeouts for production
  • Monitor validation failure rates
  • Don't bypass input sanitization
  • Don't ignore security warnings

RAG Best Practices

  • Set appropriate minimum score threshold
  • Retrieve enough passages (k=5-10)
  • Format with clear citations
  • Check confidence before using
  • Don't retrieve too few passages (k<3)
  • Don't ignore low relevance scores

Verification-First Best Practices

  • Generate checklist BEFORE responding
  • Use checklist to guide response structure
  • Validate response against checklist
  • Include open risks and limitations
  • Don't skip checklist generation for "simple" tasks
  • Don't ignore checklist during response creation

Performance Summary

Component Target Actual Status
CoVe (5 steps) <200ms ~150ms
HSP (per claim) <10ms ~2ms
RAG retrieve (k=5) <200ms ~3-5ms
RAG format <50ms ~0.2-0.5ms
Confidence check <50ms ~0.05-0.1ms
Full pipeline <300ms ~10-20ms

Red Flags - STOP

If you catch yourself thinking:

  • "This is low-stakes, no need for verification" - STOP. Unverified claims compound. All factual claims need verification.
  • "I'll verify after I finish the response" - NO. Use Verification-First pattern. Generate checklist BEFORE responding.
  • "The model is confident, that's good enough" - Wrong. Model confidence ≠ factual accuracy. Always verify with external evidence.
  • "Verification is too slow for this deadline" - False. Full pipeline averages <20ms. Verification saves time by preventing rework.
  • "I'll skip CoVe and just use RAG" - NO. Each layer serves different purposes. CoVe = atomic facts, RAG = Wolf context, HSP = safety.
  • "This is just internal documentation, no need to verify" - Wrong. Incorrect internal docs are worse than no docs. Verify anyway.
  • "Verification is optional for exploration" - If generating factual claims, verification is MANDATORY. Mark speculation explicitly.

STOP. Use verification tools BEFORE claiming anything is factually accurate.

After Using This Skill

VERIFICATION IS CONTINUOUS - This skill is called DURING work, not after

When Verification Happens

Called by wolf-governance:

  • During Definition of Done validation
  • As part of quality gate assessment
  • Before merge approval

Called by wolf-roles:

  • During implementation checkpoints
  • Before PR creation
  • As continuous validation loop

Called by wolf-archetypes:

  • When security lens applied (HSP required)
  • When research-prototyper needs evidence
  • When reliability-fixer validates root cause

Integration Points

1. With wolf-governance (Primary Caller)

  • When: Before declaring work complete
  • Why: Verification is part of Definition of Done
  • How:
    // Governance checks if verification passed
    mcp__wolf-core-ip__check_confidence({
      model_confidence: 0.75,
      evidence_count: passages.length,
      complexity: 0.6,
      high_stakes: false
    })
    
  • Gate: Cannot claim DoD complete without verification evidence

2. With wolf-roles (Continuous Validation)

  • When: During implementation at checkpoints
  • Why: Prevents late-stage verification failures
  • How: Use verification-first pattern for each claim
  • Example: coder-agent verifies API docs are accurate before committing

3. With wolf-archetypes (Lens-Driven)

  • When: Security or research archetypes selected
  • Why: Specialized verification requirements
  • How:
    • Security-hardener → HSP for safety validation
    • Research-prototyper → CoVe for fact-checking
    • Reliability-fixer → Verification of root cause analysis

Verification Checklist

Before claiming verification is complete:

  • Generated verification checklist FIRST (verification-first pattern)
  • Used appropriate verification layer:
    • CoVe for factual claims
    • HSP for safety validation
    • RAG for Wolf-specific context
  • Checked confidence scores:
    • Overall confidence ≥0.8 for proceed
    • 0.5-0.8 = needs more evidence (abstain)
    • <0.5 = escalate to human review
  • Documented verification results in journal
  • Provided evidence sources for claims
  • Identified and documented open risks

Can't check all boxes? Verification incomplete. Return to this skill.

Verification Examples

Example 1: Feature Implementation

Scenario: Coder-agent implementing user authentication

Verification-First:
  Step 1: Generate checklist BEFORE coding
    - Assumptions: User has email, password requirements known
    - Sources: OAuth 2.0 spec, bcrypt documentation
    - Claims: bcrypt is secure for password hashing
    - Tests: Verify bcrypt parameters against OWASP recommendations
    - Open Risks: Password requirements may need to evolve

  Step 2: Implement with checklist guidance

  Step 3: Verify claims with CoVe
    - Claim: "bcrypt is recommended by OWASP for password hashing"
    - CoVe verification: ✅ Confidence 0.95
    - Evidence: [OWASP Cheat Sheet, bcrypt documentation]

  Step 4: Check overall confidence
    - Model confidence: 0.85
    - Evidence count: 3 passages
    - Complexity: 0.4 (low)
    - Result: 'proceed' 

Assessment: Verified implementation, safe to proceed

Example 2: Security Review (Bad)

Scenario: Security-agent reviewing authentication without verification

❌ What went wrong:
  - Skipped verification-first checklist generation
  - Assumed encryption was correct without verifying
  - No HSP safety validation performed
  - No evidence retrieved for claims

❌ Result:
  - Claimed "authentication is secure" without evidence
  - Missed hardcoded secrets (HSP would have caught)
  - Missed deprecated crypto usage (CoVe would have caught)
  - High confidence but no verification = hallucination

Correct Approach:
  1. Generate verification checklist for security claims
  2. Use HSP to scan for secrets, PII, unsafe patterns
  3. Use CoVe to verify crypto library recommendations
  4. Use RAG to ground in Wolf security best practices
  5. Check confidence before approving
  6. Document verification evidence in journal

Performance vs Quality Trade-offs

Verification is NOT slow:

  • Full pipeline: <20ms average
  • CoVe (5 steps): ~150ms
  • HSP (per claim): ~2ms
  • RAG (k=5): ~3-5ms

Cost of skipping verification:

  • Merge rejected due to factual errors: Hours of rework
  • Security vulnerability shipped: Days to patch + incident response
  • Documentation errors: Weeks of support burden + reputation damage

Verification is an investment, not overhead.

  • wolf-principles: Evidence-based decision making principle (#5)
  • wolf-governance: Quality gate requirements (verification is DoD item)
  • wolf-roles: Roles call verification at checkpoints
  • wolf-archetypes: Lenses determine verification requirements
  • wolf-adr: ADR-043 (Verification Architecture)
  • wolf-scripts-core: Evidence validation patterns

Integration with Other Skills

Primary Chain Position: Called DURING work (not after)

wolf-principles → wolf-archetypes → wolf-governance → wolf-roles
                                            ↓
                                    wolf-verification (YOU ARE HERE)
                                            ↓
                                    Continuous validation throughout implementation

You are a supporting skill that enables quality:

  • Governance depends on you for evidence
  • Roles depend on you for confidence
  • Archetypes depend on you for lens validation

DO NOT wait until the end to verify. Verify continuously.


Total Components: 4 (CoVe, HSP, RAG, Verification-First) Test Coverage: ≥90% required (achieved: 98%+) Production Status: Active in Phase 50+

Last Updated: 2025-11-14 Phase: Superpowers Skill-Chaining Enhancement v2.0.0 Version: 1.1.0