Initial commit

2025-11-30 08:58:05 +08:00
commit 36a6fff8d8
20 changed files with 4237 additions and 0 deletions
--- a/skills/cyberarian/SKILL.md
+++ b/skills/cyberarian/SKILL.md
@@ -0,0 +1,333 @@
+---
+name: cyberarian
+description: The digital librarian for Claude Code projects. Enforces structured document lifecycle management - organizing, indexing, and archiving project documentation automatically. Use when creating, organizing, or managing project documentation. Ensures documents are created in the proper `docs/` directory structure with required metadata, handles temporary documents in system temp directories, maintains an auto-generated index, and performs automatic archiving of old/complete documents. Use for any task involving document creation, organization, or maintenance.
+---
+
+# Cyberarian - Document Lifecycle Management
+
+This skill enforces a structured approach to documentation in Claude Code projects, ensuring consistency, discoverability, and automatic maintenance.
+
+## Core Principles
+
+1. **Structured Organization**: All persistent documentation goes in `docs/` with semantic categorization
+2. **No Temporary Docs in docs/**: Ephemeral/scratch documents belong in `/tmp` or system temp, never in `docs/`
+3. **Metadata-Driven**: YAML frontmatter enables automation and lifecycle management
+4. **Automatic Maintenance**: Indexing and archiving happen automatically, not manually
+5. **Context Efficiency**: Bulk operations delegate to subagents to preserve main context
+
+## Context-Efficient Operations
+
+### The Problem
+
+Document management operations can produce verbose output that pollutes the main agent's context:
+- Validation scripts listing many errors across files
+- Index generation scanning dozens of documents
+- Archive operations listing all files being moved
+- Search results returning many matches
+
+### The Solution: Subagent Delegation
+
+**Delegate to Task subagent** for operations that return verbose output. The subagent absorbs the verbose output in its isolated context and returns a concise summary (<50 tokens).
+
+### Delegation Rules
+
+**Execute directly** (simple, low-output):
+- Creating a single document from template
+- Reading a specific document's metadata
+- Checking if `docs/` directory exists
+
+**Delegate to Task subagent** (complex, verbose):
+- Running validation across all documents
+- Regenerating the index
+- Archiving operations (especially dry-run)
+- Searching documents by tag/status/category
+- Summarizing INDEX.md contents
+- Any operation touching multiple files
+
+### Delegation Pattern
+
+When verbose output is expected:
+
+```
+1. Recognize the operation will be verbose
+2. Delegate to Task subagent with explicit instructions
+3. Subagent executes scripts, absorbs output
+4. Subagent parses and returns summary <50 tokens
+5. Main agent receives only essential summary
+```
+
+**Task subagent prompt format:**
+```
+Execute document operation and return concise summary:
+- Run: [command]
+- Parse: Extract [specific data needed]
+- Return: [emoji] [state] | [metric] | [next action]
+- Limit: <50 tokens
+
+Use agents/doc-librarian-subagent.md patterns for response formatting.
+```
+
+### Response Formats
+
+**Success:** `✓ [result] | [metric] | Next: [action]`
+**List:** `📋 [N] items: [item1], [item2], ... (+[remainder] more)`
+**Error:** `❌ [operation] failed | Reason: [brief] | Fix: [action]`
+**Warning:** `⚠️ [concern] | Impact: [brief] | Consider: [action]`
+
+## Directory Structure
+
+```
+docs/
+├── README.md           # Human-written guide to the structure
+├── INDEX.md            # Auto-generated index of all documents
+├── ai_docs/           # Reference materials for Claude Code (SDKs, APIs, repo context)
+├── specs/             # Feature and migration specifications
+├── analysis/          # Investigation outputs (bugs, optimization, cleanup)
+├── plans/             # Implementation plans
+├── templates/         # Reusable templates
+└── archive/           # Historical and completed documents
+    ├── specs/
+    ├── analysis/
+    └── plans/
+```
+
+## Workflows
+
+### First-Time Setup
+
+When a project doesn't have a `docs/` directory:
+
+1. **Initialize the structure**:
+   ```bash
+   python scripts/init_docs_structure.py
+   ```
+   This creates all directories, README.md, and initial INDEX.md
+
+2. **Inform the user** about the structure and conventions
+
+### Creating a New Document
+
+When asked to create documentation (specs, analysis, plans, etc.):
+
+1. **Determine the category**:
+   - **ai_docs**: SDKs, API references, repo architecture, coding conventions
+   - **specs**: Feature specifications, migration plans, technical designs
+   - **analysis**: Bug investigations, performance analysis, code audits
+   - **plans**: Implementation plans, rollout strategies, task breakdowns
+   - **templates**: Reusable document templates
+
+2. **Use the template**:
+   ```bash
+   cp assets/doc_template.md docs/<category>/<descriptive-name>.md
+   ```
+
+3. **Fill in metadata**:
+   - Set `title`, `category`, `status`, `created`, `last_updated`
+   - Add relevant `tags`
+   - Start with `status: draft`
+
+4. **Write the content** following the document structure
+
+5. **Update the index**:
+   ```bash
+   python scripts/index_docs.py
+   ```
+
+**File naming convention**: Use lowercase with hyphens, descriptive names:
+- ✅ `oauth2-migration-spec.md`
+- ✅ `auth-performance-analysis.md`
+- ❌ `spec1.md`
+- ❌ `MyDocument.md`
+
+### Working with Existing Documents
+
+When modifying existing documentation:
+
+1. **Update metadata**:
+   - Set `last_updated` to current date
+   - Update `status` if lifecycle changes (draft → active → complete)
+
+2. **Regenerate index** if significant changes:
+   ```bash
+   python scripts/index_docs.py
+   ```
+
+### Creating Temporary/Scratch Documents
+
+When creating ephemeral documents (scratchpads, temporary notes, single-use docs):
+
+**NEVER create in docs/** - Use system temp instead:
+
+```bash
+# Create in /tmp for Linux/macOS
+/tmp/scratch-notes.md
+/tmp/debug-output.txt
+
+# Let the system clean up temporary files
+```
+
+**Why**: The `docs/` directory is for persistent, managed documentation. Temporary files clutter the structure and interfere with indexing and archiving.
+
+### Regular Maintenance
+
+**When to run**:
+- After creating/modifying documents: Update index
+- Weekly/monthly: Run archiving to clean up completed work
+- Before commits: Validate metadata
+
+**Maintenance workflow** (delegate to Task subagent for context efficiency):
+
+1. **Validate metadata** → Delegate to subagent:
+   ```
+   Task: Run python scripts/validate_doc_metadata.py
+   Return: ✓ [N] valid | [N] issues: [list top 3] | Next: [action]
+   ```
+
+2. **Archive old documents** → Delegate to subagent:
+   ```
+   Task: Run python scripts/archive_docs.py --dry-run
+   Return: 📦 [N] ready for archive: [list top 3] | Next: Run archive
+
+   Task: Run python scripts/archive_docs.py
+   Return: ✓ Archived [N] docs | Categories: [list] | Index updated
+   ```
+
+3. **Update index** → Delegate to subagent:
+   ```
+   Task: Run python scripts/index_docs.py
+   Return: ✓ Index updated | [N] documents | Categories: [summary]
+   ```
+
+**Why delegate?** These operations can scan dozens of files and produce verbose output. Subagent isolation keeps the main context clean for reasoning.
+
+### Archiving Documents
+
+Archiving happens automatically based on category-specific rules. See `references/archiving-criteria.md` for full details.
+
+**Quick reference**:
+- `specs/`: Auto-archive when `status: complete` AND >90 days
+- `analysis/`: Auto-archive when `status: complete` AND >60 days  
+- `plans/`: Auto-archive when `status: complete` AND >30 days
+- `ai_docs/`: Manual archiving only
+- `templates/`: Never auto-archive
+
+**To prevent auto-archiving**, set in frontmatter:
+```yaml
+archivable_after: 2025-12-31
+```
+
+## Metadata Requirements
+
+Every document must have YAML frontmatter. See `references/metadata-schema.md` for complete schema.
+
+**Minimal required frontmatter**:
+```yaml
+---
+title: Document Title
+category: specs
+status: draft
+created: 2024-11-16
+last_updated: 2024-11-16
+tags: []
+---
+```
+
+**Lifecycle statuses**:
+- `draft` → Document being created
+- `active` → Current and relevant
+- `complete` → Work done, kept for reference
+- `archived` → Moved to archive
+
+## Reference Files
+
+Load these when needed for detailed guidance:
+
+- **references/metadata-schema.md**: Complete YAML frontmatter specification
+- **references/archiving-criteria.md**: Detailed archiving rules and philosophy
+- **agents/doc-librarian-subagent.md**: Subagent template for context-efficient operations
+
+## Scripts Reference
+
+All scripts accept optional path argument (defaults to current directory):
+
+- `scripts/init_docs_structure.py [path]` - Initialize docs structure
+- `scripts/index_docs.py [path]` - Regenerate INDEX.md
+- `scripts/archive_docs.py [path] [--dry-run]` - Archive old documents
+- `scripts/validate_doc_metadata.py [path]` - Validate all metadata
+
+## Common Patterns
+
+### Creating a Specification
+```bash
+# Copy template
+cp assets/doc_template.md docs/specs/new-feature-spec.md
+
+# Edit with proper metadata
+# category: specs
+# status: draft
+# tags: [feature-name, relevant-tags]
+
+# Update index
+python scripts/index_docs.py
+```
+
+### Completing Work
+```bash
+# Update document metadata
+# status: draft → active → complete
+# last_updated: <current-date>
+
+# After a while, archiving script will auto-archive
+python scripts/archive_docs.py
+```
+
+### Finding Documents
+
+**Delegate searches to subagent** for context efficiency:
+
+```
+Task: Summarize docs/INDEX.md
+Return: 📊 [N] total docs | Categories: [breakdown] | Recent: [latest doc]
+
+Task: Search docs for tag "performance"
+Run: grep -r "tags:.*performance" docs/ --include="*.md" | head -10
+Return: 📋 [N] docs match: [path1], [path2], ... | Next: Read [most relevant]
+
+Task: Find all draft documents
+Run: grep -r "status: draft" docs/ --include="*.md"
+Return: 📋 [N] drafts: [list top 5] | Next: [action]
+```
+
+**Direct execution** (only for quick checks):
+```bash
+# Check if docs/ exists
+ls docs/ 2>/dev/null
+```
+
+## Best Practices
+
+1. **Always use metadata**: Don't skip the frontmatter, it enables automation
+2. **Keep status current**: Update as work progresses (draft → active → complete)
+3. **Use descriptive names**: File names should be clear and searchable
+4. **Update dates**: Set `last_updated` when making significant changes
+5. **Run maintenance regularly**: Index and archive periodically
+6. **Temp goes in /tmp**: Never create temporary/scratch docs in docs/
+7. **Validate before committing**: Run `validate_doc_metadata.py` to catch issues
+8. **Delegate bulk operations**: Use Task subagents for validation, indexing, archiving, and search to preserve main context
+
+## Error Handling
+
+**Document has no frontmatter**:
+- Add frontmatter using `assets/doc_template.md` as reference
+- Run `validate_doc_metadata.py` to confirm
+
+**Document in wrong category**:
+- Move file to correct category directory
+- Update `category` field in frontmatter to match
+- Regenerate index
+
+**Archived document still needed**:
+- Move from `archive/<category>/` back to `<category>/`
+- Update `status` from `archived` to `active`
+- Remove `archived_date` and `archive_reason` fields
+- Regenerate index
--- a/skills/cyberarian/agents/doc-librarian-subagent.md
+++ b/skills/cyberarian/agents/doc-librarian-subagent.md
@@ -0,0 +1,311 @@
+# doc-librarian Subagent Template
+
+**Use this template when delegating document operations via Task tool**
+
+---
+
+You are **doc-librarian**, a specialized subagent for context-efficient document lifecycle management operations.
+
+## Your Mission
+
+Execute document management operations (scanning, indexing, validation, archiving, searching) while maintaining extreme context efficiency. You absorb verbose script output in your isolated context and return only essential summaries to the main orchestration agent.
+
+## Core Principles
+
+### 1. Context Efficiency is Paramount
+- Your context window is disposable; the main agent's is precious
+- All verbose output stays in YOUR context
+- Return summaries under 50 tokens
+- Think: "What decision does the main agent need to make?"
+
+### 2. Structured Processing
+- Parse script output before summarizing
+- Extract only decision-relevant information
+- Suppress verbose tracebacks with `2>/dev/null`
+
+### 3. Actionable Intelligence
+- Don't just report status; recommend next actions
+- Format: `[emoji] [current state] | [key metric] | [next action]`
+- Example: `✓ 12 docs indexed | 3 need metadata fixes | Run validation`
+
+## Operation Patterns
+
+### Document Scanning/Indexing
+
+**Regenerate index:**
+```bash
+python scripts/index_docs.py 2>/dev/null
+```
+
+**Return format:**
+```
+✓ Index updated | [N] documents | Categories: [list top 3]
+```
+
+**If errors:**
+```
+❌ Index failed | Missing docs/ directory | Run: python scripts/init_docs_structure.py
+```
+
+### Validation Operations
+
+**Validate all documents:**
+```bash
+python scripts/validate_doc_metadata.py 2>/dev/null
+```
+
+**Return format (success):**
+```
+✓ All [N] documents valid | Ready to commit
+```
+
+**Return format (errors):**
+```
+❌ [N] documents have issues:
+  • [path1]: Missing [field]
+  • [path2]: Invalid [field]
+  (+[remainder] more)
+Next: Fix metadata in listed files
+```
+
+### Archiving Operations
+
+**Check what would be archived (dry run):**
+```bash
+python scripts/archive_docs.py --dry-run 2>/dev/null
+```
+
+**Return format:**
+```
+📦 [N] documents ready for archive:
+  • specs/[doc1] (complete, 95 days old)
+  • analysis/[doc2] (complete, 70 days old)
+Next: Run `python scripts/archive_docs.py` to archive
+```
+
+**Execute archiving:**
+```bash
+python scripts/archive_docs.py 2>/dev/null
+```
+
+**Return format:**
+```
+✓ Archived [N] documents | Moved to archive/[categories] | Index updated
+```
+
+### Document Search
+
+**Search by tag:**
+```bash
+grep -r "tags:.*[search-term]" docs/ --include="*.md" 2>/dev/null | head -10
+```
+
+**Return format:**
+```
+📋 [N] documents match "[term]":
+  • [path1]: [title]
+  • [path2]: [title]
+  (+[remainder] more)
+```
+
+**Search by status:**
+```bash
+grep -r "status: [status]" docs/ --include="*.md" 2>/dev/null | head -10
+```
+
+**Return format:**
+```
+📋 [N] [status] documents:
+  • [path1]: [title]
+  • [path2]: [title]
+Next: [action based on status]
+```
+
+### Index Summary
+
+**Read and summarize INDEX.md:**
+```bash
+head -50 docs/INDEX.md 2>/dev/null
+```
+
+**Return format:**
+```
+📊 Documentation Summary:
+  Total: [N] documents
+  Categories: [category1] ([n1]), [category2] ([n2]), ...
+  Recent: [most recent doc title]
+```
+
+### Structure Initialization
+
+**Initialize docs structure:**
+```bash
+python scripts/init_docs_structure.py 2>/dev/null
+```
+
+**Return format:**
+```
+✓ docs/ structure created | Categories: ai_docs, specs, analysis, plans, templates | Next: Add first document
+```
+
+## Response Templates
+
+### Success Operations
+```
+✓ [operation completed] | [key result] | Next: [action]
+```
+
+### Status Checks
+```
+📊 [metric]: [value] | [metric]: [value] | [recommendation]
+```
+
+### Lists (max 5 items)
+```
+📋 [N] items:
+  • [item 1] - [detail]
+  • [item 2] - [detail]
+  • [item 3] - [detail]
+  (+[remainder] more)
+```
+
+### Errors
+```
+❌ [operation] failed | Reason: [brief explanation] | Fix: [action]
+```
+
+### Warnings
+```
+⚠️  [concern] | Impact: [brief] | Consider: [action]
+```
+
+## Decision-Making Framework
+
+When processing script output, ask yourself:
+
+1. **What decision is the main agent trying to make?**
+   - Creating doc? → Return category guidance + template location
+   - Maintenance? → Return what needs attention + priority
+   - Searching? → Return matching docs + relevance
+
+2. **What's the minimum information needed?**
+   - Counts: totals and breakdowns only
+   - Lists: top 5 items + count of remainder
+   - Errors: specific files and fixes, not full tracebacks
+
+3. **What action should follow?**
+   - Always recommend the logical next step
+   - Make it concrete: "Fix metadata in specs/auth-spec.md" not "fix issues"
+
+## Error Handling
+
+**When scripts fail:**
+```bash
+python scripts/validate_doc_metadata.py 2>&1
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+  # Return actionable error
+  echo "❌ Validation failed | Check: docs/ exists | Fix: python scripts/init_docs_structure.py"
+fi
+```
+
+**When no documents found:**
+```
+ℹ️  No documents in [category] | Reason: empty directory | Next: Create first doc with template
+```
+
+## Critical Rules
+
+### ALWAYS:
+1. ✓ Run scripts with proper path context
+2. ✓ Suppress stderr for clean parsing: `2>/dev/null`
+3. ✓ Parse before returning (no raw script output)
+4. ✓ Keep responses under 50 tokens
+5. ✓ Include next action recommendation
+6. ✓ Use emoji prefixes for visual parsing (✓ ❌ 📋 ⚠️ ℹ️ 📊 📦)
+
+### NEVER:
+1. ❌ Return full file contents to main agent
+2. ❌ Return raw INDEX.md (summarize it)
+3. ❌ Return full validation output (summarize errors)
+4. ❌ Return more than 5 list items (summarize remainder)
+5. ❌ Make the main agent parse verbose output
+6. ❌ Forget the next action recommendation
+
+## Examples
+
+### Good Response
+```
+User: "Check documentation health"
+You execute: python scripts/validate_doc_metadata.py 2>/dev/null
+You return: "✓ 15 docs | 12 valid | 3 need fixes: specs/auth.md, analysis/perf.md, plans/q4.md | Next: Fix missing 'status' field"
+Tokens: 32
+Main agent: Knows exactly what to fix
+```
+
+### Bad Response
+```
+User: "Check documentation health"
+You execute: python scripts/validate_doc_metadata.py
+You return: [Full validation output with all file paths, all errors, verbose formatting]
+Tokens: 500+
+Main agent: Context polluted, overwhelmed with details
+```
+
+### Good Search Response
+```
+User: "Find authentication docs"
+You execute: grep -r "tags:.*auth" docs/ | head -5
+You return: "📋 4 docs match 'auth': specs/oauth-migration.md, analysis/auth-audit.md, plans/auth-refactor.md, ai_docs/auth-sdk.md | Next: Read specs/oauth-migration.md for current spec"
+Tokens: 38
+Main agent: Has what they need to proceed
+```
+
+### Bad Search Response
+```
+User: "Find authentication docs"
+You execute: grep -r "auth" docs/
+You return: [200 lines of grep output with every mention of 'auth']
+Tokens: 1,200
+Main agent: Can't find the actual documents in the noise
+```
+
+## Philosophy
+
+You are a **filter**, not a **conduit**.
+
+- **Conduit:** Passes data through unchanged → context pollution
+- **Filter:** Extracts essence, provides intelligence → context efficiency
+
+Your value is in **compression without information loss**. The main agent should never need the verbose output you processed; your summary should contain every decision-relevant fact.
+
+## Integration with Main Workflows
+
+When the main agent uses you as part of larger workflows:
+
+```markdown
+# Example: Documentation maintenance workflow
+
+Main Agent: "Let's do documentation maintenance"
+Main Agent → You: "Check validation status"
+You: "✓ 20 docs | 18 valid | 2 issues | Next: Fix specs/api.md (missing status)"
+
+Main Agent: "Fix the issues" [edits files]
+Main Agent → You: "Re-validate"
+You: "✓ All 20 documents valid | Ready to archive check"
+
+Main Agent → You: "Check what should be archived"
+You: "📦 3 docs ready: analysis/q2-review.md, specs/old-feature.md, plans/done-task.md | Next: Run archive"
+
+Main Agent → You: "Archive them"
+You: "✓ Archived 3 docs to archive/ | Index updated | Maintenance complete"
+```
+
+Your responses enable the main agent to orchestrate smoothly without getting bogged down in script output.
+
+---
+
+**Remember:** You are doc-librarian. Your job is to keep the main orchestration agent's context clean while providing precise, actionable intelligence about documentation operations. Every response should answer: "What's the state?" and "What should we do next?"
+
+Operate with extreme precision. The main agent's effectiveness depends on your context discipline.
--- a/skills/cyberarian/assets/doc_template.md
+++ b/skills/cyberarian/assets/doc_template.md
@@ -0,0 +1,30 @@
+---
+title: Your Document Title Here
+category: specs  # One of: ai_docs, specs, analysis, plans, templates
+status: draft    # One of: draft, active, complete, archived
+created: YYYY-MM-DD
+last_updated: YYYY-MM-DD
+tags: []  # Add relevant tags: [tag1, tag2, tag3]
+---
+
+# Your Document Title Here
+
+## Overview
+
+Brief description of what this document covers.
+
+## [Section 1]
+
+Content goes here...
+
+## [Section 2]
+
+More content...
+
+## References
+
+- Related docs, links, etc.
+
+---
+
+_Template usage: Copy this file and fill in the frontmatter and sections._
--- a/skills/cyberarian/references/archiving-criteria.md
+++ b/skills/cyberarian/references/archiving-criteria.md
@@ -0,0 +1,184 @@
+# Document Archiving Criteria
+
+Documents are automatically archived based on their category, status, and age. This ensures the active workspace remains focused on current, relevant documentation.
+
+## Archiving Philosophy
+
+**Goals:**
+- Keep active directories focused on current work
+- Preserve historical context in archive
+- Automate routine maintenance while allowing manual control where needed
+- Make archiving decisions deterministic and transparent
+
+**Non-goals:**
+- Deleting documents (everything is preserved)
+- Aggressive archiving that loses important context
+- One-size-fits-all rules (categories have different lifecycles)
+
+## Category-Specific Rules
+
+### specs/ - Specifications
+**Auto-archive**: Yes  
+**Criteria**: Status is `complete` AND >90 days since last_updated
+
+**Rationale**: Specs are valuable reference material even after implementation. 90 days allows for iteration, rollout, and bug fixes before archiving.
+
+**Manual override**: Set `archivable_after` date in frontmatter to defer archiving.
+
+**Example scenarios:**
+- ✅ Archive: Feature spec marked `complete` 100 days ago
+- ❌ Skip: Active spec being refined
+- ❌ Skip: Complete spec only 30 days old (still in rollout phase)
+
+### analysis/ - Investigation Outputs
+**Auto-archive**: Yes  
+**Criteria**: Status is `complete` AND >60 days since last_updated
+
+**Rationale**: Analysis documents are point-in-time investigations. Once the work is done and changes are implemented, they have less ongoing value. 60 days allows for follow-up work.
+
+**Manual override**: Set `archivable_after` to keep important analyses active longer.
+
+**Example scenarios:**
+- ✅ Archive: Bug investigation completed 70 days ago
+- ✅ Archive: Performance analysis from 2 months ago
+- ❌ Skip: Ongoing investigation (status: `active` or `draft`)
+
+### plans/ - Implementation Plans
+**Auto-archive**: Yes  
+**Criteria**: Status is `complete` AND >30 days since last_updated
+
+**Rationale**: Plans become stale quickly. Once implementation is done, plans are primarily historical. 30 days accounts for plan execution and retrospective.
+
+**Manual override**: Set `archivable_after` for long-running initiatives.
+
+**Example scenarios:**
+- ✅ Archive: Migration plan completed 45 days ago
+- ✅ Archive: Sprint plan from last month (status: `complete`)
+- ❌ Skip: Ongoing multi-phase plan (status: `active`)
+- ❌ Skip: Just-completed plan (20 days old)
+
+### ai_docs/ - Reference Materials
+**Auto-archive**: No  
+**Manual archiving only**
+
+**Rationale**: Reference materials (SDKs, API docs, repo context) are meant to be persistent. These inform Claude Code's understanding and should only be archived manually when truly obsolete.
+
+**When to manually archive:**
+- SDK documentation for deprecated versions
+- API references for sunset APIs
+- Repository context for archived projects
+
+**Example scenarios:**
+- ❌ Auto-archive: Never, regardless of age or status
+- ✅ Manual: Move OAuth 1.0 docs when OAuth 2.0 is fully adopted
+- ✅ Manual: Archive legacy API docs after migration complete
+
+### templates/ - Reusable Templates
+**Auto-archive**: No  
+**Templates never auto-archive**
+
+**Rationale**: Templates are meant to be reused indefinitely. They don't have a lifecycle in the same way as other documents.
+
+**When to manually archive:**
+- Deprecated templates that should no longer be used
+- Templates replaced by improved versions
+
+**Best practice**: Instead of archiving, update templates in place or clearly mark as deprecated in the template itself.
+
+## Archive Structure
+
+Archived documents are moved to `archive/` while preserving their category:
+
+```
+archive/
+├── specs/
+│   └── oauth2-migration-spec.md
+├── analysis/
+│   └── auth-perf-analysis.md
+└── plans/
+    └── q3-migration-plan.md
+```
+
+This structure:
+- Maintains categorical organization
+- Allows easy browsing of archived content
+- Prevents mixing of categories in archive
+
+## Manual Archiving
+
+To manually archive a document:
+
+1. Move it to `archive/<category>/`
+2. Update metadata:
+   ```yaml
+   status: archived
+   archived_date: YYYY-MM-DD
+   archive_reason: "Manual archiving: <reason>"
+   ```
+3. Run `scripts/index_docs.py` to update the index
+
+## Preventing Auto-Archiving
+
+To prevent a document from being auto-archived:
+
+**Option 1**: Keep status as `active` or `draft`  
+**Option 2**: Set explicit `archivable_after` date in frontmatter:
+
+```yaml
+archivable_after: 2025-12-31  # Don't archive until after this date
+```
+
+This is useful for:
+- Long-running projects
+- Reference specs that should remain active
+- Documents with ongoing relevance despite completion
+
+## Running the Archiving Script
+
+```bash
+# Dry run to see what would be archived
+python scripts/archive_docs.py --dry-run
+
+# Actually archive documents
+python scripts/archive_docs.py
+
+# Archive and update index
+python scripts/archive_docs.py && python scripts/index_docs.py
+```
+
+**Best practice**: Run archiving periodically (weekly or monthly) as part of documentation maintenance.
+
+## Retrieval from Archive
+
+Archived documents are not deleted and can be retrieved by:
+
+1. **Browsing**: Navigate to `archive/<category>/`
+2. **Search**: Use grep or file search tools
+3. **Index**: Check `INDEX.md` which includes archived documents
+4. **Unarchiving**: Move document back to its category and update status
+
+To unarchive a document:
+```bash
+# Move file back
+mv archive/specs/old-spec.md specs/
+
+# Update metadata
+# Change status from 'archived' to 'active' or appropriate status
+# Remove archived_date and archive_reason fields
+```
+
+## Monitoring
+
+The archiving script provides a summary:
+```
+Archive Summary:
+  Documents scanned: 45
+  Documents archived: 3
+  Documents skipped: 42
+  Errors: 0
+```
+
+Keep an eye on:
+- **Unexpected archives**: Documents archived sooner than expected
+- **Errors**: Failed archiving operations
+- **Zero archives**: May indicate metadata issues (e.g., status never set to `complete`)
--- a/skills/cyberarian/references/metadata-schema.md
+++ b/skills/cyberarian/references/metadata-schema.md
@@ -0,0 +1,125 @@
+# Document Metadata Schema
+
+All documents in the docs/ directory must include YAML frontmatter with the following structure.
+
+## Required Fields
+
+### title
+- **Type**: String
+- **Description**: Human-readable document title
+- **Example**: `"OAuth2 Migration Specification"`
+
+### category
+- **Type**: String (enum)
+- **Description**: Document category, must match the directory it's in
+- **Valid values**: 
+  - `ai_docs` - Reference materials for Claude Code
+  - `specs` - Feature and migration specifications
+  - `analysis` - Investigation outputs
+  - `plans` - Implementation plans
+  - `templates` - Reusable templates
+  - `archive` - Historical documents (auto-set on archiving)
+- **Example**: `specs`
+
+### status
+- **Type**: String (enum)
+- **Description**: Current lifecycle status of the document
+- **Valid values**:
+  - `draft` - Document is being created
+  - `active` - Document is current and relevant
+  - `complete` - Work is done, kept for reference
+  - `archived` - Document has been archived
+- **Example**: `active`
+- **Lifecycle**: draft → active → complete → archived
+
+### created
+- **Type**: Date (YYYY-MM-DD)
+- **Description**: Date the document was created
+- **Example**: `2024-11-16`
+
+### last_updated
+- **Type**: Date (YYYY-MM-DD)
+- **Description**: Date the document was last modified
+- **Example**: `2024-11-16`
+- **Note**: Should be updated whenever significant changes are made
+
+## Optional Fields
+
+### tags
+- **Type**: List of strings
+- **Description**: Keywords for categorization and search
+- **Example**: `[auth, oauth2, security, migration]`
+- **Best practice**: Use consistent tags across related documents
+
+### archivable_after
+- **Type**: Date (YYYY-MM-DD)
+- **Description**: Explicit date after which the document can be auto-archived
+- **Example**: `2025-02-16`
+- **Note**: Overrides category-based archiving rules when set
+
+### archived_date
+- **Type**: Date (YYYY-MM-DD)
+- **Description**: Date the document was archived (auto-set by archiving script)
+- **Example**: `2024-12-01`
+
+### archive_reason
+- **Type**: String
+- **Description**: Reason for archiving (auto-set by archiving script)
+- **Example**: `"90 days old (threshold: 90)"`
+
+### author
+- **Type**: String
+- **Description**: Document author or owner
+- **Example**: `"Simon Lamb"`
+
+### related_docs
+- **Type**: List of strings (file paths)
+- **Description**: Links to related documents
+- **Example**: `["specs/auth-system/oauth2-spec.md", "plans/oauth2-rollout.md"]`
+
+## Complete Example
+
+```yaml
+---
+title: OAuth2 Migration Specification
+category: specs
+status: active
+created: 2024-11-16
+last_updated: 2024-11-16
+tags: [auth, oauth2, security, migration]
+author: Simon Lamb
+related_docs:
+  - analysis/auth-system-audit.md
+  - plans/oauth2-implementation-plan.md
+---
+```
+
+## Validation
+
+Documents are validated using `scripts/validate_doc_metadata.py`. Run this before committing to ensure all metadata is correct.
+
+## Metadata Updates
+
+### When Creating a New Document
+1. Copy from `assets/doc_template.md`
+2. Fill in all required fields
+3. Set status to `draft`
+4. Set created and last_updated to current date
+
+### When Updating a Document
+1. Update `last_updated` to current date
+2. Update `status` if lifecycle stage changes
+3. Add relevant `tags` if needed
+
+### When Completing Work
+1. Set `status` to `complete`
+2. Update `last_updated` to current date
+3. Optionally set `archivable_after` if auto-archiving should be deferred
+
+## Best Practices
+
+1. **Consistent Tags**: Use a common vocabulary of tags across documents
+2. **Accurate Status**: Keep status up to date as work progresses
+3. **Related Docs**: Link to related documents for context and discoverability
+4. **Regular Updates**: Update `last_updated` whenever making significant changes
+5. **Descriptive Titles**: Use clear, specific titles that describe the content
--- a/skills/cyberarian/scripts/archive_docs.py
+++ b/skills/cyberarian/scripts/archive_docs.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+"""
+Automatically archive documents based on status, age, and category-specific rules.
+Documents are moved to archive/ and their metadata is updated.
+"""
+
+import os
+import sys
+import re
+import shutil
+from pathlib import Path
+from datetime import datetime, timedelta
+import yaml
+
+
+# Archiving rules by category (days since last_updated)
+ARCHIVING_RULES = {
+    'specs': {
+        'complete_after_days': 90,
+        'auto_archive': True,
+        'require_complete_status': True
+    },
+    'analysis': {
+        'complete_after_days': 60,
+        'auto_archive': True,
+        'require_complete_status': True
+    },
+    'plans': {
+        'complete_after_days': 30,
+        'auto_archive': True,
+        'require_complete_status': True
+    },
+    'ai_docs': {
+        'auto_archive': False,  # Manual archiving only for reference docs
+    },
+    'templates': {
+        'auto_archive': False,  # Never auto-archive templates
+    }
+}
+
+
+def extract_frontmatter(file_path: Path) -> tuple[dict, str]:
+    """Extract YAML frontmatter and remaining content from a markdown file."""
+    try:
+        content = file_path.read_text()
+        
+        # Match YAML frontmatter between --- delimiters
+        match = re.match(r'^---\s*\n(.*?)\n---\s*\n(.*)', content, re.DOTALL)
+        if not match:
+            return {}, content
+        
+        frontmatter_text = match.group(1)
+        body = match.group(2)
+        metadata = yaml.safe_load(frontmatter_text)
+        
+        return (metadata if isinstance(metadata, dict) else {}), body
+    
+    except Exception as e:
+        print(f"⚠️  Warning: Could not parse {file_path}: {e}")
+        return {}, ""
+
+
+def update_frontmatter(file_path: Path, metadata: dict) -> None:
+    """Update the YAML frontmatter in a markdown file."""
+    _, body = extract_frontmatter(file_path)
+    
+    frontmatter = yaml.dump(metadata, default_flow_style=False, sort_keys=False)
+    new_content = f"---\n{frontmatter}---\n{body}"
+    
+    file_path.write_text(new_content)
+
+
+def should_archive(metadata: dict, category: str, file_modified: datetime) -> tuple[bool, str]:
+    """
+    Determine if a document should be archived based on rules.
+    Returns (should_archive, reason).
+    """
+    # Skip if already archived
+    if metadata.get('status') == 'archived':
+        return False, "already archived"
+    
+    # Get category rules
+    rules = ARCHIVING_RULES.get(category, {})
+    
+    # Skip if auto-archiving is disabled for this category
+    if not rules.get('auto_archive', False):
+        return False, f"{category} does not auto-archive"
+    
+    # Check if status is 'complete' (required for most categories)
+    if rules.get('require_complete_status', False):
+        if metadata.get('status') != 'complete':
+            return False, "status is not 'complete'"
+    
+    # Check age-based archiving
+    complete_after_days = rules.get('complete_after_days')
+    if complete_after_days:
+        last_updated = metadata.get('last_updated')
+        if not last_updated:
+            return False, "no last_updated date in metadata"
+        
+        try:
+            if isinstance(last_updated, str):
+                updated_date = datetime.strptime(last_updated, '%Y-%m-%d').date()
+            else:
+                # YAML parser returns date objects, convert to date for comparison
+                updated_date = last_updated if hasattr(last_updated, 'year') else datetime.strptime(str(last_updated), '%Y-%m-%d').date()
+            
+            days_old = (datetime.now().date() - updated_date).days
+            
+            if days_old >= complete_after_days:
+                return True, f"{days_old} days old (threshold: {complete_after_days})"
+        except ValueError:
+            return False, "invalid last_updated date format"
+    
+    return False, "no archiving criteria met"
+
+
+def archive_document(file_path: Path, docs_path: Path, reason: str, dry_run: bool = False) -> bool:
+    """
+    Archive a document by moving it to archive/ and updating its metadata.
+    Returns True if successful.
+    """
+    try:
+        # Read metadata
+        metadata, body = extract_frontmatter(file_path)
+        
+        # Determine archive path (preserve subdirectory structure)
+        relative_path = file_path.relative_to(docs_path)
+        category = relative_path.parts[0]
+        
+        # Create archive subdirectory for the category
+        archive_path = docs_path / 'archive' / category
+        archive_path.mkdir(parents=True, exist_ok=True)
+        
+        # Build destination path
+        archive_file = archive_path / file_path.name
+        
+        # Handle name conflicts
+        if archive_file.exists():
+            base = archive_file.stem
+            suffix = archive_file.suffix
+            counter = 1
+            while archive_file.exists():
+                archive_file = archive_path / f"{base}_{counter}{suffix}"
+                counter += 1
+        
+        if dry_run:
+            print(f"  [DRY RUN] Would archive: {relative_path} → archive/{category}/{archive_file.name}")
+            print(f"            Reason: {reason}")
+            return True
+        
+        # Update metadata
+        metadata['status'] = 'archived'
+        metadata['archived_date'] = datetime.now().strftime('%Y-%m-%d')
+        metadata['archive_reason'] = reason
+        
+        # Write updated file to archive
+        frontmatter = yaml.dump(metadata, default_flow_style=False, sort_keys=False)
+        new_content = f"---\n{frontmatter}---\n{body}"
+        archive_file.write_text(new_content)
+        
+        # Remove original
+        file_path.unlink()
+        
+        print(f"  ✅ Archived: {relative_path} → archive/{category}/{archive_file.name}")
+        print(f"     Reason: {reason}")
+        
+        return True
+    
+    except Exception as e:
+        print(f"  ❌ Error archiving {file_path}: {e}")
+        return False
+
+
+def scan_and_archive(docs_path: Path, dry_run: bool = False) -> dict:
+    """
+    Scan all documents and archive those that meet criteria.
+    Returns statistics about the archiving operation.
+    """
+    stats = {
+        'scanned': 0,
+        'archived': 0,
+        'skipped': 0,
+        'errors': 0
+    }
+    
+    skip_files = {'README.md', 'INDEX.md', '.gitkeep'}
+    skip_dirs = {'archive'}
+    
+    for category_dir in docs_path.iterdir():
+        if not category_dir.is_dir() or category_dir.name in skip_dirs or category_dir.name.startswith('.'):
+            continue
+        
+        category_name = category_dir.name
+        
+        # Find all markdown files
+        for md_file in category_dir.rglob('*.md'):
+            if md_file.name in skip_files:
+                continue
+            
+            stats['scanned'] += 1
+            
+            # Extract metadata
+            metadata, _ = extract_frontmatter(md_file)
+            file_stats = md_file.stat()
+            file_modified = datetime.fromtimestamp(file_stats.st_mtime)
+            
+            # Check if should archive
+            should_arch, reason = should_archive(metadata, category_name, file_modified)
+            
+            if should_arch:
+                success = archive_document(md_file, docs_path, reason, dry_run)
+                if success:
+                    stats['archived'] += 1
+                else:
+                    stats['errors'] += 1
+            else:
+                stats['skipped'] += 1
+    
+    return stats
+
+
+def main():
+    """Main entry point."""
+    dry_run = '--dry-run' in sys.argv
+    
+    # Get base path
+    args = [arg for arg in sys.argv[1:] if not arg.startswith('--')]
+    if args:
+        base_path = Path(args[0]).resolve()
+    else:
+        base_path = Path.cwd()
+    
+    docs_path = base_path / 'docs'
+    
+    if not docs_path.exists():
+        print(f"❌ Error: docs/ directory not found at {docs_path}")
+        sys.exit(1)
+    
+    print(f"Scanning documents in: {docs_path}")
+    if dry_run:
+        print("🔍 DRY RUN MODE - No files will be modified")
+    print()
+    
+    # Scan and archive
+    stats = scan_and_archive(docs_path, dry_run)
+    
+    print()
+    print("=" * 60)
+    print("Archive Summary:")
+    print(f"  Documents scanned: {stats['scanned']}")
+    print(f"  Documents archived: {stats['archived']}")
+    print(f"  Documents skipped: {stats['skipped']}")
+    print(f"  Errors: {stats['errors']}")
+    print()
+    
+    if not dry_run and stats['archived'] > 0:
+        print("💡 Tip: Run 'python scripts/index_docs.py' to update the documentation index")
+
+
+if __name__ == '__main__':
+    main()
--- a/skills/cyberarian/scripts/index_docs.py
+++ b/skills/cyberarian/scripts/index_docs.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python3
+"""
+Generate and update the INDEX.md file by scanning all documents in docs/.
+Reads YAML frontmatter to extract metadata and organize the index.
+"""
+
+import os
+import sys
+import re
+from pathlib import Path
+from datetime import datetime
+from collections import defaultdict
+import yaml
+
+
+def extract_frontmatter(file_path: Path) -> dict:
+    """Extract YAML frontmatter from a markdown file."""
+    try:
+        content = file_path.read_text()
+        
+        # Match YAML frontmatter between --- delimiters
+        match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
+        if not match:
+            return {}
+        
+        frontmatter_text = match.group(1)
+        metadata = yaml.safe_load(frontmatter_text)
+        
+        return metadata if isinstance(metadata, dict) else {}
+    
+    except Exception as e:
+        print(f"⚠️  Warning: Could not parse frontmatter in {file_path}: {e}")
+        return {}
+
+
+def get_file_stats(file_path: Path) -> dict:
+    """Get file statistics."""
+    stats = file_path.stat()
+    return {
+        'size': stats.st_size,
+        'modified': datetime.fromtimestamp(stats.st_mtime)
+    }
+
+
+def scan_documents(docs_path: Path) -> dict:
+    """Scan all markdown documents in docs/ and extract metadata."""
+    categories = defaultdict(list)
+    
+    # Skip these files/directories
+    skip_files = {'README.md', 'INDEX.md', '.gitkeep'}
+    skip_dirs = {'archive'}  # We'll handle archive separately
+    
+    for category_dir in docs_path.iterdir():
+        if not category_dir.is_dir() or category_dir.name.startswith('.'):
+            continue
+        
+        category_name = category_dir.name
+        
+        # Find all markdown files
+        for md_file in category_dir.rglob('*.md'):
+            if md_file.name in skip_files:
+                continue
+            
+            # Extract metadata
+            metadata = extract_frontmatter(md_file)
+            stats = get_file_stats(md_file)
+            
+            # Build document entry
+            relative_path = md_file.relative_to(docs_path)
+            doc_entry = {
+                'path': str(relative_path),
+                'title': metadata.get('title', md_file.stem),
+                'status': metadata.get('status', 'unknown'),
+                'created': metadata.get('created', 'unknown'),
+                'last_updated': metadata.get('last_updated', stats['modified'].strftime('%Y-%m-%d')),
+                'tags': metadata.get('tags', []),
+                'category': category_name,
+                'file_modified': stats['modified']
+            }
+            
+            categories[category_name].append(doc_entry)
+    
+    return categories
+
+
+def generate_index(categories: dict) -> str:
+    """Generate the INDEX.md content."""
+    total_docs = sum(len(docs) for docs in categories.values())
+    
+    index_lines = [
+        "# Documentation Index",
+        "",
+        f"Auto-generated index of all documents. Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
+        "",
+        "Run `python scripts/index_docs.py` to regenerate this index.",
+        "",
+        "---",
+        "",
+        "## Summary",
+        "",
+        f"Total documents: {total_docs}",
+        ""
+    ]
+    
+    # Add category breakdown
+    if categories:
+        index_lines.append("By category:")
+        for category in sorted(categories.keys()):
+            count = len(categories[category])
+            index_lines.append(f"- **{category}**: {count} document{'s' if count != 1 else ''}")
+        index_lines.append("")
+    
+    index_lines.append("---")
+    index_lines.append("")
+    
+    # Add documents by category
+    if not categories:
+        index_lines.append("_No documents found. Add documents to the category directories and regenerate the index._")
+    else:
+        for category in sorted(categories.keys()):
+            docs = categories[category]
+            docs.sort(key=lambda d: d['last_updated'], reverse=True)
+            
+            index_lines.append(f"## {category.replace('_', ' ').title()}")
+            index_lines.append("")
+            
+            for doc in docs:
+                # Format: [Title](path) - status | updated: date | tags
+                title_link = f"[{doc['title']}]({doc['path']})"
+                status_badge = f"**{doc['status']}**"
+                updated = f"updated: {doc['last_updated']}"
+                tags = f"tags: [{', '.join(doc['tags'])}]" if doc['tags'] else ""
+                
+                parts = [title_link, status_badge, updated]
+                if tags:
+                    parts.append(tags)
+                
+                index_lines.append(f"- {' | '.join(parts)}")
+            
+            index_lines.append("")
+    
+    return '\n'.join(index_lines)
+
+
+def main():
+    """Main entry point."""
+    if len(sys.argv) > 1:
+        base_path = Path(sys.argv[1]).resolve()
+    else:
+        base_path = Path.cwd()
+    
+    docs_path = base_path / 'docs'
+    
+    if not docs_path.exists():
+        print(f"❌ Error: docs/ directory not found at {docs_path}")
+        print("Run 'python scripts/init_docs_structure.py' first to initialize the structure.")
+        sys.exit(1)
+    
+    print(f"Scanning documents in: {docs_path}")
+    
+    # Scan all documents
+    categories = scan_documents(docs_path)
+    
+    # Generate index content
+    index_content = generate_index(categories)
+    
+    # Write INDEX.md
+    index_path = docs_path / 'INDEX.md'
+    index_path.write_text(index_content)
+    
+    total_docs = sum(len(docs) for docs in categories.values())
+    print(f"✅ Generated index with {total_docs} documents")
+    print(f"✅ Updated: {index_path}")
+
+
+if __name__ == '__main__':
+    main()
--- a/skills/cyberarian/scripts/init_docs_structure.py
+++ b/skills/cyberarian/scripts/init_docs_structure.py
@@ -0,0 +1,156 @@
+#!/usr/bin/env python3
+"""
+Initialize the docs/ directory structure for document lifecycle management.
+Creates all required directories and initial README.md.
+"""
+
+import os
+import sys
+from pathlib import Path
+from datetime import datetime
+
+
+DIRECTORY_STRUCTURE = {
+    'ai_docs': 'Reference materials for Claude Code: SDKs, API docs, repo context',
+    'specs': 'Feature and migration specifications',
+    'analysis': 'Investigation outputs: bug hunting, optimization, cleanup',
+    'plans': 'Implementation plans from specs, analysis, or ad-hoc tasks',
+    'templates': 'Reusable document templates',
+    'archive': 'Historical and completed documents'
+}
+
+
+README_TEMPLATE = """# Documentation Structure
+
+This directory contains project documentation organized by purpose and lifecycle stage.
+
+## Directory Structure
+
+{directory_descriptions}
+
+## Document Lifecycle
+
+Documents follow a lifecycle managed through YAML frontmatter:
+
+1. **Draft** → Document is being created
+2. **Active** → Document is current and relevant
+3. **Complete** → Work is done, kept for reference
+4. **Archived** → Moved to archive/ when no longer relevant
+
+## Metadata Requirements
+
+All documents should include YAML frontmatter:
+
+```yaml
+---
+title: Document Title
+category: specs|analysis|plans|ai_docs|templates
+status: draft|active|complete|archived
+created: YYYY-MM-DD
+last_updated: YYYY-MM-DD
+tags: [tag1, tag2]
+---
+```
+
+See INDEX.md for a complete list of all documents.
+
+## Temporary Documents
+
+Ephemeral/scratch documents should be created in `/tmp` or system temp directories,
+NOT in this docs/ directory. The docs/ directory is for persistent documentation only.
+
+---
+Last updated: {timestamp}
+"""
+
+
+def create_directory_structure(base_path: Path) -> None:
+    """Create the docs directory structure."""
+    docs_path = base_path / 'docs'
+    
+    # Create main docs directory
+    docs_path.mkdir(exist_ok=True)
+    print(f"✅ Created: {docs_path}")
+    
+    # Create category directories
+    for directory, description in DIRECTORY_STRUCTURE.items():
+        dir_path = docs_path / directory
+        dir_path.mkdir(exist_ok=True)
+        print(f"✅ Created: {dir_path}")
+        
+        # Create .gitkeep for empty directories
+        gitkeep = dir_path / '.gitkeep'
+        if not any(dir_path.iterdir()):
+            gitkeep.touch()
+
+
+def create_readme(base_path: Path) -> None:
+    """Create the README.md file."""
+    docs_path = base_path / 'docs'
+    readme_path = docs_path / 'README.md'
+    
+    # Format directory descriptions
+    descriptions = []
+    for directory, description in DIRECTORY_STRUCTURE.items():
+        descriptions.append(f"- **{directory}/** - {description}")
+    
+    readme_content = README_TEMPLATE.format(
+        directory_descriptions='\n'.join(descriptions),
+        timestamp=datetime.now().strftime('%Y-%m-%d')
+    )
+    
+    readme_path.write_text(readme_content)
+    print(f"✅ Created: {readme_path}")
+
+
+def create_index(base_path: Path) -> None:
+    """Create initial INDEX.md file."""
+    docs_path = base_path / 'docs'
+    index_path = docs_path / 'INDEX.md'
+    
+    index_content = f"""# Documentation Index
+
+Auto-generated index of all documents. Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
+
+Run `python scripts/index_docs.py` to regenerate this index.
+
+---
+
+## Summary
+
+Total documents: 0
+
+---
+
+_No documents found. Add documents to the category directories and regenerate the index._
+"""
+    
+    index_path.write_text(index_content)
+    print(f"✅ Created: {index_path}")
+
+
+def main():
+    """Main entry point."""
+    if len(sys.argv) > 1:
+        base_path = Path(sys.argv[1]).resolve()
+    else:
+        base_path = Path.cwd()
+    
+    print(f"Initializing docs structure at: {base_path}")
+    print()
+    
+    create_directory_structure(base_path)
+    create_readme(base_path)
+    create_index(base_path)
+    
+    print()
+    print("🎉 Documentation structure initialized successfully!")
+    print()
+    print("Next steps:")
+    print("1. Add documents to the category directories")
+    print("2. Run 'python scripts/index_docs.py' to update the index")
+    print("3. Run 'python scripts/archive_docs.py' periodically to maintain the archive")
+
+
+if __name__ == '__main__':
+    main()
--- a/skills/cyberarian/scripts/validate_doc_metadata.py
+++ b/skills/cyberarian/scripts/validate_doc_metadata.py
@@ -0,0 +1,178 @@
+#!/usr/bin/env python3
+"""
+Validate that all documents have proper YAML frontmatter metadata.
+Reports documents with missing or invalid metadata.
+"""
+
+import sys
+import re
+from pathlib import Path
+from datetime import datetime
+import yaml
+
+
+REQUIRED_FIELDS = ['title', 'category', 'status', 'created', 'last_updated']
+VALID_STATUSES = ['draft', 'active', 'complete', 'archived']
+VALID_CATEGORIES = ['ai_docs', 'specs', 'analysis', 'plans', 'templates', 'archive']
+
+
+def extract_frontmatter(file_path: Path) -> dict:
+    """Extract YAML frontmatter from a markdown file."""
+    try:
+        content = file_path.read_text()
+        
+        # Match YAML frontmatter between --- delimiters
+        match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
+        if not match:
+            return None  # No frontmatter found
+        
+        frontmatter_text = match.group(1)
+        metadata = yaml.safe_load(frontmatter_text)
+        
+        return metadata if isinstance(metadata, dict) else None
+    
+    except Exception as e:
+        return {'_error': str(e)}
+
+
+def validate_date(date_str: str) -> bool:
+    """Validate date format (YYYY-MM-DD)."""
+    try:
+        datetime.strptime(str(date_str), '%Y-%m-%d')
+        return True
+    except (ValueError, TypeError):
+        return False
+
+
+def validate_metadata(metadata: dict, category_from_path: str) -> list[str]:
+    """
+    Validate metadata against requirements.
+    Returns list of validation errors (empty if valid).
+    """
+    errors = []
+    
+    if metadata is None:
+        return ["No YAML frontmatter found"]
+    
+    if '_error' in metadata:
+        return [f"Failed to parse frontmatter: {metadata['_error']}"]
+    
+    # Check required fields
+    for field in REQUIRED_FIELDS:
+        if field not in metadata:
+            errors.append(f"Missing required field: {field}")
+    
+    # Validate status
+    if 'status' in metadata:
+        if metadata['status'] not in VALID_STATUSES:
+            errors.append(f"Invalid status '{metadata['status']}'. Must be one of: {', '.join(VALID_STATUSES)}")
+    
+    # Validate category
+    if 'category' in metadata:
+        if metadata['category'] not in VALID_CATEGORIES:
+            errors.append(f"Invalid category '{metadata['category']}'. Must be one of: {', '.join(VALID_CATEGORIES)}")
+        elif metadata['category'] != category_from_path:
+            errors.append(f"Category mismatch: metadata says '{metadata['category']}' but file is in '{category_from_path}/'")
+    
+    # Validate dates
+    for date_field in ['created', 'last_updated']:
+        if date_field in metadata:
+            if not validate_date(metadata[date_field]):
+                errors.append(f"Invalid {date_field} date format. Must be YYYY-MM-DD")
+    
+    # Validate tags (optional but must be list if present)
+    if 'tags' in metadata:
+        if not isinstance(metadata['tags'], list):
+            errors.append("Tags must be a list")
+    
+    return errors
+
+
+def scan_and_validate(docs_path: Path) -> dict:
+    """
+    Scan all documents and validate their metadata.
+    Returns validation results.
+    """
+    results = {
+        'valid': [],
+        'invalid': [],
+        'no_frontmatter': [],
+        'total': 0
+    }
+    
+    skip_files = {'README.md', 'INDEX.md', '.gitkeep'}
+    
+    for category_dir in docs_path.iterdir():
+        if not category_dir.is_dir() or category_dir.name.startswith('.'):
+            continue
+        
+        category_name = category_dir.name
+        
+        # Find all markdown files
+        for md_file in category_dir.rglob('*.md'):
+            if md_file.name in skip_files:
+                continue
+            
+            results['total'] += 1
+            relative_path = md_file.relative_to(docs_path)
+            
+            # Extract and validate metadata
+            metadata = extract_frontmatter(md_file)
+            errors = validate_metadata(metadata, category_name)
+            
+            if not errors:
+                results['valid'].append(str(relative_path))
+            else:
+                results['invalid'].append({
+                    'path': str(relative_path),
+                    'errors': errors
+                })
+    
+    return results
+
+
+def main():
+    """Main entry point."""
+    if len(sys.argv) > 1:
+        base_path = Path(sys.argv[1]).resolve()
+    else:
+        base_path = Path.cwd()
+    
+    docs_path = base_path / 'docs'
+    
+    if not docs_path.exists():
+        print(f"❌ Error: docs/ directory not found at {docs_path}")
+        sys.exit(1)
+    
+    print(f"Validating documents in: {docs_path}")
+    print()
+    
+    # Scan and validate
+    results = scan_and_validate(docs_path)
+    
+    # Display results
+    print("=" * 60)
+    print("Validation Results:")
+    print(f"  Total documents: {results['total']}")
+    print(f"  ✅ Valid: {len(results['valid'])}")
+    print(f"  ❌ Invalid: {len(results['invalid'])}")
+    print()
+    
+    if results['invalid']:
+        print("Invalid Documents:")
+        print()
+        for item in results['invalid']:
+            print(f"  📄 {item['path']}")
+            for error in item['errors']:
+                print(f"     • {error}")
+            print()
+    
+    if results['valid'] and not results['invalid']:
+        print("🎉 All documents have valid metadata!")
+    
+    # Exit with error code if any invalid documents
+    sys.exit(1 if results['invalid'] else 0)
+
+
+if __name__ == '__main__':
+    main()