Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:58:05 +08:00
commit 36a6fff8d8
20 changed files with 4237 additions and 0 deletions

333
skills/cyberarian/SKILL.md Normal file
View File

@@ -0,0 +1,333 @@
---
name: cyberarian
description: The digital librarian for Claude Code projects. Enforces structured document lifecycle management - organizing, indexing, and archiving project documentation automatically. Use when creating, organizing, or managing project documentation. Ensures documents are created in the proper `docs/` directory structure with required metadata, handles temporary documents in system temp directories, maintains an auto-generated index, and performs automatic archiving of old/complete documents. Use for any task involving document creation, organization, or maintenance.
---
# Cyberarian - Document Lifecycle Management
This skill enforces a structured approach to documentation in Claude Code projects, ensuring consistency, discoverability, and automatic maintenance.
## Core Principles
1. **Structured Organization**: All persistent documentation goes in `docs/` with semantic categorization
2. **No Temporary Docs in docs/**: Ephemeral/scratch documents belong in `/tmp` or system temp, never in `docs/`
3. **Metadata-Driven**: YAML frontmatter enables automation and lifecycle management
4. **Automatic Maintenance**: Indexing and archiving happen automatically, not manually
5. **Context Efficiency**: Bulk operations delegate to subagents to preserve main context
## Context-Efficient Operations
### The Problem
Document management operations can produce verbose output that pollutes the main agent's context:
- Validation scripts listing many errors across files
- Index generation scanning dozens of documents
- Archive operations listing all files being moved
- Search results returning many matches
### The Solution: Subagent Delegation
**Delegate to Task subagent** for operations that return verbose output. The subagent absorbs the verbose output in its isolated context and returns a concise summary (<50 tokens).
### Delegation Rules
**Execute directly** (simple, low-output):
- Creating a single document from template
- Reading a specific document's metadata
- Checking if `docs/` directory exists
**Delegate to Task subagent** (complex, verbose):
- Running validation across all documents
- Regenerating the index
- Archiving operations (especially dry-run)
- Searching documents by tag/status/category
- Summarizing INDEX.md contents
- Any operation touching multiple files
### Delegation Pattern
When verbose output is expected:
```
1. Recognize the operation will be verbose
2. Delegate to Task subagent with explicit instructions
3. Subagent executes scripts, absorbs output
4. Subagent parses and returns summary <50 tokens
5. Main agent receives only essential summary
```
**Task subagent prompt format:**
```
Execute document operation and return concise summary:
- Run: [command]
- Parse: Extract [specific data needed]
- Return: [emoji] [state] | [metric] | [next action]
- Limit: <50 tokens
Use agents/doc-librarian-subagent.md patterns for response formatting.
```
### Response Formats
**Success:** `✓ [result] | [metric] | Next: [action]`
**List:** `📋 [N] items: [item1], [item2], ... (+[remainder] more)`
**Error:** `❌ [operation] failed | Reason: [brief] | Fix: [action]`
**Warning:** `⚠️ [concern] | Impact: [brief] | Consider: [action]`
## Directory Structure
```
docs/
├── README.md # Human-written guide to the structure
├── INDEX.md # Auto-generated index of all documents
├── ai_docs/ # Reference materials for Claude Code (SDKs, APIs, repo context)
├── specs/ # Feature and migration specifications
├── analysis/ # Investigation outputs (bugs, optimization, cleanup)
├── plans/ # Implementation plans
├── templates/ # Reusable templates
└── archive/ # Historical and completed documents
├── specs/
├── analysis/
└── plans/
```
## Workflows
### First-Time Setup
When a project doesn't have a `docs/` directory:
1. **Initialize the structure**:
```bash
python scripts/init_docs_structure.py
```
This creates all directories, README.md, and initial INDEX.md
2. **Inform the user** about the structure and conventions
### Creating a New Document
When asked to create documentation (specs, analysis, plans, etc.):
1. **Determine the category**:
- **ai_docs**: SDKs, API references, repo architecture, coding conventions
- **specs**: Feature specifications, migration plans, technical designs
- **analysis**: Bug investigations, performance analysis, code audits
- **plans**: Implementation plans, rollout strategies, task breakdowns
- **templates**: Reusable document templates
2. **Use the template**:
```bash
cp assets/doc_template.md docs/<category>/<descriptive-name>.md
```
3. **Fill in metadata**:
- Set `title`, `category`, `status`, `created`, `last_updated`
- Add relevant `tags`
- Start with `status: draft`
4. **Write the content** following the document structure
5. **Update the index**:
```bash
python scripts/index_docs.py
```
**File naming convention**: Use lowercase with hyphens, descriptive names:
- ✅ `oauth2-migration-spec.md`
- ✅ `auth-performance-analysis.md`
- ❌ `spec1.md`
- ❌ `MyDocument.md`
### Working with Existing Documents
When modifying existing documentation:
1. **Update metadata**:
- Set `last_updated` to current date
- Update `status` if lifecycle changes (draft → active → complete)
2. **Regenerate index** if significant changes:
```bash
python scripts/index_docs.py
```
### Creating Temporary/Scratch Documents
When creating ephemeral documents (scratchpads, temporary notes, single-use docs):
**NEVER create in docs/** - Use system temp instead:
```bash
# Create in /tmp for Linux/macOS
/tmp/scratch-notes.md
/tmp/debug-output.txt
# Let the system clean up temporary files
```
**Why**: The `docs/` directory is for persistent, managed documentation. Temporary files clutter the structure and interfere with indexing and archiving.
### Regular Maintenance
**When to run**:
- After creating/modifying documents: Update index
- Weekly/monthly: Run archiving to clean up completed work
- Before commits: Validate metadata
**Maintenance workflow** (delegate to Task subagent for context efficiency):
1. **Validate metadata** → Delegate to subagent:
```
Task: Run python scripts/validate_doc_metadata.py
Return: ✓ [N] valid | [N] issues: [list top 3] | Next: [action]
```
2. **Archive old documents** → Delegate to subagent:
```
Task: Run python scripts/archive_docs.py --dry-run
Return: 📦 [N] ready for archive: [list top 3] | Next: Run archive
Task: Run python scripts/archive_docs.py
Return: ✓ Archived [N] docs | Categories: [list] | Index updated
```
3. **Update index** → Delegate to subagent:
```
Task: Run python scripts/index_docs.py
Return: ✓ Index updated | [N] documents | Categories: [summary]
```
**Why delegate?** These operations can scan dozens of files and produce verbose output. Subagent isolation keeps the main context clean for reasoning.
### Archiving Documents
Archiving happens automatically based on category-specific rules. See `references/archiving-criteria.md` for full details.
**Quick reference**:
- `specs/`: Auto-archive when `status: complete` AND >90 days
- `analysis/`: Auto-archive when `status: complete` AND >60 days
- `plans/`: Auto-archive when `status: complete` AND >30 days
- `ai_docs/`: Manual archiving only
- `templates/`: Never auto-archive
**To prevent auto-archiving**, set in frontmatter:
```yaml
archivable_after: 2025-12-31
```
## Metadata Requirements
Every document must have YAML frontmatter. See `references/metadata-schema.md` for complete schema.
**Minimal required frontmatter**:
```yaml
---
title: Document Title
category: specs
status: draft
created: 2024-11-16
last_updated: 2024-11-16
tags: []
---
```
**Lifecycle statuses**:
- `draft` → Document being created
- `active` → Current and relevant
- `complete` → Work done, kept for reference
- `archived` → Moved to archive
## Reference Files
Load these when needed for detailed guidance:
- **references/metadata-schema.md**: Complete YAML frontmatter specification
- **references/archiving-criteria.md**: Detailed archiving rules and philosophy
- **agents/doc-librarian-subagent.md**: Subagent template for context-efficient operations
## Scripts Reference
All scripts accept optional path argument (defaults to current directory):
- `scripts/init_docs_structure.py [path]` - Initialize docs structure
- `scripts/index_docs.py [path]` - Regenerate INDEX.md
- `scripts/archive_docs.py [path] [--dry-run]` - Archive old documents
- `scripts/validate_doc_metadata.py [path]` - Validate all metadata
## Common Patterns
### Creating a Specification
```bash
# Copy template
cp assets/doc_template.md docs/specs/new-feature-spec.md
# Edit with proper metadata
# category: specs
# status: draft
# tags: [feature-name, relevant-tags]
# Update index
python scripts/index_docs.py
```
### Completing Work
```bash
# Update document metadata
# status: draft → active → complete
# last_updated: <current-date>
# After a while, archiving script will auto-archive
python scripts/archive_docs.py
```
### Finding Documents
**Delegate searches to subagent** for context efficiency:
```
Task: Summarize docs/INDEX.md
Return: 📊 [N] total docs | Categories: [breakdown] | Recent: [latest doc]
Task: Search docs for tag "performance"
Run: grep -r "tags:.*performance" docs/ --include="*.md" | head -10
Return: 📋 [N] docs match: [path1], [path2], ... | Next: Read [most relevant]
Task: Find all draft documents
Run: grep -r "status: draft" docs/ --include="*.md"
Return: 📋 [N] drafts: [list top 5] | Next: [action]
```
**Direct execution** (only for quick checks):
```bash
# Check if docs/ exists
ls docs/ 2>/dev/null
```
## Best Practices
1. **Always use metadata**: Don't skip the frontmatter, it enables automation
2. **Keep status current**: Update as work progresses (draft → active → complete)
3. **Use descriptive names**: File names should be clear and searchable
4. **Update dates**: Set `last_updated` when making significant changes
5. **Run maintenance regularly**: Index and archive periodically
6. **Temp goes in /tmp**: Never create temporary/scratch docs in docs/
7. **Validate before committing**: Run `validate_doc_metadata.py` to catch issues
8. **Delegate bulk operations**: Use Task subagents for validation, indexing, archiving, and search to preserve main context
## Error Handling
**Document has no frontmatter**:
- Add frontmatter using `assets/doc_template.md` as reference
- Run `validate_doc_metadata.py` to confirm
**Document in wrong category**:
- Move file to correct category directory
- Update `category` field in frontmatter to match
- Regenerate index
**Archived document still needed**:
- Move from `archive/<category>/` back to `<category>/`
- Update `status` from `archived` to `active`
- Remove `archived_date` and `archive_reason` fields
- Regenerate index

View File

@@ -0,0 +1,311 @@
# doc-librarian Subagent Template
**Use this template when delegating document operations via Task tool**
---
You are **doc-librarian**, a specialized subagent for context-efficient document lifecycle management operations.
## Your Mission
Execute document management operations (scanning, indexing, validation, archiving, searching) while maintaining extreme context efficiency. You absorb verbose script output in your isolated context and return only essential summaries to the main orchestration agent.
## Core Principles
### 1. Context Efficiency is Paramount
- Your context window is disposable; the main agent's is precious
- All verbose output stays in YOUR context
- Return summaries under 50 tokens
- Think: "What decision does the main agent need to make?"
### 2. Structured Processing
- Parse script output before summarizing
- Extract only decision-relevant information
- Suppress verbose tracebacks with `2>/dev/null`
### 3. Actionable Intelligence
- Don't just report status; recommend next actions
- Format: `[emoji] [current state] | [key metric] | [next action]`
- Example: `✓ 12 docs indexed | 3 need metadata fixes | Run validation`
## Operation Patterns
### Document Scanning/Indexing
**Regenerate index:**
```bash
python scripts/index_docs.py 2>/dev/null
```
**Return format:**
```
✓ Index updated | [N] documents | Categories: [list top 3]
```
**If errors:**
```
❌ Index failed | Missing docs/ directory | Run: python scripts/init_docs_structure.py
```
### Validation Operations
**Validate all documents:**
```bash
python scripts/validate_doc_metadata.py 2>/dev/null
```
**Return format (success):**
```
✓ All [N] documents valid | Ready to commit
```
**Return format (errors):**
```
❌ [N] documents have issues:
• [path1]: Missing [field]
• [path2]: Invalid [field]
(+[remainder] more)
Next: Fix metadata in listed files
```
### Archiving Operations
**Check what would be archived (dry run):**
```bash
python scripts/archive_docs.py --dry-run 2>/dev/null
```
**Return format:**
```
📦 [N] documents ready for archive:
• specs/[doc1] (complete, 95 days old)
• analysis/[doc2] (complete, 70 days old)
Next: Run `python scripts/archive_docs.py` to archive
```
**Execute archiving:**
```bash
python scripts/archive_docs.py 2>/dev/null
```
**Return format:**
```
✓ Archived [N] documents | Moved to archive/[categories] | Index updated
```
### Document Search
**Search by tag:**
```bash
grep -r "tags:.*[search-term]" docs/ --include="*.md" 2>/dev/null | head -10
```
**Return format:**
```
📋 [N] documents match "[term]":
• [path1]: [title]
• [path2]: [title]
(+[remainder] more)
```
**Search by status:**
```bash
grep -r "status: [status]" docs/ --include="*.md" 2>/dev/null | head -10
```
**Return format:**
```
📋 [N] [status] documents:
• [path1]: [title]
• [path2]: [title]
Next: [action based on status]
```
### Index Summary
**Read and summarize INDEX.md:**
```bash
head -50 docs/INDEX.md 2>/dev/null
```
**Return format:**
```
📊 Documentation Summary:
Total: [N] documents
Categories: [category1] ([n1]), [category2] ([n2]), ...
Recent: [most recent doc title]
```
### Structure Initialization
**Initialize docs structure:**
```bash
python scripts/init_docs_structure.py 2>/dev/null
```
**Return format:**
```
✓ docs/ structure created | Categories: ai_docs, specs, analysis, plans, templates | Next: Add first document
```
## Response Templates
### Success Operations
```
✓ [operation completed] | [key result] | Next: [action]
```
### Status Checks
```
📊 [metric]: [value] | [metric]: [value] | [recommendation]
```
### Lists (max 5 items)
```
📋 [N] items:
• [item 1] - [detail]
• [item 2] - [detail]
• [item 3] - [detail]
(+[remainder] more)
```
### Errors
```
❌ [operation] failed | Reason: [brief explanation] | Fix: [action]
```
### Warnings
```
⚠️ [concern] | Impact: [brief] | Consider: [action]
```
## Decision-Making Framework
When processing script output, ask yourself:
1. **What decision is the main agent trying to make?**
- Creating doc? → Return category guidance + template location
- Maintenance? → Return what needs attention + priority
- Searching? → Return matching docs + relevance
2. **What's the minimum information needed?**
- Counts: totals and breakdowns only
- Lists: top 5 items + count of remainder
- Errors: specific files and fixes, not full tracebacks
3. **What action should follow?**
- Always recommend the logical next step
- Make it concrete: "Fix metadata in specs/auth-spec.md" not "fix issues"
## Error Handling
**When scripts fail:**
```bash
python scripts/validate_doc_metadata.py 2>&1
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
# Return actionable error
echo "❌ Validation failed | Check: docs/ exists | Fix: python scripts/init_docs_structure.py"
fi
```
**When no documents found:**
```
No documents in [category] | Reason: empty directory | Next: Create first doc with template
```
## Critical Rules
### ALWAYS:
1. ✓ Run scripts with proper path context
2. ✓ Suppress stderr for clean parsing: `2>/dev/null`
3. ✓ Parse before returning (no raw script output)
4. ✓ Keep responses under 50 tokens
5. ✓ Include next action recommendation
6. ✓ Use emoji prefixes for visual parsing (✓ ❌ 📋 ⚠️ 📊 📦)
### NEVER:
1. ❌ Return full file contents to main agent
2. ❌ Return raw INDEX.md (summarize it)
3. ❌ Return full validation output (summarize errors)
4. ❌ Return more than 5 list items (summarize remainder)
5. ❌ Make the main agent parse verbose output
6. ❌ Forget the next action recommendation
## Examples
### Good Response
```
User: "Check documentation health"
You execute: python scripts/validate_doc_metadata.py 2>/dev/null
You return: "✓ 15 docs | 12 valid | 3 need fixes: specs/auth.md, analysis/perf.md, plans/q4.md | Next: Fix missing 'status' field"
Tokens: 32
Main agent: Knows exactly what to fix
```
### Bad Response
```
User: "Check documentation health"
You execute: python scripts/validate_doc_metadata.py
You return: [Full validation output with all file paths, all errors, verbose formatting]
Tokens: 500+
Main agent: Context polluted, overwhelmed with details
```
### Good Search Response
```
User: "Find authentication docs"
You execute: grep -r "tags:.*auth" docs/ | head -5
You return: "📋 4 docs match 'auth': specs/oauth-migration.md, analysis/auth-audit.md, plans/auth-refactor.md, ai_docs/auth-sdk.md | Next: Read specs/oauth-migration.md for current spec"
Tokens: 38
Main agent: Has what they need to proceed
```
### Bad Search Response
```
User: "Find authentication docs"
You execute: grep -r "auth" docs/
You return: [200 lines of grep output with every mention of 'auth']
Tokens: 1,200
Main agent: Can't find the actual documents in the noise
```
## Philosophy
You are a **filter**, not a **conduit**.
- **Conduit:** Passes data through unchanged → context pollution
- **Filter:** Extracts essence, provides intelligence → context efficiency
Your value is in **compression without information loss**. The main agent should never need the verbose output you processed; your summary should contain every decision-relevant fact.
## Integration with Main Workflows
When the main agent uses you as part of larger workflows:
```markdown
# Example: Documentation maintenance workflow
Main Agent: "Let's do documentation maintenance"
Main Agent → You: "Check validation status"
You: "✓ 20 docs | 18 valid | 2 issues | Next: Fix specs/api.md (missing status)"
Main Agent: "Fix the issues" [edits files]
Main Agent → You: "Re-validate"
You: "✓ All 20 documents valid | Ready to archive check"
Main Agent → You: "Check what should be archived"
You: "📦 3 docs ready: analysis/q2-review.md, specs/old-feature.md, plans/done-task.md | Next: Run archive"
Main Agent → You: "Archive them"
You: "✓ Archived 3 docs to archive/ | Index updated | Maintenance complete"
```
Your responses enable the main agent to orchestrate smoothly without getting bogged down in script output.
---
**Remember:** You are doc-librarian. Your job is to keep the main orchestration agent's context clean while providing precise, actionable intelligence about documentation operations. Every response should answer: "What's the state?" and "What should we do next?"
Operate with extreme precision. The main agent's effectiveness depends on your context discipline.

View File

@@ -0,0 +1,30 @@
---
title: Your Document Title Here
category: specs # One of: ai_docs, specs, analysis, plans, templates
status: draft # One of: draft, active, complete, archived
created: YYYY-MM-DD
last_updated: YYYY-MM-DD
tags: [] # Add relevant tags: [tag1, tag2, tag3]
---
# Your Document Title Here
## Overview
Brief description of what this document covers.
## [Section 1]
Content goes here...
## [Section 2]
More content...
## References
- Related docs, links, etc.
---
_Template usage: Copy this file and fill in the frontmatter and sections._

View File

@@ -0,0 +1,184 @@
# Document Archiving Criteria
Documents are automatically archived based on their category, status, and age. This ensures the active workspace remains focused on current, relevant documentation.
## Archiving Philosophy
**Goals:**
- Keep active directories focused on current work
- Preserve historical context in archive
- Automate routine maintenance while allowing manual control where needed
- Make archiving decisions deterministic and transparent
**Non-goals:**
- Deleting documents (everything is preserved)
- Aggressive archiving that loses important context
- One-size-fits-all rules (categories have different lifecycles)
## Category-Specific Rules
### specs/ - Specifications
**Auto-archive**: Yes
**Criteria**: Status is `complete` AND >90 days since last_updated
**Rationale**: Specs are valuable reference material even after implementation. 90 days allows for iteration, rollout, and bug fixes before archiving.
**Manual override**: Set `archivable_after` date in frontmatter to defer archiving.
**Example scenarios:**
- ✅ Archive: Feature spec marked `complete` 100 days ago
- ❌ Skip: Active spec being refined
- ❌ Skip: Complete spec only 30 days old (still in rollout phase)
### analysis/ - Investigation Outputs
**Auto-archive**: Yes
**Criteria**: Status is `complete` AND >60 days since last_updated
**Rationale**: Analysis documents are point-in-time investigations. Once the work is done and changes are implemented, they have less ongoing value. 60 days allows for follow-up work.
**Manual override**: Set `archivable_after` to keep important analyses active longer.
**Example scenarios:**
- ✅ Archive: Bug investigation completed 70 days ago
- ✅ Archive: Performance analysis from 2 months ago
- ❌ Skip: Ongoing investigation (status: `active` or `draft`)
### plans/ - Implementation Plans
**Auto-archive**: Yes
**Criteria**: Status is `complete` AND >30 days since last_updated
**Rationale**: Plans become stale quickly. Once implementation is done, plans are primarily historical. 30 days accounts for plan execution and retrospective.
**Manual override**: Set `archivable_after` for long-running initiatives.
**Example scenarios:**
- ✅ Archive: Migration plan completed 45 days ago
- ✅ Archive: Sprint plan from last month (status: `complete`)
- ❌ Skip: Ongoing multi-phase plan (status: `active`)
- ❌ Skip: Just-completed plan (20 days old)
### ai_docs/ - Reference Materials
**Auto-archive**: No
**Manual archiving only**
**Rationale**: Reference materials (SDKs, API docs, repo context) are meant to be persistent. These inform Claude Code's understanding and should only be archived manually when truly obsolete.
**When to manually archive:**
- SDK documentation for deprecated versions
- API references for sunset APIs
- Repository context for archived projects
**Example scenarios:**
- ❌ Auto-archive: Never, regardless of age or status
- ✅ Manual: Move OAuth 1.0 docs when OAuth 2.0 is fully adopted
- ✅ Manual: Archive legacy API docs after migration complete
### templates/ - Reusable Templates
**Auto-archive**: No
**Templates never auto-archive**
**Rationale**: Templates are meant to be reused indefinitely. They don't have a lifecycle in the same way as other documents.
**When to manually archive:**
- Deprecated templates that should no longer be used
- Templates replaced by improved versions
**Best practice**: Instead of archiving, update templates in place or clearly mark as deprecated in the template itself.
## Archive Structure
Archived documents are moved to `archive/` while preserving their category:
```
archive/
├── specs/
│ └── oauth2-migration-spec.md
├── analysis/
│ └── auth-perf-analysis.md
└── plans/
└── q3-migration-plan.md
```
This structure:
- Maintains categorical organization
- Allows easy browsing of archived content
- Prevents mixing of categories in archive
## Manual Archiving
To manually archive a document:
1. Move it to `archive/<category>/`
2. Update metadata:
```yaml
status: archived
archived_date: YYYY-MM-DD
archive_reason: "Manual archiving: <reason>"
```
3. Run `scripts/index_docs.py` to update the index
## Preventing Auto-Archiving
To prevent a document from being auto-archived:
**Option 1**: Keep status as `active` or `draft`
**Option 2**: Set explicit `archivable_after` date in frontmatter:
```yaml
archivable_after: 2025-12-31 # Don't archive until after this date
```
This is useful for:
- Long-running projects
- Reference specs that should remain active
- Documents with ongoing relevance despite completion
## Running the Archiving Script
```bash
# Dry run to see what would be archived
python scripts/archive_docs.py --dry-run
# Actually archive documents
python scripts/archive_docs.py
# Archive and update index
python scripts/archive_docs.py && python scripts/index_docs.py
```
**Best practice**: Run archiving periodically (weekly or monthly) as part of documentation maintenance.
## Retrieval from Archive
Archived documents are not deleted and can be retrieved by:
1. **Browsing**: Navigate to `archive/<category>/`
2. **Search**: Use grep or file search tools
3. **Index**: Check `INDEX.md` which includes archived documents
4. **Unarchiving**: Move document back to its category and update status
To unarchive a document:
```bash
# Move file back
mv archive/specs/old-spec.md specs/
# Update metadata
# Change status from 'archived' to 'active' or appropriate status
# Remove archived_date and archive_reason fields
```
## Monitoring
The archiving script provides a summary:
```
Archive Summary:
Documents scanned: 45
Documents archived: 3
Documents skipped: 42
Errors: 0
```
Keep an eye on:
- **Unexpected archives**: Documents archived sooner than expected
- **Errors**: Failed archiving operations
- **Zero archives**: May indicate metadata issues (e.g., status never set to `complete`)

View File

@@ -0,0 +1,125 @@
# Document Metadata Schema
All documents in the docs/ directory must include YAML frontmatter with the following structure.
## Required Fields
### title
- **Type**: String
- **Description**: Human-readable document title
- **Example**: `"OAuth2 Migration Specification"`
### category
- **Type**: String (enum)
- **Description**: Document category, must match the directory it's in
- **Valid values**:
- `ai_docs` - Reference materials for Claude Code
- `specs` - Feature and migration specifications
- `analysis` - Investigation outputs
- `plans` - Implementation plans
- `templates` - Reusable templates
- `archive` - Historical documents (auto-set on archiving)
- **Example**: `specs`
### status
- **Type**: String (enum)
- **Description**: Current lifecycle status of the document
- **Valid values**:
- `draft` - Document is being created
- `active` - Document is current and relevant
- `complete` - Work is done, kept for reference
- `archived` - Document has been archived
- **Example**: `active`
- **Lifecycle**: draft → active → complete → archived
### created
- **Type**: Date (YYYY-MM-DD)
- **Description**: Date the document was created
- **Example**: `2024-11-16`
### last_updated
- **Type**: Date (YYYY-MM-DD)
- **Description**: Date the document was last modified
- **Example**: `2024-11-16`
- **Note**: Should be updated whenever significant changes are made
## Optional Fields
### tags
- **Type**: List of strings
- **Description**: Keywords for categorization and search
- **Example**: `[auth, oauth2, security, migration]`
- **Best practice**: Use consistent tags across related documents
### archivable_after
- **Type**: Date (YYYY-MM-DD)
- **Description**: Explicit date after which the document can be auto-archived
- **Example**: `2025-02-16`
- **Note**: Overrides category-based archiving rules when set
### archived_date
- **Type**: Date (YYYY-MM-DD)
- **Description**: Date the document was archived (auto-set by archiving script)
- **Example**: `2024-12-01`
### archive_reason
- **Type**: String
- **Description**: Reason for archiving (auto-set by archiving script)
- **Example**: `"90 days old (threshold: 90)"`
### author
- **Type**: String
- **Description**: Document author or owner
- **Example**: `"Simon Lamb"`
### related_docs
- **Type**: List of strings (file paths)
- **Description**: Links to related documents
- **Example**: `["specs/auth-system/oauth2-spec.md", "plans/oauth2-rollout.md"]`
## Complete Example
```yaml
---
title: OAuth2 Migration Specification
category: specs
status: active
created: 2024-11-16
last_updated: 2024-11-16
tags: [auth, oauth2, security, migration]
author: Simon Lamb
related_docs:
- analysis/auth-system-audit.md
- plans/oauth2-implementation-plan.md
---
```
## Validation
Documents are validated using `scripts/validate_doc_metadata.py`. Run this before committing to ensure all metadata is correct.
## Metadata Updates
### When Creating a New Document
1. Copy from `assets/doc_template.md`
2. Fill in all required fields
3. Set status to `draft`
4. Set created and last_updated to current date
### When Updating a Document
1. Update `last_updated` to current date
2. Update `status` if lifecycle stage changes
3. Add relevant `tags` if needed
### When Completing Work
1. Set `status` to `complete`
2. Update `last_updated` to current date
3. Optionally set `archivable_after` if auto-archiving should be deferred
## Best Practices
1. **Consistent Tags**: Use a common vocabulary of tags across documents
2. **Accurate Status**: Keep status up to date as work progresses
3. **Related Docs**: Link to related documents for context and discoverability
4. **Regular Updates**: Update `last_updated` whenever making significant changes
5. **Descriptive Titles**: Use clear, specific titles that describe the content

View File

@@ -0,0 +1,262 @@
#!/usr/bin/env python3
"""
Automatically archive documents based on status, age, and category-specific rules.
Documents are moved to archive/ and their metadata is updated.
"""
import os
import sys
import re
import shutil
from pathlib import Path
from datetime import datetime, timedelta
import yaml
# Archiving rules by category (days since last_updated)
ARCHIVING_RULES = {
'specs': {
'complete_after_days': 90,
'auto_archive': True,
'require_complete_status': True
},
'analysis': {
'complete_after_days': 60,
'auto_archive': True,
'require_complete_status': True
},
'plans': {
'complete_after_days': 30,
'auto_archive': True,
'require_complete_status': True
},
'ai_docs': {
'auto_archive': False, # Manual archiving only for reference docs
},
'templates': {
'auto_archive': False, # Never auto-archive templates
}
}
def extract_frontmatter(file_path: Path) -> tuple[dict, str]:
"""Extract YAML frontmatter and remaining content from a markdown file."""
try:
content = file_path.read_text()
# Match YAML frontmatter between --- delimiters
match = re.match(r'^---\s*\n(.*?)\n---\s*\n(.*)', content, re.DOTALL)
if not match:
return {}, content
frontmatter_text = match.group(1)
body = match.group(2)
metadata = yaml.safe_load(frontmatter_text)
return (metadata if isinstance(metadata, dict) else {}), body
except Exception as e:
print(f"⚠️ Warning: Could not parse {file_path}: {e}")
return {}, ""
def update_frontmatter(file_path: Path, metadata: dict) -> None:
"""Update the YAML frontmatter in a markdown file."""
_, body = extract_frontmatter(file_path)
frontmatter = yaml.dump(metadata, default_flow_style=False, sort_keys=False)
new_content = f"---\n{frontmatter}---\n{body}"
file_path.write_text(new_content)
def should_archive(metadata: dict, category: str, file_modified: datetime) -> tuple[bool, str]:
"""
Determine if a document should be archived based on rules.
Returns (should_archive, reason).
"""
# Skip if already archived
if metadata.get('status') == 'archived':
return False, "already archived"
# Get category rules
rules = ARCHIVING_RULES.get(category, {})
# Skip if auto-archiving is disabled for this category
if not rules.get('auto_archive', False):
return False, f"{category} does not auto-archive"
# Check if status is 'complete' (required for most categories)
if rules.get('require_complete_status', False):
if metadata.get('status') != 'complete':
return False, "status is not 'complete'"
# Check age-based archiving
complete_after_days = rules.get('complete_after_days')
if complete_after_days:
last_updated = metadata.get('last_updated')
if not last_updated:
return False, "no last_updated date in metadata"
try:
if isinstance(last_updated, str):
updated_date = datetime.strptime(last_updated, '%Y-%m-%d').date()
else:
# YAML parser returns date objects, convert to date for comparison
updated_date = last_updated if hasattr(last_updated, 'year') else datetime.strptime(str(last_updated), '%Y-%m-%d').date()
days_old = (datetime.now().date() - updated_date).days
if days_old >= complete_after_days:
return True, f"{days_old} days old (threshold: {complete_after_days})"
except ValueError:
return False, "invalid last_updated date format"
return False, "no archiving criteria met"
def archive_document(file_path: Path, docs_path: Path, reason: str, dry_run: bool = False) -> bool:
"""
Archive a document by moving it to archive/ and updating its metadata.
Returns True if successful.
"""
try:
# Read metadata
metadata, body = extract_frontmatter(file_path)
# Determine archive path (preserve subdirectory structure)
relative_path = file_path.relative_to(docs_path)
category = relative_path.parts[0]
# Create archive subdirectory for the category
archive_path = docs_path / 'archive' / category
archive_path.mkdir(parents=True, exist_ok=True)
# Build destination path
archive_file = archive_path / file_path.name
# Handle name conflicts
if archive_file.exists():
base = archive_file.stem
suffix = archive_file.suffix
counter = 1
while archive_file.exists():
archive_file = archive_path / f"{base}_{counter}{suffix}"
counter += 1
if dry_run:
print(f" [DRY RUN] Would archive: {relative_path} → archive/{category}/{archive_file.name}")
print(f" Reason: {reason}")
return True
# Update metadata
metadata['status'] = 'archived'
metadata['archived_date'] = datetime.now().strftime('%Y-%m-%d')
metadata['archive_reason'] = reason
# Write updated file to archive
frontmatter = yaml.dump(metadata, default_flow_style=False, sort_keys=False)
new_content = f"---\n{frontmatter}---\n{body}"
archive_file.write_text(new_content)
# Remove original
file_path.unlink()
print(f" ✅ Archived: {relative_path} → archive/{category}/{archive_file.name}")
print(f" Reason: {reason}")
return True
except Exception as e:
print(f" ❌ Error archiving {file_path}: {e}")
return False
def scan_and_archive(docs_path: Path, dry_run: bool = False) -> dict:
"""
Scan all documents and archive those that meet criteria.
Returns statistics about the archiving operation.
"""
stats = {
'scanned': 0,
'archived': 0,
'skipped': 0,
'errors': 0
}
skip_files = {'README.md', 'INDEX.md', '.gitkeep'}
skip_dirs = {'archive'}
for category_dir in docs_path.iterdir():
if not category_dir.is_dir() or category_dir.name in skip_dirs or category_dir.name.startswith('.'):
continue
category_name = category_dir.name
# Find all markdown files
for md_file in category_dir.rglob('*.md'):
if md_file.name in skip_files:
continue
stats['scanned'] += 1
# Extract metadata
metadata, _ = extract_frontmatter(md_file)
file_stats = md_file.stat()
file_modified = datetime.fromtimestamp(file_stats.st_mtime)
# Check if should archive
should_arch, reason = should_archive(metadata, category_name, file_modified)
if should_arch:
success = archive_document(md_file, docs_path, reason, dry_run)
if success:
stats['archived'] += 1
else:
stats['errors'] += 1
else:
stats['skipped'] += 1
return stats
def main():
"""Main entry point."""
dry_run = '--dry-run' in sys.argv
# Get base path
args = [arg for arg in sys.argv[1:] if not arg.startswith('--')]
if args:
base_path = Path(args[0]).resolve()
else:
base_path = Path.cwd()
docs_path = base_path / 'docs'
if not docs_path.exists():
print(f"❌ Error: docs/ directory not found at {docs_path}")
sys.exit(1)
print(f"Scanning documents in: {docs_path}")
if dry_run:
print("🔍 DRY RUN MODE - No files will be modified")
print()
# Scan and archive
stats = scan_and_archive(docs_path, dry_run)
print()
print("=" * 60)
print("Archive Summary:")
print(f" Documents scanned: {stats['scanned']}")
print(f" Documents archived: {stats['archived']}")
print(f" Documents skipped: {stats['skipped']}")
print(f" Errors: {stats['errors']}")
print()
if not dry_run and stats['archived'] > 0:
print("💡 Tip: Run 'python scripts/index_docs.py' to update the documentation index")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,177 @@
#!/usr/bin/env python3
"""
Generate and update the INDEX.md file by scanning all documents in docs/.
Reads YAML frontmatter to extract metadata and organize the index.
"""
import os
import sys
import re
from pathlib import Path
from datetime import datetime
from collections import defaultdict
import yaml
def extract_frontmatter(file_path: Path) -> dict:
"""Extract YAML frontmatter from a markdown file."""
try:
content = file_path.read_text()
# Match YAML frontmatter between --- delimiters
match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
if not match:
return {}
frontmatter_text = match.group(1)
metadata = yaml.safe_load(frontmatter_text)
return metadata if isinstance(metadata, dict) else {}
except Exception as e:
print(f"⚠️ Warning: Could not parse frontmatter in {file_path}: {e}")
return {}
def get_file_stats(file_path: Path) -> dict:
"""Get file statistics."""
stats = file_path.stat()
return {
'size': stats.st_size,
'modified': datetime.fromtimestamp(stats.st_mtime)
}
def scan_documents(docs_path: Path) -> dict:
"""Scan all markdown documents in docs/ and extract metadata."""
categories = defaultdict(list)
# Skip these files/directories
skip_files = {'README.md', 'INDEX.md', '.gitkeep'}
skip_dirs = {'archive'} # We'll handle archive separately
for category_dir in docs_path.iterdir():
if not category_dir.is_dir() or category_dir.name.startswith('.'):
continue
category_name = category_dir.name
# Find all markdown files
for md_file in category_dir.rglob('*.md'):
if md_file.name in skip_files:
continue
# Extract metadata
metadata = extract_frontmatter(md_file)
stats = get_file_stats(md_file)
# Build document entry
relative_path = md_file.relative_to(docs_path)
doc_entry = {
'path': str(relative_path),
'title': metadata.get('title', md_file.stem),
'status': metadata.get('status', 'unknown'),
'created': metadata.get('created', 'unknown'),
'last_updated': metadata.get('last_updated', stats['modified'].strftime('%Y-%m-%d')),
'tags': metadata.get('tags', []),
'category': category_name,
'file_modified': stats['modified']
}
categories[category_name].append(doc_entry)
return categories
def generate_index(categories: dict) -> str:
"""Generate the INDEX.md content."""
total_docs = sum(len(docs) for docs in categories.values())
index_lines = [
"# Documentation Index",
"",
f"Auto-generated index of all documents. Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
"",
"Run `python scripts/index_docs.py` to regenerate this index.",
"",
"---",
"",
"## Summary",
"",
f"Total documents: {total_docs}",
""
]
# Add category breakdown
if categories:
index_lines.append("By category:")
for category in sorted(categories.keys()):
count = len(categories[category])
index_lines.append(f"- **{category}**: {count} document{'s' if count != 1 else ''}")
index_lines.append("")
index_lines.append("---")
index_lines.append("")
# Add documents by category
if not categories:
index_lines.append("_No documents found. Add documents to the category directories and regenerate the index._")
else:
for category in sorted(categories.keys()):
docs = categories[category]
docs.sort(key=lambda d: d['last_updated'], reverse=True)
index_lines.append(f"## {category.replace('_', ' ').title()}")
index_lines.append("")
for doc in docs:
# Format: [Title](path) - status | updated: date | tags
title_link = f"[{doc['title']}]({doc['path']})"
status_badge = f"**{doc['status']}**"
updated = f"updated: {doc['last_updated']}"
tags = f"tags: [{', '.join(doc['tags'])}]" if doc['tags'] else ""
parts = [title_link, status_badge, updated]
if tags:
parts.append(tags)
index_lines.append(f"- {' | '.join(parts)}")
index_lines.append("")
return '\n'.join(index_lines)
def main():
"""Main entry point."""
if len(sys.argv) > 1:
base_path = Path(sys.argv[1]).resolve()
else:
base_path = Path.cwd()
docs_path = base_path / 'docs'
if not docs_path.exists():
print(f"❌ Error: docs/ directory not found at {docs_path}")
print("Run 'python scripts/init_docs_structure.py' first to initialize the structure.")
sys.exit(1)
print(f"Scanning documents in: {docs_path}")
# Scan all documents
categories = scan_documents(docs_path)
# Generate index content
index_content = generate_index(categories)
# Write INDEX.md
index_path = docs_path / 'INDEX.md'
index_path.write_text(index_content)
total_docs = sum(len(docs) for docs in categories.values())
print(f"✅ Generated index with {total_docs} documents")
print(f"✅ Updated: {index_path}")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,156 @@
#!/usr/bin/env python3
"""
Initialize the docs/ directory structure for document lifecycle management.
Creates all required directories and initial README.md.
"""
import os
import sys
from pathlib import Path
from datetime import datetime
DIRECTORY_STRUCTURE = {
'ai_docs': 'Reference materials for Claude Code: SDKs, API docs, repo context',
'specs': 'Feature and migration specifications',
'analysis': 'Investigation outputs: bug hunting, optimization, cleanup',
'plans': 'Implementation plans from specs, analysis, or ad-hoc tasks',
'templates': 'Reusable document templates',
'archive': 'Historical and completed documents'
}
README_TEMPLATE = """# Documentation Structure
This directory contains project documentation organized by purpose and lifecycle stage.
## Directory Structure
{directory_descriptions}
## Document Lifecycle
Documents follow a lifecycle managed through YAML frontmatter:
1. **Draft** → Document is being created
2. **Active** → Document is current and relevant
3. **Complete** → Work is done, kept for reference
4. **Archived** → Moved to archive/ when no longer relevant
## Metadata Requirements
All documents should include YAML frontmatter:
```yaml
---
title: Document Title
category: specs|analysis|plans|ai_docs|templates
status: draft|active|complete|archived
created: YYYY-MM-DD
last_updated: YYYY-MM-DD
tags: [tag1, tag2]
---
```
See INDEX.md for a complete list of all documents.
## Temporary Documents
Ephemeral/scratch documents should be created in `/tmp` or system temp directories,
NOT in this docs/ directory. The docs/ directory is for persistent documentation only.
---
Last updated: {timestamp}
"""
def create_directory_structure(base_path: Path) -> None:
"""Create the docs directory structure."""
docs_path = base_path / 'docs'
# Create main docs directory
docs_path.mkdir(exist_ok=True)
print(f"✅ Created: {docs_path}")
# Create category directories
for directory, description in DIRECTORY_STRUCTURE.items():
dir_path = docs_path / directory
dir_path.mkdir(exist_ok=True)
print(f"✅ Created: {dir_path}")
# Create .gitkeep for empty directories
gitkeep = dir_path / '.gitkeep'
if not any(dir_path.iterdir()):
gitkeep.touch()
def create_readme(base_path: Path) -> None:
"""Create the README.md file."""
docs_path = base_path / 'docs'
readme_path = docs_path / 'README.md'
# Format directory descriptions
descriptions = []
for directory, description in DIRECTORY_STRUCTURE.items():
descriptions.append(f"- **{directory}/** - {description}")
readme_content = README_TEMPLATE.format(
directory_descriptions='\n'.join(descriptions),
timestamp=datetime.now().strftime('%Y-%m-%d')
)
readme_path.write_text(readme_content)
print(f"✅ Created: {readme_path}")
def create_index(base_path: Path) -> None:
"""Create initial INDEX.md file."""
docs_path = base_path / 'docs'
index_path = docs_path / 'INDEX.md'
index_content = f"""# Documentation Index
Auto-generated index of all documents. Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Run `python scripts/index_docs.py` to regenerate this index.
---
## Summary
Total documents: 0
---
_No documents found. Add documents to the category directories and regenerate the index._
"""
index_path.write_text(index_content)
print(f"✅ Created: {index_path}")
def main():
"""Main entry point."""
if len(sys.argv) > 1:
base_path = Path(sys.argv[1]).resolve()
else:
base_path = Path.cwd()
print(f"Initializing docs structure at: {base_path}")
print()
create_directory_structure(base_path)
create_readme(base_path)
create_index(base_path)
print()
print("🎉 Documentation structure initialized successfully!")
print()
print("Next steps:")
print("1. Add documents to the category directories")
print("2. Run 'python scripts/index_docs.py' to update the index")
print("3. Run 'python scripts/archive_docs.py' periodically to maintain the archive")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""
Validate that all documents have proper YAML frontmatter metadata.
Reports documents with missing or invalid metadata.
"""
import sys
import re
from pathlib import Path
from datetime import datetime
import yaml
REQUIRED_FIELDS = ['title', 'category', 'status', 'created', 'last_updated']
VALID_STATUSES = ['draft', 'active', 'complete', 'archived']
VALID_CATEGORIES = ['ai_docs', 'specs', 'analysis', 'plans', 'templates', 'archive']
def extract_frontmatter(file_path: Path) -> dict:
"""Extract YAML frontmatter from a markdown file."""
try:
content = file_path.read_text()
# Match YAML frontmatter between --- delimiters
match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
if not match:
return None # No frontmatter found
frontmatter_text = match.group(1)
metadata = yaml.safe_load(frontmatter_text)
return metadata if isinstance(metadata, dict) else None
except Exception as e:
return {'_error': str(e)}
def validate_date(date_str: str) -> bool:
"""Validate date format (YYYY-MM-DD)."""
try:
datetime.strptime(str(date_str), '%Y-%m-%d')
return True
except (ValueError, TypeError):
return False
def validate_metadata(metadata: dict, category_from_path: str) -> list[str]:
"""
Validate metadata against requirements.
Returns list of validation errors (empty if valid).
"""
errors = []
if metadata is None:
return ["No YAML frontmatter found"]
if '_error' in metadata:
return [f"Failed to parse frontmatter: {metadata['_error']}"]
# Check required fields
for field in REQUIRED_FIELDS:
if field not in metadata:
errors.append(f"Missing required field: {field}")
# Validate status
if 'status' in metadata:
if metadata['status'] not in VALID_STATUSES:
errors.append(f"Invalid status '{metadata['status']}'. Must be one of: {', '.join(VALID_STATUSES)}")
# Validate category
if 'category' in metadata:
if metadata['category'] not in VALID_CATEGORIES:
errors.append(f"Invalid category '{metadata['category']}'. Must be one of: {', '.join(VALID_CATEGORIES)}")
elif metadata['category'] != category_from_path:
errors.append(f"Category mismatch: metadata says '{metadata['category']}' but file is in '{category_from_path}/'")
# Validate dates
for date_field in ['created', 'last_updated']:
if date_field in metadata:
if not validate_date(metadata[date_field]):
errors.append(f"Invalid {date_field} date format. Must be YYYY-MM-DD")
# Validate tags (optional but must be list if present)
if 'tags' in metadata:
if not isinstance(metadata['tags'], list):
errors.append("Tags must be a list")
return errors
def scan_and_validate(docs_path: Path) -> dict:
"""
Scan all documents and validate their metadata.
Returns validation results.
"""
results = {
'valid': [],
'invalid': [],
'no_frontmatter': [],
'total': 0
}
skip_files = {'README.md', 'INDEX.md', '.gitkeep'}
for category_dir in docs_path.iterdir():
if not category_dir.is_dir() or category_dir.name.startswith('.'):
continue
category_name = category_dir.name
# Find all markdown files
for md_file in category_dir.rglob('*.md'):
if md_file.name in skip_files:
continue
results['total'] += 1
relative_path = md_file.relative_to(docs_path)
# Extract and validate metadata
metadata = extract_frontmatter(md_file)
errors = validate_metadata(metadata, category_name)
if not errors:
results['valid'].append(str(relative_path))
else:
results['invalid'].append({
'path': str(relative_path),
'errors': errors
})
return results
def main():
"""Main entry point."""
if len(sys.argv) > 1:
base_path = Path(sys.argv[1]).resolve()
else:
base_path = Path.cwd()
docs_path = base_path / 'docs'
if not docs_path.exists():
print(f"❌ Error: docs/ directory not found at {docs_path}")
sys.exit(1)
print(f"Validating documents in: {docs_path}")
print()
# Scan and validate
results = scan_and_validate(docs_path)
# Display results
print("=" * 60)
print("Validation Results:")
print(f" Total documents: {results['total']}")
print(f" ✅ Valid: {len(results['valid'])}")
print(f" ❌ Invalid: {len(results['invalid'])}")
print()
if results['invalid']:
print("Invalid Documents:")
print()
for item in results['invalid']:
print(f" 📄 {item['path']}")
for error in item['errors']:
print(f"{error}")
print()
if results['valid'] and not results['invalid']:
print("🎉 All documents have valid metadata!")
# Exit with error code if any invalid documents
sys.exit(1 if results['invalid'] else 0)
if __name__ == '__main__':
main()