zhongwei/gh-cskiro-claudex-claude-code-tools

Files

Zhongwei Li 4e8a12140c Initial commit

2025-11-29 18:16:51 +08:00

17 KiB

Raw Blame History

CLAUDE.md Auditor

Comprehensive validation and optimization tool for CLAUDE.md memory files in Claude Code

An Anthropic Skill that analyzes CLAUDE.md configuration files against official Anthropic documentation, community best practices, and academic research on LLM context optimization.

Quick Start

With Claude Code (Recommended)

# 1. Copy skill to your skills directory
cp -r claude-md-auditor ~/.claude/skills/

# 2. Use in Claude Code
claude
> Audit my CLAUDE.md using the claude-md-auditor skill

Direct Script Usage

# Markdown audit report
python claude-md-auditor/scripts/analyzer.py ./CLAUDE.md

# JSON report (for CI/CD)
python claude-md-auditor/scripts/report_generator.py ./CLAUDE.md json > audit.json

# Generate refactored CLAUDE.md
python claude-md-auditor/scripts/report_generator.py ./CLAUDE.md refactored > CLAUDE_refactored.md

What Does It Do?

Validates Against Three Sources

Source	Authority	Examples
✅ Official Anthropic Docs	Highest	Memory hierarchy, import syntax, "keep them lean"
💡 Community Best Practices	Medium	100-300 line target, 80/20 rule, maintenance cadence
🔬 Academic Research	Medium	"Lost in the middle" positioning, token optimization

Detects Issues

🚨 CRITICAL: Secrets (API keys, passwords), security vulnerabilities
⚠️ HIGH: Generic content, excessive verbosity, vague instructions
📋 MEDIUM: Outdated info, broken links, duplicate sections
ℹ️ LOW: Formatting issues, organizational improvements

Generates Output

Markdown Report: Human-readable audit with detailed findings
JSON Report: Machine-readable for CI/CD integration
Refactored CLAUDE.md: Production-ready improved version

Features

🔒 Security Validation (CRITICAL)

Detects exposed secrets using pattern matching:

API keys (OpenAI, AWS, generic)
Tokens and passwords
Database connection strings
Private keys (PEM format)
Internal IP addresses

Why Critical: CLAUDE.md files are often committed to git. Exposed secrets can leak through history, PRs, logs, or backups.

✅ Official Compliance

Validates against docs.claude.com guidance:

File length ("keep them lean")
Generic content (Claude already knows basic programming)
Import syntax (@path/to/import, max 5 hops)
Vague instructions (specific vs. ambiguous)
Proper markdown structure

💡 Best Practices

Evaluates community recommendations:

Optimal file length (100-300 lines)
Token usage (< 3,000 tokens / <2% of 200K context)
Organization (sections, headers, priority markers)
Maintenance (update dates, version info)
Duplicate or conflicting content

🔬 Research Optimization

Applies academic insights:

"Lost in the Middle" positioning (critical info at top/bottom)
Token efficiency and context utilization
Chunking and information architecture
Attention pattern optimization

Based On:

"Lost in the Middle" (Liu et al., 2023, TACL)
Claude-specific performance studies
Context awareness research (MIT/Google Cloud AI, 2024)

Installation

Option 1: Claude Code Skills (Recommended)

# Clone or copy to your skills directory
mkdir -p ~/.claude/skills
cp -r claude-md-auditor ~/.claude/skills/

# Verify installation
ls ~/.claude/skills/claude-md-auditor/SKILL.md

Option 2: Standalone Scripts

# Clone repository
git clone https://github.com/cskiro/annex.git
cd annex/claude-md-auditor

# Run directly (Python 3.8+ required, no dependencies)
python scripts/analyzer.py path/to/CLAUDE.md

Requirements

Python: 3.8 or higher
Dependencies: None (uses standard library only)
Claude Code: Any version with Skills support (optional)

Usage

Basic Audit

With Claude Code:

Audit my CLAUDE.md using the claude-md-auditor skill.

Direct:

python scripts/analyzer.py ./CLAUDE.md

Output:

============================================================
CLAUDE.md Audit Results: ./CLAUDE.md
============================================================

Overall Health Score: 78/100
Security Score: 100/100
Official Compliance Score: 75/100
Best Practices Score: 70/100
Research Optimization Score: 85/100

============================================================
Findings Summary:
  🚨 Critical: 0
  ⚠️  High: 2
  📋 Medium: 3
  ℹ️  Low: 5
============================================================

Generate JSON Report

For CI/CD Integration:

python scripts/report_generator.py ./CLAUDE.md json > audit.json

Example Output:

{
  "metadata": {
    "file": "./CLAUDE.md",
    "generated_at": "2025-10-26 10:30:00",
    "tier": "Project"
  },
  "scores": {
    "overall": 78,
    "security": 100,
    "official_compliance": 75,
    "critical_count": 0,
    "high_count": 2
  },
  "findings": [...]
}

Generate Refactored File

Create improved CLAUDE.md:

python scripts/report_generator.py ./CLAUDE.md refactored > CLAUDE_refactored.md

This generates a production-ready file with:

✅ Optimal structure (critical at top, reference at bottom)
✅ Research-based positioning ("lost in the middle" mitigation)
✅ All security issues removed
✅ Best practices applied
✅ Inline comments for maintenance

Output Examples

Markdown Report Structure

# CLAUDE.md Audit Report

## Executive Summary
- Overall health: 78/100 (Good)
- Status: ⚠️ **HIGH PRIORITY** - Address this sprint
- Total findings: 10 (0 critical, 2 high, 3 medium, 5 low)

## Score Dashboard
| Category | Score | Status |
|----------|-------|--------|
| Security | 100/100 | ✅ Excellent |
| Official Compliance | 75/100 | 🟢 Good |
| Best Practices | 70/100 | 🟢 Good |

## Detailed Findings

### ⚠️ HIGH Priority

#### 1. Generic Programming Content Detected
**Category**: Official Compliance
**Source**: Official Guidance

**Description**: File contains generic React documentation

**Impact**: Wastes context window. Official guidance:
"Don't include basic programming concepts Claude already understands"

**Remediation**: Remove generic content. Focus on project-specific standards.

---

JSON Report Structure

{
  "metadata": {
    "file_path": "./CLAUDE.md",
    "line_count": 245,
    "token_estimate": 3240,
    "context_usage_200k": 1.62,
    "tier": "Project"
  },
  "scores": {
    "overall": 78,
    "security": 100,
    "official_compliance": 75,
    "best_practices": 70,
    "research_optimization": 85
  },
  "findings": [
    {
      "severity": "high",
      "category": "official_compliance",
      "title": "Generic Programming Content Detected",
      "description": "File contains generic React documentation",
      "line_number": 42,
      "source": "official",
      "remediation": "Remove generic content..."
    }
  ]
}

Reference Documentation

Complete Validation Criteria

All checks are documented in the reference/ directory:

Document	Content	Source
official_guidance.md	Complete official Anthropic documentation	docs.claude.com
best_practices.md	Community recommendations and field experience	Practitioners
research_insights.md	Academic research on LLM context optimization	Peer-reviewed papers
anti_patterns.md	Catalog of common mistakes and violations	Field observations

Key References

Memory Management - Official docs
"Lost in the Middle" - Academic paper
Claude Code Best Practices - Anthropic blog

CI/CD Integration

GitHub Actions Example

name: CLAUDE.md Audit

on:
  pull_request:
    paths:
      - '**/CLAUDE.md'

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run CLAUDE.md Audit
        run: |
          python claude-md-auditor/scripts/analyzer.py CLAUDE.md \
            --format json \
            --output audit.json

      - name: Check Critical Issues
        run: |
          CRITICAL=$(python -c "import json; print(json.load(open('audit.json'))['summary']['critical'])")

          if [ "$CRITICAL" -gt 0 ]; then
            echo "❌ Critical issues found in CLAUDE.md"
            exit 1
          fi

          echo "✅ CLAUDE.md validation passed"

Pre-Commit Hook

#!/bin/bash
# .git/hooks/pre-commit

if git diff --cached --name-only | grep -q "CLAUDE.md"; then
  echo "Validating CLAUDE.md..."

  python claude-md-auditor/scripts/analyzer.py CLAUDE.md > /tmp/audit.txt

  # Check exit code or parse output
  if grep -q "🚨 Critical: [1-9]" /tmp/audit.txt; then
    echo "❌ CLAUDE.md has critical issues"
    cat /tmp/audit.txt
    exit 1
  fi

  echo "✅ CLAUDE.md validation passed"
fi

VS Code Task

Add to .vscode/tasks.json:

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "Audit CLAUDE.md",
      "type": "shell",
      "command": "python",
      "args": [
        "${workspaceFolder}/claude-md-auditor/scripts/analyzer.py",
        "${workspaceFolder}/CLAUDE.md"
      ],
      "group": "test",
      "presentation": {
        "reveal": "always",
        "panel": "new"
      }
    }
  ]
}

Understanding Scores

Overall Health Score (0-100)

Range	Status	Action
90-100	✅ Excellent	Minor optimizations only
75-89	🟢 Good	Some improvements recommended
60-74	🟡 Fair	Schedule improvements this quarter
40-59	🟠 Poor	Significant issues to address
0-39	🔴 Critical	Immediate action required

Severity Levels

🚨 CRITICAL: Security risks (fix immediately, within 24 hours)
⚠️ HIGH: Significant issues (fix this sprint, within 2 weeks)
📋 MEDIUM: Moderate improvements (schedule for next quarter)
ℹ️ LOW: Minor optimizations (backlog)

Category Scores

Security: Should always be 100 (any security issue is critical)
Official Compliance: Aim for 80+ (follow Anthropic guidance)
Best Practices: 70+ is good (community recommendations are flexible)
Research Optimization: 60+ is acceptable (optimizations, not requirements)

Real-World Examples

Example 1: Security Violation

Before (Anti-Pattern):

# CLAUDE.md

## API Configuration
- API Key: sk-1234567890abcdefghijklmnop
- Database: postgres://admin:pass@10.0.1.42/db

Audit Finding:

🚨 CRITICAL: API Key Detected
Line: 4
Impact: Security breach risk. Secrets exposed in git history.
Remediation:
1. Remove the API key immediately
2. Rotate the compromised credential
3. Use environment variables (.env file)
4. Clean git history if committed

After (Fixed):

# CLAUDE.md

## API Configuration
- API keys: Stored in .env (see .env.example for template)
- Database: Use AWS Secrets Manager connection string
- Access: Contact team lead for credentials

Example 2: Generic Content