Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:28:57 +08:00
commit 995d97df60
11 changed files with 3977 additions and 0 deletions

View File

@@ -0,0 +1,623 @@
# Context Management Reference Guide
Complete reference for multi-agent workflow context management.
**Quick Navigation:**
- [Context Management Guide](#context-management-guide) - Save/restore operations
- [Workflow Patterns](#workflow-patterns) - Sequential, Parallel, Conditional, Resumable
- [State Persistence](#state-persistence) - Serialization, compression, versioning
- [Performance](#performance) - Optimization strategies
---
## Overview
Context management enables multi-agent workflows by preserving state across agent transitions.
**Core Capabilities:**
- Save agent state at any workflow phase
- Restore state to resume or handoff work
- Validate context integrity
- Optimize context size
- Handle version compatibility
**Key Metrics:**
- Average context size: 5-100 KB
- Serialization time: <200ms (99th percentile)
- Restore success rate: 99.8%
- Context retention: 30 days default
---
## Context Management Guide
### Save Operation
**When to Save:**
- Phase completion (design done, implementation done)
- Agent handoff (passing work to next agent)
- Checkpoint creation (before risky operations)
- Long-running workflows (every 30 minutes)
- User-requested pause
**What to Save:**
```json
{
"version": "1.0",
"workflow_id": "unique-identifier",
"timestamp": "2025-01-15T10:30:00Z",
"current_agent": "agent-saving-context",
"next_agent": "agent-to-receive-context",
"phase": "current-workflow-phase",
"files_modified": ["src/api.ts", "tests/api.test.ts"],
"decisions": ["Use REST API", "PostgreSQL for storage"],
"pending_actions": ["Write tests", "Deploy to staging"],
"context_summary": "Implemented user API endpoints",
"constraints": ["Must support v1 API", "< 200ms response"],
"conversation_history": [...],
"checkpoints": {...},
"error_log": [...]
}
```
**Save Best Practices:**
- ✅ Include all decisions with rationale
- ✅ List pending actions in priority order
- ✅ Document constraints explicitly
- ✅ Keep context summary under 500 chars
- ✅ Use relative file paths
- ❌ Don't include sensitive data (API keys, passwords)
- ❌ Don't use absolute file paths
- ❌ Don't save redundant conversation history
### Restore Operation
**When to Restore:**
- Starting new session on existing workflow
- Agent handoff receiving work
- Rollback after error
- Resuming paused workflow
- Debugging previous decisions
**Restore Process:**
```python
# 1. Load context file
with open('.claude/context/workflow-id.json') as f:
context = json.load(f)
# 2. Validate schema
validate_context_schema(context)
# 3. Check version compatibility
if context['version'] != CURRENT_VERSION:
context = migrate_context(context)
# 4. Verify files exist
for file_path in context['files_modified']:
if not os.path.exists(file_path):
log_warning(f"Missing file: {file_path}")
# 5. Reconstruct state
workflow.load_state(context)
# 6. Resume from phase
workflow.resume(context['phase'])
```
**Restore Validation:**
- [ ] Schema validation passes
- [ ] Required fields present
- [ ] Version compatible
- [ ] Referenced files exist
- [ ] No data corruption
### Context Size Optimization
**Target Sizes:**
- **Simple workflows (1-2 agents):** 5-20 KB
- **Medium workflows (3-5 agents):** 20-100 KB
- **Complex workflows (6+ agents):** 100-500 KB
- **Very large (requires optimization):** >500 KB
**Optimization Strategies:**
1. **Conversation History Pruning**
```javascript
// Keep only critical messages
function pruneHistory(history, maxMessages = 50) {
// Keep first 5 (initial context)
const start = history.slice(0, 5);
// Keep last 20 (recent context)
const recent = history.slice(-20);
// Keep decision points in middle
const decisions = history.filter(msg =>
msg.content.includes('DECISION:') ||
msg.content.includes('ERROR:')
);
return [...start, ...decisions, ...recent];
}
```
2. **Checkpoint Compression**
```python
import json
import gzip
def compress_checkpoint(checkpoint):
"""Compress checkpoint data."""
json_str = json.dumps(checkpoint)
compressed = gzip.compress(json_str.encode('utf-8'))
return base64.b64encode(compressed).decode('utf-8')
def decompress_checkpoint(compressed_data):
"""Decompress checkpoint data."""
compressed = base64.b64decode(compressed_data)
json_str = gzip.decompress(compressed).decode('utf-8')
return json.loads(json_str)
```
3. **External Large Data**
```json
{
"files_modified": ["src/api.ts"],
"large_file_refs": {
"src/api.ts": {
"size": 15000,
"hash": "sha256:abc123...",
"storage": ".claude/context/files/api-ts-snapshot.txt"
}
}
}
```
**Size Reduction Checklist:**
- [ ] Remove completed pending_actions
- [ ] Prune conversation_history (keep <50 messages)
- [ ] Compress checkpoints (use gzip)
- [ ] Externalize large file contents
- [ ] Remove redundant decision descriptions
- [ ] Deduplicate error_log entries
---
## Workflow Patterns
### Pattern 1: Sequential Handoff
**Use Case:** Linear workflow where each agent completes before next starts.
```
Agent A (Design) → Context Save → Agent B (Implement)
→ Context Save → Agent C (Test)
→ Context Save → Agent D (Deploy)
```
**Context Flow:**
```json
// Agent A saves
{
"current_agent": "architect",
"next_agent": "tdd-typescript",
"phase": "design-complete",
"decisions": ["Use REST API", "PostgreSQL"],
"pending_actions": ["Implement endpoints", "Write tests"]
}
// Agent B loads, works, saves
{
"current_agent": "tdd-typescript",
"next_agent": "test-generator",
"phase": "implementation-complete",
"decisions": [...previous, "FastAPI framework", "Pydantic validation"],
"pending_actions": ["Generate integration tests", "Test error cases"]
}
```
**Benefits:**
- Simple to understand and debug
- Clear responsibility handoff
- Easy to track progress
- Supports different agent types
**When to Use:**
- Feature development pipelines
- Code review workflows
- Deployment processes
### Pattern 2: Parallel Execution
**Use Case:** Multiple agents work concurrently on independent tasks.
```
┌─ Agent B (Frontend) ─┐
Agent A (Design) ──┼─ Agent C (Backend) ──┼─→ Agent D (Integration)
└─ Agent D (Tests) ─┘
```
**Context Management:**
```json
// Parent context spawns 3 parallel contexts
{
"workflow_id": "feature-xyz",
"phase": "parallel-execution",
"parallel_tasks": [
{
"task_id": "frontend-impl",
"agent": "react-developer",
"context_ref": ".claude/context/feature-xyz-frontend.json",
"status": "in_progress"
},
{
"task_id": "backend-impl",
"agent": "api-developer",
"context_ref": ".claude/context/feature-xyz-backend.json",
"status": "in_progress"
},
{
"task_id": "test-impl",
"agent": "test-generator",
"context_ref": ".claude/context/feature-xyz-tests.json",
"status": "completed"
}
]
}
```
**Merge Strategy:**
```javascript
async function mergeParallelContexts(parentContext) {
const results = await Promise.all(
parentContext.parallel_tasks.map(task =>
loadContext(task.context_ref)
)
);
return {
...parentContext,
phase: "parallel-complete",
files_modified: results.flatMap(r => r.files_modified),
decisions: results.flatMap(r => r.decisions),
errors: results.flatMap(r => r.error_log)
};
}
```
### Pattern 3: Conditional Routing
**Use Case:** Workflow branches based on conditions or results.
```
Agent A (Analysis) → Context Save → Condition Check
├─ If security issue → Security Agent
├─ If performance issue → Perf Agent
└─ If quality issue → Code Review Agent
```
**Conditional Logic:**
```json
{
"workflow_id": "code-analysis",
"phase": "analysis-complete",
"current_agent": "code-analyzer",
"next_agent": null, // Determined by routing logic
"routing_conditions": {
"has_security_issues": true,
"has_performance_issues": false,
"code_quality_score": 85
},
"routing_rules": [
{
"condition": "has_security_issues == true",
"next_agent": "security-analyzer",
"priority": 1
},
{
"condition": "has_performance_issues == true",
"next_agent": "performance-optimizer",
"priority": 2
},
{
"condition": "code_quality_score < 80",
"next_agent": "code-quality-analyzer",
"priority": 3
}
]
}
```
### Pattern 4: Resumable Long-Running
**Use Case:** Workflows that span multiple sessions or require human approval.
**Checkpoint Strategy:**
```json
{
"workflow_id": "migration-v2-to-v3",
"phase": "migration-in-progress",
"current_agent": "migration-orchestrator",
"checkpoints": [
{
"id": "checkpoint-1",
"timestamp": "2025-01-15T10:00:00Z",
"phase": "schema-migrated",
"files_modified": ["db/migrations/001_v3_schema.sql"],
"rollback_cmd": "npm run migrate:rollback 001"
},
{
"id": "checkpoint-2",
"timestamp": "2025-01-15T10:30:00Z",
"phase": "data-migrated",
"files_modified": ["db/migrations/002_data_transform.sql"],
"rollback_cmd": "npm run migrate:rollback 002"
}
],
"resume_from": "checkpoint-2",
"pending_actions": [
"Migrate user preferences (50% complete)",
"Update API endpoints",
"Deploy to staging"
]
}
```
---
## State Persistence
### Serialization Formats
**JSON (Default)**
```json
{
"version": "1.0",
"workflow_id": "example",
"timestamp": "2025-01-15T10:30:00Z"
}
```
- ✅ Human-readable
- ✅ Wide tool support
- ✅ Schema validation available
- ❌ Larger file size
- **Use for:** Most workflows
**Compressed JSON (Large Contexts)**
```python
import gzip
import json
context_json = json.dumps(context)
compressed = gzip.compress(context_json.encode())
# Reduces size by 60-80%
```
- ✅ 60-80% size reduction
- ✅ Still JSON underneath
- ❌ Not human-readable without decompression
- **Use for:** Contexts >100KB
### Version Management
**Schema Versions:**
```json
{
"version": "1.0", // Breaking changes increment major
"schema_version": "1.2" // Non-breaking changes increment minor
}
```
**Migration Example:**
```javascript
function migrateContext(context) {
const migrations = {
'1.0': migrateFrom1_0,
'1.1': migrateFrom1_1,
'2.0': migrateFrom2_0
};
let current = context;
const currentVersion = current.version;
const targetVersion = CURRENT_VERSION;
// Apply migrations sequentially
for (const [version, migrateFn] of Object.entries(migrations)) {
if (compareVersions(currentVersion, version) < 0) {
current = migrateFn(current);
}
}
return current;
}
function migrateFrom1_0(context) {
// v1.0 → v1.1: Add 'constraints' field
return {
...context,
version: '1.1',
constraints: []
};
}
```
### Error Recovery
**Corrupted Context Detection:**
```python
def validate_context_integrity(context_path):
"""Validate context file integrity."""
try:
with open(context_path) as f:
context = json.load(f)
# Check required fields
required = ['version', 'workflow_id', 'timestamp']
for field in required:
if field not in context:
raise ValueError(f"Missing required field: {field}")
# Validate timestamp format
datetime.fromisoformat(context['timestamp'])
# Check file references
for file_path in context.get('files_modified', []):
if not os.path.exists(file_path):
log_warning(f"Referenced file missing: {file_path}")
return True
except json.JSONDecodeError as e:
log_error(f"Invalid JSON: {e}")
return False
except Exception as e:
log_error(f"Validation failed: {e}")
return False
```
**Rollback Strategies:**
```json
{
"workflow_id": "example",
"checkpoints": [
{
"id": "checkpoint-1",
"timestamp": "2025-01-15T10:00:00Z",
"context_snapshot": ".claude/context/example-cp1.json",
"git_commit": "abc123"
}
],
"rollback_to_checkpoint": function(checkpoint_id) {
const checkpoint = this.checkpoints.find(cp => cp.id === checkpoint_id);
// Restore context
const snapshot = loadContext(checkpoint.context_snapshot);
// Restore code (optional)
if (checkpoint.git_commit) {
execSync(`git reset --hard ${checkpoint.git_commit}`);
}
return snapshot;
}
}
```
---
## Performance
### Optimization Benchmarks
| Operation | Target | P50 | P95 | P99 |
|-----------|--------|-----|-----|-----|
| Save (small <20KB) | <50ms | 12ms | 45ms | 80ms |
| Save (large >100KB) | <200ms | 85ms | 180ms | 250ms |
| Restore (small) | <100ms | 35ms | 90ms | 150ms |
| Restore (large) | <500ms | 220ms | 480ms | 600ms |
| Validate | <50ms | 18ms | 40ms | 65ms |
### Optimization Techniques
**1. Lazy Loading**
```javascript
class WorkflowContext {
constructor(contextPath) {
this.metadata = this.loadMetadata(contextPath);
this._fullContext = null; // Load only when needed
}
loadMetadata(path) {
// Load only essential fields
const context = JSON.parse(fs.readFileSync(path));
return {
version: context.version,
workflow_id: context.workflow_id,
phase: context.phase,
current_agent: context.current_agent
};
}
get conversationHistory() {
if (!this._fullContext) {
this._fullContext = this.loadFullContext();
}
return this._fullContext.conversation_history;
}
}
```
**2. Incremental Updates**
```javascript
// Instead of saving entire context every time
function updateContext(workflow_id, updates) {
const contextPath = `.claude/context/${workflow_id}.json`;
const context = JSON.parse(fs.readFileSync(contextPath));
// Apply updates
Object.assign(context, updates);
context.timestamp = new Date().toISOString();
// Atomic write with temp file
const tempPath = `${contextPath}.tmp`;
fs.writeFileSync(tempPath, JSON.stringify(context, null, 2));
fs.renameSync(tempPath, contextPath);
}
```
**3. Caching**
```python
from functools import lru_cache
@lru_cache(maxsize=10)
def load_context(workflow_id):
"""Cache recently loaded contexts."""
context_path = f".claude/context/{workflow_id}.json"
with open(context_path) as f:
return json.load(f)
# Invalidate cache on updates
def save_context(workflow_id, context):
load_context.cache_clear() # Invalidate cache
with open(f".claude/context/{workflow_id}.json", 'w') as f:
json.dump(context, f, indent=2)
```
---
## Quick Reference
**Essential Context Fields:**
-`version`, `workflow_id`, `timestamp` (required)
-`current_agent`, `phase` (required)
- ⚠️ `next_agent`, `pending_actions` (important)
- `context_summary`, `decisions` (helpful)
**Size Targets:**
- Simple: 5-20 KB
- Medium: 20-100 KB
- Complex: 100-500 KB
- Optimize if: >500 KB
**Performance Targets:**
- Save: <200ms (P99)
- Restore: <500ms (P99)
- Validate: <50ms (P99)
**Common Patterns:**
1. Sequential handoff (linear workflows)
2. Parallel execution (independent tasks)
3. Conditional routing (branching logic)
4. Resumable long-running (multi-session)
---
**Reference Version**: 1.0
**Last Updated**: 2025-01-15

View File

@@ -0,0 +1,880 @@
# Workflow Best Practices
Production-ready patterns and practices for multi-agent workflows.
**Quick Navigation:**
- [Context Design](#context-design)
- [Handoff Patterns](#handoff-patterns)
- [Error Handling](#error-handling)
- [Performance](#performance)
- [Security](#security)
- [Testing](#testing)
---
## Context Design
### Principle 1: Minimal Context Size
**Why:** Smaller contexts = faster operations, lower memory, easier debugging.
**Practice:**
```javascript
// ❌ BAD: Include everything
const context = {
workflow_id: 'feature-123',
// ... required fields ...
all_git_commits: gitLog(), // Huge!
full_codebase: readAllFiles(), // Unnecessary!
raw_logs: getSystemLogs() // Too much!
};
// ✅ GOOD: Only essential information
const context = {
workflow_id: 'feature-123',
version: '1.0',
timestamp: new Date().toISOString(),
current_agent: 'backend-architect',
phase: 'design-complete',
// Only modified files
files_modified: ['src/api/users.ts', 'tests/api.test.ts'],
// Only key decisions
decisions: [
'Use PostgreSQL for user data',
'Implement JWT authentication',
'Rate limit: 100 req/min per user'
],
// Only next actions
pending_actions: [
'Implement user endpoints',
'Add authentication middleware',
'Write integration tests'
]
};
```
**Guidelines:**
- Target <100KB for 80% of workflows
- Include only information next agent needs
- Reference external data instead of embedding
- Prune completed actions regularly
---
### Principle 2: Explicit Over Implicit
**Why:** Clear intent prevents misunderstandings and errors.
**Practice:**
```javascript
// ❌ BAD: Implicit assumptions
const context = {
files_modified: ['api.ts'],
next_agent: 'test-generator'
// Unclear: What should test-generator do?
// Unclear: Are there constraints?
};
// ✅ GOOD: Explicit requirements
const context = {
files_modified: ['src/api/users.ts'],
next_agent: 'test-generator',
pending_actions: [
{
action: 'generate_integration_tests',
target: 'src/api/users.ts',
requirements: [
'Test all CRUD operations',
'Test authentication',
'Test error cases (401, 403, 404, 500)',
'Test rate limiting'
],
constraints: [
'Use Vitest framework',
'Coverage must be >90%',
'Tests must be idempotent'
]
}
],
context_summary: 'User API implemented with JWT auth. Need comprehensive tests before deployment.'
};
```
**Checklist:**
- [ ] Next agent explicitly stated
- [ ] Actions clearly described
- [ ] Requirements enumerated
- [ ] Constraints documented
- [ ] Success criteria defined
---
### Principle 3: Versioned Evolution
**Why:** Workflows evolve. Version tracking enables migration and compatibility.
**Practice:**
```json
{
"version": "2.1",
"schema_version": "2.1.0",
"version_history": [
{
"version": "1.0",
"timestamp": "2025-01-10T10:00:00Z",
"changes": "Initial context structure"
},
{
"version": "2.0",
"timestamp": "2025-01-12T14:30:00Z",
"changes": "Added metadata and constraints fields"
},
{
"version": "2.1",
"timestamp": "2025-01-15T09:00:00Z",
"changes": "Added parallel task support"
}
],
"migration_path": {
"1.0_to_2.0": "Add metadata and constraints with defaults",
"2.0_to_2.1": "Add parallel_tasks array if missing"
}
}
```
**Guidelines:**
- Increment major version for breaking changes
- Increment minor version for new fields
- Document migration path
- Maintain backward compatibility for 2 versions
---
## Handoff Patterns
### Pattern 1: Clean Handoff
**When:** Sequential workflow where each agent completes fully before handoff.
**Implementation:**
```javascript
class WorkflowOrchestrator {
async executeCleanHandoff(agents) {
let context = this.initializeContext();
for (const agent of agents) {
console.log(`Starting agent: ${agent.name}`);
// Load previous context
const agentContext = {
...context,
current_agent: agent.name,
next_agent: agents[agents.indexOf(agent) + 1]?.name || null
};
// Execute agent
const result = await agent.execute(agentContext);
// Validate completion
if (result.status !== 'completed') {
throw new Error(`Agent ${agent.name} did not complete successfully`);
}
// Update context with results
context = {
...context,
phase: result.phase,
files_modified: [
...context.files_modified,
...result.files_modified
],
decisions: [
...context.decisions,
...result.decisions
],
pending_actions: result.pending_actions
};
// Save checkpoint
await this.saveContext(context);
console.log(`✅ Completed agent: ${agent.name}`);
}
return context;
}
}
```
**Best Practices:**
- Save context after each agent
- Validate agent completion
- Accumulate decisions and files
- Clear completed actions
---
### Pattern 2: Conditional Handoff
**When:** Next agent depends on results or conditions.
**Implementation:**
```javascript
class ConditionalRouter {
async routeBasedOnResults(context, analysisResults) {
// Define routing rules
const rules = [
{
condition: () => analysisResults.security_score < 70,
next_agent: 'security-analyzer',
priority: 1,
reason: 'Security vulnerabilities detected'
},
{
condition: () => analysisResults.performance_score < 80,
next_agent: 'performance-optimizer',
priority: 2,
reason: 'Performance issues detected'
},
{
condition: () => analysisResults.test_coverage < 85,
next_agent: 'test-generator',
priority: 3,
reason: 'Insufficient test coverage'
}
];
// Find highest priority match
const match = rules
.filter(rule => rule.condition())
.sort((a, b) => a.priority - b.priority)[0];
if (match) {
// Update context for routing
return {
...context,
next_agent: match.next_agent,
routing_reason: match.reason,
routing_data: analysisResults
};
}
// No issues - proceed to deployment
return {
...context,
next_agent: 'deployment-agent',
routing_reason: 'All checks passed'
};
}
}
```
---
### Pattern 3: Parallel Handoff with Merge
**When:** Multiple agents can work concurrently, then merge results.
**Implementation:**
```javascript
class ParallelOrchestrator {
async executeParallel(parentContext, tasks) {
// Spawn parallel contexts
const parallelContexts = tasks.map(task => ({
...parentContext,
workflow_id: `${parentContext.workflow_id}-${task.id}`,
current_agent: task.agent,
task_id: task.id,
task_scope: task.scope
}));
// Execute all tasks in parallel
const results = await Promise.allSettled(
parallelContexts.map((ctx, i) =>
this.executeTask(tasks[i], ctx)
)
);
// Check for failures
const failures = results.filter(r => r.status === 'rejected');
if (failures.length > 0) {
throw new Error(
`${failures.length} parallel tasks failed:\n` +
failures.map(f => f.reason).join('\n')
);
}
// Merge successful results
return this.mergeResults(
parentContext,
results.map(r => r.value)
);
}
mergeResults(parent, childContexts) {
return {
...parent,
phase: 'parallel-complete',
// Union of all modified files
files_modified: [
...new Set(
childContexts.flatMap(ctx => ctx.files_modified)
)
],
// Concatenate all decisions
decisions: childContexts.flatMap(ctx => ctx.decisions),
// Merge pending actions
pending_actions: childContexts.flatMap(ctx =>
ctx.pending_actions
),
// Collect errors from all tasks
error_log: childContexts.flatMap(ctx =>
ctx.error_log || []
),
// Track parallel execution
parallel_execution: {
tasks: childContexts.map(ctx => ({
task_id: ctx.task_id,
agent: ctx.current_agent,
duration_ms: ctx.execution_time,
status: 'completed'
}))
}
};
}
}
```
---
## Error Handling
### Pattern 1: Graceful Degradation
**Practice:**
```javascript
async function executeWithGracefulDegradation(agent, context) {
try {
// Attempt full execution
return await agent.execute(context);
} catch (error) {
console.error(`Agent ${agent.name} failed:`, error);
// Try to save partial progress
const partialContext = {
...context,
phase: `${context.phase}-failed`,
error_log: [
...(context.error_log || []),
{
timestamp: new Date().toISOString(),
agent: agent.name,
error: error.message,
stack: error.stack,
context_at_failure: {
phase: context.phase,
files_modified: context.files_modified
}
}
]
};
// Save failure context
await saveContext(partialContext);
// Determine if recoverable
if (error.code === 'RECOVERABLE') {
// Suggest recovery action
partialContext.pending_actions.unshift({
action: 'recover_from_error',
error_id: partialContext.error_log.length - 1,
recovery_strategy: error.recoveryStrategy
});
return partialContext;
}
// Unrecoverable - rethrow
throw error;
}
}
```
---
### Pattern 2: Checkpoint-Based Rollback
**Practice:**
```javascript
class CheckpointManager {
async createCheckpoint(context, label) {
const checkpoint = {
id: `checkpoint-${Date.now()}`,
label: label,
timestamp: new Date().toISOString(),
context_snapshot: JSON.parse(JSON.stringify(context)),
git_commit: execSync('git rev-parse HEAD').toString().trim()
};
// Save checkpoint
const checkpointPath = `.claude/context/checkpoints/${context.workflow_id}/${checkpoint.id}.json`;
await fs.promises.mkdir(path.dirname(checkpointPath), { recursive: true });
await fs.promises.writeFile(
checkpointPath,
JSON.stringify(checkpoint, null, 2)
);
// Add to context
context.checkpoints = context.checkpoints || [];
context.checkpoints.push({
id: checkpoint.id,
label: label,
timestamp: checkpoint.timestamp
});
return checkpoint.id;
}
async rollbackToCheckpoint(context, checkpointId) {
// Load checkpoint
const checkpointPath = `.claude/context/checkpoints/${context.workflow_id}/${checkpointId}.json`;
const checkpoint = JSON.parse(
await fs.promises.readFile(checkpointPath, 'utf-8')
);
// Restore code state
if (checkpoint.git_commit) {
execSync(`git reset --hard ${checkpoint.git_commit}`);
console.log(`✅ Rolled back code to ${checkpoint.git_commit}`);
}
// Restore context state
const restoredContext = checkpoint.context_snapshot;
// Add rollback metadata
restoredContext.rollback_history = restoredContext.rollback_history || [];
restoredContext.rollback_history.push({
timestamp: new Date().toISOString(),
from_phase: context.phase,
to_checkpoint: checkpointId,
reason: 'manual_rollback'
});
return restoredContext;
}
}
// Usage
const checkpointMgr = new CheckpointManager();
// Before risky operation
const checkpointId = await checkpointMgr.createCheckpoint(
context,
'before-database-migration'
);
try {
await performDatabaseMigration();
} catch (error) {
// Rollback on failure
context = await checkpointMgr.rollbackToCheckpoint(context, checkpointId);
}
```
---
## Performance
### Optimization 1: Lazy Context Loading
**Practice:**
```typescript
interface ContextMetadata {
version: string;
workflow_id: string;
phase: string;
timestamp: string;
current_agent: string;
}
class LazyWorkflowContext {
private _metadata: ContextMetadata | null = null;
private _fullContext: any = null;
constructor(private contextPath: string) {}
// Fast metadata access (doesn't load full context)
async getMetadata(): Promise<ContextMetadata> {
if (!this._metadata) {
const json = await fs.promises.readFile(this.contextPath, 'utf-8');
const context = JSON.parse(json);
this._metadata = {
version: context.version,
workflow_id: context.workflow_id,
phase: context.phase,
timestamp: context.timestamp,
current_agent: context.current_agent
};
}
return this._metadata;
}
// Load full context only when needed
async getFullContext() {
if (!this._fullContext) {
const json = await fs.promises.readFile(this.contextPath, 'utf-8');
this._fullContext = JSON.parse(json);
}
return this._fullContext;
}
// Access specific fields without full load
async getField<T>(fieldName: string): Promise<T> {
const metadata = await this.getMetadata();
if (fieldName in metadata) {
return (metadata as any)[fieldName];
}
// Need full context for this field
const full = await this.getFullContext();
return full[fieldName];
}
}
// Usage - very fast for metadata-only operations
const ctx = new LazyWorkflowContext('.claude/context/workflow-123.json');
const phase = await ctx.getMetadata(); // Fast - only loads 5 fields
console.log(phase.phase); // 'implementation-complete'
// Only load full context when actually needed
if (needFullHistory) {
const full = await ctx.getFullContext(); // Slower
processHistory(full.conversation_history);
}
```
---
### Optimization 2: Incremental Updates
**Practice:**
```javascript
class IncrementalContextManager {
async updateContextField(workflowId, fieldName, value) {
const contextPath = `.claude/context/${workflowId}.json`;
// Read current context
const json = await fs.promises.readFile(contextPath, 'utf-8');
const context = JSON.parse(json);
// Update only changed field
context[fieldName] = value;
context.timestamp = new Date().toISOString();
// Atomic write
const tempPath = `${contextPath}.tmp`;
await fs.promises.writeFile(tempPath, JSON.stringify(context, null, 2));
await fs.promises.rename(tempPath, contextPath);
}
async appendToArrayField(workflowId, fieldName, item) {
const contextPath = `.claude/context/${workflowId}.json`;
const json = await fs.promises.readFile(contextPath, 'utf-8');
const context = JSON.parse(json);
// Append to array
if (!Array.isArray(context[fieldName])) {
context[fieldName] = [];
}
context[fieldName].push(item);
context.timestamp = new Date().toISOString();
// Save
const tempPath = `${contextPath}.tmp`;
await fs.promises.writeFile(tempPath, JSON.stringify(context, null, 2));
await fs.promises.rename(tempPath, contextPath);
}
}
// Usage - faster than loading/saving entire context
const mgr = new IncrementalContextManager();
await mgr.updateContextField('workflow-123', 'phase', 'testing-complete');
await mgr.appendToArrayField('workflow-123', 'decisions',
'Use Playwright for E2E tests'
);
```
---
## Security
### Practice 1: No Secrets in Context
**Rule:** Never save API keys, passwords, tokens, or sensitive data in context.
**Implementation:**
```javascript
// ❌ BAD: Secrets in context
const context = {
workflow_id: 'deploy-prod',
database_url: 'postgresql://user:password@host:5432/db', // LEAKED!
api_key: 'sk_live_abc123...', // LEAKED!
aws_secret: 'wJalrXUtnFEMI/K7MDENG...' // LEAKED!
};
// ✅ GOOD: Reference secrets by ID
const context = {
workflow_id: 'deploy-prod',
secrets: {
database_url: { ref: 'env:DATABASE_URL' },
api_key: { ref: 'doppler:STRIPE_API_KEY' },
aws_secret: { ref: 'vault:aws/secret_key' }
}
};
// Agent resolves secrets at runtime
function resolveSecrets(context) {
return Object.entries(context.secrets || {}).reduce((acc, [key, value]) => {
const [provider, secretId] = value.ref.split(':');
switch (provider) {
case 'env':
acc[key] = process.env[secretId];
break;
case 'doppler':
acc[key] = fetchFromDoppler(secretId);
break;
case 'vault':
acc[key] = fetchFromVault(secretId);
break;
}
return acc;
}, {});
}
```
---
### Practice 2: Context Access Control
**Implementation:**
```javascript
class SecureContextManager {
constructor(workflowId, userId) {
this.workflowId = workflowId;
this.userId = userId;
}
async saveContext(context) {
// Check write permission
if (!await this.canWrite(this.userId, this.workflowId)) {
throw new Error('Permission denied: cannot save context');
}
// Sanitize before saving
const sanitized = this.sanitizeContext(context);
// Save with restricted permissions
const contextPath = this.getContextPath();
await fs.promises.writeFile(
contextPath,
JSON.stringify(sanitized, null, 2),
{ mode: 0o600 } // Owner read/write only
);
// Log access
await this.logAccess('write', contextPath);
}
async loadContext() {
// Check read permission
if (!await this.canRead(this.userId, this.workflowId)) {
throw new Error('Permission denied: cannot read context');
}
const contextPath = this.getContextPath();
const json = await fs.promises.readFile(contextPath, 'utf-8');
// Log access
await this.logAccess('read', contextPath);
return JSON.parse(json);
}
sanitizeContext(context) {
// Remove sensitive fields
const { password, token, secret, ...safe } = context;
// Redact sensitive patterns
const sanitized = JSON.stringify(safe);
const redacted = sanitized
.replace(/sk_live_[a-zA-Z0-9]+/g, 'sk_live_REDACTED')
.replace(/password":\s*"[^"]+"/g, 'password": "REDACTED"');
return JSON.parse(redacted);
}
}
```
---
## Testing
### Practice 1: Context Validation Tests
**Implementation:**
```typescript
import { describe, test, expect } from 'vitest';
describe('Context Validation', () => {
test('should have all required fields', () => {
const context = createTestContext();
expect(context).toHaveProperty('version');
expect(context).toHaveProperty('workflow_id');
expect(context).toHaveProperty('timestamp');
expect(context).toHaveProperty('current_agent');
expect(context).toHaveProperty('phase');
});
test('should have valid timestamp format', () => {
const context = createTestContext();
// Should be ISO 8601
expect(() => {
new Date(context.timestamp);
}).not.toThrow();
// Should be recent (within 1 hour)
const timestamp = new Date(context.timestamp);
const now = new Date();
const diffMs = now.getTime() - timestamp.getTime();
expect(diffMs).toBeLessThan(3600000); // 1 hour
});
test('should not contain secrets', () => {
const context = createTestContext();
const json = JSON.stringify(context);
// Check for common secret patterns
expect(json).not.toMatch(/sk_live_[a-zA-Z0-9]+/); // Stripe
expect(json).not.toMatch(/password/i);
expect(json).not.toMatch(/api[_-]?key/i);
});
test('should be under size limit', () => {
const context = createTestContext();
const size = JSON.stringify(context).length;
expect(size).toBeLessThan(500 * 1024); // 500KB
});
});
```
---
### Practice 2: Workflow Integration Tests
**Implementation:**
```typescript
describe('Workflow Integration', () => {
test('should complete sequential workflow', async () => {
const workflow = new WorkflowOrchestrator();
const agents = [
new DesignAgent(),
new ImplementAgent(),
new TestAgent(),
new DeployAgent()
];
const result = await workflow.executeSequential(agents);
expect(result.phase).toBe('deployed');
expect(result.files_modified.length).toBeGreaterThan(0);
expect(result.decisions.length).toBeGreaterThan(0);
});
test('should handle agent failure gracefully', async () => {
const workflow = new WorkflowOrchestrator();
const agents = [
new DesignAgent(),
new FailingAgent(), // Will throw error
new TestAgent()
];
await expect(
workflow.executeSequential(agents)
).rejects.toThrow();
// Context should be saved up to failure point
const context = await loadContext(workflow.workflowId);
expect(context.phase).toBe('design-complete');
expect(context.error_log).toHaveLength(1);
});
});
```
---
## Quick Reference
**Context Design:**
- ✅ Keep contexts <100KB
- ✅ Be explicit, avoid assumptions
- ✅ Version your schema
- ❌ Don't embed large files
- ❌ Don't include secrets
**Handoff Patterns:**
- Sequential: Clean, linear progression
- Parallel: Concurrent tasks with merge
- Conditional: Route based on results
**Error Handling:**
- Save partial progress
- Create checkpoints before risky operations
- Implement rollback strategies
**Performance:**
- Use lazy loading for metadata
- Implement incremental updates
- Cache frequently accessed contexts
**Security:**
- Never save secrets in context
- Use secret references instead
- Implement access control
- Sanitize before saving
**Testing:**
- Validate required fields
- Check for secrets
- Verify size limits
- Test workflow integration
---
**Best Practices Version**: 1.0
**Last Updated**: 2025-01-15