Initial commit

2025-11-29 18:20:21 +08:00
commit bbbaf7acad
63 changed files with 38552 additions and 0 deletions
--- a/commands/optimize/analyze.md
+++ b/commands/optimize/analyze.md
@@ -0,0 +1,494 @@
+# Performance Analysis Operation
+
+You are executing the **analyze** operation to perform comprehensive performance analysis and identify bottlenecks across all application layers.
+
+## Parameters
+
+**Received**: `$ARGUMENTS` (after removing 'analyze' operation name)
+
+Expected format: `target:"area" [scope:"frontend|backend|database|infrastructure|all"] [metrics:"baseline|compare"] [baseline:"version-or-timestamp"]`
+
+**Parameter definitions**:
+- `target` (required): Application or component to analyze (e.g., "user dashboard", "checkout flow", "production app")
+- `scope` (optional): Layer to focus on - `frontend`, `backend`, `database`, `infrastructure`, or `all` (default: `all`)
+- `metrics` (optional): Metrics mode - `baseline` (establish baseline), `compare` (compare against baseline) (default: `baseline`)
+- `baseline` (optional): Baseline version or timestamp for comparison (e.g., "v1.2.0", "2025-10-01")
+
+## Workflow
+
+### 1. Define Analysis Scope
+
+Based on the `target` and `scope` parameters, determine what to analyze:
+
+**Scope: all** (comprehensive analysis):
+- Frontend: Page load, rendering, bundle size
+- Backend: API response times, throughput, error rates
+- Database: Query performance, connection pools, cache hit rates
+- Infrastructure: Resource utilization, scaling efficiency
+
+**Scope: frontend**:
+- Web Vitals (LCP, FID, CLS, INP, TTFB, FCP)
+- Bundle sizes and composition
+- Network waterfall analysis
+- Runtime performance (memory, CPU)
+
+**Scope: backend**:
+- API endpoint response times (p50, p95, p99)
+- Throughput and concurrency handling
+- Error rates and types
+- Dependency latency (database, external APIs)
+
+**Scope: database**:
+- Query execution times
+- Index effectiveness
+- Connection pool utilization
+- Cache hit rates
+
+**Scope: infrastructure**:
+- CPU, memory, disk, network utilization
+- Container/instance metrics
+- Auto-scaling behavior
+- CDN effectiveness
+
+### 2. Establish Baseline Metrics
+
+Run comprehensive performance profiling:
+
+**Frontend Profiling**:
+```bash
+# Lighthouse audit
+npx lighthouse [url] --output=json --output-path=./perf-baseline-lighthouse.json
+
+# Bundle analysis
+npm run build -- --stats
+npx webpack-bundle-analyzer dist/stats.json --mode static --report ./perf-baseline-bundle.html
+
+# Check for unused dependencies
+npx depcheck > ./perf-baseline-deps.txt
+
+# Runtime profiling (if applicable)
+# Use browser DevTools Performance tab
+```
+
+**Backend Profiling**:
+```bash
+# API response times (if monitoring exists)
+# Check APM dashboard or logs
+
+# Profile Node.js application
+node --prof app.js
+# Then process the profile
+node --prof-process isolate-*.log > perf-baseline-profile.txt
+
+# Memory snapshot
+node --inspect app.js
+# Take heap snapshot via Chrome DevTools
+
+# Load test to get baseline throughput
+npx k6 run --duration 60s --vus 50 load-test.js
+```
+
+**Database Profiling**:
+```sql
+-- PostgreSQL: Enable pg_stat_statements
+CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
+
+-- Capture slow queries
+SELECT
+  query,
+  calls,
+  total_exec_time,
+  mean_exec_time,
+  max_exec_time,
+  stddev_exec_time
+FROM pg_stat_statements
+ORDER BY mean_exec_time DESC
+LIMIT 50;
+
+-- Check index usage
+SELECT
+  schemaname,
+  tablename,
+  indexname,
+  idx_scan,
+  idx_tup_read,
+  idx_tup_fetch
+FROM pg_stat_user_indexes
+ORDER BY idx_scan ASC;
+
+-- Table statistics
+SELECT
+  schemaname,
+  tablename,
+  n_live_tup,
+  n_dead_tup,
+  last_vacuum,
+  last_autovacuum
+FROM pg_stat_user_tables;
+```
+
+**Infrastructure Profiling**:
+```bash
+# Container metrics (if using Docker/Kubernetes)
+docker stats --no-stream
+
+# Or for Kubernetes
+kubectl top nodes
+kubectl top pods
+
+# Server resource utilization
+top -b -n 1 | head -20
+free -h
+df -h
+iostat -x 1 5
+```
+
+### 3. Identify Bottlenecks
+
+Analyze collected metrics to identify performance bottlenecks:
+
+**Bottleneck Detection Matrix**:
+
+| Layer | Indicator | Severity | Common Causes |
+|-------|-----------|----------|---------------|
+| **Frontend** | LCP > 2.5s | High | Large images, render-blocking resources, slow TTFB |
+| **Frontend** | Bundle > 1MB | Medium | Unused dependencies, no code splitting, large libraries |
+| **Frontend** | CLS > 0.1 | Medium | Missing dimensions, dynamic content injection |
+| **Frontend** | INP > 200ms | High | Long tasks, unoptimized event handlers |
+| **Backend** | p95 > 1000ms | High | Slow queries, N+1 problems, synchronous I/O |
+| **Backend** | p99 > 5000ms | Critical | Database locks, resource exhaustion, cascading failures |
+| **Backend** | Error rate > 1% | High | Unhandled errors, timeout issues, dependency failures |
+| **Database** | Query > 500ms | High | Missing indexes, full table scans, complex joins |
+| **Database** | Cache hit < 80% | Medium | Insufficient cache size, poor cache strategy |
+| **Database** | Connection pool exhaustion | Critical | Connection leaks, insufficient pool size |
+| **Infrastructure** | CPU > 80% | High | Insufficient resources, inefficient algorithms |
+| **Infrastructure** | Memory > 90% | Critical | Memory leaks, oversized caches, insufficient resources |
+
+**Prioritization Framework**:
+
+1. **Critical** - Immediate impact on user experience or system stability
+2. **High** - Significant performance degradation
+3. **Medium** - Noticeable but not blocking
+4. **Low** - Minor optimization opportunity
+
+### 4. Create Optimization Opportunity Matrix
+
+For each identified bottleneck, assess:
+
+**Impact Assessment**:
+- Performance improvement potential (low/medium/high)
+- Implementation effort (hours/days)
+- Risk level (low/medium/high)
+- Dependencies on other optimizations
+
+**Optimization Opportunities**:
+
+```markdown
+## Opportunity Matrix
+
+| ID | Layer | Issue | Impact | Effort | Priority | Recommendation |
+|----|-------|-------|--------|--------|----------|----------------|
+| 1 | Database | Missing index on users.email | High | 1h | Critical | Add index immediately |
+| 2 | Frontend | Bundle size 2.5MB | High | 4h | High | Implement code splitting |
+| 3 | Backend | N+1 query in /api/users | High | 2h | High | Add eager loading |
+| 4 | Infrastructure | No CDN for static assets | Medium | 3h | Medium | Configure CloudFront |
+| 5 | Frontend | Unoptimized images | Medium | 2h | Medium | Add next/image or similar |
+```
+
+### 5. Generate Performance Profile
+
+Create a comprehensive performance profile:
+
+**Performance Snapshot**:
+```json
+{
+  "timestamp": "2025-10-14T12:00:00Z",
+  "version": "v1.2.3",
+  "environment": "production",
+  "metrics": {
+    "frontend": {
+      "lcp": 3200,
+      "fid": 150,
+      "cls": 0.15,
+      "ttfb": 800,
+      "bundle_size": 2500000
+    },
+    "backend": {
+      "p50_response_time": 120,
+      "p95_response_time": 850,
+      "p99_response_time": 2100,
+      "throughput_rps": 450,
+      "error_rate": 0.02
+    },
+    "database": {
+      "avg_query_time": 45,
+      "slow_query_count": 23,
+      "cache_hit_rate": 0.72,
+      "connection_pool_utilization": 0.85
+    },
+    "infrastructure": {
+      "cpu_utilization": 0.68,
+      "memory_utilization": 0.75,
+      "disk_io_wait": 0.03
+    }
+  },
+  "bottlenecks": [
+    {
+      "id": "BTL001",
+      "layer": "frontend",
+      "severity": "high",
+      "issue": "Large LCP time",
+      "metric": "lcp",
+      "value": 3200,
+      "threshold": 2500,
+      "impact": "Poor user experience on initial page load"
+    }
+  ]
+}
+```
+
+### 6. Recommend Next Steps
+
+Based on analysis results, recommend:
+
+**Immediate Actions** (Critical bottlenecks):
+- List specific optimizations with highest ROI
+- Estimated improvement for each
+- Implementation order
+
+**Short-term Actions** (High priority):
+- Optimizations to tackle in current sprint
+- Potential dependencies
+
+**Long-term Actions** (Medium/Low priority):
+- Architectural improvements
+- Infrastructure upgrades
+- Technical debt reduction
+
+## Output Format
+
+```markdown
+# Performance Analysis Report: [Target]
+
+**Analysis Date**: [Date and time]
+**Analyzed Version**: [Version or commit]
+**Environment**: [production/staging/development]
+**Scope**: [all/frontend/backend/database/infrastructure]
+
+## Executive Summary
+
+[2-3 paragraph summary of overall findings, critical issues, and recommended priorities]
+
+## Baseline Metrics
+
+### Frontend Performance
+| Metric | Value | Status | Threshold |
+|--------|-------|--------|-----------|
+| LCP (Largest Contentful Paint) | 3.2s | ⚠️ Needs Improvement | < 2.5s |
+| FID (First Input Delay) | 150ms | ✅ Good | < 100ms |
+| CLS (Cumulative Layout Shift) | 0.15 | ⚠️ Needs Improvement | < 0.1 |
+| TTFB (Time to First Byte) | 800ms | ⚠️ Needs Improvement | < 600ms |
+| Bundle Size (gzipped) | 2.5MB | ❌ Poor | < 500KB |
+
+### Backend Performance
+| Metric | Value | Status | Threshold |
+|--------|-------|--------|-----------|
+| P50 Response Time | 120ms | ✅ Good | < 200ms |
+| P95 Response Time | 850ms | ⚠️ Needs Improvement | < 500ms |
+| P99 Response Time | 2100ms | ❌ Poor | < 1000ms |
+| Throughput | 450 req/s | ✅ Good | > 400 req/s |
+| Error Rate | 2% | ⚠️ Needs Improvement | < 1% |
+
+### Database Performance
+| Metric | Value | Status | Threshold |
+|--------|-------|--------|-----------|
+| Avg Query Time | 45ms | ✅ Good | < 100ms |
+| Slow Query Count (>500ms) | 23 queries | ❌ Poor | 0 queries |
+| Cache Hit Rate | 72% | ⚠️ Needs Improvement | > 85% |
+| Connection Pool Utilization | 85% | ⚠️ Needs Improvement | < 75% |
+
+### Infrastructure Performance
+| Metric | Value | Status | Threshold |
+|--------|-------|--------|-----------|
+| CPU Utilization | 68% | ✅ Good | < 75% |
+| Memory Utilization | 75% | ⚠️ Needs Improvement | < 70% |
+| Disk I/O Wait | 3% | ✅ Good | < 5% |
+
+## Bottlenecks Identified
+
+### Critical Priority
+
+#### BTL001: Frontend - Large LCP Time (3.2s)
+**Impact**: High - Users experience slow initial page load
+**Cause**:
+- Large hero image (1.2MB) loaded synchronously
+- Render-blocking CSS and JavaScript
+- No image optimization
+
+**Recommendation**:
+1. Optimize and lazy-load hero image (reduce to <200KB)
+2. Defer non-critical CSS/JS
+3. Implement resource hints (preload critical assets)
+**Expected Improvement**: LCP reduction to ~1.8s (44% improvement)
+
+#### BTL002: Database - Missing Index on users.email
+**Impact**: High - Slow user lookup queries affecting multiple endpoints
+**Queries Affected**:
+```sql
+SELECT * FROM users WHERE email = $1;  -- 450ms avg
+```
+**Recommendation**:
+```sql
+CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
+```
+**Expected Improvement**: Query time reduction to <10ms (95% improvement)
+
+### High Priority
+
+#### BTL003: Backend - N+1 Query Problem in /api/users Endpoint
+**Impact**: High - p95 response time of 850ms
+**Cause**:
+```javascript
+// Current (N+1 problem)
+const users = await User.findAll();
+for (const user of users) {
+  user.posts = await Post.findAll({ where: { userId: user.id } });
+}
+```
+**Recommendation**:
+```javascript
+// Optimized (eager loading)
+const users = await User.findAll({
+  include: [{ model: Post, as: 'posts' }]
+});
+```
+**Expected Improvement**: Response time reduction to ~200ms (75% improvement)
+
+#### BTL004: Frontend - Bundle Size 2.5MB
+**Impact**: High - Slow initial load especially on mobile
+**Cause**:
+- No code splitting
+- Unused dependencies (moment.js, lodash full import)
+- No tree shaking
+
+**Recommendation**:
+1. Implement code splitting by route
+2. Replace moment.js with date-fns (92% smaller)
+3. Use tree-shakeable imports
+```javascript
+// Before
+import _ from 'lodash';
+import moment from 'moment';
+
+// After
+import { debounce, throttle } from 'lodash-es';
+import { format, parseISO } from 'date-fns';
+```
+**Expected Improvement**: Bundle reduction to ~800KB (68% improvement)
+
+### Medium Priority
+
+[Additional bottlenecks with similar format]
+
+## Optimization Opportunity Matrix
+
+| ID | Layer | Issue | Impact | Effort | Priority | Est. Improvement |
+|----|-------|-------|--------|--------|----------|------------------|
+| BTL001 | Frontend | Large LCP | High | 4h | Critical | 44% LCP reduction |
+| BTL002 | Database | Missing index | High | 1h | Critical | 95% query speedup |
+| BTL003 | Backend | N+1 queries | High | 2h | High | 75% response time reduction |
+| BTL004 | Frontend | Bundle size | High | 6h | High | 68% bundle reduction |
+| BTL005 | Infrastructure | No CDN | Medium | 3h | Medium | 30% TTFB reduction |
+| BTL006 | Database | Low cache hit | Medium | 4h | Medium | 15% query improvement |
+
+## Profiling Data
+
+### Frontend Profiling Results
+[Include relevant Lighthouse report summary, bundle analysis, etc.]
+
+### Backend Profiling Results
+[Include relevant API response time distribution, slow endpoint list, etc.]
+
+### Database Profiling Results
+[Include slow query details, table scan frequency, etc.]
+
+### Infrastructure Profiling Results
+[Include resource utilization charts, scaling behavior, etc.]
+
+## Recommended Action Plan
+
+### Phase 1: Critical Fixes (Immediate - 1-2 days)
+1. **Add missing database indexes** (BTL002) - 1 hour
+   - Estimated improvement: 95% reduction in user lookup queries
+2. **Optimize hero image and implement lazy loading** (BTL001) - 4 hours
+   - Estimated improvement: 44% LCP reduction
+
+### Phase 2: High-Priority Optimizations (This week - 3-5 days)
+1. **Fix N+1 query problems** (BTL003) - 2 hours
+   - Estimated improvement: 75% response time reduction on affected endpoints
+2. **Implement bundle optimization** (BTL004) - 6 hours
+   - Estimated improvement: 68% bundle size reduction
+
+### Phase 3: Infrastructure Improvements (Next sprint - 1-2 weeks)
+1. **Configure CDN for static assets** (BTL005) - 3 hours
+   - Estimated improvement: 30% TTFB reduction
+2. **Optimize database caching strategy** (BTL006) - 4 hours
+   - Estimated improvement: 15% overall query performance
+
+## Expected Overall Impact
+
+If all critical and high-priority optimizations are implemented:
+
+| Metric | Current | Expected | Improvement |
+|--------|---------|----------|-------------|
+| LCP | 3.2s | 1.5s | 53% faster |
+| Bundle Size | 2.5MB | 650KB | 74% smaller |
+| P95 Response Time | 850ms | 250ms | 71% faster |
+| User Lookup Query | 450ms | 8ms | 98% faster |
+| Overall Performance Score | 62/100 | 88/100 | +26 points |
+
+## Monitoring Recommendations
+
+After implementing optimizations, monitor these key metrics:
+
+**Frontend**:
+- Real User Monitoring (RUM) for Web Vitals
+- Bundle size in CI/CD pipeline
+- Lighthouse CI for regression detection
+
+**Backend**:
+- APM for endpoint response times
+- Error rate monitoring
+- Database query performance
+
+**Database**:
+- Slow query log monitoring
+- Index hit rate
+- Connection pool metrics
+
+**Infrastructure**:
+- Resource utilization alerts
+- Auto-scaling triggers
+- CDN cache hit rates
+
+## Testing Instructions
+
+### Before Optimization
+1. Run Lighthouse audit: `npx lighthouse [url] --output=json --output-path=baseline.json`
+2. Capture API metrics: [specify how]
+3. Profile database: [SQL queries above]
+4. Save baseline for comparison
+
+### After Optimization
+1. Repeat all baseline measurements
+2. Compare metrics using provided scripts
+3. Verify no functionality regressions
+4. Monitor for 24-48 hours in production
+
+## Next Steps
+
+1. Review and prioritize optimizations with team
+2. Create tasks for Phase 1 critical fixes
+3. Implement optimizations using `/optimize [layer]` operations
+4. Benchmark improvements using `/optimize benchmark`
+5. Document lessons learned and update performance budget