Files
2025-11-29 18:20:21 +08:00

495 lines
15 KiB
Markdown

# Performance Analysis Operation
You are executing the **analyze** operation to perform comprehensive performance analysis and identify bottlenecks across all application layers.
## Parameters
**Received**: `$ARGUMENTS` (after removing 'analyze' operation name)
Expected format: `target:"area" [scope:"frontend|backend|database|infrastructure|all"] [metrics:"baseline|compare"] [baseline:"version-or-timestamp"]`
**Parameter definitions**:
- `target` (required): Application or component to analyze (e.g., "user dashboard", "checkout flow", "production app")
- `scope` (optional): Layer to focus on - `frontend`, `backend`, `database`, `infrastructure`, or `all` (default: `all`)
- `metrics` (optional): Metrics mode - `baseline` (establish baseline), `compare` (compare against baseline) (default: `baseline`)
- `baseline` (optional): Baseline version or timestamp for comparison (e.g., "v1.2.0", "2025-10-01")
## Workflow
### 1. Define Analysis Scope
Based on the `target` and `scope` parameters, determine what to analyze:
**Scope: all** (comprehensive analysis):
- Frontend: Page load, rendering, bundle size
- Backend: API response times, throughput, error rates
- Database: Query performance, connection pools, cache hit rates
- Infrastructure: Resource utilization, scaling efficiency
**Scope: frontend**:
- Web Vitals (LCP, FID, CLS, INP, TTFB, FCP)
- Bundle sizes and composition
- Network waterfall analysis
- Runtime performance (memory, CPU)
**Scope: backend**:
- API endpoint response times (p50, p95, p99)
- Throughput and concurrency handling
- Error rates and types
- Dependency latency (database, external APIs)
**Scope: database**:
- Query execution times
- Index effectiveness
- Connection pool utilization
- Cache hit rates
**Scope: infrastructure**:
- CPU, memory, disk, network utilization
- Container/instance metrics
- Auto-scaling behavior
- CDN effectiveness
### 2. Establish Baseline Metrics
Run comprehensive performance profiling:
**Frontend Profiling**:
```bash
# Lighthouse audit
npx lighthouse [url] --output=json --output-path=./perf-baseline-lighthouse.json
# Bundle analysis
npm run build -- --stats
npx webpack-bundle-analyzer dist/stats.json --mode static --report ./perf-baseline-bundle.html
# Check for unused dependencies
npx depcheck > ./perf-baseline-deps.txt
# Runtime profiling (if applicable)
# Use browser DevTools Performance tab
```
**Backend Profiling**:
```bash
# API response times (if monitoring exists)
# Check APM dashboard or logs
# Profile Node.js application
node --prof app.js
# Then process the profile
node --prof-process isolate-*.log > perf-baseline-profile.txt
# Memory snapshot
node --inspect app.js
# Take heap snapshot via Chrome DevTools
# Load test to get baseline throughput
npx k6 run --duration 60s --vus 50 load-test.js
```
**Database Profiling**:
```sql
-- PostgreSQL: Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Capture slow queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
stddev_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 50;
-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Table statistics
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables;
```
**Infrastructure Profiling**:
```bash
# Container metrics (if using Docker/Kubernetes)
docker stats --no-stream
# Or for Kubernetes
kubectl top nodes
kubectl top pods
# Server resource utilization
top -b -n 1 | head -20
free -h
df -h
iostat -x 1 5
```
### 3. Identify Bottlenecks
Analyze collected metrics to identify performance bottlenecks:
**Bottleneck Detection Matrix**:
| Layer | Indicator | Severity | Common Causes |
|-------|-----------|----------|---------------|
| **Frontend** | LCP > 2.5s | High | Large images, render-blocking resources, slow TTFB |
| **Frontend** | Bundle > 1MB | Medium | Unused dependencies, no code splitting, large libraries |
| **Frontend** | CLS > 0.1 | Medium | Missing dimensions, dynamic content injection |
| **Frontend** | INP > 200ms | High | Long tasks, unoptimized event handlers |
| **Backend** | p95 > 1000ms | High | Slow queries, N+1 problems, synchronous I/O |
| **Backend** | p99 > 5000ms | Critical | Database locks, resource exhaustion, cascading failures |
| **Backend** | Error rate > 1% | High | Unhandled errors, timeout issues, dependency failures |
| **Database** | Query > 500ms | High | Missing indexes, full table scans, complex joins |
| **Database** | Cache hit < 80% | Medium | Insufficient cache size, poor cache strategy |
| **Database** | Connection pool exhaustion | Critical | Connection leaks, insufficient pool size |
| **Infrastructure** | CPU > 80% | High | Insufficient resources, inefficient algorithms |
| **Infrastructure** | Memory > 90% | Critical | Memory leaks, oversized caches, insufficient resources |
**Prioritization Framework**:
1. **Critical** - Immediate impact on user experience or system stability
2. **High** - Significant performance degradation
3. **Medium** - Noticeable but not blocking
4. **Low** - Minor optimization opportunity
### 4. Create Optimization Opportunity Matrix
For each identified bottleneck, assess:
**Impact Assessment**:
- Performance improvement potential (low/medium/high)
- Implementation effort (hours/days)
- Risk level (low/medium/high)
- Dependencies on other optimizations
**Optimization Opportunities**:
```markdown
## Opportunity Matrix
| ID | Layer | Issue | Impact | Effort | Priority | Recommendation |
|----|-------|-------|--------|--------|----------|----------------|
| 1 | Database | Missing index on users.email | High | 1h | Critical | Add index immediately |
| 2 | Frontend | Bundle size 2.5MB | High | 4h | High | Implement code splitting |
| 3 | Backend | N+1 query in /api/users | High | 2h | High | Add eager loading |
| 4 | Infrastructure | No CDN for static assets | Medium | 3h | Medium | Configure CloudFront |
| 5 | Frontend | Unoptimized images | Medium | 2h | Medium | Add next/image or similar |
```
### 5. Generate Performance Profile
Create a comprehensive performance profile:
**Performance Snapshot**:
```json
{
"timestamp": "2025-10-14T12:00:00Z",
"version": "v1.2.3",
"environment": "production",
"metrics": {
"frontend": {
"lcp": 3200,
"fid": 150,
"cls": 0.15,
"ttfb": 800,
"bundle_size": 2500000
},
"backend": {
"p50_response_time": 120,
"p95_response_time": 850,
"p99_response_time": 2100,
"throughput_rps": 450,
"error_rate": 0.02
},
"database": {
"avg_query_time": 45,
"slow_query_count": 23,
"cache_hit_rate": 0.72,
"connection_pool_utilization": 0.85
},
"infrastructure": {
"cpu_utilization": 0.68,
"memory_utilization": 0.75,
"disk_io_wait": 0.03
}
},
"bottlenecks": [
{
"id": "BTL001",
"layer": "frontend",
"severity": "high",
"issue": "Large LCP time",
"metric": "lcp",
"value": 3200,
"threshold": 2500,
"impact": "Poor user experience on initial page load"
}
]
}
```
### 6. Recommend Next Steps
Based on analysis results, recommend:
**Immediate Actions** (Critical bottlenecks):
- List specific optimizations with highest ROI
- Estimated improvement for each
- Implementation order
**Short-term Actions** (High priority):
- Optimizations to tackle in current sprint
- Potential dependencies
**Long-term Actions** (Medium/Low priority):
- Architectural improvements
- Infrastructure upgrades
- Technical debt reduction
## Output Format
```markdown
# Performance Analysis Report: [Target]
**Analysis Date**: [Date and time]
**Analyzed Version**: [Version or commit]
**Environment**: [production/staging/development]
**Scope**: [all/frontend/backend/database/infrastructure]
## Executive Summary
[2-3 paragraph summary of overall findings, critical issues, and recommended priorities]
## Baseline Metrics
### Frontend Performance
| Metric | Value | Status | Threshold |
|--------|-------|--------|-----------|
| LCP (Largest Contentful Paint) | 3.2s | ⚠️ Needs Improvement | < 2.5s |
| FID (First Input Delay) | 150ms | ✅ Good | < 100ms |
| CLS (Cumulative Layout Shift) | 0.15 | ⚠️ Needs Improvement | < 0.1 |
| TTFB (Time to First Byte) | 800ms | ⚠️ Needs Improvement | < 600ms |
| Bundle Size (gzipped) | 2.5MB | ❌ Poor | < 500KB |
### Backend Performance
| Metric | Value | Status | Threshold |
|--------|-------|--------|-----------|
| P50 Response Time | 120ms | ✅ Good | < 200ms |
| P95 Response Time | 850ms | ⚠️ Needs Improvement | < 500ms |
| P99 Response Time | 2100ms | ❌ Poor | < 1000ms |
| Throughput | 450 req/s | ✅ Good | > 400 req/s |
| Error Rate | 2% | ⚠️ Needs Improvement | < 1% |
### Database Performance
| Metric | Value | Status | Threshold |
|--------|-------|--------|-----------|
| Avg Query Time | 45ms | ✅ Good | < 100ms |
| Slow Query Count (>500ms) | 23 queries | ❌ Poor | 0 queries |
| Cache Hit Rate | 72% | ⚠️ Needs Improvement | > 85% |
| Connection Pool Utilization | 85% | ⚠️ Needs Improvement | < 75% |
### Infrastructure Performance
| Metric | Value | Status | Threshold |
|--------|-------|--------|-----------|
| CPU Utilization | 68% | ✅ Good | < 75% |
| Memory Utilization | 75% | ⚠️ Needs Improvement | < 70% |
| Disk I/O Wait | 3% | ✅ Good | < 5% |
## Bottlenecks Identified
### Critical Priority
#### BTL001: Frontend - Large LCP Time (3.2s)
**Impact**: High - Users experience slow initial page load
**Cause**:
- Large hero image (1.2MB) loaded synchronously
- Render-blocking CSS and JavaScript
- No image optimization
**Recommendation**:
1. Optimize and lazy-load hero image (reduce to <200KB)
2. Defer non-critical CSS/JS
3. Implement resource hints (preload critical assets)
**Expected Improvement**: LCP reduction to ~1.8s (44% improvement)
#### BTL002: Database - Missing Index on users.email
**Impact**: High - Slow user lookup queries affecting multiple endpoints
**Queries Affected**:
```sql
SELECT * FROM users WHERE email = $1; -- 450ms avg
```
**Recommendation**:
```sql
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
```
**Expected Improvement**: Query time reduction to <10ms (95% improvement)
### High Priority
#### BTL003: Backend - N+1 Query Problem in /api/users Endpoint
**Impact**: High - p95 response time of 850ms
**Cause**:
```javascript
// Current (N+1 problem)
const users = await User.findAll();
for (const user of users) {
user.posts = await Post.findAll({ where: { userId: user.id } });
}
```
**Recommendation**:
```javascript
// Optimized (eager loading)
const users = await User.findAll({
include: [{ model: Post, as: 'posts' }]
});
```
**Expected Improvement**: Response time reduction to ~200ms (75% improvement)
#### BTL004: Frontend - Bundle Size 2.5MB
**Impact**: High - Slow initial load especially on mobile
**Cause**:
- No code splitting
- Unused dependencies (moment.js, lodash full import)
- No tree shaking
**Recommendation**:
1. Implement code splitting by route
2. Replace moment.js with date-fns (92% smaller)
3. Use tree-shakeable imports
```javascript
// Before
import _ from 'lodash';
import moment from 'moment';
// After
import { debounce, throttle } from 'lodash-es';
import { format, parseISO } from 'date-fns';
```
**Expected Improvement**: Bundle reduction to ~800KB (68% improvement)
### Medium Priority
[Additional bottlenecks with similar format]
## Optimization Opportunity Matrix
| ID | Layer | Issue | Impact | Effort | Priority | Est. Improvement |
|----|-------|-------|--------|--------|----------|------------------|
| BTL001 | Frontend | Large LCP | High | 4h | Critical | 44% LCP reduction |
| BTL002 | Database | Missing index | High | 1h | Critical | 95% query speedup |
| BTL003 | Backend | N+1 queries | High | 2h | High | 75% response time reduction |
| BTL004 | Frontend | Bundle size | High | 6h | High | 68% bundle reduction |
| BTL005 | Infrastructure | No CDN | Medium | 3h | Medium | 30% TTFB reduction |
| BTL006 | Database | Low cache hit | Medium | 4h | Medium | 15% query improvement |
## Profiling Data
### Frontend Profiling Results
[Include relevant Lighthouse report summary, bundle analysis, etc.]
### Backend Profiling Results
[Include relevant API response time distribution, slow endpoint list, etc.]
### Database Profiling Results
[Include slow query details, table scan frequency, etc.]
### Infrastructure Profiling Results
[Include resource utilization charts, scaling behavior, etc.]
## Recommended Action Plan
### Phase 1: Critical Fixes (Immediate - 1-2 days)
1. **Add missing database indexes** (BTL002) - 1 hour
- Estimated improvement: 95% reduction in user lookup queries
2. **Optimize hero image and implement lazy loading** (BTL001) - 4 hours
- Estimated improvement: 44% LCP reduction
### Phase 2: High-Priority Optimizations (This week - 3-5 days)
1. **Fix N+1 query problems** (BTL003) - 2 hours
- Estimated improvement: 75% response time reduction on affected endpoints
2. **Implement bundle optimization** (BTL004) - 6 hours
- Estimated improvement: 68% bundle size reduction
### Phase 3: Infrastructure Improvements (Next sprint - 1-2 weeks)
1. **Configure CDN for static assets** (BTL005) - 3 hours
- Estimated improvement: 30% TTFB reduction
2. **Optimize database caching strategy** (BTL006) - 4 hours
- Estimated improvement: 15% overall query performance
## Expected Overall Impact
If all critical and high-priority optimizations are implemented:
| Metric | Current | Expected | Improvement |
|--------|---------|----------|-------------|
| LCP | 3.2s | 1.5s | 53% faster |
| Bundle Size | 2.5MB | 650KB | 74% smaller |
| P95 Response Time | 850ms | 250ms | 71% faster |
| User Lookup Query | 450ms | 8ms | 98% faster |
| Overall Performance Score | 62/100 | 88/100 | +26 points |
## Monitoring Recommendations
After implementing optimizations, monitor these key metrics:
**Frontend**:
- Real User Monitoring (RUM) for Web Vitals
- Bundle size in CI/CD pipeline
- Lighthouse CI for regression detection
**Backend**:
- APM for endpoint response times
- Error rate monitoring
- Database query performance
**Database**:
- Slow query log monitoring
- Index hit rate
- Connection pool metrics
**Infrastructure**:
- Resource utilization alerts
- Auto-scaling triggers
- CDN cache hit rates
## Testing Instructions
### Before Optimization
1. Run Lighthouse audit: `npx lighthouse [url] --output=json --output-path=baseline.json`
2. Capture API metrics: [specify how]
3. Profile database: [SQL queries above]
4. Save baseline for comparison
### After Optimization
1. Repeat all baseline measurements
2. Compare metrics using provided scripts
3. Verify no functionality regressions
4. Monitor for 24-48 hours in production
## Next Steps
1. Review and prioritize optimizations with team
2. Create tasks for Phase 1 critical fixes
3. Implement optimizations using `/optimize [layer]` operations
4. Benchmark improvements using `/optimize benchmark`
5. Document lessons learned and update performance budget