gh-jamesrochabrun-skills-al…/skills/query-expert/scripts/analyze_performance.sh

#!/bin/bash

# Query Expert - Performance Analyzer
# Analyze EXPLAIN output and provide optimization recommendations

set -e

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

print_info() {
    echo -e "${BLUE}ℹ $1${NC}"
}

print_warning() {
    echo -e "${YELLOW}⚠ $1${NC}"
}

print_success() {
    echo -e "${GREEN}✓ $1${NC}"
}

echo ""
echo "╔════════════════════════════════════════════════════════════╗"
echo "║                                                            ║"
echo "║          Query Expert - Performance Analyzer              ║"
echo "║                                                            ║"
echo "╚════════════════════════════════════════════════════════════╝"
echo ""

print_info "Performance Analysis Guide"
echo ""

cat << 'EOF'
## PostgreSQL EXPLAIN Analysis

### Run EXPLAIN
```sql
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;
```

### What to Look For:

**Seq Scan (Table Scan) ❌**
→ Reading entire table
→ Fix: Add index on filtered column
→ CREATE INDEX idx_orders_customer ON orders(customer_id);

**Index Scan ✅**
→ Using index efficiently
→ Good performance for specific rows

**Bitmap Heap Scan ✅**
→ Good for returning multiple rows
→ Efficient index usage

**Nested Loop ⚠️**
→ Can be slow with large datasets
→ Consider: Hash Join or Merge Join

**Hash Join ✅**
→ Good for large joins
→ Requires memory

**Cost Numbers**
→ Higher = slower
→ Compare before/after optimization
→ cost=0.00..35.50 rows=10

**Actual Time**
→ Real execution time
→ actual time=0.023..0.156 rows=10 loops=1

---

## MySQL EXPLAIN Analysis

### Run EXPLAIN
```sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
```

### Type Column:

**ALL ❌**
→ Full table scan
→ Fix: Add index

**index ⚠️**
→ Full index scan
→ Better than ALL, but could improve

**range ✅**
→ Index range scan
→ Good for WHERE with >, <, BETWEEN

**ref ✅✅**
→ Index lookup
→ Excellent performance

**eq_ref ✅✅✅**
→ Unique index lookup
→ Best performance

**const ✅✅✅**
→ Constant lookup (primary key)
→ Fastest possible

### Extra Column:

**Using filesort ⚠️**
→ Sorting in memory/disk
→ Fix: Add index on ORDER BY columns

**Using temporary ⚠️**
→ Creating temporary table
→ Fix: Optimize GROUP BY or DISTINCT

**Using index ✅**
→ Covering index (index-only scan)
→ Excellent performance

**Using where ✅**
→ Filtering after read
→ Normal for WHERE clauses

---

## MongoDB Explain Analysis

### Run Explain
```javascript
db.orders.find({ customer_id: 123 }).explain("executionStats")
```

### What to Look For:

**COLLSCAN ❌**
→ Full collection scan
→ Fix: Create index
→ db.orders.createIndex({ customer_id: 1 })

**IXSCAN ✅**
→ Index scan
→ Good performance

**executionTimeMillis**
→ Total execution time
→ < 100ms good, > 1000ms needs optimization

**nReturned vs totalDocsExamined**
→ Efficiency ratio
→ Ideally close to 1:1
→ If totalDocsExamined >> nReturned, add index

**Index Usage**
→ indexName: "customer_id_1" ✅
→ indexName: null ❌ (no index used)

---

## Index Recommendations

### When to Create Index:

1. **WHERE Clause**
   CREATE INDEX idx_table_column ON table(column);

2. **JOIN Columns**
   CREATE INDEX idx_table_join_col ON table(join_column);

3. **ORDER BY**
   CREATE INDEX idx_table_sort ON table(sort_column);

4. **Composite Index (order matters!)**
   CREATE INDEX idx_multi ON table(col1, col2, col3);
   → Works for: col1 | col1,col2 | col1,col2,col3
   → NOT for: col2 | col3 | col2,col3

5. **Covering Index**
   CREATE INDEX idx_covering ON table(filter_col) INCLUDE (select_cols);
   → Index contains all needed columns
   → Fastest possible (index-only scan)

### When NOT to Index:

- Small tables (< 1000 rows)
- Columns with low cardinality (few distinct values)
- Frequently updated columns
- Large text/blob columns

---

## Query Optimization Checklist

Performance Issues:
[ ] Check EXPLAIN plan
[ ] Look for table scans (Seq Scan, ALL, COLLSCAN)
[ ] Identify missing indexes
[ ] Check JOIN types (Nested Loop on large tables)
[ ] Look for filesort or temporary tables
[ ] Verify index usage (Using index)

Optimizations:
[ ] Create indexes on WHERE columns
[ ] Create indexes on JOIN columns
[ ] Use composite indexes (correct order)
[ ] Add covering indexes for frequent queries
[ ] Use LIMIT to reduce result set
[ ] Avoid SELECT * (select only needed columns)
[ ] Avoid functions on indexed columns
[ ] Use EXISTS instead of IN (subqueries)
[ ] Filter early (before JOIN)
[ ] Use appropriate JOIN type

Monitoring:
[ ] Run EXPLAIN ANALYZE before optimization
[ ] Create indexes
[ ] Run EXPLAIN ANALYZE after optimization
[ ] Compare execution time and cost
[ ] Test with production-like data volume
[ ] Monitor slow query log

---

## Example Optimization

### Before (Slow)
```sql
-- EXPLAIN shows: Seq Scan, cost=1000.00
SELECT * FROM orders WHERE customer_id = 123;
```

### Optimization Steps
```sql
-- 1. Create index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- 2. Optimize query (avoid SELECT *)
SELECT order_id, total, created_at
FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 100;

-- 3. Check improvement
EXPLAIN ANALYZE
SELECT order_id, total, created_at
FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 100;
```

### After (Fast)
```
→ Index Scan using idx_orders_customer
→ cost=0.29..15.50 (95% improvement!)
→ actual time=0.015..0.023
```

---

## Tools

**PostgreSQL:**
- EXPLAIN ANALYZE
- pg_stat_statements extension
- pgBadger (log analyzer)

**MySQL:**
- EXPLAIN
- SHOW PROFILE
- MySQL Workbench Performance Dashboard

**MongoDB:**
- explain("executionStats")
- MongoDB Compass (GUI)
- Database Profiler

---

EOF

print_success "Performance analysis guide displayed"
echo ""
print_info "Next Steps:"
echo "  1. Run EXPLAIN on your slow query"
echo "  2. Identify the bottleneck (table scan, no index, etc.)"
echo "  3. Apply recommended optimization"
echo "  4. Re-run EXPLAIN to verify improvement"
echo "  5. Test with production data volume"
echo ""