zhongwei/gh-jamesrochabrun-skills-query-expert

Files

Zhongwei Li 55ae231efe Initial commit

2025-11-29 18:49:43 +08:00

17 KiB

Raw Permalink Blame History

name, description

name	description
query-expert	Master SQL and database queries across multiple systems. Generate optimized queries, analyze performance, design indexes, and troubleshoot slow queries for PostgreSQL, MySQL, MongoDB, and more.

Query Expert

Master database queries across SQL and NoSQL systems. Generate optimized queries, analyze performance with EXPLAIN plans, design effective indexes, and troubleshoot slow queries.

What This Skill Does

Helps you write efficient, performant database queries:

Generate Queries - SQL, MongoDB, GraphQL queries
Optimize Queries - Performance tuning and refactoring
Design Indexes - Index strategies for faster queries
Analyze Performance - EXPLAIN plans and query analysis
Troubleshoot - Debug slow queries and bottlenecks
Best Practices - Query patterns and anti-patterns

Supported Databases

SQL Databases

PostgreSQL - Advanced features, CTEs, window functions
MySQL/MariaDB - InnoDB optimization, replication
SQLite - Embedded database optimization
SQL Server - T-SQL, execution plans, DMVs
Oracle - PL/SQL, partitioning, hints

NoSQL Databases

MongoDB - Aggregation pipelines, indexes
Redis - Key-value queries, Lua scripts
Elasticsearch - Full-text search queries
Cassandra - CQL, partition keys

Query Languages

SQL - Standard and vendor-specific
MongoDB Query Language - Find, aggregation
GraphQL - Efficient data fetching
Cypher - Neo4j graph queries

SQL Query Patterns

SELECT Queries

Basic SELECT

-- ✅ Select only needed columns
SELECT
    user_id,
    email,
    created_at
FROM users
WHERE status = 'active'
    AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 100;

-- ❌ Avoid SELECT *
SELECT * FROM users;  -- Wastes resources

JOINs

-- INNER JOIN (most common)
SELECT
    o.order_id,
    o.total,
    c.name AS customer_name,
    c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.created_at >= '2024-01-01';

-- LEFT JOIN (include all left rows)
SELECT
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

-- Multiple JOINs
SELECT
    o.order_id,
    c.name AS customer_name,
    p.product_name,
    oi.quantity,
    oi.price
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.status = 'completed';

Subqueries

-- Subquery in WHERE
SELECT name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE total > 1000
);

-- Correlated subquery
SELECT
    c.name,
    (SELECT COUNT(*)
     FROM orders o
     WHERE o.customer_id = c.customer_id) AS order_count
FROM customers c;

-- ✅ Better: Use JOIN instead
SELECT
    c.name,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Aggregation

-- GROUP BY with aggregates
SELECT
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price,
    MIN(price) AS min_price,
    MAX(price) AS max_price,
    SUM(stock_quantity) AS total_stock
FROM products
GROUP BY category
HAVING COUNT(*) > 5
ORDER BY avg_price DESC;

-- Multiple GROUP BY columns
SELECT
    DATE_TRUNC('month', created_at) AS month,
    category,
    SUM(total) AS monthly_sales
FROM orders
GROUP BY DATE_TRUNC('month', created_at), category
ORDER BY month DESC, monthly_sales DESC;

-- ROLLUP for subtotals
SELECT
    COALESCE(category, 'TOTAL') AS category,
    COALESCE(brand, 'All Brands') AS brand,
    SUM(sales) AS total_sales
FROM products
GROUP BY ROLLUP(category, brand);

Window Functions (PostgreSQL, SQL Server, MySQL 8+)

-- ROW_NUMBER
SELECT
    customer_id,
    order_date,
    total,
    ROW_NUMBER() OVER (
        PARTITION BY customer_id
        ORDER BY order_date DESC
    ) AS order_rank
FROM orders;

-- Running totals
SELECT
    order_date,
    total,
    SUM(total) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders;

-- RANK vs DENSE_RANK
SELECT
    product_name,
    sales,
    RANK() OVER (ORDER BY sales DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rank,
    NTILE(4) OVER (ORDER BY sales DESC) AS quartile
FROM products;

-- LAG and LEAD
SELECT
    order_date,
    total,
    LAG(total, 1) OVER (ORDER BY order_date) AS prev_total,
    LEAD(total, 1) OVER (ORDER BY order_date) AS next_total,
    total - LAG(total, 1) OVER (ORDER BY order_date) AS change
FROM orders;

CTEs (Common Table Expressions)

-- Simple CTE
WITH active_customers AS (
    SELECT customer_id, name, email
    FROM customers
    WHERE status = 'active'
)
SELECT
    ac.name,
    COUNT(o.order_id) AS order_count
FROM active_customers ac
LEFT JOIN orders o ON ac.customer_id = o.customer_id
GROUP BY ac.customer_id, ac.name;

-- Multiple CTEs
WITH
monthly_sales AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(total) AS sales
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
),
avg_monthly AS (
    SELECT AVG(sales) AS avg_sales
    FROM monthly_sales
)
SELECT
    ms.month,
    ms.sales,
    am.avg_sales,
    ms.sales - am.avg_sales AS variance
FROM monthly_sales ms
CROSS JOIN avg_monthly am
ORDER BY ms.month;

-- Recursive CTE (hierarchies)
WITH RECURSIVE org_tree AS (
    -- Base case
    SELECT
        employee_id,
        name,
        manager_id,
        1 AS level,
        ARRAY[employee_id] AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case
    SELECT
        e.employee_id,
        e.name,
        e.manager_id,
        ot.level + 1,
        ot.path || e.employee_id
    FROM employees e
    INNER JOIN org_tree ot ON e.manager_id = ot.employee_id
)
SELECT * FROM org_tree ORDER BY path;

Query Optimization

1. Use Indexes Effectively

-- Create index on frequently queried columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Composite index (order matters!)
CREATE INDEX idx_orders_composite
ON orders(status, customer_id, order_date);

-- ✅ This query uses the index
SELECT * FROM orders
WHERE status = 'pending'
    AND customer_id = 123
    AND order_date > '2024-01-01';

-- ❌ This doesn't use the index (skips first column)
SELECT * FROM orders
WHERE customer_id = 123;

-- Partial/Filtered index (smaller, faster)
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';

-- Covering index (includes all needed columns)
CREATE INDEX idx_users_covering
ON users(email)
INCLUDE (name, created_at);

2. Avoid SELECT *

-- ❌ Bad: Retrieves all columns
SELECT * FROM users;

-- ✅ Good: Select only needed columns
SELECT user_id, email, name FROM users;

-- ✅ Good: More efficient for joins
SELECT
    u.user_id,
    u.email,
    o.order_id,
    o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

3. Optimize JOINs

-- ❌ Bad: Filtering after JOIN
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed';

-- ✅ Good: Filter before JOIN
SELECT u.name, o.total
FROM users u
INNER JOIN (
    SELECT user_id, total
    FROM orders
    WHERE status = 'completed'
) o ON u.user_id = o.user_id;

-- ✅ Even better: Use WHERE with INNER JOIN
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed';

4. Use EXISTS Instead of IN

-- ❌ Slower: IN with subquery
SELECT name FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders WHERE total > 1000
);

-- ✅ Faster: EXISTS
SELECT name FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
        AND o.total > 1000
);

5. Avoid Functions on Indexed Columns

-- ❌ Bad: Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';

-- ✅ Good: Use functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Or use case-insensitive collation
SELECT * FROM users
WHERE email = 'john@example.com' COLLATE utf8_general_ci;

6. Limit Result Sets

-- ✅ Use LIMIT/TOP for pagination
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;

-- ✅ Use WHERE to reduce rows early
SELECT * FROM orders
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;

7. Batch Operations

-- ❌ Bad: Multiple single inserts
INSERT INTO users (name, email) VALUES ('User1', 'user1@example.com');
INSERT INTO users (name, email) VALUES ('User2', 'user2@example.com');

-- ✅ Good: Batch insert
INSERT INTO users (name, email) VALUES
    ('User1', 'user1@example.com'),
    ('User2', 'user2@example.com'),
    ('User3', 'user3@example.com');

-- ✅ Good: Batch update
UPDATE products
SET price = price * 1.1
WHERE category IN ('Electronics', 'Computers');

EXPLAIN Plans

PostgreSQL

-- Simple EXPLAIN
EXPLAIN
SELECT * FROM orders WHERE customer_id = 123;

-- EXPLAIN ANALYZE (actually runs query)
EXPLAIN ANALYZE
SELECT
    c.name,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

-- Look for:
-- - Seq Scan (bad, needs index)
-- - Index Scan (good)
-- - Bitmap Heap Scan (good for multiple rows)
-- - Hash Join vs Nested Loop
-- - High cost numbers

MySQL

-- EXPLAIN
EXPLAIN
SELECT * FROM orders WHERE customer_id = 123;

-- EXPLAIN ANALYZE (MySQL 8.0.18+)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

-- Look for:
-- - type: ALL (table scan, bad)
-- - type: index (index scan, good)
-- - type: ref (index lookup, great)
-- - Extra: Using filesort (may need index)
-- - Extra: Using temporary (may need optimization)

Indexing Strategies

When to Index

✅ Index these columns:

Primary keys (automatic)
Foreign keys
Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY
Columns in GROUP BY

❌ Don't index:

Small tables (< 1000 rows)
Columns with low cardinality (few distinct values)
Frequently updated columns
Large text/blob columns

Index Types

-- B-Tree (default, most common)
CREATE INDEX idx_users_email ON users(email);

-- Hash index (equality only, PostgreSQL)
CREATE INDEX idx_users_email_hash ON users USING HASH(email);

-- GIN (full-text search, arrays, JSONB)
CREATE INDEX idx_posts_content_gin
ON posts USING GIN(to_tsvector('english', content));

-- GiST (geometric, full-text)
CREATE INDEX idx_locations_gist
ON locations USING GIST(coordinates);

-- Partial index (filtered)
CREATE INDEX idx_orders_pending
ON orders(customer_id)
WHERE status = 'pending';

-- Expression index
CREATE INDEX idx_users_email_domain
ON users((email ~~ '%@gmail.com%'));

Composite Index Order

-- Index column order matters!
CREATE INDEX idx_orders_search
ON orders(status, customer_id, created_at);

-- ✅ Uses index (left-most column)
WHERE status = 'completed'

-- ✅ Uses index (left-most columns)
WHERE status = 'completed' AND customer_id = 123

-- ✅ Uses full index
WHERE status = 'completed'
    AND customer_id = 123
    AND created_at > '2024-01-01'

-- ❌ Doesn't use index (skips first column)
WHERE customer_id = 123

-- ❌ Doesn't use index (skips first column)
WHERE created_at > '2024-01-01'

MongoDB Queries

Find Queries

// Basic find
db.users.find({ status: 'active' })

// Find with projection
db.users.find(
    { status: 'active' },
    { name: 1, email: 1, _id: 0 }
)

// Find with operators
db.orders.find({
    total: { $gt: 100, $lt: 1000 },
    status: { $in: ['pending', 'processing'] },
    'customer.city': 'New York'
})

// Find with sort and limit
db.products.find({ category: 'Electronics' })
    .sort({ price: -1 })
    .limit(10)

// Count
db.users.countDocuments({ status: 'active' })

Aggregation Pipeline

// Group and count
db.orders.aggregate([
    { $match: { status: 'completed' } },
    { $group: {
        _id: '$customer_id',
        total_orders: { $sum: 1 },
        total_spent: { $sum: '$total' },
        avg_order: { $avg: '$total' }
    }},
    { $sort: { total_spent: -1 } },
    { $limit: 10 }
])

// Lookup (JOIN)
db.orders.aggregate([
    { $lookup: {
        from: 'customers',
        localField: 'customer_id',
        foreignField: '_id',
        as: 'customer'
    }},
    { $unwind: '$customer' },
    { $project: {
        order_id: 1,
        total: 1,
        'customer.name': 1,
        'customer.email': 1
    }}
])

// Complex aggregation
db.sales.aggregate([
    // Filter
    { $match: {
        date: { $gte: ISODate('2024-01-01') }
    }},

    // Add computed fields
    { $addFields: {
        month: { $month: '$date' },
        year: { $year: '$date' }
    }},

    // Group by month
    { $group: {
        _id: { year: '$year', month: '$month' },
        total_sales: { $sum: '$amount' },
        order_count: { $sum: 1 },
        avg_sale: { $avg: '$amount' }
    }},

    // Sort
    { $sort: { '_id.year': 1, '_id.month': 1 } },

    // Reshape
    { $project: {
        _id: 0,
        date: {
            $concat: [
                { $toString: '$_id.year' },
                '-',
                { $toString: '$_id.month' }
            ]
        },
        total_sales: 1,
        order_count: 1,
        avg_sale: { $round: ['$avg_sale', 2] }
    }}
])

MongoDB Indexes

// Single field index
db.users.createIndex({ email: 1 })

// Compound index
db.orders.createIndex({ customer_id: 1, created_at: -1 })

// Unique index
db.users.createIndex({ email: 1 }, { unique: true })

// Partial index
db.orders.createIndex(
    { customer_id: 1 },
    { partialFilterExpression: { status: 'active' } }
)

// Text index
db.products.createIndex({ name: 'text', description: 'text' })

// TTL index (auto-delete after time)
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 }
)

// List indexes
db.users.getIndexes()

// Analyze query performance
db.orders.find({ customer_id: 123 }).explain('executionStats')

GraphQL Queries

# Basic query
query {
  users {
    id
    name
    email
  }
}

# Query with arguments
query {
  user(id: "123") {
    name
    email
    orders {
      id
      total
      status
    }
  }
}

# Query with variables
query GetUser($userId: ID!) {
  user(id: $userId) {
    name
    email
    orders(limit: 10, status: COMPLETED) {
      id
      total
      createdAt
    }
  }
}

# Fragments (reusable fields)
fragment UserFields on User {
  id
  name
  email
  createdAt
}

query {
  user(id: "123") {
    ...UserFields
    orders {
      id
      total
    }
  }
}

# Avoid N+1 queries with DataLoader
query {
  orders {
    id
    total
    customer {  # Batched by DataLoader
      name
      email
    }
  }
}

Common Anti-Patterns

❌ N+1 Query Problem

-- Bad: N+1 queries
SELECT * FROM customers;  -- 1 query
-- Then for each customer:
SELECT * FROM orders WHERE customer_id = ?;  -- N queries

-- Good: Single JOIN query
SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.total
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

❌ Using OR on Different Columns

-- Bad: Can't use indexes effectively
SELECT * FROM products
WHERE name = 'iPhone' OR category = 'Electronics';

-- Good: Use UNION
SELECT * FROM products WHERE name = 'iPhone'
UNION
SELECT * FROM products WHERE category = 'Electronics';

❌ Implicit Type Conversion

-- Bad: '123' is string, user_id is integer
SELECT * FROM users WHERE user_id = '123';

-- Good: Use correct type
SELECT * FROM users WHERE user_id = 123;

Query Performance Checklist

Select only needed columns (no SELECT *)
Add indexes to WHERE/JOIN/ORDER BY columns
Use EXPLAIN to analyze query plan
Avoid functions on indexed columns
Use EXISTS instead of IN for subqueries
Batch INSERT/UPDATE operations
Use appropriate JOIN types
Filter early (WHERE before JOIN)
Use LIMIT for large result sets
Monitor slow query logs
Update statistics regularly
Avoid SELECT DISTINCT when possible
Use covering indexes when appropriate

Resources

PostgreSQL: https://www.postgresql.org/docs/current/performance-tips.html
MySQL: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
MongoDB: https://docs.mongodb.com/manual/core/query-optimization/
Use The Index, Luke: https://use-the-index-luke.com/

"Premature optimization is the root of all evil, but slow queries are the root of all frustration."

17 KiB Raw Permalink Blame History