17 KiB
17 KiB
name, description
| name | description |
|---|---|
| query-expert | Master SQL and database queries across multiple systems. Generate optimized queries, analyze performance, design indexes, and troubleshoot slow queries for PostgreSQL, MySQL, MongoDB, and more. |
Query Expert
Master database queries across SQL and NoSQL systems. Generate optimized queries, analyze performance with EXPLAIN plans, design effective indexes, and troubleshoot slow queries.
What This Skill Does
Helps you write efficient, performant database queries:
- Generate Queries - SQL, MongoDB, GraphQL queries
- Optimize Queries - Performance tuning and refactoring
- Design Indexes - Index strategies for faster queries
- Analyze Performance - EXPLAIN plans and query analysis
- Troubleshoot - Debug slow queries and bottlenecks
- Best Practices - Query patterns and anti-patterns
Supported Databases
SQL Databases
- PostgreSQL - Advanced features, CTEs, window functions
- MySQL/MariaDB - InnoDB optimization, replication
- SQLite - Embedded database optimization
- SQL Server - T-SQL, execution plans, DMVs
- Oracle - PL/SQL, partitioning, hints
NoSQL Databases
- MongoDB - Aggregation pipelines, indexes
- Redis - Key-value queries, Lua scripts
- Elasticsearch - Full-text search queries
- Cassandra - CQL, partition keys
Query Languages
- SQL - Standard and vendor-specific
- MongoDB Query Language - Find, aggregation
- GraphQL - Efficient data fetching
- Cypher - Neo4j graph queries
SQL Query Patterns
SELECT Queries
Basic SELECT
-- ✅ Select only needed columns
SELECT
user_id,
email,
created_at
FROM users
WHERE status = 'active'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 100;
-- ❌ Avoid SELECT *
SELECT * FROM users; -- Wastes resources
JOINs
-- INNER JOIN (most common)
SELECT
o.order_id,
o.total,
c.name AS customer_name,
c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.created_at >= '2024-01-01';
-- LEFT JOIN (include all left rows)
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
-- Multiple JOINs
SELECT
o.order_id,
c.name AS customer_name,
p.product_name,
oi.quantity,
oi.price
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.status = 'completed';
Subqueries
-- Subquery in WHERE
SELECT name, email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE total > 1000
);
-- Correlated subquery
SELECT
c.name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id) AS order_count
FROM customers c;
-- ✅ Better: Use JOIN instead
SELECT
c.name,
COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
Aggregation
-- GROUP BY with aggregates
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price,
MIN(price) AS min_price,
MAX(price) AS max_price,
SUM(stock_quantity) AS total_stock
FROM products
GROUP BY category
HAVING COUNT(*) > 5
ORDER BY avg_price DESC;
-- Multiple GROUP BY columns
SELECT
DATE_TRUNC('month', created_at) AS month,
category,
SUM(total) AS monthly_sales
FROM orders
GROUP BY DATE_TRUNC('month', created_at), category
ORDER BY month DESC, monthly_sales DESC;
-- ROLLUP for subtotals
SELECT
COALESCE(category, 'TOTAL') AS category,
COALESCE(brand, 'All Brands') AS brand,
SUM(sales) AS total_sales
FROM products
GROUP BY ROLLUP(category, brand);
Window Functions (PostgreSQL, SQL Server, MySQL 8+)
-- ROW_NUMBER
SELECT
customer_id,
order_date,
total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS order_rank
FROM orders;
-- Running totals
SELECT
order_date,
total,
SUM(total) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders;
-- RANK vs DENSE_RANK
SELECT
product_name,
sales,
RANK() OVER (ORDER BY sales DESC) AS rank,
DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rank,
NTILE(4) OVER (ORDER BY sales DESC) AS quartile
FROM products;
-- LAG and LEAD
SELECT
order_date,
total,
LAG(total, 1) OVER (ORDER BY order_date) AS prev_total,
LEAD(total, 1) OVER (ORDER BY order_date) AS next_total,
total - LAG(total, 1) OVER (ORDER BY order_date) AS change
FROM orders;
CTEs (Common Table Expressions)
-- Simple CTE
WITH active_customers AS (
SELECT customer_id, name, email
FROM customers
WHERE status = 'active'
)
SELECT
ac.name,
COUNT(o.order_id) AS order_count
FROM active_customers ac
LEFT JOIN orders o ON ac.customer_id = o.customer_id
GROUP BY ac.customer_id, ac.name;
-- Multiple CTEs
WITH
monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total) AS sales
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
),
avg_monthly AS (
SELECT AVG(sales) AS avg_sales
FROM monthly_sales
)
SELECT
ms.month,
ms.sales,
am.avg_sales,
ms.sales - am.avg_sales AS variance
FROM monthly_sales ms
CROSS JOIN avg_monthly am
ORDER BY ms.month;
-- Recursive CTE (hierarchies)
WITH RECURSIVE org_tree AS (
-- Base case
SELECT
employee_id,
name,
manager_id,
1 AS level,
ARRAY[employee_id] AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case
SELECT
e.employee_id,
e.name,
e.manager_id,
ot.level + 1,
ot.path || e.employee_id
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.employee_id
)
SELECT * FROM org_tree ORDER BY path;
Query Optimization
1. Use Indexes Effectively
-- Create index on frequently queried columns
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Composite index (order matters!)
CREATE INDEX idx_orders_composite
ON orders(status, customer_id, order_date);
-- ✅ This query uses the index
SELECT * FROM orders
WHERE status = 'pending'
AND customer_id = 123
AND order_date > '2024-01-01';
-- ❌ This doesn't use the index (skips first column)
SELECT * FROM orders
WHERE customer_id = 123;
-- Partial/Filtered index (smaller, faster)
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';
-- Covering index (includes all needed columns)
CREATE INDEX idx_users_covering
ON users(email)
INCLUDE (name, created_at);
2. Avoid SELECT *
-- ❌ Bad: Retrieves all columns
SELECT * FROM users;
-- ✅ Good: Select only needed columns
SELECT user_id, email, name FROM users;
-- ✅ Good: More efficient for joins
SELECT
u.user_id,
u.email,
o.order_id,
o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;
3. Optimize JOINs
-- ❌ Bad: Filtering after JOIN
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed';
-- ✅ Good: Filter before JOIN
SELECT u.name, o.total
FROM users u
INNER JOIN (
SELECT user_id, total
FROM orders
WHERE status = 'completed'
) o ON u.user_id = o.user_id;
-- ✅ Even better: Use WHERE with INNER JOIN
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed';
4. Use EXISTS Instead of IN
-- ❌ Slower: IN with subquery
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders WHERE total > 1000
);
-- ✅ Faster: EXISTS
SELECT name FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.total > 1000
);
5. Avoid Functions on Indexed Columns
-- ❌ Bad: Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';
-- ✅ Good: Use functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Or use case-insensitive collation
SELECT * FROM users
WHERE email = 'john@example.com' COLLATE utf8_general_ci;
6. Limit Result Sets
-- ✅ Use LIMIT/TOP for pagination
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;
-- ✅ Use WHERE to reduce rows early
SELECT * FROM orders
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;
7. Batch Operations
-- ❌ Bad: Multiple single inserts
INSERT INTO users (name, email) VALUES ('User1', 'user1@example.com');
INSERT INTO users (name, email) VALUES ('User2', 'user2@example.com');
-- ✅ Good: Batch insert
INSERT INTO users (name, email) VALUES
('User1', 'user1@example.com'),
('User2', 'user2@example.com'),
('User3', 'user3@example.com');
-- ✅ Good: Batch update
UPDATE products
SET price = price * 1.1
WHERE category IN ('Electronics', 'Computers');
EXPLAIN Plans
PostgreSQL
-- Simple EXPLAIN
EXPLAIN
SELECT * FROM orders WHERE customer_id = 123;
-- EXPLAIN ANALYZE (actually runs query)
EXPLAIN ANALYZE
SELECT
c.name,
COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
-- Look for:
-- - Seq Scan (bad, needs index)
-- - Index Scan (good)
-- - Bitmap Heap Scan (good for multiple rows)
-- - Hash Join vs Nested Loop
-- - High cost numbers
MySQL
-- EXPLAIN
EXPLAIN
SELECT * FROM orders WHERE customer_id = 123;
-- EXPLAIN ANALYZE (MySQL 8.0.18+)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;
-- Look for:
-- - type: ALL (table scan, bad)
-- - type: index (index scan, good)
-- - type: ref (index lookup, great)
-- - Extra: Using filesort (may need index)
-- - Extra: Using temporary (may need optimization)
Indexing Strategies
When to Index
✅ Index these columns:
- Primary keys (automatic)
- Foreign keys
- Columns in WHERE clauses
- Columns in JOIN conditions
- Columns in ORDER BY
- Columns in GROUP BY
❌ Don't index:
- Small tables (< 1000 rows)
- Columns with low cardinality (few distinct values)
- Frequently updated columns
- Large text/blob columns
Index Types
-- B-Tree (default, most common)
CREATE INDEX idx_users_email ON users(email);
-- Hash index (equality only, PostgreSQL)
CREATE INDEX idx_users_email_hash ON users USING HASH(email);
-- GIN (full-text search, arrays, JSONB)
CREATE INDEX idx_posts_content_gin
ON posts USING GIN(to_tsvector('english', content));
-- GiST (geometric, full-text)
CREATE INDEX idx_locations_gist
ON locations USING GIST(coordinates);
-- Partial index (filtered)
CREATE INDEX idx_orders_pending
ON orders(customer_id)
WHERE status = 'pending';
-- Expression index
CREATE INDEX idx_users_email_domain
ON users((email ~~ '%@gmail.com%'));
Composite Index Order
-- Index column order matters!
CREATE INDEX idx_orders_search
ON orders(status, customer_id, created_at);
-- ✅ Uses index (left-most column)
WHERE status = 'completed'
-- ✅ Uses index (left-most columns)
WHERE status = 'completed' AND customer_id = 123
-- ✅ Uses full index
WHERE status = 'completed'
AND customer_id = 123
AND created_at > '2024-01-01'
-- ❌ Doesn't use index (skips first column)
WHERE customer_id = 123
-- ❌ Doesn't use index (skips first column)
WHERE created_at > '2024-01-01'
MongoDB Queries
Find Queries
// Basic find
db.users.find({ status: 'active' })
// Find with projection
db.users.find(
{ status: 'active' },
{ name: 1, email: 1, _id: 0 }
)
// Find with operators
db.orders.find({
total: { $gt: 100, $lt: 1000 },
status: { $in: ['pending', 'processing'] },
'customer.city': 'New York'
})
// Find with sort and limit
db.products.find({ category: 'Electronics' })
.sort({ price: -1 })
.limit(10)
// Count
db.users.countDocuments({ status: 'active' })
Aggregation Pipeline
// Group and count
db.orders.aggregate([
{ $match: { status: 'completed' } },
{ $group: {
_id: '$customer_id',
total_orders: { $sum: 1 },
total_spent: { $sum: '$total' },
avg_order: { $avg: '$total' }
}},
{ $sort: { total_spent: -1 } },
{ $limit: 10 }
])
// Lookup (JOIN)
db.orders.aggregate([
{ $lookup: {
from: 'customers',
localField: 'customer_id',
foreignField: '_id',
as: 'customer'
}},
{ $unwind: '$customer' },
{ $project: {
order_id: 1,
total: 1,
'customer.name': 1,
'customer.email': 1
}}
])
// Complex aggregation
db.sales.aggregate([
// Filter
{ $match: {
date: { $gte: ISODate('2024-01-01') }
}},
// Add computed fields
{ $addFields: {
month: { $month: '$date' },
year: { $year: '$date' }
}},
// Group by month
{ $group: {
_id: { year: '$year', month: '$month' },
total_sales: { $sum: '$amount' },
order_count: { $sum: 1 },
avg_sale: { $avg: '$amount' }
}},
// Sort
{ $sort: { '_id.year': 1, '_id.month': 1 } },
// Reshape
{ $project: {
_id: 0,
date: {
$concat: [
{ $toString: '$_id.year' },
'-',
{ $toString: '$_id.month' }
]
},
total_sales: 1,
order_count: 1,
avg_sale: { $round: ['$avg_sale', 2] }
}}
])
MongoDB Indexes
// Single field index
db.users.createIndex({ email: 1 })
// Compound index
db.orders.createIndex({ customer_id: 1, created_at: -1 })
// Unique index
db.users.createIndex({ email: 1 }, { unique: true })
// Partial index
db.orders.createIndex(
{ customer_id: 1 },
{ partialFilterExpression: { status: 'active' } }
)
// Text index
db.products.createIndex({ name: 'text', description: 'text' })
// TTL index (auto-delete after time)
db.sessions.createIndex(
{ created_at: 1 },
{ expireAfterSeconds: 3600 }
)
// List indexes
db.users.getIndexes()
// Analyze query performance
db.orders.find({ customer_id: 123 }).explain('executionStats')
GraphQL Queries
# Basic query
query {
users {
id
name
email
}
}
# Query with arguments
query {
user(id: "123") {
name
email
orders {
id
total
status
}
}
}
# Query with variables
query GetUser($userId: ID!) {
user(id: $userId) {
name
email
orders(limit: 10, status: COMPLETED) {
id
total
createdAt
}
}
}
# Fragments (reusable fields)
fragment UserFields on User {
id
name
email
createdAt
}
query {
user(id: "123") {
...UserFields
orders {
id
total
}
}
}
# Avoid N+1 queries with DataLoader
query {
orders {
id
total
customer { # Batched by DataLoader
name
email
}
}
}
Common Anti-Patterns
❌ N+1 Query Problem
-- Bad: N+1 queries
SELECT * FROM customers; -- 1 query
-- Then for each customer:
SELECT * FROM orders WHERE customer_id = ?; -- N queries
-- Good: Single JOIN query
SELECT
c.customer_id,
c.name,
o.order_id,
o.total
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
❌ Using OR on Different Columns
-- Bad: Can't use indexes effectively
SELECT * FROM products
WHERE name = 'iPhone' OR category = 'Electronics';
-- Good: Use UNION
SELECT * FROM products WHERE name = 'iPhone'
UNION
SELECT * FROM products WHERE category = 'Electronics';
❌ Implicit Type Conversion
-- Bad: '123' is string, user_id is integer
SELECT * FROM users WHERE user_id = '123';
-- Good: Use correct type
SELECT * FROM users WHERE user_id = 123;
Query Performance Checklist
- Select only needed columns (no SELECT *)
- Add indexes to WHERE/JOIN/ORDER BY columns
- Use EXPLAIN to analyze query plan
- Avoid functions on indexed columns
- Use EXISTS instead of IN for subqueries
- Batch INSERT/UPDATE operations
- Use appropriate JOIN types
- Filter early (WHERE before JOIN)
- Use LIMIT for large result sets
- Monitor slow query logs
- Update statistics regularly
- Avoid SELECT DISTINCT when possible
- Use covering indexes when appropriate
Resources
- PostgreSQL: https://www.postgresql.org/docs/current/performance-tips.html
- MySQL: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
- MongoDB: https://docs.mongodb.com/manual/core/query-optimization/
- Use The Index, Luke: https://use-the-index-luke.com/
"Premature optimization is the root of all evil, but slow queries are the root of all frustration."