Initial commit

2025-11-29 18:52:31 +08:00
commit 7f9211b22c
16 changed files with 1217 additions and 0 deletions
--- a/commands/add-rate-limiting.md
+++ b/commands/add-rate-limiting.md
@@ -0,0 +1,655 @@
+---
+description: Add rate limiting to API endpoints
+shortcut: ratelimit
+---
+
+# Add Rate Limiting to API Endpoints
+
+Implement production-ready rate limiting with token bucket, sliding window, or fixed window algorithms using Redis for distributed state management.
+
+## When to Use This Command
+
+Use `/add-rate-limiting` when you need to:
+- Protect APIs from abuse and DDoS attacks
+- Enforce fair usage policies across user tiers
+- Prevent resource exhaustion from runaway clients
+- Comply with downstream API rate limits
+- Implement freemium pricing models with usage tiers
+- Control costs for expensive operations (AI inference, video processing)
+
+DON'T use this when:
+- Building internal-only APIs with trusted clients (use circuit breakers instead)
+- Single-user applications (no shared resource contention)
+- Already behind API gateway with built-in rate limiting (avoid double limiting)
+
+## Design Decisions
+
+This command implements **Token Bucket algorithm with Redis** as the primary approach because:
+- Allows burst traffic while maintaining average rate (better UX)
+- Distributed state enables horizontal scaling
+- Redis atomic operations prevent race conditions
+- Standard algorithm with well-understood behavior
+
+**Alternative considered: Sliding Window**
+- More accurate rate limiting (no reset boundary issues)
+- Higher Redis memory usage (stores timestamp per request)
+- Slightly higher computational overhead
+- Recommended for strict compliance requirements
+
+**Alternative considered: Fixed Window**
+- Simplest implementation (single counter)
+- Burst at window boundaries (2x limit possible)
+- Lower memory footprint
+- Recommended only for non-critical rate limiting
+
+**Alternative considered: Leaky Bucket**
+- Constant output rate (smooths bursty traffic)
+- Complex to explain to users
+- Less common in practice
+- Recommended for queuing systems, not APIs
+
+## Prerequisites
+
+Before running this command:
+1. Redis server installed and accessible (standalone or cluster)
+2. Node.js/Python runtime for middleware implementation
+3. API framework that supports middleware (Express, FastAPI, etc.)
+4. Understanding of your API usage patterns and SLO requirements
+5. Monitoring infrastructure to track rate limit metrics
+
+## Implementation Process
+
+### Step 1: Choose Rate Limiting Strategy
+Select algorithm based on requirements: Token Bucket for user-facing APIs, Sliding Window for strict compliance, Fixed Window for internal APIs.
+
+### Step 2: Configure Redis Connection
+Set up Redis client with connection pooling, retry logic, and failover handling for high availability.
+
+### Step 3: Implement Rate Limiter Middleware
+Create middleware that intercepts requests, checks Redis state, and enforces limits with proper HTTP headers.
+
+### Step 4: Define Rate Limit Tiers
+Configure different limits for user segments (anonymous, free, premium, enterprise) based on business requirements.
+
+### Step 5: Add Monitoring and Alerting
+Instrument rate limiter with metrics for blocked requests, Redis latency, and tier usage patterns.
+
+## Output Format
+
+The command generates:
+- `rate-limiter.js` or `rate_limiter.py` - Core rate limiting middleware
+- `redis-config.js` - Redis connection configuration with failover
+- `rate-limit-tiers.json` - Tiered limit definitions
+- `rate-limiter.test.js` - Comprehensive test suite
+- `README.md` - Integration guide and configuration options
+- `docker-compose.yml` - Redis setup for local development
+
+## Code Examples
+
+### Example 1: Token Bucket Rate Limiter with Express and Redis
+
+```javascript
+// rate-limiter.js
+const Redis = require('ioredis');
+
+class TokenBucketRateLimiter {
+  constructor(redisClient, options = {}) {
+    this.redis = redisClient;
+    this.defaultOptions = {
+      points: 100,        // Number of tokens
+      duration: 60,       // Time window in seconds
+      blockDuration: 60,  // Block duration after limit exceeded
+      keyPrefix: 'rl',    // Redis key prefix
+      ...options
+    };
+  }
+
+  /**
+   * Token bucket algorithm using Redis
+   * Returns: { allowed: boolean, remaining: number, resetTime: number }
+   */
+  async consume(identifier, points = 1, options = {}) {
+    const opts = { ...this.defaultOptions, ...options };
+    const key = `${opts.keyPrefix}:${identifier}`;
+    const now = Date.now();
+
+    // Lua script for atomic token bucket operations
+    const luaScript = `
+      local key = KEYS[1]
+      local capacity = tonumber(ARGV[1])
+      local refill_rate = tonumber(ARGV[2])
+      local requested = tonumber(ARGV[3])
+      local now = tonumber(ARGV[4])
+      local ttl = tonumber(ARGV[5])
+
+      -- Get current state or initialize
+      local tokens = tonumber(redis.call('HGET', key, 'tokens'))
+      local last_refill = tonumber(redis.call('HGET', key, 'last_refill'))
+
+      if not tokens then
+        tokens = capacity
+        last_refill = now
+      end
+
+      -- Calculate tokens to add since last refill
+      local time_passed = now - last_refill
+      local tokens_to_add = math.floor(time_passed * refill_rate)
+      tokens = math.min(capacity, tokens + tokens_to_add)
+      last_refill = now
+
+      -- Check if we can fulfill request
+      if tokens >= requested then
+        tokens = tokens - requested
+        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
+        redis.call('EXPIRE', key, ttl)
+        return {1, tokens, last_refill}
+      else
+        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
+        redis.call('EXPIRE', key, ttl)
+        return {0, tokens, last_refill}
+      end
+    `;
+
+    const refillRate = opts.points / opts.duration;
+    const result = await this.redis.eval(
+      luaScript,
+      1,
+      key,
+      opts.points,
+      refillRate,
+      points,
+      now,
+      opts.duration
+    );
+
+    const [allowed, remaining, lastRefill] = result;
+    const resetTime = lastRefill + (opts.duration * 1000);
+
+    return {
+      allowed: allowed === 1,
+      remaining: Math.floor(remaining),
+      resetTime: new Date(resetTime).toISOString(),
+      retryAfter: allowed === 1 ? null : Math.ceil((opts.duration * 1000 - (now - lastRefill)) / 1000)
+    };
+  }
+
+  /**
+   * Express middleware factory
+   */
+  middleware(getTier = null) {
+    return async (req, res, next) => {
+      try {
+        // Determine identifier (user ID or IP)
+        const identifier = req.user?.id || req.ip;
+
+        // Get tier configuration
+        const tier = getTier ? await getTier(req) : 'default';
+        const tierConfig = this.getTierConfig(tier);
+
+        // Consume tokens
+        const result = await this.consume(identifier, 1, tierConfig);
+
+        // Set rate limit headers
+        res.set({
+          'X-RateLimit-Limit': tierConfig.points,
+          'X-RateLimit-Remaining': result.remaining,
+          'X-RateLimit-Reset': result.resetTime
+        });
+
+        if (!result.allowed) {
+          res.set('Retry-After', result.retryAfter);
+          return res.status(429).json({
+            error: 'Too Many Requests',
+            message: `Rate limit exceeded. Try again in ${result.retryAfter} seconds.`,
+            retryAfter: result.retryAfter
+          });
+        }
+
+        next();
+      } catch (error) {
+        console.error('Rate limiter error:', error);
+        // Fail open to avoid blocking all traffic on Redis failure
+        next();
+      }
+    };
+  }
+
+  getTierConfig(tier) {
+    const tiers = {
+      anonymous: { points: 20, duration: 60 },
+      free: { points: 100, duration: 60 },
+      premium: { points: 1000, duration: 60 },
+      enterprise: { points: 10000, duration: 60 }
+    };
+    return tiers[tier] || tiers.free;
+  }
+}
+
+// Usage example
+const redis = new Redis({
+  host: process.env.REDIS_HOST || 'localhost',
+  port: process.env.REDIS_PORT || 6379,
+  retryStrategy: (times) => Math.min(times * 50, 2000),
+  maxRetriesPerRequest: 3
+});
+
+const rateLimiter = new TokenBucketRateLimiter(redis);
+
+// Apply to all routes
+app.use(rateLimiter.middleware(async (req) => {
+  if (req.user?.subscription === 'enterprise') return 'enterprise';
+  if (req.user?.subscription === 'premium') return 'premium';
+  if (req.user) return 'free';
+  return 'anonymous';
+}));
+
+// Apply stricter limit to specific expensive endpoint
+app.post('/api/ai/generate',
+  rateLimiter.middleware(() => ({ points: 10, duration: 3600 })),
+  handleGenerate
+);
+
+module.exports = TokenBucketRateLimiter;
+```
+
+### Example 2: Sliding Window Rate Limiter in Python with FastAPI
+
+```python
+# rate_limiter.py
+import time
+import redis.asyncio as aioredis
+from fastapi import Request, Response, HTTPException
+from typing import Optional, Callable
+import asyncio
+
+class SlidingWindowRateLimiter:
+    def __init__(self, redis_client: aioredis.Redis, window_size: int = 60, max_requests: int = 100):
+        self.redis = redis_client
+        self.window_size = window_size
+        self.max_requests = max_requests
+        self.key_prefix = "rate_limit"
+
+    async def is_allowed(self, identifier: str, tier_config: dict = None) -> dict:
+        """
+        Sliding window algorithm using Redis sorted set
+        Each request is a member with score = timestamp
+        """
+        config = tier_config or {'max_requests': self.max_requests, 'window_size': self.window_size}
+        now = time.time()
+        window_start = now - config['window_size']
+        key = f"{self.key_prefix}:{identifier}"
+
+        # Redis pipeline for atomic operations
+        pipe = self.redis.pipeline()
+
+        # Remove old entries outside the window
+        pipe.zremrangebyscore(key, 0, window_start)
+
+        # Count requests in current window
+        pipe.zcard(key)
+
+        # Add current request
+        pipe.zadd(key, {str(now): now})
+
+        # Set expiration
+        pipe.expire(key, config['window_size'] + 10)
+
+        results = await pipe.execute()
+        request_count = results[1]
+
+        if request_count >= config['max_requests']:
+            # Get oldest request in window to calculate retry time
+            oldest = await self.redis.zrange(key, 0, 0, withscores=True)
+            if oldest:
+                oldest_time = oldest[0][1]
+                retry_after = int(config['window_size'] - (now - oldest_time)) + 1
+            else:
+                retry_after = config['window_size']
+
+            return {
+                'allowed': False,
+                'remaining': 0,
+                'reset_time': int(now + retry_after),
+                'retry_after': retry_after
+            }
+
+        remaining = config['max_requests'] - request_count - 1
+        reset_time = int(now + config['window_size'])
+
+        return {
+            'allowed': True,
+            'remaining': remaining,
+            'reset_time': reset_time,
+            'retry_after': None
+        }
+
+    def middleware(self, get_tier: Optional[Callable] = None):
+        """FastAPI middleware factory"""
+        async def rate_limit_middleware(request: Request, call_next):
+            # Get identifier (user ID or IP)
+            identifier = getattr(request.state, 'user_id', None) or request.client.host
+
+            # Get tier configuration
+            tier_config = None
+            if get_tier:
+                tier_config = await get_tier(request)
+
+            # Check rate limit
+            result = await self.is_allowed(identifier, tier_config)
+
+            # Always set rate limit headers
+            response = None
+            if result['allowed']:
+                response = await call_next(request)
+            else:
+                response = Response(
+                    content=f'{{"error": "Rate limit exceeded", "retry_after": {result["retry_after"]}}}',
+                    status_code=429,
+                    media_type="application/json"
+                )
+
+            response.headers['X-RateLimit-Limit'] = str(tier_config['max_requests'] if tier_config else self.max_requests)
+            response.headers['X-RateLimit-Remaining'] = str(result['remaining'])
+            response.headers['X-RateLimit-Reset'] = str(result['reset_time'])
+
+            if not result['allowed']:
+                response.headers['Retry-After'] = str(result['retry_after'])
+
+            return response
+
+        return rate_limit_middleware
+
+# Usage in FastAPI
+from fastapi import FastAPI
+from contextlib import asynccontextmanager
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # Startup
+    app.state.redis = await aioredis.from_url("redis://localhost:6379")
+    app.state.rate_limiter = SlidingWindowRateLimiter(app.state.redis)
+    yield
+    # Shutdown
+    await app.state.redis.close()
+
+app = FastAPI(lifespan=lifespan)
+
+async def get_user_tier(request: Request) -> dict:
+    """Determine user tier from request"""
+    user = getattr(request.state, 'user', None)
+    if not user:
+        return {'max_requests': 20, 'window_size': 60}  # Anonymous
+    elif user.get('subscription') == 'enterprise':
+        return {'max_requests': 10000, 'window_size': 60}
+    elif user.get('subscription') == 'premium':
+        return {'max_requests': 1000, 'window_size': 60}
+    else:
+        return {'max_requests': 100, 'window_size': 60}  # Free tier
+
+# Apply rate limiting middleware
+app.middleware("http")(app.state.rate_limiter.middleware(get_user_tier))
+```
+
+### Example 3: DDoS Protection with Multi-Layer Rate Limiting
+
+```javascript
+// advanced-rate-limiter.js - Multi-layer protection
+const Redis = require('ioredis');
+
+class MultiLayerRateLimiter {
+  constructor(redisClient) {
+    this.redis = redisClient;
+  }
+
+  /**
+   * Layered rate limiting strategy:
+   * 1. IP-based (DDoS protection)
+   * 2. User-based (fair usage)
+   * 3. Endpoint-specific (expensive operations)
+   */
+  async checkLayers(req) {
+    const layers = [
+      // Layer 1: IP-based rate limiting (DDoS protection)
+      {
+        name: 'ip',
+        identifier: req.ip,
+        limits: { points: 1000, duration: 60 }, // 1000 req/min per IP
+        priority: 'high'
+      },
+      // Layer 2: User-based rate limiting
+      {
+        name: 'user',
+        identifier: req.user?.id || `anon:${req.ip}`,
+        limits: this.getUserTierLimits(req.user),
+        priority: 'medium'
+      },
+      // Layer 3: Endpoint-specific limiting
+      {
+        name: 'endpoint',
+        identifier: `${req.user?.id || req.ip}:${req.path}`,
+        limits: this.getEndpointLimits(req.path),
+        priority: 'low'
+      }
+    ];
+
+    for (const layer of layers) {
+      const result = await this.checkLimit(layer);
+      if (!result.allowed) {
+        return {
+          blocked: true,
+          layer: layer.name,
+          ...result
+        };
+      }
+    }
+
+    return { blocked: false };
+  }
+
+  async checkLimit(layer) {
+    const key = `rl:${layer.name}:${layer.identifier}`;
+    const now = Date.now();
+
+    const count = await this.redis.incr(key);
+
+    if (count === 1) {
+      await this.redis.expire(key, layer.limits.duration);
+    }
+
+    const ttl = await this.redis.ttl(key);
+    const allowed = count <= layer.limits.points;
+
+    return {
+      allowed,
+      remaining: Math.max(0, layer.limits.points - count),
+      resetTime: now + (ttl * 1000),
+      retryAfter: allowed ? null : ttl
+    };
+  }
+
+  getUserTierLimits(user) {
+    if (!user) return { points: 20, duration: 60 };
+    const tiers = {
+      free: { points: 100, duration: 60 },
+      premium: { points: 1000, duration: 60 },
+      enterprise: { points: 10000, duration: 60 }
+    };
+    return tiers[user.subscription] || tiers.free;
+  }
+
+  getEndpointLimits(path) {
+    const expensiveEndpoints = {
+      '/api/ai/generate': { points: 10, duration: 3600 },  // 10/hour
+      '/api/video/render': { points: 5, duration: 3600 },  // 5/hour
+      '/api/export/large': { points: 20, duration: 3600 }  // 20/hour
+    };
+    return expensiveEndpoints[path] || { points: 1000, duration: 60 };
+  }
+
+  middleware() {
+    return async (req, res, next) => {
+      try {
+        const result = await this.checkLayers(req);
+
+        if (result.blocked) {
+          res.set({
+            'X-RateLimit-Layer': result.layer,
+            'X-RateLimit-Remaining': result.remaining,
+            'Retry-After': result.retryAfter
+          });
+
+          return res.status(429).json({
+            error: 'Rate limit exceeded',
+            layer: result.layer,
+            retryAfter: result.retryAfter,
+            message: `Too many requests. Please retry after ${result.retryAfter} seconds.`
+          });
+        }
+
+        next();
+      } catch (error) {
+        console.error('Multi-layer rate limiter error:', error);
+        next(); // Fail open
+      }
+    };
+  }
+}
+
+module.exports = MultiLayerRateLimiter;
+```
+
+## Error Handling
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| "Redis connection failed" | Redis server unreachable | Check Redis server status, verify connection string, implement connection retry |
+| "Rate limiter fail-closed" | Redis timeout, middleware blocking all traffic | Implement fail-open strategy with circuit breaker pattern |
+| "Inconsistent rate limits" | Clock skew across servers | Use Redis time (`TIME` command) instead of server time |
+| "Memory exhaustion" | Too many keys, no TTL set | Always set TTL on rate limit keys, use key expiration monitoring |
+| "False positives from NAT" | Multiple users behind same IP | Use authenticated user IDs when available, consider X-Forwarded-For |
+
+## Configuration Options
+
+**Rate Limit Algorithms**
+- **Token Bucket**: Best for user-facing APIs with burst allowance
+- **Sliding Window**: Most accurate, higher memory usage
+- **Fixed Window**: Simplest, allows boundary bursts
+- **Leaky Bucket**: Constant rate, complex UX
+
+**Tier Definitions**
+```json
+{
+  "anonymous": { "points": 20, "duration": 60 },
+  "free": { "points": 100, "duration": 60 },
+  "premium": { "points": 1000, "duration": 60 },
+  "enterprise": { "points": 10000, "duration": 60 }
+}
+```
+
+**Redis Configuration**
+- **Connection pooling**: Minimum 5 connections
+- **Retry strategy**: Exponential backoff up to 2s
+- **Failover**: Redis Sentinel or Cluster for HA
+- **Persistence**: AOF for rate limit state recovery
+
+## Best Practices
+
+DO:
+- Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset)
+- Implement graceful degradation (fail open on Redis failure)
+- Use user ID over IP when authenticated (avoids NAT issues)
+- Set TTL on all Redis keys to prevent memory leaks
+- Monitor rate limiter performance (latency, block rate)
+- Provide clear error messages with retry guidance
+
+DON'T:
+- Block legitimate traffic (tune limits based on real usage)
+- Use client-side rate limiting only (easily bypassed)
+- Forget to handle Redis connection failures (causes complete outage)
+- Implement synchronous Redis calls (adds latency to every request)
+- Use rate limiting as only defense against DDoS (need multiple layers)
+
+TIPS:
+- Start conservative, increase limits based on monitoring
+- Use different limits for different operations (read vs write)
+- Implement per-endpoint rate limits for expensive operations
+- Cache tier lookups to reduce database queries
+- Log rate limit violations for security analysis
+- Provide upgrade paths for users hitting limits
+
+## Performance Considerations
+
+**Latency Impact**
+- Token bucket: 1-2ms added to request (single Redis call)
+- Sliding window: 2-4ms (multiple Redis operations)
+- With pipelining: <1ms for all algorithms
+
+**Redis Memory Usage**
+- Token bucket: ~100 bytes per user
+- Sliding window: ~50 bytes per request in window
+- Fixed window: ~50 bytes per user per window
+
+**Throughput**
+- Redis can handle 100k+ operations/second
+- Use Redis Cluster for horizontal scaling
+- Pipeline Redis operations when possible
+- Consider local caching for extremely high throughput
+
+## Security Considerations
+
+1. **DDoS Protection**: Implement IP-based rate limiting as first layer
+2. **Credential Stuffing**: Add stricter limits on authentication endpoints
+3. **API Scraping**: Implement progressive delays for repeated violations
+4. **Distributed Attacks**: Use shared Redis across all API servers
+5. **Bypass Attempts**: Validate X-Forwarded-For headers, don't trust blindly
+6. **State Consistency**: Use Redis transactions to prevent race conditions
+
+## Troubleshooting
+
+**Rate Limits Not Enforced**
+```bash
+# Check Redis connectivity
+redis-cli -h localhost -p 6379 ping
+
+# Verify keys are being created
+redis-cli --scan --pattern 'rl:*' | head -10
+
+# Check TTL is set correctly
+redis-cli TTL rl:user:123456
+```
+
+**Too Many False Positives**
+```bash
+# Review blocked requests by IP
+redis-cli --scan --pattern 'rl:ip:*' | xargs redis-cli MGET
+
+# Check tier assignments
+# Review application logs for tier calculation
+
+# Analyze legitimate traffic patterns
+# Adjust limits based on p95/p99 usage
+```
+
+**Redis Memory Issues**
+```bash
+# Check memory usage
+redis-cli INFO memory
+
+# Count rate limit keys
+redis-cli --scan --pattern 'rl:*' | wc -l
+
+# Review keys without TTL
+redis-cli --scan --pattern 'rl:*' | xargs redis-cli TTL | grep -c "^-1"
+```
+
+## Related Commands
+
+- `/create-monitoring` - Monitor rate limit metrics and violations
+- `/api-authentication-builder` - Integrate with auth for user-based limits
+- `/api-load-tester` - Test rate limiter under realistic load
+- `/setup-logging` - Log rate limit violations for analysis
+
+## Version History
+
+- v1.0.0 (2024-10): Initial implementation with token bucket and sliding window
+- Planned v1.1.0: Add adaptive rate limiting based on system load