Files
gh-jeremylongshore-claude-c…/commands/add-rate-limiting.md
2025-11-29 18:52:31 +08:00

21 KiB

description, shortcut
description shortcut
Add rate limiting to API endpoints ratelimit

Add Rate Limiting to API Endpoints

Implement production-ready rate limiting with token bucket, sliding window, or fixed window algorithms using Redis for distributed state management.

When to Use This Command

Use /add-rate-limiting when you need to:

  • Protect APIs from abuse and DDoS attacks
  • Enforce fair usage policies across user tiers
  • Prevent resource exhaustion from runaway clients
  • Comply with downstream API rate limits
  • Implement freemium pricing models with usage tiers
  • Control costs for expensive operations (AI inference, video processing)

DON'T use this when:

  • Building internal-only APIs with trusted clients (use circuit breakers instead)
  • Single-user applications (no shared resource contention)
  • Already behind API gateway with built-in rate limiting (avoid double limiting)

Design Decisions

This command implements Token Bucket algorithm with Redis as the primary approach because:

  • Allows burst traffic while maintaining average rate (better UX)
  • Distributed state enables horizontal scaling
  • Redis atomic operations prevent race conditions
  • Standard algorithm with well-understood behavior

Alternative considered: Sliding Window

  • More accurate rate limiting (no reset boundary issues)
  • Higher Redis memory usage (stores timestamp per request)
  • Slightly higher computational overhead
  • Recommended for strict compliance requirements

Alternative considered: Fixed Window

  • Simplest implementation (single counter)
  • Burst at window boundaries (2x limit possible)
  • Lower memory footprint
  • Recommended only for non-critical rate limiting

Alternative considered: Leaky Bucket

  • Constant output rate (smooths bursty traffic)
  • Complex to explain to users
  • Less common in practice
  • Recommended for queuing systems, not APIs

Prerequisites

Before running this command:

  1. Redis server installed and accessible (standalone or cluster)
  2. Node.js/Python runtime for middleware implementation
  3. API framework that supports middleware (Express, FastAPI, etc.)
  4. Understanding of your API usage patterns and SLO requirements
  5. Monitoring infrastructure to track rate limit metrics

Implementation Process

Step 1: Choose Rate Limiting Strategy

Select algorithm based on requirements: Token Bucket for user-facing APIs, Sliding Window for strict compliance, Fixed Window for internal APIs.

Step 2: Configure Redis Connection

Set up Redis client with connection pooling, retry logic, and failover handling for high availability.

Step 3: Implement Rate Limiter Middleware

Create middleware that intercepts requests, checks Redis state, and enforces limits with proper HTTP headers.

Step 4: Define Rate Limit Tiers

Configure different limits for user segments (anonymous, free, premium, enterprise) based on business requirements.

Step 5: Add Monitoring and Alerting

Instrument rate limiter with metrics for blocked requests, Redis latency, and tier usage patterns.

Output Format

The command generates:

  • rate-limiter.js or rate_limiter.py - Core rate limiting middleware
  • redis-config.js - Redis connection configuration with failover
  • rate-limit-tiers.json - Tiered limit definitions
  • rate-limiter.test.js - Comprehensive test suite
  • README.md - Integration guide and configuration options
  • docker-compose.yml - Redis setup for local development

Code Examples

Example 1: Token Bucket Rate Limiter with Express and Redis

// rate-limiter.js
const Redis = require('ioredis');

class TokenBucketRateLimiter {
  constructor(redisClient, options = {}) {
    this.redis = redisClient;
    this.defaultOptions = {
      points: 100,        // Number of tokens
      duration: 60,       // Time window in seconds
      blockDuration: 60,  // Block duration after limit exceeded
      keyPrefix: 'rl',    // Redis key prefix
      ...options
    };
  }

  /**
   * Token bucket algorithm using Redis
   * Returns: { allowed: boolean, remaining: number, resetTime: number }
   */
  async consume(identifier, points = 1, options = {}) {
    const opts = { ...this.defaultOptions, ...options };
    const key = `${opts.keyPrefix}:${identifier}`;
    const now = Date.now();

    // Lua script for atomic token bucket operations
    const luaScript = `
      local key = KEYS[1]
      local capacity = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local requested = tonumber(ARGV[3])
      local now = tonumber(ARGV[4])
      local ttl = tonumber(ARGV[5])

      -- Get current state or initialize
      local tokens = tonumber(redis.call('HGET', key, 'tokens'))
      local last_refill = tonumber(redis.call('HGET', key, 'last_refill'))

      if not tokens then
        tokens = capacity
        last_refill = now
      end

      -- Calculate tokens to add since last refill
      local time_passed = now - last_refill
      local tokens_to_add = math.floor(time_passed * refill_rate)
      tokens = math.min(capacity, tokens + tokens_to_add)
      last_refill = now

      -- Check if we can fulfill request
      if tokens >= requested then
        tokens = tokens - requested
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
        redis.call('EXPIRE', key, ttl)
        return {1, tokens, last_refill}
      else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
        redis.call('EXPIRE', key, ttl)
        return {0, tokens, last_refill}
      end
    `;

    const refillRate = opts.points / opts.duration;
    const result = await this.redis.eval(
      luaScript,
      1,
      key,
      opts.points,
      refillRate,
      points,
      now,
      opts.duration
    );

    const [allowed, remaining, lastRefill] = result;
    const resetTime = lastRefill + (opts.duration * 1000);

    return {
      allowed: allowed === 1,
      remaining: Math.floor(remaining),
      resetTime: new Date(resetTime).toISOString(),
      retryAfter: allowed === 1 ? null : Math.ceil((opts.duration * 1000 - (now - lastRefill)) / 1000)
    };
  }

  /**
   * Express middleware factory
   */
  middleware(getTier = null) {
    return async (req, res, next) => {
      try {
        // Determine identifier (user ID or IP)
        const identifier = req.user?.id || req.ip;

        // Get tier configuration
        const tier = getTier ? await getTier(req) : 'default';
        const tierConfig = this.getTierConfig(tier);

        // Consume tokens
        const result = await this.consume(identifier, 1, tierConfig);

        // Set rate limit headers
        res.set({
          'X-RateLimit-Limit': tierConfig.points,
          'X-RateLimit-Remaining': result.remaining,
          'X-RateLimit-Reset': result.resetTime
        });

        if (!result.allowed) {
          res.set('Retry-After', result.retryAfter);
          return res.status(429).json({
            error: 'Too Many Requests',
            message: `Rate limit exceeded. Try again in ${result.retryAfter} seconds.`,
            retryAfter: result.retryAfter
          });
        }

        next();
      } catch (error) {
        console.error('Rate limiter error:', error);
        // Fail open to avoid blocking all traffic on Redis failure
        next();
      }
    };
  }

  getTierConfig(tier) {
    const tiers = {
      anonymous: { points: 20, duration: 60 },
      free: { points: 100, duration: 60 },
      premium: { points: 1000, duration: 60 },
      enterprise: { points: 10000, duration: 60 }
    };
    return tiers[tier] || tiers.free;
  }
}

// Usage example
const redis = new Redis({
  host: process.env.REDIS_HOST || 'localhost',
  port: process.env.REDIS_PORT || 6379,
  retryStrategy: (times) => Math.min(times * 50, 2000),
  maxRetriesPerRequest: 3
});

const rateLimiter = new TokenBucketRateLimiter(redis);

// Apply to all routes
app.use(rateLimiter.middleware(async (req) => {
  if (req.user?.subscription === 'enterprise') return 'enterprise';
  if (req.user?.subscription === 'premium') return 'premium';
  if (req.user) return 'free';
  return 'anonymous';
}));

// Apply stricter limit to specific expensive endpoint
app.post('/api/ai/generate',
  rateLimiter.middleware(() => ({ points: 10, duration: 3600 })),
  handleGenerate
);

module.exports = TokenBucketRateLimiter;

Example 2: Sliding Window Rate Limiter in Python with FastAPI

# rate_limiter.py
import time
import redis.asyncio as aioredis
from fastapi import Request, Response, HTTPException
from typing import Optional, Callable
import asyncio

class SlidingWindowRateLimiter:
    def __init__(self, redis_client: aioredis.Redis, window_size: int = 60, max_requests: int = 100):
        self.redis = redis_client
        self.window_size = window_size
        self.max_requests = max_requests
        self.key_prefix = "rate_limit"

    async def is_allowed(self, identifier: str, tier_config: dict = None) -> dict:
        """
        Sliding window algorithm using Redis sorted set
        Each request is a member with score = timestamp
        """
        config = tier_config or {'max_requests': self.max_requests, 'window_size': self.window_size}
        now = time.time()
        window_start = now - config['window_size']
        key = f"{self.key_prefix}:{identifier}"

        # Redis pipeline for atomic operations
        pipe = self.redis.pipeline()

        # Remove old entries outside the window
        pipe.zremrangebyscore(key, 0, window_start)

        # Count requests in current window
        pipe.zcard(key)

        # Add current request
        pipe.zadd(key, {str(now): now})

        # Set expiration
        pipe.expire(key, config['window_size'] + 10)

        results = await pipe.execute()
        request_count = results[1]

        if request_count >= config['max_requests']:
            # Get oldest request in window to calculate retry time
            oldest = await self.redis.zrange(key, 0, 0, withscores=True)
            if oldest:
                oldest_time = oldest[0][1]
                retry_after = int(config['window_size'] - (now - oldest_time)) + 1
            else:
                retry_after = config['window_size']

            return {
                'allowed': False,
                'remaining': 0,
                'reset_time': int(now + retry_after),
                'retry_after': retry_after
            }

        remaining = config['max_requests'] - request_count - 1
        reset_time = int(now + config['window_size'])

        return {
            'allowed': True,
            'remaining': remaining,
            'reset_time': reset_time,
            'retry_after': None
        }

    def middleware(self, get_tier: Optional[Callable] = None):
        """FastAPI middleware factory"""
        async def rate_limit_middleware(request: Request, call_next):
            # Get identifier (user ID or IP)
            identifier = getattr(request.state, 'user_id', None) or request.client.host

            # Get tier configuration
            tier_config = None
            if get_tier:
                tier_config = await get_tier(request)

            # Check rate limit
            result = await self.is_allowed(identifier, tier_config)

            # Always set rate limit headers
            response = None
            if result['allowed']:
                response = await call_next(request)
            else:
                response = Response(
                    content=f'{{"error": "Rate limit exceeded", "retry_after": {result["retry_after"]}}}',
                    status_code=429,
                    media_type="application/json"
                )

            response.headers['X-RateLimit-Limit'] = str(tier_config['max_requests'] if tier_config else self.max_requests)
            response.headers['X-RateLimit-Remaining'] = str(result['remaining'])
            response.headers['X-RateLimit-Reset'] = str(result['reset_time'])

            if not result['allowed']:
                response.headers['Retry-After'] = str(result['retry_after'])

            return response

        return rate_limit_middleware

# Usage in FastAPI
from fastapi import FastAPI
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup
    app.state.redis = await aioredis.from_url("redis://localhost:6379")
    app.state.rate_limiter = SlidingWindowRateLimiter(app.state.redis)
    yield
    # Shutdown
    await app.state.redis.close()

app = FastAPI(lifespan=lifespan)

async def get_user_tier(request: Request) -> dict:
    """Determine user tier from request"""
    user = getattr(request.state, 'user', None)
    if not user:
        return {'max_requests': 20, 'window_size': 60}  # Anonymous
    elif user.get('subscription') == 'enterprise':
        return {'max_requests': 10000, 'window_size': 60}
    elif user.get('subscription') == 'premium':
        return {'max_requests': 1000, 'window_size': 60}
    else:
        return {'max_requests': 100, 'window_size': 60}  # Free tier

# Apply rate limiting middleware
app.middleware("http")(app.state.rate_limiter.middleware(get_user_tier))

Example 3: DDoS Protection with Multi-Layer Rate Limiting

// advanced-rate-limiter.js - Multi-layer protection
const Redis = require('ioredis');

class MultiLayerRateLimiter {
  constructor(redisClient) {
    this.redis = redisClient;
  }

  /**
   * Layered rate limiting strategy:
   * 1. IP-based (DDoS protection)
   * 2. User-based (fair usage)
   * 3. Endpoint-specific (expensive operations)
   */
  async checkLayers(req) {
    const layers = [
      // Layer 1: IP-based rate limiting (DDoS protection)
      {
        name: 'ip',
        identifier: req.ip,
        limits: { points: 1000, duration: 60 }, // 1000 req/min per IP
        priority: 'high'
      },
      // Layer 2: User-based rate limiting
      {
        name: 'user',
        identifier: req.user?.id || `anon:${req.ip}`,
        limits: this.getUserTierLimits(req.user),
        priority: 'medium'
      },
      // Layer 3: Endpoint-specific limiting
      {
        name: 'endpoint',
        identifier: `${req.user?.id || req.ip}:${req.path}`,
        limits: this.getEndpointLimits(req.path),
        priority: 'low'
      }
    ];

    for (const layer of layers) {
      const result = await this.checkLimit(layer);
      if (!result.allowed) {
        return {
          blocked: true,
          layer: layer.name,
          ...result
        };
      }
    }

    return { blocked: false };
  }

  async checkLimit(layer) {
    const key = `rl:${layer.name}:${layer.identifier}`;
    const now = Date.now();

    const count = await this.redis.incr(key);

    if (count === 1) {
      await this.redis.expire(key, layer.limits.duration);
    }

    const ttl = await this.redis.ttl(key);
    const allowed = count <= layer.limits.points;

    return {
      allowed,
      remaining: Math.max(0, layer.limits.points - count),
      resetTime: now + (ttl * 1000),
      retryAfter: allowed ? null : ttl
    };
  }

  getUserTierLimits(user) {
    if (!user) return { points: 20, duration: 60 };
    const tiers = {
      free: { points: 100, duration: 60 },
      premium: { points: 1000, duration: 60 },
      enterprise: { points: 10000, duration: 60 }
    };
    return tiers[user.subscription] || tiers.free;
  }

  getEndpointLimits(path) {
    const expensiveEndpoints = {
      '/api/ai/generate': { points: 10, duration: 3600 },  // 10/hour
      '/api/video/render': { points: 5, duration: 3600 },  // 5/hour
      '/api/export/large': { points: 20, duration: 3600 }  // 20/hour
    };
    return expensiveEndpoints[path] || { points: 1000, duration: 60 };
  }

  middleware() {
    return async (req, res, next) => {
      try {
        const result = await this.checkLayers(req);

        if (result.blocked) {
          res.set({
            'X-RateLimit-Layer': result.layer,
            'X-RateLimit-Remaining': result.remaining,
            'Retry-After': result.retryAfter
          });

          return res.status(429).json({
            error: 'Rate limit exceeded',
            layer: result.layer,
            retryAfter: result.retryAfter,
            message: `Too many requests. Please retry after ${result.retryAfter} seconds.`
          });
        }

        next();
      } catch (error) {
        console.error('Multi-layer rate limiter error:', error);
        next(); // Fail open
      }
    };
  }
}

module.exports = MultiLayerRateLimiter;

Error Handling

Error Cause Solution
"Redis connection failed" Redis server unreachable Check Redis server status, verify connection string, implement connection retry
"Rate limiter fail-closed" Redis timeout, middleware blocking all traffic Implement fail-open strategy with circuit breaker pattern
"Inconsistent rate limits" Clock skew across servers Use Redis time (TIME command) instead of server time
"Memory exhaustion" Too many keys, no TTL set Always set TTL on rate limit keys, use key expiration monitoring
"False positives from NAT" Multiple users behind same IP Use authenticated user IDs when available, consider X-Forwarded-For

Configuration Options

Rate Limit Algorithms

  • Token Bucket: Best for user-facing APIs with burst allowance
  • Sliding Window: Most accurate, higher memory usage
  • Fixed Window: Simplest, allows boundary bursts
  • Leaky Bucket: Constant rate, complex UX

Tier Definitions

{
  "anonymous": { "points": 20, "duration": 60 },
  "free": { "points": 100, "duration": 60 },
  "premium": { "points": 1000, "duration": 60 },
  "enterprise": { "points": 10000, "duration": 60 }
}

Redis Configuration

  • Connection pooling: Minimum 5 connections
  • Retry strategy: Exponential backoff up to 2s
  • Failover: Redis Sentinel or Cluster for HA
  • Persistence: AOF for rate limit state recovery

Best Practices

DO:

  • Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset)
  • Implement graceful degradation (fail open on Redis failure)
  • Use user ID over IP when authenticated (avoids NAT issues)
  • Set TTL on all Redis keys to prevent memory leaks
  • Monitor rate limiter performance (latency, block rate)
  • Provide clear error messages with retry guidance

DON'T:

  • Block legitimate traffic (tune limits based on real usage)
  • Use client-side rate limiting only (easily bypassed)
  • Forget to handle Redis connection failures (causes complete outage)
  • Implement synchronous Redis calls (adds latency to every request)
  • Use rate limiting as only defense against DDoS (need multiple layers)

TIPS:

  • Start conservative, increase limits based on monitoring
  • Use different limits for different operations (read vs write)
  • Implement per-endpoint rate limits for expensive operations
  • Cache tier lookups to reduce database queries
  • Log rate limit violations for security analysis
  • Provide upgrade paths for users hitting limits

Performance Considerations

Latency Impact

  • Token bucket: 1-2ms added to request (single Redis call)
  • Sliding window: 2-4ms (multiple Redis operations)
  • With pipelining: <1ms for all algorithms

Redis Memory Usage

  • Token bucket: ~100 bytes per user
  • Sliding window: ~50 bytes per request in window
  • Fixed window: ~50 bytes per user per window

Throughput

  • Redis can handle 100k+ operations/second
  • Use Redis Cluster for horizontal scaling
  • Pipeline Redis operations when possible
  • Consider local caching for extremely high throughput

Security Considerations

  1. DDoS Protection: Implement IP-based rate limiting as first layer
  2. Credential Stuffing: Add stricter limits on authentication endpoints
  3. API Scraping: Implement progressive delays for repeated violations
  4. Distributed Attacks: Use shared Redis across all API servers
  5. Bypass Attempts: Validate X-Forwarded-For headers, don't trust blindly
  6. State Consistency: Use Redis transactions to prevent race conditions

Troubleshooting

Rate Limits Not Enforced

# Check Redis connectivity
redis-cli -h localhost -p 6379 ping

# Verify keys are being created
redis-cli --scan --pattern 'rl:*' | head -10

# Check TTL is set correctly
redis-cli TTL rl:user:123456

Too Many False Positives

# Review blocked requests by IP
redis-cli --scan --pattern 'rl:ip:*' | xargs redis-cli MGET

# Check tier assignments
# Review application logs for tier calculation

# Analyze legitimate traffic patterns
# Adjust limits based on p95/p99 usage

Redis Memory Issues

# Check memory usage
redis-cli INFO memory

# Count rate limit keys
redis-cli --scan --pattern 'rl:*' | wc -l

# Review keys without TTL
redis-cli --scan --pattern 'rl:*' | xargs redis-cli TTL | grep -c "^-1"
  • /create-monitoring - Monitor rate limit metrics and violations
  • /api-authentication-builder - Integrate with auth for user-based limits
  • /api-load-tester - Test rate limiter under realistic load
  • /setup-logging - Log rate limit violations for analysis

Version History

  • v1.0.0 (2024-10): Initial implementation with token bucket and sliding window
  • Planned v1.1.0: Add adaptive rate limiting based on system load