Files
gh-jezweb-claude-skills-ski…/references/rate-limits.md
2025-11-30 08:24:01 +08:00

2.1 KiB

Rate Limits Guide

Complete guide to Claude API rate limits and how to handle them.

Overview

Claude API uses token bucket algorithm for rate limiting:

  • Capacity continuously replenishes
  • Three types: Requests per minute (RPM), Tokens per minute (TPM), Daily tokens
  • Limits vary by account tier and model

Rate Limit Tiers

Tier Requirements Example Limits (Sonnet 3.5)
Tier 1 New account 50 RPM, 40k TPM
Tier 2 $10 spend 1000 RPM, 100k TPM
Tier 3 $50 spend 2000 RPM, 200k TPM
Tier 4 $500 spend 4000 RPM, 400k TPM

Note: Limits vary by model. Check Console for exact limits.

Response Headers

Every API response includes:

anthropic-ratelimit-requests-limit: 50
anthropic-ratelimit-requests-remaining: 49
anthropic-ratelimit-requests-reset: 2025-10-25T12:00:00Z
anthropic-ratelimit-tokens-limit: 50000
anthropic-ratelimit-tokens-remaining: 49500
anthropic-ratelimit-tokens-reset: 2025-10-25T12:01:00Z

On 429 errors:

retry-after: 60  // Seconds until retry allowed

Handling Rate Limits

Basic Exponential Backoff

async function withRetry(requestFn, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await requestFn();
    } catch (error) {
      if (error.status === 429) {
        const delay = 1000 * Math.pow(2, attempt);
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
}

Respecting retry-after Header

const retryAfter = error.response?.headers?.['retry-after'];
const delay = retryAfter ? parseInt(retryAfter) * 1000 : baseDelay;

Best Practices

  1. Monitor headers - Check remaining requests/tokens
  2. Implement backoff - Exponential delay on 429
  3. Respect retry-after - Use provided wait time
  4. Batch requests - Group when possible
  5. Use caching - Reduce duplicate requests
  6. Upgrade tier - Scale with usage

Official Docs

https://docs.claude.com/en/api/rate-limits