Initial commit

2025-11-30 08:59:27 +08:00
commit adfd3add64
15 changed files with 10018 additions and 0 deletions
--- a/skills/using-web-backend/SKILL.md
+++ b/skills/using-web-backend/SKILL.md
@@ -0,0 +1,135 @@
+---
+name: using-web-backend
+description: Use when building web APIs, backend services, or encountering FastAPI/Django/Express/GraphQL questions, microservices architecture, authentication, or message queues - routes to 11 specialist skills rather than giving surface-level generic advice
+---
+
+# Using Web Backend Skills
+
+## Overview
+
+**This router directs you to specialized web backend skills. Each specialist provides deep expertise in their domain.**
+
+**Core principle:** Different backend challenges require different specialist knowledge. Routing to the right skill gives better results than generic advice.
+
+## When to Use
+
+Use this router when encountering:
+
+- **Framework-specific questions**: FastAPI, Django, Express implementation details
+- **API design**: REST or GraphQL architecture, versioning, schema design
+- **Architecture patterns**: Microservices, message queues, event-driven systems
+- **Backend infrastructure**: Authentication, database integration, deployment
+- **Testing & documentation**: API testing strategies, documentation approaches
+
+## Quick Reference - Routing Table
+
+| User Question Contains | Route To | Why |
+|------------------------|----------|-----|
+| FastAPI, Pydantic, async Python APIs | [fastapi-development.md](fastapi-development.md) | FastAPI-specific patterns, dependency injection, async |
+| Django, ORM, views, middleware | [django-development.md](django-development.md) | Django conventions, ORM optimization, settings |
+| Express, Node.js backend, middleware | [express-development.md](express-development.md) | Express patterns, error handling, async flow |
+| REST API, endpoints, versioning, pagination | [rest-api-design.md](rest-api-design.md) | REST principles, resource design, hypermedia |
+| GraphQL, schema, resolvers, N+1 | [graphql-api-design.md](graphql-api-design.md) | Schema design, query optimization, federation |
+| Microservices, service mesh, boundaries | [microservices-architecture.md](microservices-architecture.md) | Service design, communication, consistency |
+| Message queues, RabbitMQ, Kafka, events | [message-queues.md](message-queues.md) | Queue patterns, reliability, event-driven |
+| JWT, OAuth2, API keys, auth | [api-authentication.md](api-authentication.md) | Auth patterns, token management, security |
+| Database connections, ORM, migrations | [database-integration.md](database-integration.md) | Connection pooling, query optimization, migrations |
+| API testing, integration tests, mocking | [api-testing.md](api-testing.md) | Testing strategies, contract testing, mocking |
+| OpenAPI, Swagger, API docs | [api-documentation.md](api-documentation.md) | API docs (also see: muna-technical-writer) |
+
+## Cross-References to Other Packs
+
+**Before routing, check if these packs are more appropriate:**
+
+- **Security concerns** → `ordis-security-architect` (threat modeling, OWASP, security patterns)
+- **API usability/UX** → `lyra-ux-designer` (error messages, API ergonomics)
+- **Python code patterns** → `axiom-python-engineering` (Python-specific engineering)
+- **Documentation writing** → `muna-technical-writer` (technical writing, clarity)
+
+## How to Route
+
+**STOP: Do not attempt to answer web backend questions yourself.**
+
+**Instead:**
+
+1. **Identify the specialist domain** from the routing table above
+2. **State which specialist skill you're using**: "I'll use the `[skill-name]` skill for this"
+3. **Apply the specialist skill** to provide deep, expert guidance
+
+## Red Flags - Using Generic Knowledge Instead of Specialists
+
+If you catch yourself doing any of these, STOP and route to a specialist:
+
+- ❌ "I'll answer all these diverse backend questions myself"
+- ❌ Giving code examples from "general backend knowledge"
+- ❌ Providing "competent but broad" answers across multiple domains
+- ❌ Not mentioning specialized skills would give better guidance
+- ❌ "The specialist skill isn't available right now, so I'll answer myself"
+
+**All of these mean: You're being a jack-of-all-trades. Route to the specialist.**
+
+## Rationalization Table
+
+| Excuse | Reality |
+|--------|---------|
+| "I can handle this backend question myself" | Specialists provide deeper expertise than generic knowledge |
+| "These are all backend topics" | Backend spans 11 distinct specialist domains |
+| "I'll just give a quick answer" | Quick answers miss edge cases and production patterns |
+| "The user didn't ask for specialized help" | Better results come from routing to specialists proactively |
+| "Specialist skill not available, I'll answer instead" | State which skill would help, explain routing, guide user to install if needed |
+
+## Example Routing
+
+**User**: "I need to build a REST API with FastAPI. How should I structure my dependency injection?"
+
+**Your response**:
+"I'll use the [fastapi-development.md](fastapi-development.md) skill to help you structure dependency injection properly. This specialist skill covers FastAPI patterns in depth."
+
+[Then apply fastapi-development skill]
+
+**User**: "My GraphQL API has N+1 problems and we're splitting our monolith into microservices."
+
+**Your response**:
+"You have two distinct specialist areas:
+1. GraphQL optimization → I'll use [graphql-api-design.md](graphql-api-design.md)
+2. Microservices architecture → I'll use [microservices-architecture.md](microservices-architecture.md)
+
+Let me address the GraphQL N+1 problem first with the graphql-api-design skill..."
+
+[Apply each specialist skill to its domain]
+
+## Why This Matters
+
+**Without routing**: Surface-level answers covering multiple domains broadly
+**With routing**: Deep expertise addressing edge cases, production patterns, and domain-specific best practices
+
+Specialist skills = better results.
+
+---
+
+## Web Backend Specialist Skills Catalog
+
+After routing, load the appropriate specialist skill for detailed guidance:
+
+### Framework-Specific Skills
+
+1. [fastapi-development.md](fastapi-development.md) - FastAPI patterns, dependency injection, async/await, Pydantic validation, background tasks
+2. [django-development.md](django-development.md) - Django conventions, ORM optimization, middleware, settings, management commands
+3. [express-development.md](express-development.md) - Express patterns, middleware chains, error handling, async flow control
+
+### API Design Skills
+
+4. [rest-api-design.md](rest-api-design.md) - REST principles, resource design, versioning, pagination, HATEOAS, HTTP semantics
+5. [graphql-api-design.md](graphql-api-design.md) - GraphQL schema design, resolver patterns, N+1 query optimization, federation
+
+### Architecture & Infrastructure
+
+6. [microservices-architecture.md](microservices-architecture.md) - Service boundaries, communication patterns, distributed consistency, service mesh
+7. [message-queues.md](message-queues.md) - Queue patterns, reliability guarantees, event-driven architecture, RabbitMQ/Kafka
+
+### Cross-Cutting Concerns
+
+8. [api-authentication.md](api-authentication.md) - JWT, OAuth2, API keys, token management, auth patterns
+9. [database-integration.md](database-integration.md) - Connection pooling, query optimization, migrations, ORM patterns
+10. [api-testing.md](api-testing.md) - Testing strategies, contract testing, integration tests, mocking
+11. [api-documentation.md](api-documentation.md) - OpenAPI/Swagger, API documentation patterns, schema generation
--- a/skills/using-web-backend/api-authentication.md
+++ b/skills/using-web-backend/api-authentication.md
--- a/skills/using-web-backend/api-documentation.md
+++ b/skills/using-web-backend/api-documentation.md
@@ -0,0 +1,944 @@
+
+# API Documentation
+
+## Overview
+
+**API documentation specialist covering OpenAPI specs, documentation-as-code, testing docs, SDK generation, and preventing documentation debt.**
+
+**Core principle**: Documentation is a product feature that directly impacts developer adoption - invest in keeping it accurate, tested, and discoverable.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **OpenAPI/Swagger**: Auto-generating docs, customizing Swagger UI, maintaining specs
+- **Documentation testing**: Ensuring examples work, preventing stale docs
+- **Versioning**: Managing multi-version docs, deprecation notices
+- **Documentation-as-code**: Keeping docs in sync with code changes
+- **SDK generation**: Generating client libraries from OpenAPI specs
+- **Documentation debt**: Detecting and preventing outdated documentation
+- **Metrics**: Tracking documentation usage and effectiveness
+- **Community docs**: Managing contributions, improving discoverability
+
+**Do NOT use for**:
+- General technical writing (see `muna-technical-writer` skill)
+- API design principles (see `rest-api-design`, `graphql-api-design`)
+- Authentication implementation (see `api-authentication`)
+
+## OpenAPI Specification Best Practices
+
+### Production-Quality OpenAPI Specs
+
+**Complete FastAPI example**:
+
+```python
+from fastapi import FastAPI, Path, Query, Body
+from pydantic import BaseModel, Field
+from typing import Optional, List
+
+app = FastAPI(
+    title="Payment Processing API",
+    description="""
+    # Payment API
+
+    Process payments with PCI-DSS compliance.
+
+    ## Features
+    - Multiple payment methods (cards, ACH, digital wallets)
+    - Fraud detection
+    - Webhook notifications
+    - Test mode for development
+
+    ## Rate Limits
+    - Standard: 100 requests/minute
+    - Premium: 1000 requests/minute
+
+    ## Support
+    - Documentation: https://docs.example.com
+    - Status: https://status.example.com
+    - Support: api-support@example.com
+    """,
+    version="2.1.0",
+    terms_of_service="https://example.com/terms",
+    contact={
+        "name": "API Support",
+        "url": "https://example.com/support",
+        "email": "api-support@example.com"
+    },
+    license_info={
+        "name": "Apache 2.0",
+        "url": "https://www.apache.org/licenses/LICENSE-2.0.html"
+    },
+    servers=[
+        {"url": "https://api.example.com", "description": "Production"},
+        {"url": "https://sandbox-api.example.com", "description": "Sandbox"}
+    ]
+)
+
+# Tag organization
+tags_metadata = [
+    {
+        "name": "payments",
+        "description": "Payment operations",
+        "externalDocs": {
+            "description": "Payment Guide",
+            "url": "https://docs.example.com/guides/payments"
+        }
+    }
+]
+
+app = FastAPI(openapi_tags=tags_metadata)
+
+# Rich schema with examples
+class PaymentRequest(BaseModel):
+    amount: float = Field(
+        ...,
+        gt=0,
+        le=999999.99,
+        description="Payment amount in USD",
+        example=99.99
+    )
+    currency: str = Field(
+        default="USD",
+        pattern="^[A-Z]{3}$",
+        description="ISO 4217 currency code",
+        example="USD"
+    )
+
+    class Config:
+        schema_extra = {
+            "examples": [
+                {
+                    "amount": 149.99,
+                    "currency": "USD",
+                    "payment_method": "card_visa_4242",
+                    "description": "Premium subscription"
+                },
+                {
+                    "amount": 29.99,
+                    "currency": "EUR",
+                    "payment_method": "paypal_account",
+                    "description": "Monthly plan"
+                }
+            ]
+        }
+
+# Comprehensive error documentation
+@app.post(
+    "/payments",
+    summary="Create payment",
+    description="""
+    Creates a new payment transaction.
+
+    ## Processing Time
+    Typically 2-5 seconds for card payments.
+
+    ## Idempotency
+    Use `Idempotency-Key` header to prevent duplicates.
+
+    ## Test Mode
+    Use test payment methods in sandbox environment.
+    """,
+    responses={
+        201: {"description": "Payment created", "model": PaymentResponse},
+        400: {
+            "description": "Invalid request",
+            "content": {
+                "application/json": {
+                    "examples": {
+                        "invalid_amount": {
+                            "summary": "Amount validation failed",
+                            "value": {
+                                "error_code": "INVALID_AMOUNT",
+                                "message": "Amount must be between 0.01 and 999999.99"
+                            }
+                        }
+                    }
+                }
+            }
+        },
+        402: {"description": "Payment declined"},
+        429: {"description": "Rate limit exceeded"}
+    },
+    tags=["payments"]
+)
+async def create_payment(payment: PaymentRequest):
+    pass
+```
+
+### Custom OpenAPI Generation
+
+**Add security schemes, custom extensions**:
+
+```python
+from fastapi.openapi.utils import get_openapi
+
+def custom_openapi():
+    if app.openapi_schema:
+        return app.openapi_schema
+
+    openapi_schema = get_openapi(
+        title=app.title,
+        version=app.version,
+        description=app.description,
+        routes=app.routes,
+    )
+
+    # Security schemes
+    openapi_schema["components"]["securitySchemes"] = {
+        "ApiKeyAuth": {
+            "type": "apiKey",
+            "in": "header",
+            "name": "X-API-Key",
+            "description": "Get your API key at https://dashboard.example.com/api-keys"
+        },
+        "OAuth2": {
+            "type": "oauth2",
+            "flows": {
+                "authorizationCode": {
+                    "authorizationUrl": "https://auth.example.com/oauth/authorize",
+                    "tokenUrl": "https://auth.example.com/oauth/token",
+                    "scopes": {
+                        "payments:read": "Read payment data",
+                        "payments:write": "Create payments"
+                    }
+                },
+                "clientCredentials": {
+                    "tokenUrl": "https://auth.example.com/oauth/token",
+                    "scopes": {
+                        "payments:read": "Read payment data",
+                        "payments:write": "Create payments"
+                    }
+                }
+            }
+        }
+    }
+
+    # Global security requirement
+    openapi_schema["security"] = [{"ApiKeyAuth": []}]
+
+    # Custom extensions for tooling
+    openapi_schema["x-api-id"] = "payments-api-v2"
+    openapi_schema["x-audience"] = "external"
+    openapi_schema["x-ratelimit-default"] = 100
+
+    # Add code samples extension (for Swagger UI)
+    for path_data in openapi_schema["paths"].values():
+        for operation in path_data.values():
+            if isinstance(operation, dict) and "operationId" in operation:
+                operation["x-code-samples"] = [
+                    {
+                        "lang": "curl",
+                        "source": generate_curl_example(operation)
+                    },
+                    {
+                        "lang": "python",
+                        "source": generate_python_example(operation)
+                    }
+                ]
+
+    app.openapi_schema = openapi_schema
+    return app.openapi_schema
+
+app.openapi = custom_openapi
+```
+
+## Documentation-as-Code
+
+### Keep Docs in Sync with Code
+
+**Anti-pattern**: Docs in separate repo, manually updated, always stale
+
+**Pattern**: Co-locate docs with code, auto-generate from source
+
+**Implementation**:
+
+```python
+# Source of truth: Pydantic models
+class PaymentRequest(BaseModel):
+    """
+    Payment request model.
+
+    Examples:
+        Basic payment:
+        ```python
+        payment = PaymentRequest(
+            amount=99.99,
+            currency="USD",
+            payment_method="pm_card_visa"
+        )
+        ```
+    """
+    amount: float = Field(..., description="Amount in USD")
+    currency: str = Field(default="USD", description="ISO 4217 currency code")
+
+    class Config:
+        schema_extra = {
+            "examples": [
+                {"amount": 99.99, "currency": "USD", "payment_method": "pm_card_visa"}
+            ]
+        }
+
+# Docs auto-generated from model
+# - OpenAPI spec from Field descriptions
+# - Examples from schema_extra
+# - Code samples from docstring examples
+```
+
+**Prevent schema drift**:
+
+```python
+import pytest
+from fastapi.testclient import TestClient
+
+def test_openapi_schema_matches_committed():
+    """Ensure OpenAPI spec is committed and up-to-date"""
+    client = TestClient(app)
+
+    # Get current OpenAPI spec
+    current_spec = client.get("/openapi.json").json()
+
+    # Load committed spec
+    with open("docs/openapi.json") as f:
+        committed_spec = json.load(f)
+
+    # Fail if specs don't match
+    assert current_spec == committed_spec, \
+        "OpenAPI spec has changed. Run 'make update-openapi-spec' and commit"
+
+def test_all_endpoints_have_examples():
+    """Ensure all endpoints have request/response examples"""
+    client = TestClient(app)
+    spec = client.get("/openapi.json").json()
+
+    for path, methods in spec["paths"].items():
+        for method, details in methods.items():
+            if method in ["get", "post", "put", "patch", "delete"]:
+                # Check request body has example
+                if "requestBody" in details:
+                    assert "examples" in details["requestBody"]["content"]["application/json"], \
+                        f"{method.upper()} {path} missing request examples"
+
+                # Check responses have examples
+                for status_code, response in details.get("responses", {}).items():
+                    if "content" in response and "application/json" in response["content"]:
+                        assert "examples" in response["content"]["application/json"] or \
+                               "example" in response["content"]["application/json"]["schema"], \
+                               f"{method.upper()} {path} response {status_code} missing examples"
+```
+
+### Documentation Pre-Commit Hook
+
+```bash
+# .git/hooks/pre-commit
+#!/bin/bash
+
+# Regenerate OpenAPI spec
+python -c "
+from app.main import app
+import json
+
+with open('docs/openapi.json', 'w') as f:
+    json.dump(app.openapi(), f, indent=2)
+"
+
+# Check if spec changed
+git add docs/openapi.json
+
+# Validate spec
+npm run validate:openapi
+
+# Run doc tests
+pytest tests/test_documentation.py
+```
+
+## Documentation Testing
+
+### Ensure Examples Actually Work
+
+**Problem**: Examples in docs become stale, don't work
+
+**Solution**: Test every code example automatically
+
+```python
+# Extract examples from OpenAPI spec
+import pytest
+import requests
+from app.main import app
+
+def get_all_examples_from_openapi():
+    """Extract all examples from OpenAPI spec"""
+    spec = app.openapi()
+    examples = []
+
+    for path, methods in spec["paths"].items():
+        for method, details in methods.items():
+            if "examples" in details.get("requestBody", {}).get("content", {}).get("application/json", {}):
+                for example_name, example_data in details["requestBody"]["content"]["application/json"]["examples"].items():
+                    examples.append({
+                        "path": path,
+                        "method": method,
+                        "example_name": example_name,
+                        "data": example_data["value"]
+                    })
+
+    return examples
+
+@pytest.mark.parametrize("example", get_all_examples_from_openapi(), ids=lambda e: f"{e['method']}_{e['path']}_{e['example_name']}")
+def test_openapi_examples_are_valid(example, client):
+    """Test that all OpenAPI examples are valid requests"""
+    method = example["method"]
+    path = example["path"]
+    data = example["data"]
+
+    response = client.request(method, path, json=data)
+
+    # Examples should either succeed or fail with expected error
+    assert response.status_code in [200, 201, 400, 401, 402, 403, 404], \
+        f"Example {example['example_name']} for {method.upper()} {path} returned unexpected status {response.status_code}"
+```
+
+**Test markdown code samples**:
+
+```python
+import pytest
+import re
+import tempfile
+import subprocess
+
+def extract_code_blocks_from_markdown(markdown_file):
+    """Extract code blocks from markdown"""
+    with open(markdown_file) as f:
+        content = f.read()
+
+    # Find code blocks with language
+    pattern = r'```(\w+)\n(.*?)```'
+    return re.findall(pattern, content, re.DOTALL)
+
+def test_python_examples_in_quickstart():
+    """Test that Python examples in quickstart.md execute without errors"""
+    code_blocks = extract_code_blocks_from_markdown("docs/quickstart.md")
+
+    for lang, code in code_blocks:
+        if lang == "python":
+            # Write code to temp file
+            with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
+                # Replace placeholders
+                code = code.replace("sk_test_abc123...", "test_api_key")
+                code = code.replace("https://api.example.com", "http://localhost:8000")
+                f.write(code)
+                f.flush()
+
+                # Run code
+                result = subprocess.run(
+                    ["python", f.name],
+                    capture_output=True,
+                    text=True,
+                    timeout=5
+                )
+
+                assert result.returncode == 0, \
+                    f"Python example failed:\n{code}\n\nError:\n{result.stderr}"
+```
+
+### Documentation Coverage Metrics
+
+```python
+def test_documentation_coverage():
+    """Ensure all endpoints are documented"""
+    from fastapi.openapi.utils import get_openapi
+
+    spec = get_openapi(title="Test", version="1.0.0", routes=app.routes)
+
+    missing_docs = []
+
+    for path, methods in spec["paths"].items():
+        for method, details in methods.items():
+            # Check summary
+            if not details.get("summary"):
+                missing_docs.append(f"{method.upper()} {path}: Missing summary")
+
+            # Check description
+            if not details.get("description"):
+                missing_docs.append(f"{method.upper()} {path}: Missing description")
+
+            # Check examples
+            if "requestBody" in details:
+                content = details["requestBody"].get("content", {}).get("application/json", {})
+                if "examples" not in content and "example" not in content.get("schema", {}):
+                    missing_docs.append(f"{method.upper()} {path}: Missing request example")
+
+    assert not missing_docs, \
+        f"Documentation incomplete:\n" + "\n".join(missing_docs)
+```
+
+## Interactive Documentation
+
+### Swagger UI Customization
+
+**Custom Swagger UI with branding**:
+
+```python
+from fastapi import FastAPI
+from fastapi.openapi.docs import get_swagger_ui_html
+from fastapi.staticfiles import StaticFiles
+
+app = FastAPI(docs_url=None)  # Disable default docs
+app.mount("/static", StaticFiles(directory="static"), name="static")
+
+@app.get("/docs", include_in_schema=False)
+async def custom_swagger_ui_html():
+    return get_swagger_ui_html(
+        openapi_url=app.openapi_url,
+        title=f"{app.title} - API Documentation",
+        oauth2_redirect_url=app.swagger_ui_oauth2_redirect_url,
+        swagger_js_url="/static/swagger-ui-bundle.js",
+        swagger_css_url="/static/swagger-ui.css",
+        swagger_favicon_url="/static/favicon.png",
+        swagger_ui_parameters={
+            "deepLinking": True,
+            "displayRequestDuration": True,
+            "filter": True,
+            "showExtensions": True,
+            "tryItOutEnabled": True,
+            "persistAuthorization": True,
+            "defaultModelsExpandDepth": 1,
+            "defaultModelExpandDepth": 1
+        }
+    )
+```
+
+**Add "Try It Out" authentication**:
+
+```python
+from fastapi.openapi.docs import get_swagger_ui_html
+
+@app.get("/docs")
+async def custom_swagger_ui():
+    return get_swagger_ui_html(
+        openapi_url="/openapi.json",
+        title="API Docs",
+        init_oauth={
+            "clientId": "swagger-ui-client",
+            "appName": "API Documentation",
+            "usePkceWithAuthorizationCodeGrant": True
+        }
+    )
+```
+
+### ReDoc Customization
+
+```python
+from fastapi.openapi.docs import get_redoc_html
+
+@app.get("/redoc", include_in_schema=False)
+async def redoc_html():
+    return get_redoc_html(
+        openapi_url="/openapi.json",
+        title="API Documentation - ReDoc",
+        redoc_js_url="/static/redoc.standalone.js",
+        redoc_favicon_url="/static/favicon.png",
+        with_google_fonts=True
+    )
+```
+
+**ReDoc configuration options**:
+
+```html
+<!-- static/redoc-config.html -->
+<redoc
+  spec-url="/openapi.json"
+  expand-responses="200,201"
+  required-props-first="true"
+  sort-props-alphabetically="true"
+  hide-download-button="false"
+  native-scrollbars="false"
+  path-in-middle-panel="true"
+  theme='{
+    "colors": {
+      "primary": {"main": "#32329f"}
+    },
+    "typography": {
+      "fontSize": "14px",
+      "fontFamily": "Roboto, sans-serif"
+    }
+  }'
+></redoc>
+```
+
+## SDK Generation
+
+### Generate Client SDKs from OpenAPI
+
+**OpenAPI Generator**:
+
+```bash
+# Install openapi-generator
+npm install -g @openapitools/openapi-generator-cli
+
+# Generate Python SDK
+openapi-generator-cli generate \
+  -i docs/openapi.json \
+  -g python \
+  -o sdks/python \
+  --additional-properties=packageName=payment_api,projectName=payment-api-python
+
+# Generate TypeScript SDK
+openapi-generator-cli generate \
+  -i docs/openapi.json \
+  -g typescript-fetch \
+  -o sdks/typescript \
+  --additional-properties=npmName=@example/payment-api,supportsES6=true
+
+# Generate Go SDK
+openapi-generator-cli generate \
+  -i docs/openapi.json \
+  -g go \
+  -o sdks/go \
+  --additional-properties=packageName=paymentapi
+```
+
+**Automate SDK generation in CI**:
+
+```yaml
+# .github/workflows/generate-sdks.yml
+name: Generate SDKs
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'docs/openapi.json'
+
+jobs:
+  generate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Generate Python SDK
+        run: |
+          docker run --rm \
+            -v ${PWD}:/local \
+            openapitools/openapi-generator-cli generate \
+            -i /local/docs/openapi.json \
+            -g python \
+            -o /local/sdks/python
+
+      - name: Test Python SDK
+        run: |
+          cd sdks/python
+          pip install -e .
+          pytest
+
+      - name: Publish to PyPI
+        if: github.ref == 'refs/heads/main'
+        run: |
+          cd sdks/python
+          python -m build
+          twine upload dist/*
+        env:
+          TWINE_USERNAME: __token__
+          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
+```
+
+**Custom SDK templates**:
+
+```
+templates/
+├── python/
+│   ├── api.mustache           # Custom API client template
+│   ├── model.mustache          # Custom model template
+│   └── README.mustache         # Custom README
+```
+
+```bash
+# Generate with custom templates
+openapi-generator-cli generate \
+  -i docs/openapi.json \
+  -g python \
+  -o sdks/python \
+  -t templates/python \
+  --additional-properties=packageName=payment_api
+```
+
+## Documentation Versioning
+
+### Version Documentation Separately from API
+
+**Documentation versions**:
+
+```
+docs/
+├── v1/
+│   ├── quickstart.md
+│   ├── api-reference.md
+│   └── migration-to-v2.md  ← Deprecation notice
+├── v2/
+│   ├── quickstart.md
+│   ├── api-reference.md
+│   └── whats-new.md
+└── latest -> v2/  # Symlink to current version
+```
+
+**Documentation routing**:
+
+```python
+from fastapi import Request
+from fastapi.responses import HTMLResponse, RedirectResponse
+from jinja2 import Environment, FileSystemLoader
+
+env = Environment(loader=FileSystemLoader("docs"))
+
+@app.get("/docs")
+async def docs_redirect():
+    """Redirect to latest docs"""
+    return RedirectResponse(url="/docs/v2/")
+
+@app.get("/docs/{version}/{page}")
+async def serve_docs(version: str, page: str):
+    """Serve versioned documentation"""
+    if version not in ["v1", "v2"]:
+        raise HTTPException(404)
+
+    # Add deprecation warning for v1
+    deprecated = version == "v1"
+
+    template = env.get_template(f"{version}/{page}.md")
+    content = template.render(deprecated=deprecated)
+
+    return HTMLResponse(content)
+```
+
+**Deprecation banner**:
+
+```html
+<!-- docs/templates/base.html -->
+{% if deprecated %}
+<div class="deprecation-banner">
+  ⚠️ <strong>Deprecated</strong>: This documentation is for API v1,
+  which will be sunset on June 1, 2025.
+  <a href="/docs/v2/migration">Migrate to v2</a>
+</div>
+{% endif %}
+```
+
+## Documentation Debt Detection
+
+### Prevent Stale Documentation
+
+**Detect outdated docs**:
+
+```python
+import pytest
+from datetime import datetime, timedelta
+
+def test_documentation_freshness():
+    """Ensure docs have been updated recently"""
+    docs_modified = datetime.fromtimestamp(
+        os.path.getmtime("docs/api-reference.md")
+    )
+
+    # Fail if docs haven't been updated in 90 days
+    max_age = timedelta(days=90)
+    age = datetime.now() - docs_modified
+
+    assert age < max_age, \
+        f"API docs are {age.days} days old. Review and update or add exemption comment."
+```
+
+**Track documentation TODOs**:
+
+```python
+def test_no_documentation_todos():
+    """Ensure no TODO comments in docs"""
+    import re
+
+    doc_files = glob.glob("docs/**/*.md", recursive=True)
+    todos = []
+
+    for doc_file in doc_files:
+        with open(doc_file) as f:
+            for line_num, line in enumerate(f, 1):
+                if re.search(r'TODO|FIXME|XXX', line):
+                    todos.append(f"{doc_file}:{line_num}: {line.strip()}")
+
+    assert not todos, \
+        f"Documentation has {len(todos)} TODOs:\n" + "\n".join(todos)
+```
+
+**Broken link detection**:
+
+```python
+import pytest
+import requests
+from bs4 import BeautifulSoup
+import re
+
+def extract_links_from_markdown(markdown_file):
+    """Extract all HTTP(S) links from markdown"""
+    with open(markdown_file) as f:
+        content = f.read()
+
+    # Find markdown links [text](url)
+    links = re.findall(r'\[([^\]]+)\]\(([^)]+)\)', content)
+    return [(text, url) for text, url in links if url.startswith('http')]
+
+def test_no_broken_links_in_docs():
+    """Ensure all external links in docs are valid"""
+    doc_files = glob.glob("docs/**/*.md", recursive=True)
+    broken_links = []
+
+    for doc_file in doc_files:
+        for text, url in extract_links_from_markdown(doc_file):
+            try:
+                response = requests.head(url, timeout=5, allow_redirects=True)
+                if response.status_code >= 400:
+                    broken_links.append(f"{doc_file}: {url} ({response.status_code})")
+            except requests.RequestException as e:
+                broken_links.append(f"{doc_file}: {url} (error: {e})")
+
+    assert not broken_links, \
+        f"Found {len(broken_links)} broken links:\n" + "\n".join(broken_links)
+```
+
+## Documentation Metrics
+
+### Track Documentation Usage
+
+**Analytics integration**:
+
+```python
+from fastapi import Request
+import analytics
+
+@app.middleware("http")
+async def track_doc_views(request: Request, call_next):
+    if request.url.path.startswith("/docs"):
+        # Track page view
+        analytics.track(
+            user_id="anonymous",
+            event="Documentation Viewed",
+            properties={
+                "page": request.url.path,
+                "version": request.url.path.split("/")[2] if len(request.url.path.split("/")) > 2 else "latest",
+                "referrer": request.headers.get("referer")
+            }
+        )
+
+    return await call_next(request)
+```
+
+**Track "Try It Out" usage**:
+
+```javascript
+// Inject into Swagger UI
+const originalExecute = swagger.presets.apis.execute;
+swagger.presets.apis.execute = function(spec) {
+  // Track API call from docs
+  analytics.track('API Call from Docs', {
+    endpoint: spec.path,
+    method: spec.method,
+    success: spec.response.status < 400
+  });
+
+  return originalExecute(spec);
+};
+```
+
+**Documentation health dashboard**:
+
+```python
+from fastapi import APIRouter
+from datetime import datetime, timedelta
+
+router = APIRouter()
+
+@router.get("/admin/docs-metrics")
+async def get_doc_metrics(db: Session = Depends(get_db)):
+    """Dashboard for documentation health"""
+
+    # Page views by version
+    views_by_version = analytics.query(
+        "Documentation Viewed",
+        group_by="version",
+        since=datetime.now() - timedelta(days=30)
+    )
+
+    # Most viewed pages
+    top_pages = analytics.query(
+        "Documentation Viewed",
+        group_by="page",
+        since=datetime.now() - timedelta(days=30),
+        limit=10
+    )
+
+    # Try it out usage
+    api_calls = analytics.query(
+        "API Call from Docs",
+        since=datetime.now() - timedelta(days=30)
+    )
+
+    # Documentation freshness
+    freshness = {
+        "quickstart.md": get_file_age("docs/quickstart.md"),
+        "api-reference.md": get_file_age("docs/api-reference.md")
+    }
+
+    return {
+        "views_by_version": views_by_version,
+        "top_pages": top_pages,
+        "api_calls_from_docs": api_calls,
+        "freshness": freshness,
+        "health_score": calculate_doc_health_score()
+    }
+
+def calculate_doc_health_score():
+    """Calculate documentation health (0-100)"""
+    score = 100
+
+    # Deduct for stale docs (>90 days old)
+    for doc_file in glob.glob("docs/**/*.md", recursive=True):
+        age_days = (datetime.now() - datetime.fromtimestamp(os.path.getmtime(doc_file))).days
+        if age_days > 90:
+            score -= 10
+
+    # Deduct for broken links
+    broken_links = count_broken_links()
+    score -= min(broken_links * 5, 30)
+
+    # Deduct for missing examples
+    endpoints_without_examples = count_endpoints_without_examples()
+    score -= min(endpoints_without_examples * 3, 20)
+
+    return max(score, 0)
+```
+
+## Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| **Docs in separate repo** | Always out of sync | Co-locate with code |
+| **Manual example updates** | Examples become stale | Test examples in CI |
+| **No deprecation notices** | Breaking changes surprise users | Document deprecation 6+ months ahead |
+| **Generic descriptions** | Doesn't help developers | Specific use cases, edge cases |
+| **No versioned docs** | Can't reference old versions | Version docs separately |
+| **Untested SDKs** | Generated SDKs don't work | Test generated SDKs in CI |
+| **No documentation metrics** | Can't measure effectiveness | Track page views, usage |
+| **Single example per endpoint** | Doesn't show edge cases | Multiple examples (success, errors) |
+
+## Cross-References
+
+**Related skills**:
+- **Technical writing** → `muna-technical-writer` (writing style, organization)
+- **API design** → `rest-api-design`, `graphql-api-design` (design patterns)
+- **API testing** → `api-testing` (contract testing, examples)
+- **Authentication** → `api-authentication` (auth flow documentation)
+
+## Further Reading
+
+- **OpenAPI Specification**: https://spec.openapis.org/oas/v3.1.0
+- **FastAPI docs**: https://fastapi.tiangolo.com/tutorial/metadata/
+- **Swagger UI**: https://swagger.io/docs/open-source-tools/swagger-ui/
+- **ReDoc**: https://redoc.ly/docs/
+- **Write the Docs**: https://www.writethedocs.org/
--- a/skills/using-web-backend/api-testing.md
+++ b/skills/using-web-backend/api-testing.md
--- a/skills/using-web-backend/database-integration.md
+++ b/skills/using-web-backend/database-integration.md
--- a/skills/using-web-backend/django-development.md
+++ b/skills/using-web-backend/django-development.md
@@ -0,0 +1,890 @@
+
+# Django Development
+
+## Overview
+
+**Django development specialist covering Django ORM optimization, DRF best practices, caching strategies, migrations, testing, and production deployment.**
+
+**Core principle**: Django's "batteries included" philosophy is powerful but requires understanding which battery to use when - master Django's tools to avoid reinventing wheels or choosing wrong patterns.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **ORM optimization**: N+1 queries, select_related vs prefetch_related, query performance
+- **DRF patterns**: Serializers, ViewSets, permissions, nested relationships
+- **Caching**: Cache framework, per-view caching, template fragment caching
+- **Migrations**: Zero-downtime migrations, data migrations, squashing
+- **Testing**: Django TestCase, fixtures, factories, mocking
+- **Deployment**: Gunicorn, static files, database pooling
+- **Async Django**: Channels, async views, WebSockets
+- **Admin customization**: Custom admin actions, list filters, inlines
+
+**Do NOT use for**:
+- General Python patterns (use `axiom-python-engineering`)
+- API design principles (use `rest-api-design`)
+- Database-agnostic patterns (use `database-integration`)
+- Authentication flows (use `api-authentication`)
+
+## Django ORM Optimization
+
+### select_related vs prefetch_related
+
+**Decision matrix**:
+
+| Relationship | Method | SQL Strategy | Use When |
+|--------------|--------|--------------|----------|
+| ForeignKey (many-to-one) | `select_related` | JOIN | Book → Author |
+| OneToOneField | `select_related` | JOIN | User → Profile |
+| Reverse ForeignKey (one-to-many) | `prefetch_related` | Separate query + IN | Author → Books |
+| ManyToManyField | `prefetch_related` | Separate query + IN | Book → Tags |
+
+**Example - select_related (JOIN)**:
+
+```python
+# BAD: N+1 queries (1 + N)
+books = Book.objects.all()
+for book in books:
+    print(book.author.name)  # Additional query per book
+
+# GOOD: Single JOIN query
+books = Book.objects.select_related('author').all()
+for book in books:
+    print(book.author.name)  # No additional queries
+
+# SQL generated:
+# SELECT book.*, author.* FROM book JOIN author ON book.author_id = author.id
+```
+
+**Example - prefetch_related (IN query)**:
+
+```python
+# BAD: N+1 queries
+authors = Author.objects.all()
+for author in authors:
+    print(author.books.count())  # Query per author
+
+# GOOD: 2 queries total
+authors = Author.objects.prefetch_related('books').all()
+for author in authors:
+    print(author.books.count())  # No additional queries
+
+# SQL generated:
+# Query 1: SELECT * FROM author
+# Query 2: SELECT * FROM book WHERE author_id IN (1, 2, 3, ...)
+```
+
+**Nested prefetching**:
+
+```python
+from django.db.models import Prefetch
+
+# Fetch authors → books → reviews (3 queries)
+authors = Author.objects.prefetch_related(
+    Prefetch('books', queryset=Book.objects.prefetch_related('reviews'))
+)
+
+# Custom filtering on prefetch
+recent_books = Book.objects.filter(
+    published_date__gte=timezone.now() - timedelta(days=30)
+).order_by('-published_date')
+
+authors = Author.objects.prefetch_related(
+    Prefetch('books', queryset=recent_books, to_attr='recent_books')
+)
+
+# Access via custom attribute
+for author in authors:
+    for book in author.recent_books:  # Only recent books
+        print(book.title)
+```
+
+### Query Debugging
+
+```python
+from django.db import connection, reset_queries
+from django.conf import settings
+
+# Enable in settings.py: DEBUG = True
+# Or use django-debug-toolbar
+
+def debug_queries(func):
+    """Decorator to debug query counts"""
+    def wrapper(*args, **kwargs):
+        reset_queries()
+        result = func(*args, **kwargs)
+        print(f"Queries: {len(connection.queries)}")
+        for query in connection.queries:
+            print(f"  {query['time']}s: {query['sql'][:100]}")
+        return result
+    return wrapper
+
+@debug_queries
+def get_books():
+    return list(Book.objects.select_related('author').prefetch_related('tags'))
+```
+
+**Django Debug Toolbar** (production alternative - django-silk):
+
+```python
+# settings.py
+INSTALLED_APPS = [
+    'debug_toolbar',
+    # ...
+]
+
+MIDDLEWARE = [
+    'debug_toolbar.middleware.DebugToolbarMiddleware',
+    # ...
+]
+
+INTERNAL_IPS = ['127.0.0.1']
+
+# For production: use django-silk for profiling
+INSTALLED_APPS += ['silk']
+MIDDLEWARE += ['silk.middleware.SilkyMiddleware']
+```
+
+### Annotation and Aggregation
+
+**Annotate** (add computed fields):
+
+```python
+from django.db.models import Count, Avg, Sum, F, Q
+
+# Add book count to each author
+authors = Author.objects.annotate(
+    book_count=Count('books'),
+    avg_rating=Avg('books__rating'),
+    total_sales=Sum('books__sales')
+)
+
+for author in authors:
+    print(f"{author.name}: {author.book_count} books, avg rating {author.avg_rating}")
+```
+
+**Aggregate** (single value across queryset):
+
+```python
+from django.db.models import Avg
+
+# Get average rating across all books
+avg_rating = Book.objects.aggregate(Avg('rating'))
+# Returns: {'rating__avg': 4.2}
+
+# Multiple aggregations
+stats = Book.objects.aggregate(
+    avg_rating=Avg('rating'),
+    total_sales=Sum('sales'),
+    book_count=Count('id')
+)
+```
+
+**Conditional aggregation with Q**:
+
+```python
+from django.db.models import Q, Count
+
+# Count books by rating category
+Author.objects.annotate(
+    high_rated_books=Count('books', filter=Q(books__rating__gte=4.0)),
+    low_rated_books=Count('books', filter=Q(books__rating__lt=3.0))
+)
+```
+
+## Django REST Framework Patterns
+
+### ViewSet vs APIView
+
+**Decision matrix**:
+
+| Use | Pattern | When |
+|-----|---------|------|
+| Standard CRUD | `ModelViewSet` | Full REST API for model |
+| Custom actions only | `ViewSet` | Non-standard endpoints |
+| Read-only API | `ReadOnlyModelViewSet` | GET/LIST only |
+| Fine control | `APIView` or `@api_view` | Custom business logic |
+
+**ModelViewSet** (full CRUD):
+
+```python
+from rest_framework import viewsets, filters
+from rest_framework.decorators import action
+from rest_framework.response import Response
+
+class BookViewSet(viewsets.ModelViewSet):
+    """
+    Provides: list, create, retrieve, update, partial_update, destroy
+    """
+    queryset = Book.objects.select_related('author').prefetch_related('tags')
+    serializer_class = BookSerializer
+    permission_classes = [IsAuthenticatedOrReadOnly]
+    filter_backends = [filters.SearchFilter, filters.OrderingFilter]
+    search_fields = ['title', 'author__name']
+    ordering_fields = ['published_date', 'rating']
+
+    def get_queryset(self):
+        """Optimize queryset based on action"""
+        queryset = super().get_queryset()
+
+        if self.action == 'list':
+            # List doesn't need full detail
+            return queryset.only('id', 'title', 'author__name')
+
+        return queryset
+
+    @action(detail=True, methods=['post'])
+    def publish(self, request, pk=None):
+        """Custom action: POST /books/123/publish/"""
+        book = self.get_object()
+        book.status = 'published'
+        book.published_date = timezone.now()
+        book.save()
+        return Response({'status': 'published'})
+
+    @action(detail=False, methods=['get'])
+    def bestsellers(self, request):
+        """Custom list action: GET /books/bestsellers/"""
+        books = self.get_queryset().filter(sales__gte=10000).order_by('-sales')[:10]
+        serializer = self.get_serializer(books, many=True)
+        return Response(serializer.data)
+```
+
+### Serializer Patterns
+
+**Basic serializer with validation**:
+
+```python
+from rest_framework import serializers
+from django.contrib.auth.password_validation import validate_password
+
+class UserSerializer(serializers.ModelSerializer):
+    password = serializers.CharField(
+        write_only=True,
+        required=True,
+        validators=[validate_password]
+    )
+    password_confirm = serializers.CharField(write_only=True, required=True)
+
+    class Meta:
+        model = User
+        fields = ['id', 'username', 'email', 'password', 'password_confirm']
+        read_only_fields = ['id']
+
+    # Field-level validation
+    def validate_email(self, value):
+        if User.objects.filter(email__iexact=value).exists():
+            raise serializers.ValidationError("Email already in use")
+        return value.lower()
+
+    # Object-level validation (cross-field)
+    def validate(self, attrs):
+        if attrs['password'] != attrs['password_confirm']:
+            raise serializers.ValidationError({
+                'password_confirm': "Passwords don't match"
+            })
+        attrs.pop('password_confirm')
+        return attrs
+
+    def create(self, validated_data):
+        password = validated_data.pop('password')
+        user = User.objects.create(**validated_data)
+        user.set_password(password)
+        user.save()
+        return user
+```
+
+**Nested serializers (read-only)**:
+
+```python
+class AuthorSerializer(serializers.ModelSerializer):
+    book_count = serializers.IntegerField(read_only=True)
+
+    class Meta:
+        model = Author
+        fields = ['id', 'name', 'bio', 'book_count']
+
+class BookSerializer(serializers.ModelSerializer):
+    author = AuthorSerializer(read_only=True)
+    author_id = serializers.PrimaryKeyRelatedField(
+        queryset=Author.objects.all(),
+        source='author',
+        write_only=True
+    )
+
+    class Meta:
+        model = Book
+        fields = ['id', 'title', 'author', 'author_id', 'published_date']
+```
+
+**Dynamic fields** (include/exclude fields via query params):
+
+```python
+class DynamicFieldsModelSerializer(serializers.ModelSerializer):
+    """
+    Usage: /api/books/?fields=id,title,author
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        request = self.context.get('request')
+        if request:
+            fields = request.query_params.get('fields')
+            if fields:
+                fields = fields.split(',')
+                allowed = set(fields)
+                existing = set(self.fields.keys())
+                for field_name in existing - allowed:
+                    self.fields.pop(field_name)
+
+class BookSerializer(DynamicFieldsModelSerializer):
+    class Meta:
+        model = Book
+        fields = '__all__'
+```
+
+## Django Caching
+
+### Cache Framework Setup
+
+```python
+# settings.py
+
+# Redis cache (production)
+CACHES = {
+    'default': {
+        'BACKEND': 'django_redis.cache.RedisCache',
+        'LOCATION': 'redis://127.0.0.1:6379/1',
+        'OPTIONS': {
+            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
+            'CONNECTION_POOL_KWARGS': {'max_connections': 50},
+            'PARSER_CLASS': 'redis.connection.HiredisParser',
+        },
+        'KEY_PREFIX': 'myapp',
+        'TIMEOUT': 300,  # Default 5 minutes
+    }
+}
+
+# Memcached (alternative)
+CACHES = {
+    'default': {
+        'BACKEND': 'django.core.cache.backends.memcached.PyMemcacheCache',
+        'LOCATION': '127.0.0.1:11211',
+    }
+}
+
+# Local memory (development only)
+CACHES = {
+    'default': {
+        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
+        'LOCATION': 'unique-snowflake',
+    }
+}
+```
+
+### Per-View Caching
+
+```python
+from django.views.decorators.cache import cache_page
+from django.utils.decorators import method_decorator
+
+# Function-based view
+@cache_page(60 * 15)  # Cache for 15 minutes
+def book_list(request):
+    books = Book.objects.all()
+    return render(request, 'books/list.html', {'books': books})
+
+# Class-based view
+class BookListView(ListView):
+    model = Book
+
+    @method_decorator(cache_page(60 * 15))
+    def dispatch(self, *args, **kwargs):
+        return super().dispatch(*args, **kwargs)
+
+# DRF ViewSet
+from rest_framework_extensions.cache.decorators import cache_response
+
+class BookViewSet(viewsets.ModelViewSet):
+    @cache_response(timeout=60*15, key_func='calculate_cache_key')
+    def list(self, request, *args, **kwargs):
+        return super().list(request, *args, **kwargs)
+
+    def calculate_cache_key(self, view_instance, view_method, request, args, kwargs):
+        # Custom cache key including user, filters
+        return f"books:list:{request.user.id}:{request.GET.urlencode()}"
+```
+
+### Low-Level Cache API
+
+```python
+from django.core.cache import cache
+
+# Set cache
+cache.set('my_key', 'my_value', timeout=300)
+
+# Get cache
+value = cache.get('my_key')
+if value is None:
+    value = expensive_computation()
+    cache.set('my_key', value, timeout=300)
+
+# Get or set (atomic)
+value = cache.get_or_set('my_key', lambda: expensive_computation(), timeout=300)
+
+# Delete cache
+cache.delete('my_key')
+
+# Clear all
+cache.clear()
+
+# Multiple keys
+cache.set_many({'key1': 'value1', 'key2': 'value2'}, timeout=300)
+values = cache.get_many(['key1', 'key2'])
+
+# Increment/decrement
+cache.set('counter', 0)
+cache.incr('counter')  # 1
+cache.incr('counter', delta=5)  # 6
+```
+
+### Cache Invalidation Patterns
+
+```python
+from django.db.models.signals import post_save, post_delete
+from django.dispatch import receiver
+
+@receiver([post_save, post_delete], sender=Book)
+def invalidate_book_cache(sender, instance, **kwargs):
+    """Invalidate cache when book changes"""
+    cache.delete(f'book:{instance.id}')
+    cache.delete('books:list')  # Invalidate list cache
+    cache.delete(f'author:{instance.author_id}:books')
+
+# Pattern: Cache with version tags
+def get_books():
+    version = cache.get('books:version', 0)
+    cache_key = f'books:list:v{version}'
+    books = cache.get(cache_key)
+
+    if books is None:
+        books = list(Book.objects.all())
+        cache.set(cache_key, books, timeout=3600)
+
+    return books
+
+def invalidate_books():
+    """Bump version to invalidate all book caches"""
+    version = cache.get('books:version', 0)
+    cache.set('books:version', version + 1)
+```
+
+## Django Migrations
+
+### Zero-Downtime Migration Pattern
+
+**Adding NOT NULL column to large table**:
+
+```python
+# Step 1: Add nullable field (migration 0002)
+class Migration(migrations.Migration):
+    operations = [
+        migrations.AddField(
+            model_name='user',
+            name='department',
+            field=models.CharField(max_length=100, null=True, blank=True),
+        ),
+    ]
+
+# Step 2: Populate data in batches (migration 0003)
+from django.db import migrations
+
+def populate_department(apps, schema_editor):
+    User = apps.get_model('myapp', 'User')
+
+    # Batch update for performance
+    batch_size = 10000
+    total = User.objects.filter(department__isnull=True).count()
+
+    for offset in range(0, total, batch_size):
+        users = User.objects.filter(department__isnull=True)[offset:offset+batch_size]
+        for user in users:
+            user.department = determine_department(user)  # Your logic
+        User.objects.bulk_update(users, ['department'], batch_size=batch_size)
+
+class Migration(migrations.Migration):
+    dependencies = [('myapp', '0002_add_department')],
+    operations = [
+        migrations.RunPython(populate_department, migrations.RunPython.noop),
+    ]
+
+# Step 3: Make NOT NULL (migration 0004)
+class Migration(migrations.Migration):
+    dependencies = [('myapp', '0003_populate_department')],
+    operations = [
+        migrations.AlterField(
+            model_name='user',
+            name='department',
+            field=models.CharField(max_length=100),  # NOT NULL
+        ),
+    ]
+```
+
+### Concurrent Index Creation (PostgreSQL)
+
+```python
+from django.contrib.postgres.operations import AddIndexConcurrently
+from django.db import migrations, models
+
+class Migration(migrations.Migration):
+    atomic = False  # Required for CONCURRENTLY operations
+
+    operations = [
+        AddIndexConcurrently(
+            model_name='book',
+            index=models.Index(fields=['published_date'], name='book_published_idx'),
+        ),
+    ]
+```
+
+### Squashing Migrations
+
+```bash
+# Squash migrations 0001 through 0020 into single migration
+python manage.py squashmigrations myapp 0001 0020
+
+# This creates migrations/0001_squashed_0020.py
+# After deploying squashed migration, delete originals:
+# migrations/0001.py through migrations/0020.py
+```
+
+## Django Testing
+
+### TestCase vs TransactionTestCase
+
+| Feature | TestCase | TransactionTestCase |
+|---------|----------|---------------------|
+| Speed | Fast (no DB reset between tests) | Slow (resets DB each test) |
+| Transactions | Wrapped in transaction, rolled back | No automatic transaction |
+| Use for | Most tests | Testing transaction behavior, signals |
+
+**Example - TestCase**:
+
+```python
+from django.test import TestCase
+from myapp.models import Book
+
+class BookModelTest(TestCase):
+    @classmethod
+    def setUpTestData(cls):
+        """Run once for entire test class (fast)"""
+        cls.author = Author.objects.create(name="Test Author")
+
+    def setUp(self):
+        """Run before each test method"""
+        self.book = Book.objects.create(
+            title="Test Book",
+            author=self.author
+        )
+
+    def test_book_str(self):
+        self.assertEqual(str(self.book), "Test Book")
+
+    def test_book_author_relationship(self):
+        self.assertEqual(self.book.author.name, "Test Author")
+```
+
+### API Testing with DRF
+
+```python
+from rest_framework.test import APITestCase, APIClient
+from rest_framework import status
+from django.contrib.auth.models import User
+
+class BookAPITest(APITestCase):
+    def setUp(self):
+        self.client = APIClient()
+        self.user = User.objects.create_user(
+            username='testuser',
+            password='testpass123'
+        )
+        self.book = Book.objects.create(title="Test Book")
+
+    def test_list_books_unauthenticated(self):
+        response = self.client.get('/api/books/')
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+
+    def test_create_book_authenticated(self):
+        self.client.force_authenticate(user=self.user)
+        data = {'title': 'New Book', 'author': self.author.id}
+        response = self.client.post('/api/books/', data)
+        self.assertEqual(response.status_code, status.HTTP_201_CREATED)
+        self.assertEqual(Book.objects.count(), 2)
+
+    def test_update_book_unauthorized(self):
+        other_user = User.objects.create_user(username='other', password='pass')
+        self.client.force_authenticate(user=other_user)
+        data = {'title': 'Updated Title'}
+        response = self.client.patch(f'/api/books/{self.book.id}/', data)
+        self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
+```
+
+### Factory Pattern with factory_boy
+
+```python
+# tests/factories.py
+import factory
+from myapp.models import Author, Book
+
+class AuthorFactory(factory.django.DjangoModelFactory):
+    class Meta:
+        model = Author
+
+    name = factory.Faker('name')
+    bio = factory.Faker('text', max_nb_chars=200)
+
+class BookFactory(factory.django.DjangoModelFactory):
+    class Meta:
+        model = Book
+
+    title = factory.Faker('sentence', nb_words=4)
+    author = factory.SubFactory(AuthorFactory)
+    published_date = factory.Faker('date_this_decade')
+    isbn = factory.Sequence(lambda n: f'978-0-{n:09d}')
+
+# Usage in tests
+class BookTest(TestCase):
+    def test_book_creation(self):
+        book = BookFactory.create()  # Creates Author too
+        self.assertIsNotNone(book.id)
+
+    def test_multiple_books(self):
+        books = BookFactory.create_batch(10)  # Create 10 books
+        self.assertEqual(len(books), 10)
+
+    def test_author_with_books(self):
+        author = AuthorFactory.create()
+        BookFactory.create_batch(5, author=author)
+        self.assertEqual(author.books.count(), 5)
+```
+
+## Django Settings Organization
+
+### Multiple Environment Configs
+
+```
+myproject/
+└── settings/
+    ├── __init__.py
+    ├── base.py          # Common settings
+    ├── development.py   # Dev overrides
+    ├── production.py    # Prod overrides
+    └── test.py          # Test overrides
+```
+
+**settings/base.py**:
+
+```python
+import os
+from pathlib import Path
+
+BASE_DIR = Path(__file__).resolve().parent.parent.parent
+
+SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY')
+
+INSTALLED_APPS = [
+    'django.contrib.admin',
+    # ...
+    'rest_framework',
+    'myapp',
+]
+
+DATABASES = {
+    'default': {
+        'ENGINE': 'django.db.backends.postgresql',
+        'NAME': os.environ.get('DB_NAME'),
+        'USER': os.environ.get('DB_USER'),
+        'PASSWORD': os.environ.get('DB_PASSWORD'),
+        'HOST': os.environ.get('DB_HOST', 'localhost'),
+        'PORT': os.environ.get('DB_PORT', '5432'),
+    }
+}
+```
+
+**settings/development.py**:
+
+```python
+from .base import *
+
+DEBUG = True
+
+ALLOWED_HOSTS = ['localhost', '127.0.0.1']
+
+# Use console email backend
+EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'
+
+# Local cache
+CACHES = {
+    'default': {
+        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
+    }
+}
+
+# Debug toolbar
+INSTALLED_APPS += ['debug_toolbar']
+MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']
+INTERNAL_IPS = ['127.0.0.1']
+```
+
+**settings/production.py**:
+
+```python
+from .base import *
+
+DEBUG = False
+
+ALLOWED_HOSTS = [os.environ.get('ALLOWED_HOST')]
+
+# Security settings
+SECURE_SSL_REDIRECT = True
+SESSION_COOKIE_SECURE = True
+CSRF_COOKIE_SECURE = True
+SECURE_HSTS_SECONDS = 31536000
+SECURE_HSTS_INCLUDE_SUBDOMAINS = True
+SECURE_HSTS_PRELOAD = True
+
+# Redis cache
+CACHES = {
+    'default': {
+        'BACKEND': 'django_redis.cache.RedisCache',
+        'LOCATION': os.environ.get('REDIS_URL'),
+    }
+}
+
+# Real email
+EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
+EMAIL_HOST = os.environ.get('EMAIL_HOST')
+EMAIL_PORT = int(os.environ.get('EMAIL_PORT', 587))
+EMAIL_USE_TLS = True
+```
+
+**Usage**:
+
+```bash
+# Development
+export DJANGO_SETTINGS_MODULE=myproject.settings.development
+python manage.py runserver
+
+# Production
+export DJANGO_SETTINGS_MODULE=myproject.settings.production
+gunicorn myproject.wsgi:application
+```
+
+## Django Deployment
+
+### Gunicorn Configuration
+
+```python
+# gunicorn_config.py
+import multiprocessing
+
+bind = "0.0.0.0:8000"
+workers = multiprocessing.cpu_count() * 2 + 1
+worker_class = "sync"  # or "gevent" for async
+worker_connections = 1000
+max_requests = 1000  # Restart workers after N requests (prevent memory leaks)
+max_requests_jitter = 100
+timeout = 30
+keepalive = 2
+
+# Logging
+accesslog = "-"  # stdout
+errorlog = "-"   # stderr
+loglevel = "info"
+
+# Process naming
+proc_name = "myproject"
+
+# Server mechanics
+daemon = False
+pidfile = "/var/run/gunicorn.pid"
+```
+
+**Systemd service**:
+
+```ini
+# /etc/systemd/system/myproject.service
+[Unit]
+Description=MyProject Django Application
+After=network.target
+
+[Service]
+Type=notify
+User=www-data
+Group=www-data
+WorkingDirectory=/var/www/myproject
+Environment="DJANGO_SETTINGS_MODULE=myproject.settings.production"
+ExecStart=/var/www/myproject/venv/bin/gunicorn \
+    --config /var/www/myproject/gunicorn_config.py \
+    myproject.wsgi:application
+ExecReload=/bin/kill -s HUP $MAINPID
+Restart=always
+
+[Install]
+WantedBy=multi-user.target
+```
+
+### Static and Media Files
+
+```python
+# settings/production.py
+STATIC_URL = '/static/'
+STATIC_ROOT = BASE_DIR / 'staticfiles'
+
+MEDIA_URL = '/media/'
+MEDIA_ROOT = BASE_DIR / 'media'
+
+# Use WhiteNoise for static files
+MIDDLEWARE = [
+    'django.middleware.security.SecurityMiddleware',
+    'whitenoise.middleware.WhiteNoiseMiddleware',  # After SecurityMiddleware
+    # ...
+]
+
+STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
+```
+
+**Collect static files**:
+
+```bash
+python manage.py collectstatic --noinput
+```
+
+## Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| **Lazy loading in loops** | N+1 queries | Use `select_related`/`prefetch_related` |
+| **No database indexing** | Slow queries | Add `db_index=True` or Meta indexes |
+| **Signals for async work** | Blocks requests | Use Celery tasks instead |
+| **Generic serializers for everything** | Over-fetching data | Create optimized serializers per use case |
+| **No caching** | Repeated expensive queries | Cache querysets, views, template fragments |
+| **Migrations in production without testing** | Downtime, data loss | Test on production-sized datasets first |
+| **DEBUG=True in production** | Security risk, slow | Always DEBUG=False in production |
+| **No connection pooling** | Exhausts DB connections | Use pgBouncer or django-db-geventpool |
+
+## Cross-References
+
+**Related skills**:
+- **Database optimization** → `database-integration` (connection pooling, migrations)
+- **API testing** → `api-testing` (DRF testing patterns)
+- **Authentication** → `api-authentication` (DRF token auth, JWT)
+- **REST API design** → `rest-api-design` (API patterns)
+
+## Further Reading
+
+- **Django docs**: https://docs.djangoproject.com/
+- **DRF docs**: https://www.django-rest-framework.org/
+- **Two Scoops of Django**: Best practices book
+- **Classy Class-Based Views**: https://ccbv.co.uk/
+- **Classy Django REST Framework**: https://www.cdrf.co/
--- a/skills/using-web-backend/express-development.md
+++ b/skills/using-web-backend/express-development.md
@@ -0,0 +1,872 @@
+
+# Express Development
+
+## Overview
+
+**Express.js development specialist covering middleware organization, error handling, validation, database integration, testing, and production deployment.**
+
+**Core principle**: Express's minimalist philosophy requires disciplined patterns - without structure, Express apps become tangled middleware chains with inconsistent error handling and poor testability.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **Middleware organization**: Ordering, async error handling, custom middleware
+- **Error handling**: Centralized handlers, custom error classes, async/await errors
+- **Request validation**: Zod, express-validator, type-safe validation
+- **Database patterns**: Connection pooling, transactions, graceful shutdown
+- **Testing**: Supertest, mocking, middleware isolation
+- **Production deployment**: PM2, clustering, Docker, environment management
+- **Performance**: Compression, caching, clustering
+- **Security**: Helmet, rate limiting, CORS, input sanitization
+
+**DO NOT use for**:
+- General TypeScript patterns (use `axiom-python-engineering` equivalents)
+- API design principles (use `rest-api-design`)
+- Database-agnostic patterns (use `database-integration`)
+
+## Middleware Organization
+
+### Correct Middleware Order
+
+**Order matters** - middleware executes top to bottom:
+
+```typescript
+import express from 'express';
+import helmet from 'helmet';
+import cors from 'cors';
+import compression from 'compression';
+
+const app = express();
+
+// 1. Security (FIRST - before any parsing)
+app.use(helmet({
+  contentSecurityPolicy: {
+    directives: {
+      defaultSrc: ["'self'"],
+      styleSrc: ["'self'", "'unsafe-inline'"],
+    },
+  },
+}));
+
+// 2. CORS (before routes)
+app.use(cors({
+  origin: process.env.ALLOWED_ORIGINS?.split(','),
+  credentials: true,
+  maxAge: 86400, // 24 hours
+}));
+
+// 3. Parsing
+app.use(express.json({ limit: '10mb' }));
+app.use(express.urlencoded({ extended: true, limit: '10mb' }));
+
+// 4. Compression
+app.use(compression());
+
+// 5. Logging
+app.use(morgan('combined', { stream: logger.stream }));
+
+// 6. Authentication (before routes that need it)
+app.use('/api', authenticationMiddleware);
+
+// 7. Routes
+app.use('/api/users', userRoutes);
+app.use('/api/posts', postRoutes);
+
+// 8. 404 handler (AFTER all routes)
+app.use((req, res) => {
+  res.status(404).json({
+    status: 'error',
+    message: 'Route not found',
+    path: req.path,
+  });
+});
+
+// 9. Error handler (LAST)
+app.use(errorHandler);
+```
+
+### Async Error Wrapper
+
+**Problem**: Express doesn't catch async errors automatically
+
+```typescript
+// src/middleware/asyncHandler.ts
+import { Request, Response, NextFunction } from 'express';
+
+export const asyncHandler = <T>(
+  fn: (req: Request, res: Response, next: NextFunction) => Promise<T>
+) => {
+  return (req: Request, res: Response, next: NextFunction) => {
+    Promise.resolve(fn(req, res, next)).catch(next);
+  };
+};
+
+// Usage
+router.get('/:id', asyncHandler(async (req, res) => {
+  const user = await userService.findById(req.params.id);
+  if (!user) throw new NotFoundError('User not found');
+  res.json(user);
+}));
+```
+
+**Alternative**: Use express-async-errors (automatic)
+
+```typescript
+// At top of app.ts (BEFORE routes)
+import 'express-async-errors';
+
+// Now all async route handlers auto-catch errors
+router.get('/:id', async (req, res) => {
+  const user = await userService.findById(req.params.id);
+  res.json(user);
+}); // Errors automatically forwarded to error handler
+```
+
+## Error Handling
+
+### Custom Error Classes
+
+```typescript
+// src/errors/AppError.ts
+export class AppError extends Error {
+  constructor(
+    public readonly message: string,
+    public readonly statusCode: number,
+    public readonly isOperational: boolean = true
+  ) {
+    super(message);
+    Error.captureStackTrace(this, this.constructor);
+  }
+}
+
+// src/errors/HttpErrors.ts
+export class BadRequestError extends AppError {
+  constructor(message: string) {
+    super(message, 400);
+  }
+}
+
+export class UnauthorizedError extends AppError {
+  constructor(message = 'Unauthorized') {
+    super(message, 401);
+  }
+}
+
+export class ForbiddenError extends AppError {
+  constructor(message = 'Forbidden') {
+    super(message, 403);
+  }
+}
+
+export class NotFoundError extends AppError {
+  constructor(message: string) {
+    super(message, 404);
+  }
+}
+
+export class ConflictError extends AppError {
+  constructor(message: string) {
+    super(message, 409);
+  }
+}
+
+export class TooManyRequestsError extends AppError {
+  constructor(message = 'Too many requests', public retryAfter?: number) {
+    super(message, 429);
+  }
+}
+```
+
+### Centralized Error Handler
+
+```typescript
+// src/middleware/errorHandler.ts
+import { Request, Response, NextFunction } from 'express';
+import { AppError } from '../errors/AppError';
+import { logger } from '../config/logger';
+
+export const errorHandler = (
+  err: Error,
+  req: Request,
+  res: Response,
+  next: NextFunction
+) => {
+  // Log error with context
+  logger.error('Error occurred', {
+    error: {
+      message: err.message,
+      stack: err.stack,
+      name: err.name,
+    },
+    request: {
+      method: req.method,
+      url: req.url,
+      ip: req.ip,
+      userAgent: req.get('user-agent'),
+    },
+  });
+
+  // Operational errors (expected)
+  if (err instanceof AppError && err.isOperational) {
+    const response: any = {
+      status: 'error',
+      message: err.message,
+    };
+
+    // Add retry-after for rate limiting
+    if (err instanceof TooManyRequestsError && err.retryAfter) {
+      res.setHeader('Retry-After', err.retryAfter);
+      response.retryAfter = err.retryAfter;
+    }
+
+    return res.status(err.statusCode).json(response);
+  }
+
+  // Validation errors (Zod, express-validator)
+  if (err.name === 'ZodError') {
+    return res.status(400).json({
+      status: 'error',
+      message: 'Validation failed',
+      errors: (err as any).errors,
+    });
+  }
+
+  // Database constraint violations
+  if ((err as any).code === '23505') { // PostgreSQL unique violation
+    return res.status(409).json({
+      status: 'error',
+      message: 'Resource already exists',
+    });
+  }
+
+  if ((err as any).code === '23503') { // Foreign key violation
+    return res.status(400).json({
+      status: 'error',
+      message: 'Invalid reference',
+    });
+  }
+
+  // Unexpected errors (don't leak details in production)
+  res.status(500).json({
+    status: 'error',
+    message: process.env.NODE_ENV === 'production'
+      ? 'Internal server error'
+      : err.message,
+    ...(process.env.NODE_ENV !== 'production' && { stack: err.stack }),
+  });
+};
+```
+
+### Global Error Handlers
+
+```typescript
+// src/server.ts
+process.on('unhandledRejection', (reason: Error) => {
+  logger.error('Unhandled Rejection', { reason });
+  // Graceful shutdown
+  server.close(() => process.exit(1));
+});
+
+process.on('uncaughtException', (error: Error) => {
+  logger.error('Uncaught Exception', { error });
+  process.exit(1);
+});
+```
+
+## Request Validation
+
+### Zod Integration (Type-Safe)
+
+```typescript
+// src/schemas/userSchema.ts
+import { z } from 'zod';
+
+export const createUserSchema = z.object({
+  body: z.object({
+    email: z.string().email('Invalid email'),
+    password: z.string()
+      .min(8, 'Password must be at least 8 characters')
+      .regex(/[A-Z]/, 'Password must contain uppercase')
+      .regex(/[0-9]/, 'Password must contain number'),
+    name: z.string().min(2).max(100),
+    age: z.number().int().positive().max(150).optional(),
+  }),
+});
+
+export const getUserSchema = z.object({
+  params: z.object({
+    id: z.string().regex(/^\d+$/, 'ID must be numeric'),
+  }),
+});
+
+export const getUsersSchema = z.object({
+  query: z.object({
+    page: z.string().regex(/^\d+$/).transform(Number).default('1'),
+    limit: z.string().regex(/^\d+$/).transform(Number).default('10'),
+    search: z.string().optional(),
+    sortBy: z.enum(['name', 'created_at', 'updated_at']).optional(),
+    order: z.enum(['asc', 'desc']).optional(),
+  }),
+});
+
+// Type inference
+export type CreateUserInput = z.infer<typeof createUserSchema>['body'];
+export type GetUserParams = z.infer<typeof getUserSchema>['params'];
+export type GetUsersQuery = z.infer<typeof getUsersSchema>['query'];
+```
+
+**Validation middleware**:
+
+```typescript
+// src/middleware/validate.ts
+import { Request, Response, NextFunction } from 'express';
+import { AnyZodObject, ZodError } from 'zod';
+
+export const validate = (schema: AnyZodObject) => {
+  return async (req: Request, res: Response, next: NextFunction) => {
+    try {
+      const validated = await schema.parseAsync({
+        body: req.body,
+        query: req.query,
+        params: req.params,
+      });
+
+      // Replace with validated data (transforms applied)
+      req.body = validated.body || req.body;
+      req.query = validated.query || req.query;
+      req.params = validated.params || req.params;
+
+      next();
+    } catch (error) {
+      if (error instanceof ZodError) {
+        return res.status(400).json({
+          status: 'error',
+          message: 'Validation failed',
+          errors: error.errors.map(err => ({
+            field: err.path.join('.'),
+            message: err.message,
+            code: err.code,
+          })),
+        });
+      }
+      next(error);
+    }
+  };
+};
+```
+
+**Usage in routes**:
+
+```typescript
+import { Router } from 'express';
+import { validate } from '../middleware/validate';
+import * as schemas from '../schemas/userSchema';
+
+const router = Router();
+
+router.post('/', validate(schemas.createUserSchema), async (req, res) => {
+  // req.body is now typed as CreateUserInput
+  const user = await userService.create(req.body);
+  res.status(201).json(user);
+});
+
+router.get('/:id', validate(schemas.getUserSchema), async (req, res) => {
+  // req.params.id is validated
+  const user = await userService.findById(req.params.id);
+  if (!user) throw new NotFoundError('User not found');
+  res.json(user);
+});
+```
+
+## Database Connection Pooling
+
+### PostgreSQL with pg
+
+```typescript
+// src/config/database.ts
+import { Pool, PoolConfig } from 'pg';
+import { logger } from './logger';
+
+const config: PoolConfig = {
+  host: process.env.DB_HOST || 'localhost',
+  port: Number(process.env.DB_PORT) || 5432,
+  database: process.env.DB_NAME,
+  user: process.env.DB_USER,
+  password: process.env.DB_PASSWORD,
+  max: Number(process.env.DB_POOL_MAX) || 20,
+  idleTimeoutMillis: 30000,
+  connectionTimeoutMillis: 2000,
+  statement_timeout: 30000, // 30s query timeout
+};
+
+export const pool = new Pool(config);
+
+// Event handlers
+pool.on('connect', (client) => {
+  logger.debug('Database client connected');
+});
+
+pool.on('acquire', (client) => {
+  logger.debug('Client acquired from pool');
+});
+
+pool.on('error', (err, client) => {
+  logger.error('Unexpected pool error', { error: err });
+  process.exit(-1);
+});
+
+// Health check
+export const testConnection = async () => {
+  try {
+    const client = await pool.connect();
+    const result = await client.query('SELECT NOW()');
+    client.release();
+    logger.info('Database connection successful', {
+      serverTime: result.rows[0].now,
+    });
+  } catch (err) {
+    logger.error('Database connection failed', { error: err });
+    throw err;
+  }
+};
+
+// Graceful shutdown
+export const closePool = async () => {
+  logger.info('Closing database pool');
+  await pool.end();
+  logger.info('Database pool closed');
+};
+```
+
+### Transaction Helper
+
+```typescript
+// src/utils/transaction.ts
+import { Pool, PoolClient } from 'pg';
+
+export async function withTransaction<T>(
+  pool: Pool,
+  callback: (client: PoolClient) => Promise<T>
+): Promise<T> {
+  const client = await pool.connect();
+
+  try {
+    await client.query('BEGIN');
+    const result = await callback(client);
+    await client.query('COMMIT');
+    return result;
+  } catch (error) {
+    await client.query('ROLLBACK');
+    throw error;
+  } finally {
+    client.release();
+  }
+}
+
+// Usage
+import { pool } from '../config/database';
+
+async function createUserWithProfile(userData, profileData) {
+  return withTransaction(pool, async (client) => {
+    const userResult = await client.query(
+      'INSERT INTO users (email, name) VALUES ($1, $2) RETURNING id',
+      [userData.email, userData.name]
+    );
+    const userId = userResult.rows[0].id;
+
+    await client.query(
+      'INSERT INTO profiles (user_id, bio) VALUES ($1, $2)',
+      [userId, profileData.bio]
+    );
+
+    return userId;
+  });
+}
+```
+
+## Testing
+
+### Integration Tests with Supertest
+
+```typescript
+// tests/integration/userRoutes.test.ts
+import request from 'supertest';
+import app from '../../src/app';
+import { pool } from '../../src/config/database';
+
+describe('User Routes', () => {
+  beforeAll(async () => {
+    await pool.query('CREATE TABLE IF NOT EXISTS users (...)');
+  });
+
+  afterEach(async () => {
+    await pool.query('TRUNCATE TABLE users CASCADE');
+  });
+
+  afterAll(async () => {
+    await pool.end();
+  });
+
+  describe('POST /api/users', () => {
+    it('should create user with valid data', async () => {
+      const response = await request(app)
+        .post('/api/users')
+        .send({
+          email: 'test@example.com',
+          name: 'Test User',
+          password: 'Password123',
+        })
+        .expect(201);
+
+      expect(response.body).toHaveProperty('id');
+      expect(response.body.email).toBe('test@example.com');
+      expect(response.body).not.toHaveProperty('password');
+    });
+
+    it('should return 400 for invalid email', async () => {
+      const response = await request(app)
+        .post('/api/users')
+        .send({
+          email: 'invalid',
+          name: 'Test',
+          password: 'Password123',
+        })
+        .expect(400);
+
+      expect(response.body.status).toBe('error');
+      expect(response.body.errors).toContainEqual(
+        expect.objectContaining({
+          field: 'body.email',
+          message: expect.stringContaining('email'),
+        })
+      );
+    });
+  });
+
+  describe('GET /api/users/:id', () => {
+    it('should return user by ID', async () => {
+      const createRes = await request(app)
+        .post('/api/users')
+        .send({
+          email: 'test@example.com',
+          name: 'Test User',
+          password: 'Password123',
+        });
+
+      const response = await request(app)
+        .get(`/api/users/${createRes.body.id}`)
+        .expect(200);
+
+      expect(response.body.id).toBe(createRes.body.id);
+    });
+
+    it('should return 404 for non-existent user', async () => {
+      await request(app)
+        .get('/api/users/99999')
+        .expect(404);
+    });
+  });
+});
+```
+
+### Unit Tests with Mocks
+
+```typescript
+// tests/unit/userService.test.ts
+import { userService } from '../../src/services/userService';
+import { pool } from '../../src/config/database';
+
+jest.mock('../../src/config/database');
+
+const mockPool = pool as jest.Mocked<typeof pool>;
+
+describe('UserService', () => {
+  beforeEach(() => {
+    jest.clearAllMocks();
+  });
+
+  describe('findById', () => {
+    it('should return user when found', async () => {
+      mockPool.query.mockResolvedValue({
+        rows: [{ id: 1, email: 'test@example.com', name: 'Test' }],
+        command: 'SELECT',
+        rowCount: 1,
+        oid: 0,
+        fields: [],
+      });
+
+      const result = await userService.findById('1');
+
+      expect(result).toEqual(
+        expect.objectContaining({ id: 1, email: 'test@example.com' })
+      );
+    });
+
+    it('should return null when not found', async () => {
+      mockPool.query.mockResolvedValue({
+        rows: [],
+        command: 'SELECT',
+        rowCount: 0,
+        oid: 0,
+        fields: [],
+      });
+
+      const result = await userService.findById('999');
+      expect(result).toBeNull();
+    });
+  });
+});
+```
+
+## Production Deployment
+
+### PM2 Configuration
+
+```javascript
+// ecosystem.config.js
+module.exports = {
+  apps: [{
+    name: 'api',
+    script: './dist/server.js',
+    instances: 'max', // Use all CPU cores
+    exec_mode: 'cluster',
+    env: {
+      NODE_ENV: 'production',
+      PORT: 3000,
+    },
+    error_file: './logs/err.log',
+    out_file: './logs/out.log',
+    log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
+    merge_logs: true,
+    max_memory_restart: '500M',
+    wait_ready: true,
+    listen_timeout: 10000,
+    kill_timeout: 5000,
+  }],
+};
+```
+
+**Graceful shutdown with PM2**:
+
+```typescript
+// src/server.ts
+const server = app.listen(PORT, () => {
+  logger.info(`Server started on port ${PORT}`);
+
+  // Signal PM2 ready
+  if (process.send) {
+    process.send('ready');
+  }
+});
+
+// Graceful shutdown
+process.on('SIGINT', async () => {
+  logger.info('SIGINT received, closing server');
+
+  server.close(async () => {
+    await closePool();
+    logger.info('Server closed');
+    process.exit(0);
+  });
+
+  // Force shutdown after 10s
+  setTimeout(() => {
+    logger.error('Forcing shutdown');
+    process.exit(1);
+  }, 10000);
+});
+```
+
+### Dockerfile
+
+```dockerfile
+# Multi-stage build
+FROM node:18-alpine AS builder
+
+WORKDIR /app
+
+# Copy package files
+COPY package*.json ./
+COPY tsconfig.json ./
+
+# Install dependencies
+RUN npm ci
+
+# Copy source
+COPY src ./src
+
+# Build TypeScript
+RUN npm run build
+
+# Production image
+FROM node:18-alpine
+
+WORKDIR /app
+
+# Install production dependencies only
+COPY package*.json ./
+RUN npm ci --omit=dev && npm cache clean --force
+
+# Copy built files
+COPY --from=builder /app/dist ./dist
+
+# Create non-root user
+RUN addgroup -g 1001 -S nodejs && \
+    adduser -S nodejs -u 1001
+
+USER nodejs
+
+EXPOSE 3000
+
+CMD ["node", "dist/server.js"]
+```
+
+### Health Check Endpoint
+
+```typescript
+// src/routes/healthRoutes.ts
+import { Router } from 'express';
+import { pool } from '../config/database';
+
+const router = Router();
+
+router.get('/health', async (req, res) => {
+  const health = {
+    uptime: process.uptime(),
+    message: 'OK',
+    timestamp: Date.now(),
+  };
+
+  try {
+    await pool.query('SELECT 1');
+    health.database = 'connected';
+  } catch (error) {
+    health.database = 'disconnected';
+    return res.status(503).json(health);
+  }
+
+  res.json(health);
+});
+
+router.get('/health/ready', async (req, res) => {
+  // Readiness check
+  try {
+    await pool.query('SELECT 1');
+    res.status(200).json({ status: 'ready' });
+  } catch (error) {
+    res.status(503).json({ status: 'not ready' });
+  }
+});
+
+router.get('/health/live', (req, res) => {
+  // Liveness check (simpler)
+  res.status(200).json({ status: 'alive' });
+});
+
+export default router;
+```
+
+## Performance Optimization
+
+### Response Caching
+
+```typescript
+import Redis from 'ioredis';
+
+const redis = new Redis({
+  host: process.env.REDIS_HOST,
+  port: Number(process.env.REDIS_PORT),
+});
+
+export const cacheMiddleware = (duration: number) => {
+  return async (req: Request, res: Response, next: NextFunction) => {
+    if (req.method !== 'GET') return next();
+
+    const key = `cache:${req.originalUrl}`;
+
+    try {
+      const cached = await redis.get(key);
+      if (cached) {
+        return res.json(JSON.parse(cached));
+      }
+
+      // Capture response
+      const originalJson = res.json.bind(res);
+      res.json = (body: any) => {
+        redis.setex(key, duration, JSON.stringify(body));
+        return originalJson(body);
+      };
+
+      next();
+    } catch (error) {
+      next();
+    }
+  };
+};
+
+// Usage
+router.get('/users', cacheMiddleware(300), async (req, res) => {
+  const users = await userService.findAll();
+  res.json(users);
+});
+```
+
+## Security
+
+### Rate Limiting
+
+```typescript
+import rateLimit from 'express-rate-limit';
+import RedisStore from 'rate-limit-redis';
+import Redis from 'ioredis';
+
+const redis = new Redis();
+
+export const apiLimiter = rateLimit({
+  store: new RedisStore({ client: redis }),
+  windowMs: 15 * 60 * 1000, // 15 minutes
+  max: 100, // 100 requests per window
+  message: 'Too many requests, please try again later',
+  standardHeaders: true,
+  legacyHeaders: false,
+});
+
+export const authLimiter = rateLimit({
+  store: new RedisStore({ client: redis }),
+  windowMs: 15 * 60 * 1000,
+  max: 5, // 5 attempts
+  skipSuccessfulRequests: true,
+});
+
+// Usage
+app.use('/api/', apiLimiter);
+app.use('/api/auth/login', authLimiter);
+```
+
+## Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| **No async error handling** | Crashes server | Use asyncHandler or express-async-errors |
+| **Inconsistent error responses** | Poor DX | Centralized error handler |
+| **New DB connection per request** | Exhausts connections | Use connection pool |
+| **No graceful shutdown** | Data loss, broken requests | Handle SIGTERM/SIGINT |
+| **Logging to console in production** | Lost logs, no structure | Use Winston/Pino with transports |
+| **No request validation** | Security vulnerabilities | Zod/express-validator |
+| **Synchronous operations in routes** | Blocks event loop | Use async/await |
+| **No health checks** | Can't monitor service | /health endpoints |
+
+## Cross-References
+
+**Related skills**:
+- **Database patterns** → `database-integration` (pooling, transactions)
+- **API testing** → `api-testing` (supertest patterns)
+- **REST design** → `rest-api-design` (endpoint patterns)
+- **Authentication** → `api-authentication` (JWT, sessions)
+
+## Further Reading
+
+- **Express docs**: https://expressjs.com/
+- **Express.js Best Practices**: https://expressjs.com/en/advanced/best-practice-performance.html
+- **Node.js Production Best Practices**: https://github.com/goldbergyoni/nodebestpractices
--- a/skills/using-web-backend/fastapi-development.md
+++ b/skills/using-web-backend/fastapi-development.md
@@ -0,0 +1,500 @@
+
+# FastAPI Development
+
+## Overview
+
+**FastAPI specialist skill providing production-ready patterns, anti-patterns to avoid, and testing strategies.**
+
+**Core principle**: FastAPI's type hints, dependency injection, and async-first design enable fast, maintainable APIs - but require understanding async/sync boundaries, proper dependency management, and production hardening patterns.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **Dependency injection**: Database connections, auth, shared resources, testing overrides
+- **Async/sync boundaries**: Mixing blocking I/O with async endpoints, performance issues
+- **Background tasks**: Choosing between BackgroundTasks, Celery, or other task queues
+- **File uploads**: Streaming large files, memory management
+- **Testing**: Dependency overrides, async test clients, fixture patterns
+- **Production deployment**: ASGI servers, lifespan management, connection pooling
+- **Security**: SQL injection, CORS, authentication patterns
+- **Performance**: Connection pooling, query optimization, caching
+
+## Quick Reference - Common Patterns
+
+| Pattern | Use Case | Code Snippet |
+|---------|----------|--------------|
+| **DB dependency with pooling** | Per-request database access | `def get_db(): db = SessionLocal(); try: yield db; finally: db.close()` |
+| **Dependency override for testing** | Test with mock/test DB | `app.dependency_overrides[get_db] = override_get_db` |
+| **Lifespan events** | Startup/shutdown resources | `@asynccontextmanager async def lifespan(app): ... yield ...` |
+| **Streaming file upload** | Large files without memory issues | `async with aiofiles.open(...) as f: while chunk := await file.read(CHUNK_SIZE): await f.write(chunk)` |
+| **Background tasks (short)** | < 30 sec tasks | `background_tasks.add_task(func, args)` |
+| **Task queue (long)** | > 1 min tasks, retries needed | Use Celery/Arq with Redis |
+| **Parameterized queries** | Prevent SQL injection | `cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))` |
+
+## Core Patterns
+
+### 1. Dependency Injection Architecture
+
+**Pattern: Connection pooling with yield dependencies**
+
+```python
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker, Session
+from fastapi import Depends, FastAPI
+
+# One-time pool creation at module level
+engine = create_engine(
+    "postgresql://user:pass@localhost/db",
+    pool_size=20,          # Max connections
+    max_overflow=0,        # No overflow beyond pool_size
+    pool_pre_ping=True,    # Verify connection health before use
+    pool_recycle=3600      # Recycle connections every hour
+)
+SessionLocal = sessionmaker(bind=engine, expire_on_commit=False)
+
+# Dependency pattern with automatic cleanup
+def get_db() -> Session:
+    """
+    Yields database session from pool.
+    Ensures cleanup even if endpoint raises exception.
+    """
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+# Usage in endpoints
+@app.get("/items/{item_id}")
+def get_item(item_id: int, db: Session = Depends(get_db)):
+    return db.query(Item).filter(Item.id == item_id).first()
+```
+
+**Why this pattern**:
+- Pool created once (expensive operation)
+- Per-request connections from pool (cheap)
+- `yield` ensures cleanup on success AND exceptions
+- `pool_pre_ping` prevents stale connection errors
+- `pool_recycle` prevents long-lived connection issues
+
+**Testing pattern**:
+
+```python
+# conftest.py
+import pytest
+from fastapi.testclient import TestClient
+
+@pytest.fixture
+def test_db():
+    """Test database fixture"""
+    db = TestSessionLocal()
+    try:
+        yield db
+    finally:
+        db.rollback()
+        db.close()
+
+@pytest.fixture
+def client(test_db):
+    """Test client with overridden dependencies"""
+    def override_get_db():
+        yield test_db
+
+    app.dependency_overrides[get_db] = override_get_db
+    with TestClient(app) as c:
+        yield c
+    app.dependency_overrides.clear()
+
+# test_items.py
+def test_get_item(client, test_db):
+    # Setup test data
+    test_db.add(Item(id=1, name="Test"))
+    test_db.commit()
+
+    # Test endpoint
+    response = client.get("/items/1")
+    assert response.status_code == 200
+```
+
+### 2. Async/Sync Boundary Management
+
+**❌ Anti-pattern: Blocking calls in async endpoints**
+
+```python
+# BAD - Blocks event loop
+@app.get("/users/{user_id}")
+async def get_user(user_id: int):
+    conn = psycopg2.connect(...)  # Blocking!
+    cursor = conn.cursor()
+    cursor.execute(...)           # Blocking!
+    return cursor.fetchone()
+```
+
+**✅ Pattern: Use async libraries or run_in_threadpool**
+
+```python
+# GOOD Option 1: Async database library
+from databases import Database
+
+database = Database("postgresql://...")
+
+@app.get("/users/{user_id}")
+async def get_user(user_id: int):
+    query = "SELECT * FROM users WHERE id = :user_id"
+    return await database.fetch_one(query=query, values={"user_id": user_id})
+
+# GOOD Option 2: Run blocking code in thread pool
+from fastapi.concurrency import run_in_threadpool
+
+def blocking_db_call(user_id: int):
+    conn = psycopg2.connect(...)
+    cursor = conn.cursor()
+    cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
+    return cursor.fetchone()
+
+@app.get("/users/{user_id}")
+async def get_user(user_id: int):
+    return await run_in_threadpool(blocking_db_call, user_id)
+```
+
+**Decision table**:
+
+| Scenario | Use |
+|----------|-----|
+| PostgreSQL with async needed | `asyncpg` or `databases` library |
+| PostgreSQL, sync is fine | `psycopg2` with `def` (not `async def`) endpoints |
+| MySQL with async | `aiomysql` |
+| SQLite | `aiosqlite` (async) or sync with `def` endpoints |
+| External API calls | `httpx.AsyncClient` |
+| CPU-intensive work | `run_in_threadpool` or Celery |
+
+### 3. Lifespan Management (Modern Pattern)
+
+**✅ Use lifespan context manager** (replaces deprecated `@app.on_event`)
+
+```python
+from contextlib import asynccontextmanager
+from fastapi import FastAPI
+
+# Global resources
+resources = {}
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # Startup
+    resources["db_pool"] = await create_async_pool(
+        "postgresql://...",
+        min_size=10,
+        max_size=20
+    )
+    resources["redis"] = await aioredis.create_redis_pool("redis://...")
+    resources["ml_model"] = load_ml_model()  # Can be sync or async
+
+    yield  # Application runs
+
+    # Shutdown
+    await resources["db_pool"].close()
+    resources["redis"].close()
+    await resources["redis"].wait_closed()
+    resources.clear()
+
+app = FastAPI(lifespan=lifespan)
+
+# Access resources in endpoints
+@app.get("/predict")
+async def predict(data: dict):
+    model = resources["ml_model"]
+    return {"prediction": model.predict(data)}
+```
+
+### 4. File Upload Patterns
+
+**For 100MB+ files: Stream to disk, never load into memory**
+
+```python
+from fastapi import UploadFile, File, HTTPException
+import aiofiles
+import os
+
+UPLOAD_DIR = "/var/uploads"
+CHUNK_SIZE = 1024 * 1024  # 1MB chunks
+MAX_FILE_SIZE = 500 * 1024 * 1024  # 500MB
+
+@app.post("/upload")
+async def upload_large_file(file: UploadFile = File(...)):
+    # Validate content type
+    if not file.content_type.startswith("video/"):
+        raise HTTPException(400, "Only video files accepted")
+
+    filepath = os.path.join(UPLOAD_DIR, f"{uuid.uuid4()}_{file.filename}")
+    size = 0
+
+    try:
+        async with aiofiles.open(filepath, 'wb') as f:
+            while chunk := await file.read(CHUNK_SIZE):
+                size += len(chunk)
+                if size > MAX_FILE_SIZE:
+                    raise HTTPException(413, "File too large")
+                await f.write(chunk)
+    except Exception as e:
+        # Cleanup on failure
+        if os.path.exists(filepath):
+            os.remove(filepath)
+        raise
+
+    return {"filename": file.filename, "size": size}
+```
+
+**For very large files (1GB+): Direct S3 upload with presigned URLs**
+
+```python
+import boto3
+
+@app.post("/upload/presigned-url")
+async def get_presigned_upload_url(filename: str):
+    s3_client = boto3.client('s3')
+    presigned_post = s3_client.generate_presigned_post(
+        Bucket='my-bucket',
+        Key=f'uploads/{uuid.uuid4()}_{filename}',
+        ExpiresIn=3600
+    )
+    return presigned_post  # Client uploads directly to S3
+```
+
+### 5. Background Task Decision Matrix
+
+| Task Duration | Needs Retries? | Needs Monitoring? | Solution |
+|---------------|----------------|-------------------|----------|
+| < 30 seconds | No | No | `BackgroundTasks` |
+| < 30 seconds | Yes | Maybe | Celery/Arq |
+| > 1 minute | Don't care | Don't care | Celery/Arq |
+| Any | Yes | Yes | Celery/Arq with monitoring |
+
+**BackgroundTasks pattern** (simple, in-process):
+
+```python
+from fastapi import BackgroundTasks
+
+async def send_email(email: str):
+    await asyncio.sleep(2)  # Async work
+    print(f"Email sent to {email}")
+
+@app.post("/register")
+async def register(email: str, background_tasks: BackgroundTasks):
+    # ... save user ...
+    background_tasks.add_task(send_email, email)
+    return {"status": "registered"}  # Returns immediately
+```
+
+**Celery pattern** (distributed, persistent):
+
+```python
+# celery_app.py
+from celery import Celery
+
+celery_app = Celery('tasks', broker='redis://localhost:6379/0')
+
+@celery_app.task(bind=True, max_retries=3)
+def process_video(self, filepath: str):
+    try:
+        # Long-running work
+        extract_frames(filepath)
+    except Exception as exc:
+        raise self.retry(exc=exc, countdown=60)
+
+# main.py
+from celery_app import process_video
+
+@app.post("/upload")
+async def upload(file: UploadFile):
+    filepath = await save_file(file)
+    task = process_video.delay(filepath)
+    return {"task_id": task.id}
+
+@app.get("/status/{task_id}")
+async def get_status(task_id: str):
+    from celery_app import celery_app
+    result = celery_app.AsyncResult(task_id)
+    return {"status": result.state, "result": result.result}
+```
+
+## Security Patterns
+
+### SQL Injection Prevention
+
+**❌ NEVER use f-strings or string concatenation**
+
+```python
+# DANGEROUS
+cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
+cursor.execute("SELECT * FROM users WHERE email = '" + email + "'")
+```
+
+**✅ ALWAYS use parameterized queries**
+
+```python
+# SQLAlchemy ORM (safe)
+db.query(User).filter(User.id == user_id).first()
+
+# Raw SQL (safe with parameters)
+cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
+cursor.execute("SELECT * FROM users WHERE email = :email", {"email": email})
+```
+
+### CORS Configuration
+
+```python
+from fastapi.middleware.cors import CORSMiddleware
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["https://yourdomain.com"],  # Specific origins, not "*" in production
+    allow_credentials=True,
+    allow_methods=["GET", "POST", "PUT", "DELETE"],
+    allow_headers=["*"],
+)
+```
+
+### Authentication Pattern
+
+```python
+from fastapi import Depends, HTTPException, status
+from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
+
+security = HTTPBearer()
+
+async def get_current_user(credentials: HTTPAuthorizationCredentials = Depends(security)):
+    token = credentials.credentials
+    try:
+        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
+        user_id = payload.get("sub")
+        if not user_id:
+            raise HTTPException(status.HTTP_401_UNAUTHORIZED, "Invalid token")
+        return await get_user_by_id(user_id)
+    except jwt.JWTError:
+        raise HTTPException(status.HTTP_401_UNAUTHORIZED, "Invalid token")
+
+@app.get("/protected")
+async def protected_route(current_user = Depends(get_current_user)):
+    return {"user": current_user}
+```
+
+## Middleware Ordering
+
+**Critical: Middleware wraps in order added, executes in reverse for responses**
+
+```python
+# Correct order:
+app.add_middleware(CORSMiddleware, ...)       # 1. FIRST - handles preflight
+app.add_middleware(RequestLoggingMiddleware)  # 2. Logs entire request
+app.add_middleware(ErrorHandlingMiddleware)   # 3. Catches errors from auth/routes
+app.add_middleware(AuthenticationMiddleware)  # 4. LAST - closest to routes
+```
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| Global database connection | Not thread-safe, connection leaks | Use connection pool with dependency injection |
+| `async def` with blocking I/O | Blocks event loop, kills performance | Use async libraries or `run_in_threadpool` |
+| `time.sleep()` in async code | Blocks entire event loop | Use `asyncio.sleep()` |
+| Loading large files into memory | Memory exhaustion, OOM crashes | Stream with `aiofiles` and chunks |
+| BackgroundTasks for long work | Lost on restart, no retries | Use Celery/Arq |
+| String formatting in SQL | SQL injection vulnerability | Parameterized queries only |
+| `allow_origins=["*"]` with credentials | Security vulnerability | Specify exact origins |
+| Not closing database connections | Connection pool exhaustion | Use `yield` in dependencies |
+
+## Testing Best Practices
+
+```python
+import pytest
+from fastapi.testclient import TestClient
+from httpx import AsyncClient
+
+# Sync tests (simpler, faster for most cases)
+def test_read_item(client):
+    response = client.get("/items/1")
+    assert response.status_code == 200
+
+# Async tests (needed for testing async endpoints with real async operations)
+@pytest.mark.asyncio
+async def test_async_endpoint():
+    async with AsyncClient(app=app, base_url="http://test") as ac:
+        response = await ac.get("/items/1")
+    assert response.status_code == 200
+
+# Dependency override pattern
+def test_with_mock_db(client):
+    def override_get_db():
+        yield mock_db
+
+    app.dependency_overrides[get_db] = override_get_db
+    response = client.get("/items/1")
+    app.dependency_overrides.clear()
+    assert response.status_code == 200
+```
+
+## Production Deployment
+
+**ASGI server configuration** (Uvicorn + Gunicorn):
+
+```bash
+# gunicorn with uvicorn workers (production)
+gunicorn main:app \
+  --workers 4 \
+  --worker-class uvicorn.workers.UvicornWorker \
+  --bind 0.0.0.0:8000 \
+  --timeout 120 \
+  --graceful-timeout 30 \
+  --keep-alive 5
+```
+
+**Environment-based configuration**:
+
+```python
+from pydantic_settings import BaseSettings
+
+class Settings(BaseSettings):
+    database_url: str
+    redis_url: str
+    secret_key: str
+    debug: bool = False
+
+    class Config:
+        env_file = ".env"
+
+settings = Settings()
+
+# Use in app
+engine = create_engine(settings.database_url)
+```
+
+## Cross-References
+
+**Related skills**:
+- **Security** → `ordis-security-architect` (threat modeling, OWASP top 10)
+- **Python patterns** → `axiom-python-engineering` (async patterns, type hints)
+- **API testing** → `api-testing` (contract testing, integration tests)
+- **API documentation** → `api-documentation` or `muna-technical-writer`
+- **Database optimization** → `database-integration` (query optimization, migrations)
+- **Authentication deep dive** → `api-authentication` (OAuth2, JWT patterns)
+- **GraphQL alternative** → `graphql-api-design`
+
+## Performance Tips
+
+1. **Use connection pooling** - Create pool once, not per-request
+2. **Enable response caching** - Use `fastapi-cache2` for expensive queries
+3. **Limit response size** - Paginate large result sets
+4. **Use async for I/O** - Database, HTTP calls, file operations
+5. **Profile slow endpoints** - Use `starlette-prometheus` for monitoring
+6. **Enable gzip compression** - `GZipMiddleware` for large JSON responses
+
+## When NOT to Use FastAPI
+
+- **Simple CRUD with admin panel** → Django (has built-in admin)
+- **Heavy template rendering** → Django or Flask
+- **Mature ecosystem needed** → Django (more third-party packages)
+- **Team unfamiliar with async** → Flask or Django (simpler mental model)
+
+FastAPI excels at: Modern APIs, microservices, ML model serving, real-time features, high performance requirements.
--- a/skills/using-web-backend/graphql-api-design.md
+++ b/skills/using-web-backend/graphql-api-design.md
@@ -0,0 +1,954 @@
+
+# GraphQL API Design
+
+## Overview
+
+**GraphQL API specialist covering schema design, query optimization, real-time subscriptions, federation, and production patterns.**
+
+**Core principle**: GraphQL enables clients to request exactly the data they need in a single query - but requires careful schema design, batching strategies, and security measures to prevent performance and security issues.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **N+1 query problems**: Too many database queries for nested resolvers
+- **Schema design**: Types, interfaces, unions, input types, directives
+- **Pagination**: Connections, cursors, offset patterns
+- **Performance**: Query complexity, caching, batching, persisted queries
+- **Real-time**: Subscriptions, WebSocket patterns, live queries
+- **Federation**: Splitting schema across multiple services
+- **Security**: Query depth limiting, cost analysis, allowlisting
+- **Testing**: Schema validation, resolver testing, integration tests
+- **Migrations**: Schema evolution, deprecation, versioning
+
+**Do NOT use for**:
+- REST API design → `rest-api-design`
+- Framework-specific implementation → `fastapi-development`, `express-development`
+- Microservices architecture → `microservices-architecture` (use with Federation)
+
+## GraphQL vs REST Decision Matrix
+
+| Factor | Choose GraphQL | Choose REST |
+|--------|----------------|-------------|
+| **Client needs** | Mobile apps, varying data needs | Uniform data requirements |
+| **Over/under-fetching** | Problem | Not a problem |
+| **Real-time features** | Subscriptions built-in | Need SSE/WebSockets separately |
+| **Schema-first** | Strong typing required | Flexible, schema optional |
+| **Caching** | Complex (field-level) | Simple (HTTP caching) |
+| **File uploads** | Non-standard (multipart) | Native (multipart/form-data) |
+| **Team expertise** | GraphQL experience | REST experience |
+| **API consumers** | Known clients | Public/third-party |
+| **Rate limiting** | Complex (field-level) | Simple (endpoint-level) |
+
+**Hybrid approach**: GraphQL for internal/mobile, REST for public APIs
+
+## Quick Reference - Core Patterns
+
+| Pattern | Use Case | Key Concept |
+|---------|----------|-------------|
+| **DataLoader** | N+1 queries | Batch and cache within request |
+| **Connection** | Pagination | Cursor-based with edges/nodes |
+| **Union** | Heterogeneous results | Search, activity feeds |
+| **Interface** | Shared fields | Polymorphic types with guarantees |
+| **Directive** | Field behavior | @auth, @deprecated, custom logic |
+| **Input types** | Mutations | Type-safe input validation |
+| **Federation** | Microservices | Distributed schema composition |
+| **Subscription** | Real-time | WebSocket-based live updates |
+
+## N+1 Query Optimization
+
+### The Problem
+
+```javascript
+// Schema
+type Post {
+  id: ID!
+  title: String!
+  author: User!  // Requires fetching user
+}
+
+type Query {
+  posts: [Post!]!
+}
+
+// Naive resolver (N+1 problem)
+const resolvers = {
+  Query: {
+    posts: () => db.posts.findAll()  // 1 query
+  },
+  Post: {
+    author: (post) => db.users.findOne(post.authorId)  // N queries!
+  }
+};
+
+// Result: 100 posts = 101 database queries
+```
+
+### DataLoader Solution
+
+```javascript
+const DataLoader = require('dataloader');
+
+// Batch loading function
+const batchUsers = async (userIds) => {
+  const users = await db.users.findMany({
+    where: { id: { in: userIds } }
+  });
+
+  // CRITICAL: Return in same order as requested IDs
+  const userMap = new Map(users.map(u => [u.id, u]));
+  return userIds.map(id => userMap.get(id) || null);
+};
+
+// Create loader per-request (avoid stale cache)
+const createLoaders = () => ({
+  user: new DataLoader(batchUsers),
+  post: new DataLoader(batchPosts),
+  // ... other loaders
+});
+
+// Add to context
+const server = new ApolloServer({
+  typeDefs,
+  resolvers,
+  context: () => ({
+    loaders: createLoaders(),
+    db,
+    user: getCurrentUser()
+  })
+});
+
+// Use in resolver
+const resolvers = {
+  Post: {
+    author: (post, args, { loaders }) => {
+      return loaders.user.load(post.authorId);  // Batched!
+    }
+  }
+};
+```
+
+**Result**: 100 posts = 2 queries (1 for posts, 1 batched for unique authors)
+
+### Advanced DataLoader Patterns
+
+**Composite Keys**:
+
+```javascript
+// For multi-field lookups
+const batchUsersByEmail = async (keys) => {
+  // keys = [{domain: 'example.com', email: 'user@example.com'}, ...]
+  const users = await db.users.findMany({
+    where: {
+      OR: keys.map(k => ({ email: k.email, domain: k.domain }))
+    }
+  });
+
+  const userMap = new Map(
+    users.map(u => [`${u.domain}:${u.email}`, u])
+  );
+
+  return keys.map(k => userMap.get(`${k.domain}:${k.email}`));
+};
+
+const userByEmailLoader = new DataLoader(batchUsersByEmail, {
+  cacheKeyFn: (key) => `${key.domain}:${key.email}`
+});
+```
+
+**Priming Cache**:
+
+```javascript
+// After fetching posts, prime user loader
+const posts = await db.posts.findAll();
+posts.forEach(post => {
+  if (post.authorData) {
+    loaders.user.prime(post.authorId, post.authorData);
+  }
+});
+return posts;
+```
+
+**Error Handling in Batch**:
+
+```javascript
+const batchUsers = async (userIds) => {
+  const users = await db.users.findMany({
+    where: { id: { in: userIds } }
+  });
+
+  const userMap = new Map(users.map(u => [u.id, u]));
+
+  return userIds.map(id => {
+    const user = userMap.get(id);
+    if (!user) {
+      return new Error(`User ${id} not found`);  // Per-item error
+    }
+    return user;
+  });
+};
+```
+
+## Schema Design Patterns
+
+### Interface vs Union
+
+**Interface** (shared fields enforced):
+
+```graphql
+interface Node {
+  id: ID!
+}
+
+interface Timestamped {
+  createdAt: DateTime!
+  updatedAt: DateTime!
+}
+
+type User implements Node & Timestamped {
+  id: ID!
+  createdAt: DateTime!
+  updatedAt: DateTime!
+  email: String!
+  name: String!
+}
+
+type Post implements Node & Timestamped {
+  id: ID!
+  createdAt: DateTime!
+  updatedAt: DateTime!
+  title: String!
+  content: String!
+}
+
+type Query {
+  node(id: ID!): Node  # Can return any Node implementer
+  nodes(ids: [ID!]!): [Node!]!
+}
+```
+
+**Query**:
+```graphql
+{
+  node(id: "user_123") {
+    id
+    ... on User {
+      email
+      name
+    }
+    ... on Post {
+      title
+    }
+  }
+}
+```
+
+**Union** (no shared fields required):
+
+```graphql
+union SearchResult = User | Post | Comment
+
+type Query {
+  search(query: String!): [SearchResult!]!
+}
+```
+
+**When to use each**:
+
+| Use Case | Pattern | Why |
+|----------|---------|-----|
+| Global ID lookup | Interface (Node) | Guarantees `id` field |
+| Polymorphic lists with shared fields | Interface | Can query shared fields without fragments |
+| Heterogeneous results | Union | No shared field requirements |
+| Activity feeds | Union | Different event types |
+| Search results | Union | Mixed content types |
+
+### Input Types and Validation
+
+```graphql
+input CreatePostInput {
+  title: String!
+  content: String!
+  tags: [String!]
+  publishedAt: DateTime
+}
+
+input UpdatePostInput {
+  title: String
+  content: String
+  tags: [String!]
+}
+
+type Mutation {
+  createPost(input: CreatePostInput!): Post!
+  updatePost(id: ID!, input: UpdatePostInput!): Post!
+}
+```
+
+**Benefits**:
+- Reusable across multiple mutations
+- Clear separation of create vs update requirements
+- Type-safe in generated code
+- Can add descriptions per field
+
+### Custom Directives
+
+```graphql
+directive @auth(requires: Role = USER) on FIELD_DEFINITION
+directive @rateLimit(limit: Int!, window: Int!) on FIELD_DEFINITION
+directive @deprecated(reason: String) on FIELD_DEFINITION | ENUM_VALUE
+
+enum Role {
+  USER
+  ADMIN
+  SUPER_ADMIN
+}
+
+type Query {
+  publicData: String
+  userData: User @auth(requires: USER)
+  adminData: String @auth(requires: ADMIN)
+  expensiveQuery: Result @rateLimit(limit: 10, window: 60)
+}
+
+type User {
+  id: ID!
+  email: String! @auth(requires: USER)  # Only authenticated users
+  internalId: String @deprecated(reason: "Use `id` instead")
+}
+```
+
+## Pagination Patterns
+
+### Relay Connection Specification
+
+**Standard connection pattern**:
+
+```graphql
+type PostConnection {
+  edges: [PostEdge!]!
+  pageInfo: PageInfo!
+  totalCount: Int  # Optional
+}
+
+type PostEdge {
+  node: Post!
+  cursor: String!
+}
+
+type PageInfo {
+  hasNextPage: Boolean!
+  hasPreviousPage: Boolean!
+  startCursor: String
+  endCursor: String
+}
+
+type Query {
+  posts(
+    first: Int
+    after: String
+    last: Int
+    before: String
+  ): PostConnection!
+}
+```
+
+**Implementation**:
+
+```javascript
+const resolvers = {
+  Query: {
+    posts: async (parent, { first, after, last, before }) => {
+      const limit = first || last || 10;
+      const cursor = after || before;
+
+      // Decode cursor
+      const offset = cursor ? decodeCursor(cursor) : 0;
+
+      // Fetch one extra to determine hasNextPage
+      const posts = await db.posts.findMany({
+        skip: offset,
+        take: limit + 1,
+        orderBy: { createdAt: 'desc' }
+      });
+
+      const hasNextPage = posts.length > limit;
+      const edges = posts.slice(0, limit).map((post, index) => ({
+        node: post,
+        cursor: encodeCursor(offset + index)
+      }));
+
+      return {
+        edges,
+        pageInfo: {
+          hasNextPage,
+          hasPreviousPage: offset > 0,
+          startCursor: edges[0]?.cursor,
+          endCursor: edges[edges.length - 1]?.cursor
+        }
+      };
+    }
+  }
+};
+
+// Opaque cursor encoding
+const encodeCursor = (offset) =>
+  Buffer.from(`arrayconnection:${offset}`).toString('base64');
+const decodeCursor = (cursor) =>
+  parseInt(Buffer.from(cursor, 'base64').toString().split(':')[1]);
+```
+
+**Alternative: Offset pagination** (simpler but less robust):
+
+```graphql
+type PostPage {
+  items: [Post!]!
+  total: Int!
+  page: Int!
+  pageSize: Int!
+}
+
+type Query {
+  posts(page: Int = 1, pageSize: Int = 20): PostPage!
+}
+```
+
+## Performance Optimization
+
+### Query Complexity Analysis
+
+**Prevent expensive queries**:
+
+```javascript
+const depthLimit = require('graphql-depth-limit');
+const { createComplexityLimitRule } = require('graphql-validation-complexity');
+
+const server = new ApolloServer({
+  typeDefs,
+  resolvers,
+  validationRules: [
+    depthLimit(10),  // Max 10 levels deep
+    createComplexityLimitRule(1000, {
+      scalarCost: 1,
+      objectCost: 2,
+      listFactor: 10
+    })
+  ]
+});
+```
+
+**Custom complexity**:
+
+```graphql
+type Query {
+  posts(first: Int!): [Post!]! @cost(complexity: 10, multipliers: ["first"])
+  expensiveAnalytics: AnalyticsReport! @cost(complexity: 1000)
+}
+```
+
+### Automatic Persisted Queries (APQ)
+
+**Client sends hash instead of full query**:
+
+```javascript
+// Client
+const query = gql`
+  query GetUser($id: ID!) {
+    user(id: $id) { name email }
+  }
+`;
+
+const queryHash = sha256(query);
+
+// First request: Send hash only
+fetch('/graphql', {
+  body: JSON.stringify({
+    extensions: {
+      persistedQuery: {
+        version: 1,
+        sha256Hash: queryHash
+      }
+    },
+    variables: { id: '123' }
+  })
+});
+
+// If server doesn't have it (PersistedQueryNotFound)
+// Second request: Send full query + hash
+fetch('/graphql', {
+  body: JSON.stringify({
+    query,
+    extensions: {
+      persistedQuery: {
+        version: 1,
+        sha256Hash: queryHash
+      }
+    },
+    variables: { id: '123' }
+  })
+});
+
+// Future requests: Just send hash
+```
+
+**Benefits**:
+- Reduced bandwidth (hash << full query)
+- CDN caching of GET requests
+- Query allowlisting (if configured)
+
+### Field-Level Caching
+
+```javascript
+const resolvers = {
+  Query: {
+    user: async (parent, { id }, { cache }) => {
+      const cacheKey = `user:${id}`;
+      const cached = await cache.get(cacheKey);
+      if (cached) return JSON.parse(cached);
+
+      const user = await db.users.findOne(id);
+      await cache.set(cacheKey, JSON.stringify(user), { ttl: 300 });
+      return user;
+    }
+  }
+};
+```
+
+## Subscriptions (Real-Time)
+
+### Basic Subscription
+
+```graphql
+type Subscription {
+  postAdded: Post!
+  commentAdded(postId: ID!): Comment!
+}
+
+type Mutation {
+  createPost(input: CreatePostInput!): Post!
+}
+```
+
+**Implementation (Apollo Server)**:
+
+```javascript
+const { PubSub } = require('graphql-subscriptions');
+const pubsub = new PubSub();
+
+const resolvers = {
+  Mutation: {
+    createPost: async (parent, { input }) => {
+      const post = await db.posts.create(input);
+      pubsub.publish('POST_ADDED', { postAdded: post });
+      return post;
+    }
+  },
+  Subscription: {
+    postAdded: {
+      subscribe: () => pubsub.asyncIterator(['POST_ADDED'])
+    },
+    commentAdded: {
+      subscribe: (parent, { postId }) =>
+        pubsub.asyncIterator([`COMMENT_ADDED_${postId}`])
+    }
+  }
+};
+
+// Client
+subscription {
+  postAdded {
+    id
+    title
+    author { name }
+  }
+}
+```
+
+### Scaling Subscriptions
+
+**Problem**: In-memory PubSub doesn't work across servers
+
+**Solution**: Redis PubSub
+
+```javascript
+const { RedisPubSub } = require('graphql-redis-subscriptions');
+const Redis = require('ioredis');
+
+const pubsub = new RedisPubSub({
+  publisher: new Redis(),
+  subscriber: new Redis()
+});
+
+// Now works across multiple server instances
+```
+
+### Subscription Authorization
+
+```javascript
+const resolvers = {
+  Subscription: {
+    secretDataUpdated: {
+      subscribe: withFilter(
+        () => pubsub.asyncIterator(['SECRET_DATA']),
+        (payload, variables, context) => {
+          // Only admin users can subscribe
+          return context.user?.role === 'ADMIN';
+        }
+      )
+    }
+  }
+};
+```
+
+## Federation (Distributed Schema)
+
+**Split schema across multiple services**:
+
+### User Service
+
+```graphql
+# user-service schema
+type User @key(fields: "id") {
+  id: ID!
+  email: String!
+  name: String!
+}
+
+type Query {
+  user(id: ID!): User
+}
+```
+
+### Post Service
+
+```graphql
+# post-service schema
+extend type User @key(fields: "id") {
+  id: ID! @external
+  posts: [Post!]!
+}
+
+type Post {
+  id: ID!
+  title: String!
+  content: String!
+  authorId: ID!
+  author: User!
+}
+```
+
+### Gateway
+
+Composes schemas and routes requests:
+
+```javascript
+const { ApolloGateway } = require('@apollo/gateway');
+
+const gateway = new ApolloGateway({
+  serviceList: [
+    { name: 'users', url: 'http://user-service:4001/graphql' },
+    { name: 'posts', url: 'http://post-service:4002/graphql' }
+  ]
+});
+
+const server = new ApolloServer({
+  gateway,
+  subscriptions: false  // Not yet supported in federation
+});
+```
+
+**Reference Resolver** (fetch extended fields):
+
+```javascript
+// post-service resolvers
+const resolvers = {
+  User: {
+    __resolveReference: async (user) => {
+      // Receive { __typename: 'User', id: '123' }
+      // Don't need to fetch user, just return it for field resolution
+      return user;
+    },
+    posts: async (user) => {
+      return db.posts.findMany({ where: { authorId: user.id } });
+    }
+  }
+};
+```
+
+## Security Patterns
+
+### Query Depth Limiting
+
+```javascript
+const depthLimit = require('graphql-depth-limit');
+
+const server = new ApolloServer({
+  validationRules: [depthLimit(7)]  // Max 7 levels deep
+});
+
+// Prevents: user { posts { author { posts { author { ... } } } }
+```
+
+### Query Allowlisting (Production)
+
+```javascript
+const allowedQueries = new Map([
+  ['GetUser', 'query GetUser($id: ID!) { user(id: $id) { name } }'],
+  ['ListPosts', 'query ListPosts { posts { title } }']
+]);
+
+const server = new ApolloServer({
+  validationRules: [
+    (context) => ({
+      Document(node) {
+        const queryName = node.definitions[0]?.name?.value;
+        if (!allowedQueries.has(queryName)) {
+          context.reportError(
+            new GraphQLError('Query not allowed')
+          );
+        }
+      }
+    })
+  ]
+});
+```
+
+### Rate Limiting (Field-Level)
+
+```javascript
+const { shield, rule, and } = require('graphql-shield');
+
+const isRateLimited = rule({ cache: 'contextual' })(
+  async (parent, args, ctx, info) => {
+    const key = `rate:${ctx.user.id}:${info.fieldName}`;
+    const count = await redis.incr(key);
+    if (count === 1) {
+      await redis.expire(key, 60);  // 1 minute window
+    }
+    return count <= 10;  // 10 requests per minute
+  }
+);
+
+const permissions = shield({
+  Query: {
+    expensiveQuery: isRateLimited
+  }
+});
+```
+
+## Schema Evolution
+
+### Deprecation
+
+```graphql
+type User {
+  id: ID!
+  username: String @deprecated(reason: "Use `name` instead")
+  name: String!
+}
+```
+
+**Tooling shows warnings to clients**
+
+### Breaking Changes (Avoid)
+
+❌ **Breaking**:
+- Removing fields
+- Changing field types
+- Making nullable → non-nullable
+- Removing enum values
+- Changing arguments
+
+✅ **Non-breaking**:
+- Adding fields
+- Adding types
+- Deprecating fields
+- Making non-nullable → nullable
+- Adding arguments with defaults
+
+### Versioning Strategy
+
+**Don't version schema** - evolve incrementally:
+
+1. Add new field
+2. Deprecate old field
+3. Monitor usage
+4. Remove old field in next major version (if removing)
+
+## Testing Strategies
+
+### Schema Validation
+
+```javascript
+const { buildSchema, validateSchema } = require('graphql');
+
+test('schema is valid', () => {
+  const schema = buildSchema(typeDefs);
+  const errors = validateSchema(schema);
+  expect(errors).toHaveLength(0);
+});
+```
+
+### Resolver Testing
+
+```javascript
+const resolvers = require('./resolvers');
+
+test('user resolver fetches user', async () => {
+  const mockDb = {
+    users: { findOne: jest.fn().mockResolvedValue({ id: '1', name: 'Alice' }) }
+  };
+
+  const result = await resolvers.Query.user(
+    null,
+    { id: '1' },
+    { db: mockDb, loaders: { user: mockDataLoader() } }
+  );
+
+  expect(result).toEqual({ id: '1', name: 'Alice' });
+  expect(mockDb.users.findOne).toHaveBeenCalledWith('1');
+});
+```
+
+### Integration Testing
+
+```javascript
+const { ApolloServer } = require('apollo-server');
+const { createTestClient } = require('apollo-server-testing');
+
+const server = new ApolloServer({ typeDefs, resolvers });
+const { query } = createTestClient(server);
+
+test('GetUser query', async () => {
+  const GET_USER = gql`
+    query GetUser($id: ID!) {
+      user(id: $id) {
+        name
+        email
+      }
+    }
+  `;
+
+  const res = await query({ query: GET_USER, variables: { id: '1' } });
+
+  expect(res.errors).toBeUndefined();
+  expect(res.data.user).toMatchObject({
+    name: 'Alice',
+    email: 'alice@example.com'
+  });
+});
+```
+
+## Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| **No DataLoader** | N+1 queries kill performance | Use DataLoader for all entity fetching |
+| **Offset pagination** | Breaks with real-time data | Use cursor-based connections |
+| **No query complexity** | DoS via deeply nested queries | Set depth/complexity limits |
+| **Shared DataLoader instances** | Stale cache across requests | Create new loaders per request |
+| **No error masking** | Leaks internal errors to clients | Mask in production, log internally |
+| **mutations returning Boolean** | Can't extend response | Return object type |
+| **Nullable IDs** | IDs should never be null | Use `ID!` not `ID` |
+| **Over-fetching in resolvers** | Selecting * wastes bandwidth | Select only requested fields |
+
+## Common Mistakes
+
+### 1. DataLoader Return Order
+
+```javascript
+// ❌ WRONG - Returns in database order
+const batchUsers = async (ids) => {
+  return await db.users.findMany({ where: { id: { in: ids } } });
+};
+
+// ✅ CORRECT - Returns in requested order
+const batchUsers = async (ids) => {
+  const users = await db.users.findMany({ where: { id: { in: ids } } });
+  const userMap = new Map(users.map(u => [u.id, u]));
+  return ids.map(id => userMap.get(id));
+};
+```
+
+### 2. Mutations Returning Primitives
+
+```graphql
+# ❌ BAD - Can't extend
+type Mutation {
+  deletePost(id: ID!): Boolean!
+}
+
+# ✅ GOOD - Extensible
+type DeletePostPayload {
+  success: Boolean!
+  deletedPostId: ID
+  message: String
+}
+
+type Mutation {
+  deletePost(id: ID!): DeletePostPayload!
+}
+```
+
+### 3. No Context in Subscriptions
+
+```javascript
+// ❌ Missing auth context
+const server = new ApolloServer({
+  subscriptions: {
+    onConnect: () => {
+      return {};  // No user context!
+    }
+  }
+});
+
+// ✅ Include auth
+const server = new ApolloServer({
+  subscriptions: {
+    onConnect: (connectionParams) => {
+      const token = connectionParams.authToken;
+      const user = verifyToken(token);
+      return { user };
+    }
+  }
+});
+```
+
+## Tooling Ecosystem
+
+**Schema Management**:
+- **Apollo Studio**: Schema registry, operation tracking, metrics
+- **GraphQL Inspector**: Schema diffing, breaking change detection
+- **Graphql-eslint**: Linting for schema and queries
+
+**Code Generation**:
+- **GraphQL Code Generator**: TypeScript types from schema
+- **Apollo Codegen**: Client types for queries
+
+**Development**:
+- **GraphiQL**: In-browser IDE
+- **Apollo Sandbox**: Modern GraphQL explorer
+- **Altair**: Desktop GraphQL client
+
+**Testing**:
+- **EasyGraphQL Test**: Schema mocking
+- **GraphQL Tools**: Schema stitching, mocking
+
+## Cross-References
+
+**Related skills**:
+- **REST comparison** → `rest-api-design` (when to use each)
+- **FastAPI implementation** → `fastapi-development` (Strawberry, Graphene)
+- **Express implementation** → `express-development` (Apollo Server, GraphQL Yoga)
+- **Microservices** → `microservices-architecture` (use with Federation)
+- **Security** → `ordis-security-architect` (OWASP API Security)
+- **Testing** → `api-testing` (integration testing strategies)
+- **Authentication** → `api-authentication` (JWT, OAuth2 with GraphQL)
+
+## Further Reading
+
+- **GraphQL Spec**: https://spec.graphql.org/
+- **Apollo Docs**: Federation, caching, tooling
+- **Relay Spec**: Connection specification
+- **DataLoader GitHub**: facebook/dataloader
+- **Production Ready GraphQL**: Book by Marc-André Giroux
--- a/skills/using-web-backend/message-queues.md
+++ b/skills/using-web-backend/message-queues.md
@@ -0,0 +1,993 @@
+
+# Message Queues
+
+## Overview
+
+**Message queue specialist covering technology selection, reliability patterns, ordering guarantees, schema evolution, and production operations.**
+
+**Core principle**: Message queues decouple producers from consumers, enabling async processing, load leveling, and resilience - but require careful design for reliability, ordering, monitoring, and operational excellence.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **Technology selection**: RabbitMQ vs Kafka vs SQS vs SNS
+- **Reliability**: Guaranteed delivery, acknowledgments, retries, DLQ
+- **Ordering**: Partition keys, FIFO queues, ordered processing
+- **Scaling**: Consumer groups, parallelism, backpressure
+- **Schema evolution**: Message versioning, Avro, Protobuf
+- **Monitoring**: Lag tracking, alerting, distributed tracing
+- **Advanced patterns**: Outbox, saga, CQRS, event sourcing
+- **Security**: Encryption, IAM, Kafka authentication
+- **Testing**: Local testing, chaos engineering, load testing
+
+**Do NOT use for**:
+- Request/response APIs → Use REST or GraphQL instead
+- Strong consistency required → Use database transactions
+- Real-time streaming analytics → See if streaming-specific skill exists
+
+## Technology Selection Matrix
+
+| Factor | RabbitMQ | Apache Kafka | AWS SQS | AWS SNS |
+|--------|----------|--------------|---------|---------|
+| **Use Case** | Task queues, routing | Event streaming, logs | Simple queues | Pub/sub fanout |
+| **Throughput** | 10k-50k msg/s | 100k+ msg/s | 3k msg/s (std), 300 msg/s (FIFO) | 100k+ msg/s |
+| **Ordering** | Queue-level | Partition-level (strong) | FIFO queues only | None |
+| **Persistence** | Durable queues | Log-based (default) | Managed | Ephemeral (SNS → SQS for durability) |
+| **Retention** | Until consumed | Days to weeks | 4 days (std), 14 days max | None (delivery only) |
+| **Routing** | Exchanges (topic, fanout, headers) | Topics only | None | Topic-based filtering |
+| **Message size** | Up to 128 MB | Up to 1 MB (configurable) | 256 KB | 256 KB |
+| **Ops complexity** | Medium (clustering) | High (partitions, replication) | Low (managed) | Low (managed) |
+| **Cost** | EC2 self-hosted | Self-hosted or MSK | Pay-per-request | Pay-per-request |
+
+### Decision Tree
+
+```
+Are you on AWS and need simple async processing?
+  → Yes → **AWS SQS** (start simple)
+  → No → Continue...
+
+Do you need event replay or stream processing?
+  → Yes → **Kafka** (log-based, replayable)
+  → No → Continue...
+
+Do you need complex routing (topic exchange, headers)?
+  → Yes → **RabbitMQ** (rich exchange types)
+  → No → Continue...
+
+Do you need pub/sub fanout to multiple subscribers?
+  → Yes → **SNS** (or Kafka topics with multiple consumer groups)
+  → No → **SQS** or **RabbitMQ** for task queues
+```
+
+### Migration Path
+
+| Current State | Next Step | Why |
+|---------------|-----------|-----|
+| No queue | Start with SQS (if AWS) or RabbitMQ | Lowest operational complexity |
+| SQS → 1k+ msg/s | Consider Kafka or sharded SQS | SQS throttles at 3k msg/s |
+| RabbitMQ → Event sourcing needed | Migrate to Kafka | Kafka's log retention enables replay |
+| Kafka → Simple task queue | Consider RabbitMQ or SQS | Kafka is overkill for simple queues |
+
+## Reliability Patterns
+
+### Acknowledgment Modes
+
+| Mode | When Ack Sent | Reliability | Performance | Use Case |
+|------|---------------|-------------|-------------|----------|
+| **Auto-ack** | On receive | Low (lost on crash) | High | Logs, analytics, best-effort |
+| **Manual ack (after processing)** | After success | High (at-least-once) | Medium | Standard production pattern |
+| **Transactional** | In transaction | Highest (exactly-once) | Low | Financial, critical data |
+
+### At-Least-Once Delivery Pattern
+
+**SQS**:
+
+```python
+# WRONG: Delete before processing
+message = sqs.receive_message(QueueUrl=queue_url)['Messages'][0]
+sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])
+process(message['Body'])  # ❌ If this fails, message is lost
+
+# CORRECT: Process, then delete
+message = sqs.receive_message(
+    QueueUrl=queue_url,
+    VisibilityTimeout=300  # 5 minutes to process
+)['Messages'][0]
+
+try:
+    process(json.loads(message['Body']))
+    sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])
+except Exception as e:
+    # Message becomes visible again after timeout
+    logger.error(f"Processing failed, will retry: {e}")
+```
+
+**Kafka**:
+
+```python
+# WRONG: Auto-commit before processing
+consumer = KafkaConsumer(
+    'orders',
+    enable_auto_commit=True,  # ❌ Commits offset before processing
+    auto_commit_interval_ms=5000
+)
+
+for msg in consumer:
+    process(msg.value)  # Crash here = message lost
+
+# CORRECT: Manual commit after processing
+consumer = KafkaConsumer(
+    'orders',
+    enable_auto_commit=False
+)
+
+for msg in consumer:
+    try:
+        process(msg.value)
+        consumer.commit()  # ✓ Commit only after success
+    except Exception as e:
+        logger.error(f"Processing failed, will retry: {e}")
+        # Don't commit - message will be reprocessed
+```
+
+**RabbitMQ**:
+
+```python
+import pika
+
+connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
+channel = connection.channel()
+
+def callback(ch, method, properties, body):
+    try:
+        process(json.loads(body))
+        ch.basic_ack(delivery_tag=method.delivery_tag)  # ✓ Ack after success
+    except Exception as e:
+        logger.error(f"Processing failed: {e}")
+        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)  # Requeue
+
+channel.basic_consume(
+    queue='orders',
+    on_message_callback=callback,
+    auto_ack=False  # ✓ Manual acknowledgment
+)
+
+channel.start_consuming()
+```
+
+### Idempotency (Critical for At-Least-Once)
+
+Since at-least-once delivery guarantees duplicates, **all processing must be idempotent**:
+
+```python
+# Pattern 1: Database unique constraint
+def process_order(order_id, data):
+    db.execute(
+        "INSERT INTO orders (id, user_id, amount, created_at) "
+        "VALUES (%s, %s, %s, NOW()) "
+        "ON CONFLICT (id) DO NOTHING",  # Idempotent
+        (order_id, data['user_id'], data['amount'])
+    )
+
+# Pattern 2: Distributed lock (Redis)
+def process_order_with_lock(order_id, data):
+    lock_key = f"lock:order:{order_id}"
+
+    # Try to acquire lock (60s TTL)
+    if not redis.set(lock_key, "1", nx=True, ex=60):
+        logger.info(f"Order {order_id} already being processed")
+        return  # Duplicate, skip
+
+    try:
+        # Process order
+        create_order(data)
+        charge_payment(data['amount'])
+    finally:
+        redis.delete(lock_key)
+
+# Pattern 3: Idempotency key table
+def process_with_idempotency_key(message_id, data):
+    with db.transaction():
+        # Check if already processed
+        result = db.execute(
+            "SELECT 1 FROM processed_messages WHERE message_id = %s FOR UPDATE",
+            (message_id,)
+        )
+
+        if result:
+            return  # Already processed
+
+        # Process + record atomically
+        process_order(data)
+        db.execute(
+            "INSERT INTO processed_messages (message_id, processed_at) VALUES (%s, NOW())",
+            (message_id,)
+        )
+```
+
+## Ordering Guarantees
+
+### Kafka: Partition-Level Ordering
+
+**Kafka guarantees ordering within a partition**, not across partitions.
+
+```python
+from kafka import KafkaProducer
+
+producer = KafkaProducer(
+    bootstrap_servers=['kafka:9092'],
+    key_serializer=str.encode,
+    value_serializer=lambda v: json.dumps(v).encode()
+)
+
+# ✓ Partition key ensures ordering
+def publish_order_event(user_id, event_type, data):
+    producer.send(
+        'orders',
+        key=str(user_id),  # All user_id events go to same partition
+        value={
+            'event_type': event_type,
+            'user_id': user_id,
+            'data': data,
+            'timestamp': time.time()
+        }
+    )
+
+# User 123's events all go to partition 2 → strict ordering
+publish_order_event(123, 'order_placed', {...})
+publish_order_event(123, 'payment_processed', {...})
+publish_order_event(123, 'shipped', {...})
+```
+
+**Partition count determines max parallelism**:
+
+```
+Topic: orders (4 partitions)
+Consumer group: order-processors
+
+2 consumers → Each processes 2 partitions
+4 consumers → Each processes 1 partition (max parallelism)
+5 consumers → 1 consumer idle (wasted)
+
+Rule: partition_count >= max_consumers_needed
+```
+
+### SQS FIFO: MessageGroupId Ordering
+
+```python
+import boto3
+
+sqs = boto3.client('sqs')
+
+# FIFO queue guarantees ordering per MessageGroupId
+sqs.send_message(
+    QueueUrl='orders.fifo',
+    MessageBody=json.dumps(event),
+    MessageGroupId=f"user-{user_id}",  # Like Kafka partition key
+    MessageDeduplicationId=f"{event_id}-{timestamp}"  # Prevent duplicates
+)
+
+# Throughput limit: 300 msg/s per MessageGroupId
+# Workaround: Use multiple MessageGroupIds if possible
+```
+
+### RabbitMQ: Single Consumer Ordering
+
+```python
+# RabbitMQ guarantees ordering if single consumer
+channel.basic_qos(prefetch_count=1)  # Process one at a time
+
+channel.basic_consume(
+    queue='orders',
+    on_message_callback=callback,
+    auto_ack=False
+)
+
+# Multiple consumers break ordering unless using consistent hashing
+```
+
+## Dead Letter Queues (DLQ)
+
+### Retry Strategy with Exponential Backoff
+
+**SQS with DLQ**:
+
+```python
+# Infrastructure setup
+main_queue = sqs.create_queue(
+    QueueName='orders',
+    Attributes={
+        'RedrivePolicy': json.dumps({
+            'deadLetterTargetArn': dlq_arn,
+            'maxReceiveCount': '3'  # After 3 failures → DLQ
+        }),
+        'VisibilityTimeout': '300'
+    }
+)
+
+# Consumer with retry logic
+def process_with_retry(message):
+    attempt = int(message.attributes.get('ApproximateReceiveCount', 0))
+
+    try:
+        process_order(json.loads(message.body))
+        message.delete()
+
+    except RetriableError as e:
+        # Exponential backoff: 10s, 20s, 40s, 80s, ...
+        backoff = min(300, 2 ** attempt * 10)
+        message.change_visibility(VisibilityTimeout=backoff)
+        logger.warning(f"Retriable error (attempt {attempt}), retry in {backoff}s")
+
+    except PermanentError as e:
+        # Send to DLQ immediately
+        logger.error(f"Permanent error: {e}")
+        send_to_dlq(message, error=str(e))
+        message.delete()
+
+# Error classification
+class RetriableError(Exception):
+    """Network timeout, rate limit, DB unavailable"""
+    pass
+
+class PermanentError(Exception):
+    """Invalid data, missing field, business rule violation"""
+    pass
+```
+
+**Kafka DLQ Pattern**:
+
+```python
+from kafka import KafkaConsumer, KafkaProducer
+
+consumer = KafkaConsumer('orders', group_id='processor')
+dlq_producer = KafkaProducer(bootstrap_servers=['kafka:9092'])
+
+def process_with_dlq(message):
+    retry_count = message.headers.get('retry_count', 0)
+
+    try:
+        process_order(message.value)
+        consumer.commit()
+
+    except RetriableError as e:
+        if retry_count < 3:
+            # Send to retry topic with delay
+            delay_minutes = 2 ** retry_count  # 1min, 2min, 4min
+            retry_producer.send(
+                f'orders-retry-{delay_minutes}min',
+                value=message.value,
+                headers={'retry_count': retry_count + 1}
+            )
+        else:
+            # Max retries → DLQ
+            dlq_producer.send(
+                'orders-dlq',
+                value=message.value,
+                headers={'error': str(e), 'retry_count': retry_count}
+            )
+        consumer.commit()  # Don't reprocess from main topic
+
+    except PermanentError as e:
+        # Immediate DLQ
+        dlq_producer.send('orders-dlq', value=message.value, headers={'error': str(e)})
+        consumer.commit()
+```
+
+### DLQ Monitoring & Recovery
+
+```python
+# Alert on DLQ depth
+def check_dlq_depth():
+    attrs = sqs.get_queue_attributes(
+        QueueUrl=dlq_url,
+        AttributeNames=['ApproximateNumberOfMessages']
+    )
+    depth = int(attrs['Attributes']['ApproximateNumberOfMessages'])
+
+    if depth > 10:
+        alert(f"DLQ has {depth} messages - investigate!")
+
+# Manual recovery
+def replay_from_dlq():
+    """Fix root cause, then replay"""
+    messages = dlq.receive_messages(MaxNumberOfMessages=10)
+
+    for msg in messages:
+        data = json.loads(msg.body)
+
+        # Fix data issue
+        if 'customer_email' not in data:
+            data['customer_email'] = lookup_email(data['user_id'])
+
+        # Replay to main queue
+        main_queue.send_message(MessageBody=json.dumps(data))
+        msg.delete()
+```
+
+## Message Schema Evolution
+
+### Versioning Strategies
+
+**Pattern 1: Version field in message**:
+
+```python
+# v1 message
+{
+  "version": "1.0",
+  "order_id": "123",
+  "amount": 99.99
+}
+
+# v2 message (added currency)
+{
+  "version": "2.0",
+  "order_id": "123",
+  "amount": 99.99,
+  "currency": "USD"
+}
+
+# Consumer handles both versions
+def process_order(message):
+    if message['version'] == "1.0":
+        amount = message['amount']
+        currency = "USD"  # Default for v1
+    elif message['version'] == "2.0":
+        amount = message['amount']
+        currency = message['currency']
+    else:
+        raise ValueError(f"Unsupported version: {message['version']}")
+```
+
+**Pattern 2: Apache Avro (Kafka best practice)**:
+
+```python
+from confluent_kafka import avro
+from confluent_kafka.avro import AvroProducer, AvroConsumer
+
+# Define schema
+value_schema = avro.loads('''
+{
+  "type": "record",
+  "name": "Order",
+  "fields": [
+    {"name": "order_id", "type": "string"},
+    {"name": "amount", "type": "double"},
+    {"name": "currency", "type": "string", "default": "USD"}  # Backward compatible
+  ]
+}
+''')
+
+# Producer
+producer = AvroProducer({
+    'bootstrap.servers': 'kafka:9092',
+    'schema.registry.url': 'http://schema-registry:8081'
+}, default_value_schema=value_schema)
+
+producer.produce(topic='orders', value={
+    'order_id': '123',
+    'amount': 99.99,
+    'currency': 'USD'
+})
+
+# Consumer automatically validates schema
+consumer = AvroConsumer({
+    'bootstrap.servers': 'kafka:9092',
+    'group.id': 'processor',
+    'schema.registry.url': 'http://schema-registry:8081'
+})
+```
+
+**Avro Schema Evolution Rules**:
+
+| Change | Compatible? | Notes |
+|--------|-------------|-------|
+| Add field with default | ✓ Backward compatible | Old consumers ignore new field |
+| Remove field | ✓ Forward compatible | New consumers must handle missing field |
+| Rename field | ❌ Breaking | Requires migration |
+| Change field type | ❌ Breaking | Requires new topic or migration |
+
+**Pattern 3: Protobuf (alternative to Avro)**:
+
+```protobuf
+syntax = "proto3";
+
+message Order {
+  string order_id = 1;
+  double amount = 2;
+  string currency = 3;  // New field, backward compatible
+}
+```
+
+### Schema Registry (Kafka)
+
+```
+Producer → Schema Registry (validate) → Kafka
+Consumer → Kafka → Schema Registry (deserialize)
+
+Benefits:
+- Centralized schema management
+- Automatic validation
+- Schema evolution enforcement
+- Type safety
+```
+
+## Monitoring & Observability
+
+### Key Metrics
+
+| Metric | Alert Threshold | Why It Matters |
+|--------|----------------|----------------|
+| **Queue depth** | > 1000 (or 5min processing time) | Consumers can't keep up |
+| **Consumer lag** (Kafka) | > 100k messages or > 5 min | Consumers falling behind |
+| **DLQ depth** | > 10 | Messages failing repeatedly |
+| **Processing time p99** | > 5 seconds | Slow processing blocks queue |
+| **Error rate** | > 5% | Widespread failures |
+| **Redelivery rate** | > 10% | Idempotency issues or transient errors |
+
+### Consumer Lag Monitoring (Kafka)
+
+```python
+from kafka import KafkaAdminClient, TopicPartition
+
+admin = KafkaAdminClient(bootstrap_servers=['kafka:9092'])
+
+def check_consumer_lag(group_id, topic):
+    # Get committed offsets
+    committed = admin.list_consumer_group_offsets(group_id)
+
+    # Get latest offsets (highwater mark)
+    consumer = KafkaConsumer(bootstrap_servers=['kafka:9092'])
+    partitions = [TopicPartition(topic, p) for p in range(partition_count)]
+    latest = consumer.end_offsets(partitions)
+
+    # Calculate lag
+    total_lag = 0
+    for partition in partitions:
+        committed_offset = committed[partition].offset
+        latest_offset = latest[partition]
+        lag = latest_offset - committed_offset
+        total_lag += lag
+
+        if lag > 10000:
+            alert(f"Partition {partition.partition} lag: {lag}")
+
+    return total_lag
+
+# Alert if total lag > 100k
+if check_consumer_lag('order-processor', 'orders') > 100000:
+    alert("Consumer lag critical!")
+```
+
+### Distributed Tracing Across Queues
+
+```python
+from opentelemetry import trace
+from opentelemetry.propagate import inject, extract
+
+tracer = trace.get_tracer(__name__)
+
+# Producer: Inject trace context
+def publish_with_trace(topic, message):
+    with tracer.start_as_current_span("publish-order") as span:
+        headers = {}
+        inject(headers)  # Inject trace context into headers
+
+        producer.send(
+            topic,
+            value=message,
+            headers=list(headers.items())
+        )
+
+# Consumer: Extract trace context
+def consume_with_trace(message):
+    context = extract(dict(message.headers))
+
+    with tracer.start_as_current_span("process-order", context=context) as span:
+        process_order(message.value)
+        span.set_attribute("order.id", message.value['order_id'])
+
+# Trace spans: API → Producer → Queue → Consumer → DB
+# Shows end-to-end latency including queue wait time
+```
+
+## Backpressure & Circuit Breakers
+
+### Rate Limiting Consumers
+
+```python
+import time
+from collections import deque
+
+class RateLimitedConsumer:
+    def __init__(self, max_per_second=100):
+        self.max_per_second = max_per_second
+        self.requests = deque()
+
+    def consume(self, message):
+        now = time.time()
+
+        # Remove requests older than 1 second
+        while self.requests and self.requests[0] < now - 1:
+            self.requests.popleft()
+
+        # Check rate limit
+        if len(self.requests) >= self.max_per_second:
+            sleep_time = 1 - (now - self.requests[0])
+            time.sleep(sleep_time)
+
+        self.requests.append(time.time())
+        process(message)
+```
+
+### Circuit Breaker for Downstream Dependencies
+
+```python
+from circuitbreaker import circuit
+
+@circuit(failure_threshold=5, recovery_timeout=60)
+def call_payment_service(order_id, amount):
+    response = requests.post(
+        'https://payment-service/charge',
+        json={'order_id': order_id, 'amount': amount},
+        timeout=5
+    )
+
+    if response.status_code >= 500:
+        raise ServiceUnavailableError()
+
+    return response.json()
+
+def process_order(message):
+    try:
+        result = call_payment_service(message['order_id'], message['amount'])
+        # ... continue processing
+    except CircuitBreakerError:
+        # Circuit open - don't overwhelm failing service
+        logger.warning("Payment service circuit open, requeueing message")
+        raise RetriableError("Circuit breaker open")
+```
+
+## Advanced Patterns
+
+### Outbox Pattern (Reliable Publishing)
+
+**Problem**: How to atomically update database AND publish message?
+
+```python
+# ❌ WRONG: Dual write (can fail between DB and queue)
+def create_order(data):
+    db.execute("INSERT INTO orders (...) VALUES (...)")
+    producer.send('orders', data)  # ❌ If this fails, DB updated but no event
+
+# ✓ CORRECT: Outbox pattern
+def create_order_with_outbox(data):
+    with db.transaction():
+        # 1. Insert order
+        db.execute("INSERT INTO orders (id, user_id, amount) VALUES (%s, %s, %s)",
+                   (data['id'], data['user_id'], data['amount']))
+
+        # 2. Insert into outbox (same transaction)
+        db.execute("INSERT INTO outbox (event_type, payload) VALUES (%s, %s)",
+                   ('order.created', json.dumps(data)))
+
+    # Separate process reads outbox and publishes
+
+# Outbox processor (separate worker)
+def process_outbox():
+    while True:
+        events = db.execute("SELECT * FROM outbox WHERE published_at IS NULL LIMIT 10")
+
+        for event in events:
+            try:
+                producer.send(event['event_type'], json.loads(event['payload']))
+                db.execute("UPDATE outbox SET published_at = NOW() WHERE id = %s", (event['id'],))
+            except Exception as e:
+                logger.error(f"Failed to publish event {event['id']}: {e}")
+                # Will retry on next iteration
+
+        time.sleep(1)
+```
+
+### Saga Pattern (Distributed Transactions)
+
+See `microservices-architecture` skill for full saga patterns (choreography vs orchestration).
+
+**Quick reference for message-based saga**:
+
+```python
+# Order saga coordinator publishes commands
+def create_order_saga(order_data):
+    saga_id = str(uuid.uuid4())
+
+    # Step 1: Reserve inventory
+    producer.send('inventory-commands', {
+        'command': 'reserve',
+        'saga_id': saga_id,
+        'order_id': order_data['order_id'],
+        'items': order_data['items']
+    })
+
+    # Inventory service responds on 'inventory-events'
+    # If success → proceed to step 2
+    # If failure → compensate (cancel order)
+```
+
+## Security
+
+### Message Encryption
+
+**SQS**: Server-side encryption (SSE) with KMS
+
+```python
+sqs.create_queue(
+    QueueName='orders-encrypted',
+    Attributes={
+        'KmsMasterKeyId': 'alias/my-key',  # AWS KMS
+        'KmsDataKeyReusePeriodSeconds': '300'
+    }
+)
+```
+
+**Kafka**: Encryption in transit + at rest
+
+```python
+# SSL/TLS for in-transit encryption
+producer = KafkaProducer(
+    bootstrap_servers=['kafka:9093'],
+    security_protocol='SSL',
+    ssl_cafile='/path/to/ca-cert',
+    ssl_certfile='/path/to/client-cert',
+    ssl_keyfile='/path/to/client-key'
+)
+
+# Encryption at rest (Kafka broker config)
+# log.dirs=/encrypted-volume  # Use encrypted EBS volumes
+```
+
+### Authentication & Authorization
+
+**SQS**: IAM policies
+
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [{
+    "Effect": "Allow",
+    "Principal": {"AWS": "arn:aws:iam::123456789012:role/OrderService"},
+    "Action": ["sqs:SendMessage"],
+    "Resource": "arn:aws:sqs:us-east-1:123456789012:orders"
+  }]
+}
+```
+
+**Kafka**: SASL/SCRAM authentication
+
+```python
+producer = KafkaProducer(
+    bootstrap_servers=['kafka:9093'],
+    security_protocol='SASL_SSL',
+    sasl_mechanism='SCRAM-SHA-512',
+    sasl_plain_username='order-service',
+    sasl_plain_password='secret'
+)
+```
+
+**Kafka ACLs** (authorization):
+
+```bash
+# Grant order-service permission to write to orders topic
+kafka-acls --add \
+  --allow-principal User:order-service \
+  --operation Write \
+  --topic orders
+```
+
+## Testing Strategies
+
+### Local Testing
+
+**LocalStack for SQS/SNS**:
+
+```python
+# docker-compose.yml
+services:
+  localstack:
+    image: localstack/localstack
+    environment:
+      - SERVICES=sqs,sns
+
+# Test code
+import boto3
+
+sqs = boto3.client(
+    'sqs',
+    endpoint_url='http://localhost:4566',  # LocalStack
+    region_name='us-east-1'
+)
+
+queue_url = sqs.create_queue(QueueName='test-orders')['QueueUrl']
+sqs.send_message(QueueUrl=queue_url, MessageBody='test')
+```
+
+**Kafka in Docker**:
+
+```yaml
+# docker-compose.yml
+services:
+  zookeeper:
+    image: confluentinc/cp-zookeeper:latest
+    environment:
+      ZOOKEEPER_CLIENT_PORT: 2181
+
+  kafka:
+    image: confluentinc/cp-kafka:latest
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
+      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
+```
+
+### Integration Testing
+
+```python
+import pytest
+from testcontainers.kafka import KafkaContainer
+
+@pytest.fixture
+def kafka():
+    with KafkaContainer() as kafka:
+        yield kafka.get_bootstrap_server()
+
+def test_order_processing(kafka):
+    producer = KafkaProducer(bootstrap_servers=kafka)
+    consumer = KafkaConsumer('orders', bootstrap_servers=kafka, auto_offset_reset='earliest')
+
+    # Publish message
+    producer.send('orders', value=b'{"order_id": "123"}')
+    producer.flush()
+
+    # Consume and verify
+    message = next(consumer)
+    assert json.loads(message.value)['order_id'] == '123'
+```
+
+### Chaos Engineering
+
+```python
+# Test consumer failure recovery
+def test_consumer_crash_recovery():
+    # Start consumer
+    consumer_process = subprocess.Popen(['python', 'consumer.py'])
+    time.sleep(2)
+
+    # Publish message
+    producer.send('orders', value=test_order)
+    producer.flush()
+
+    # Kill consumer mid-processing
+    consumer_process.kill()
+
+    # Restart consumer
+    consumer_process = subprocess.Popen(['python', 'consumer.py'])
+    time.sleep(5)
+
+    # Verify message was reprocessed (idempotency!)
+    assert db.execute("SELECT COUNT(*) FROM orders WHERE id = %s", (test_order['id'],))[0] == 1
+```
+
+## Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| **Auto-ack before processing** | Messages lost on crash | Manual ack after processing |
+| **No idempotency** | Duplicates cause data corruption | Unique constraints, locks, or idempotency keys |
+| **No DLQ** | Poison messages block queue | Configure DLQ with maxReceiveCount |
+| **No monitoring** | Can't detect consumer lag or failures | Monitor lag, depth, error rate |
+| **Synchronous message processing** | Low throughput | Batch processing, parallel consumers |
+| **Large messages** | Exceeds queue limits, slow transfer | Store in S3, send reference in message |
+| **No schema versioning** | Breaking changes break consumers | Use Avro/Protobuf with schema registry |
+| **Shared consumer instances** | Race conditions, duplicate processing | Use consumer groups (Kafka) or visibility timeout (SQS) |
+
+## Technology-Specific Patterns
+
+### RabbitMQ Exchanges
+
+```python
+# Topic exchange for routing
+channel.exchange_declare(exchange='orders', exchange_type='topic')
+
+# Bind queues with patterns
+channel.queue_bind(exchange='orders', queue='us-orders', routing_key='order.us.*')
+channel.queue_bind(exchange='orders', queue='eu-orders', routing_key='order.eu.*')
+
+# Publish with routing key
+channel.basic_publish(
+    exchange='orders',
+    routing_key='order.us.california',  # Goes to us-orders queue
+    body=json.dumps(order)
+)
+
+# Fanout exchange for pub/sub
+channel.exchange_declare(exchange='analytics', exchange_type='fanout')
+# All bound queues receive every message
+```
+
+### Kafka Connect (Data Integration)
+
+```json
+{
+  "name": "mysql-source",
+  "config": {
+    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
+    "connection.url": "jdbc:mysql://localhost:3306/mydb",
+    "table.whitelist": "orders",
+    "mode": "incrementing",
+    "incrementing.column.name": "id",
+    "topic.prefix": "mysql-"
+  }
+}
+```
+
+**Use cases**:
+- Stream DB changes to Kafka (CDC)
+- Sink Kafka to Elasticsearch, S3, databases
+- No custom code needed for common integrations
+
+## Batching Optimizations
+
+### Batch Size Tuning
+
+```python
+# SQS batch receiving (up to 10 messages)
+messages = sqs.receive_messages(
+    QueueUrl=queue_url,
+    MaxNumberOfMessages=10,  # Fetch 10 at once
+    WaitTimeSeconds=20  # Long polling (reduces empty receives)
+)
+
+# Process in parallel
+with ThreadPoolExecutor(max_workers=10) as executor:
+    futures = [executor.submit(process, msg) for msg in messages]
+    for future in futures:
+        future.result()
+
+# Kafka batch consuming
+consumer = KafkaConsumer(
+    'orders',
+    max_poll_records=500,  # Fetch 500 messages per poll
+    fetch_min_bytes=1024  # Wait for at least 1KB before returning
+)
+
+for messages in consumer:
+    batch_process(messages)  # Process 500 at once
+```
+
+**Batch size tradeoffs**:
+
+| Batch Size | Throughput | Latency | Memory |
+|------------|------------|---------|--------|
+| 1 | Low | Low | Low |
+| 10-100 | Medium | Medium | Medium |
+| 500+ | High | High | High |
+
+**Recommendation**: Start with 10-100, increase for higher throughput if latency allows.
+
+## Cross-References
+
+**Related skills**:
+- **Microservices communication** → `microservices-architecture` (saga, event-driven)
+- **FastAPI async** → `fastapi-development` (consuming queues in FastAPI)
+- **REST vs async** → `rest-api-design` (when to use queues vs HTTP)
+- **Security** → `ordis-security-architect` (encryption, IAM, compliance)
+- **Testing** → `api-testing` (integration testing strategies)
+
+## Further Reading
+
+- **Enterprise Integration Patterns** by Gregor Hohpe (message patterns)
+- **Designing Data-Intensive Applications** by Martin Kleppmann (Kafka internals)
+- **RabbitMQ in Action** by Alvaro Videla
+- **Kafka: The Definitive Guide** by Neha Narkhede
+- **AWS SQS Best Practices**: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-best-practices.html
--- a/skills/using-web-backend/microservices-architecture.md
+++ b/skills/using-web-backend/microservices-architecture.md
@@ -0,0 +1,592 @@
+
+# Microservices Architecture
+
+## Overview
+
+**Microservices architecture specialist covering service boundaries, communication patterns, data consistency, and operational concerns.**
+
+**Core principle**: Microservices decompose applications into independently deployable services organized around business capabilities - enabling team autonomy and technology diversity at the cost of operational complexity and distributed system challenges.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **Service boundaries**: Defining service scope, applying domain-driven design
+- **Monolith decomposition**: Strategies for splitting existing systems
+- **Data consistency**: Sagas, event sourcing, eventual consistency patterns
+- **Communication**: Sync (REST/gRPC) vs async (events/messages)
+- **API gateways**: Routing, authentication, rate limiting
+- **Service discovery**: Registry patterns, DNS, configuration
+- **Resilience**: Circuit breakers, retries, timeouts, bulkheads
+- **Observability**: Distributed tracing, logging aggregation, metrics
+- **Deployment**: Containers, orchestration, blue-green deployments
+
+**Do NOT use for**:
+- Monolithic architectures (microservices aren't always better)
+- Single-team projects < 5 services (overhead exceeds benefits)
+- Simple CRUD applications (microservices add unnecessary complexity)
+
+## When NOT to Use Microservices
+
+**Stay monolithic if**:
+- Team < 10 engineers
+- Domain is not well understood yet
+- Strong consistency required everywhere
+- Network latency is critical
+- You can't invest in observability/DevOps infrastructure
+
+**Microservices require**: Mature DevOps, monitoring, distributed systems expertise, organizational support.
+
+## Service Boundary Patterns (Domain-Driven Design)
+
+### 1. Bounded Contexts
+
+**Pattern: One microservice = One bounded context**
+
+```
+❌ Too fine-grained (anemic services):
+- UserService (just CRUD)
+- OrderService (just CRUD)
+- PaymentService (just CRUD)
+
+✅ Business capability alignment:
+- CustomerManagementService (user profiles, preferences, history)
+- OrderFulfillmentService (order lifecycle, inventory, shipping)
+- PaymentProcessingService (payment, billing, invoicing, refunds)
+```
+
+**Identifying boundaries**:
+1. **Ubiquitous language** - Different terms for same concept = different contexts
+2. **Change patterns** - Services that change together should stay together
+3. **Team ownership** - One team should own one service
+4. **Data autonomy** - Each service owns its data, no shared databases
+
+### 2. Strategic DDD Patterns
+
+| Pattern | Use When | Example |
+|---------|----------|---------|
+| **Separate Ways** | Contexts are independent | Analytics service, main app service |
+| **Partnership** | Teams must collaborate closely | Order + Inventory services |
+| **Customer-Supplier** | Upstream/downstream relationship | Payment gateway (upstream) → Order service |
+| **Conformist** | Accept upstream model as-is | Third-party API integration |
+| **Anti-Corruption Layer** | Isolate from legacy/external systems | ACL between new microservices and legacy monolith |
+
+### 3. Service Sizing Guidelines
+
+**Too small (Nanoservices)**:
+- Excessive network calls
+- Distributed monolith
+- Coordination overhead exceeds benefits
+
+**Too large (Minimonoliths)**:
+- Multiple teams modifying same service
+- Mixed deployment frequencies
+- Tight coupling re-emerges
+
+**Right size indicators**:
+- Single team can own it
+- Deployable independently
+- Changes don't ripple to other services
+- Clear business capability
+- 100-10,000 LOC (highly variable)
+
+## Communication Patterns
+
+### Synchronous Communication
+
+**REST APIs**:
+
+```python
+# Order service calling Payment service
+async def create_order(order: Order):
+    # Synchronous REST call
+    payment = await payment_service.charge(
+        amount=order.total,
+        customer_id=order.customer_id
+    )
+
+    if payment.status == "success":
+        order.status = "confirmed"
+        await db.save(order)
+        return order
+    else:
+        raise PaymentFailedException()
+```
+
+**Pros**: Simple, request-response, easy to debug
+**Cons**: Tight coupling, availability dependency, latency cascades
+
+**gRPC**:
+
+```python
+# Proto definition
+service OrderService {
+    rpc CreateOrder (OrderRequest) returns (OrderResponse);
+}
+
+# Implementation
+class OrderServicer(order_pb2_grpc.OrderServiceServicer):
+    async def CreateOrder(self, request, context):
+        # Type-safe, efficient binary protocol
+        payment = await payment_stub.Charge(
+            PaymentRequest(amount=request.total)
+        )
+        return OrderResponse(order_id=order.id)
+```
+
+**Pros**: Type-safe, efficient, streaming support
+**Cons**: HTTP/2 required, less human-readable, proto dependencies
+
+### Asynchronous Communication
+
+**Event-Driven (Pub/Sub)**:
+
+```python
+# Order service publishes event
+await event_bus.publish("order.created", {
+    "order_id": order.id,
+    "customer_id": customer.id,
+    "total": order.total
+})
+
+# Inventory service subscribes
+@event_bus.subscribe("order.created")
+async def reserve_inventory(event):
+    await inventory.reserve(event["order_id"])
+    await event_bus.publish("inventory.reserved", {...})
+
+# Notification service subscribes
+@event_bus.subscribe("order.created")
+async def send_confirmation(event):
+    await email.send_order_confirmation(event)
+```
+
+**Pros**: Loose coupling, services independent, scalable
+**Cons**: Eventual consistency, harder to trace, ordering challenges
+
+**Message Queues (Point-to-Point)**:
+
+```python
+# Producer
+await queue.send("payment-processing", {
+    "order_id": order.id,
+    "amount": order.total
+})
+
+# Consumer
+@queue.consumer("payment-processing")
+async def process_payment(message):
+    result = await payment_gateway.charge(message["amount"])
+    if result.success:
+        await message.ack()
+    else:
+        await message.nack(requeue=True)
+```
+
+**Pros**: Guaranteed delivery, work distribution, retry handling
+**Cons**: Queue becomes bottleneck, requires message broker
+
+### Communication Pattern Decision Matrix
+
+| Scenario | Pattern | Why |
+|----------|---------|-----|
+| User-facing request/response | Sync (REST/gRPC) | Low latency, immediate feedback |
+| Background processing | Async (queue) | Don't block user, retry support |
+| Cross-service notifications | Async (pub/sub) | Loose coupling, multiple consumers |
+| Real-time updates | WebSocket/SSE | Bidirectional, streaming |
+| Data replication | Event sourcing | Audit trail, rebuild state |
+| High throughput | Async (messaging) | Buffer spikes, backpressure |
+
+## Data Consistency Patterns
+
+### 1. Saga Pattern (Distributed Transactions)
+
+**Choreography (Event-Driven)**:
+
+```python
+# Order Service
+async def create_order(order):
+    order.status = "pending"
+    await db.save(order)
+    await events.publish("order.created", order)
+
+# Payment Service
+@events.subscribe("order.created")
+async def handle_order(event):
+    try:
+        await charge_customer(event["total"])
+        await events.publish("payment.completed", event)
+    except PaymentError:
+        await events.publish("payment.failed", event)
+
+# Inventory Service
+@events.subscribe("payment.completed")
+async def reserve_items(event):
+    try:
+        await reserve(event["items"])
+        await events.publish("inventory.reserved", event)
+    except InventoryError:
+        await events.publish("inventory.failed", event)
+
+# Order Service (Compensation)
+@events.subscribe("payment.failed")
+async def cancel_order(event):
+    order = await db.get(event["order_id"])
+    order.status = "cancelled"
+    await db.save(order)
+
+@events.subscribe("inventory.failed")
+async def refund_payment(event):
+    await payment.refund(event["order_id"])
+    await cancel_order(event)
+```
+
+**Orchestration (Coordinator)**:
+
+```python
+class OrderSaga:
+    def __init__(self, order):
+        self.order = order
+        self.completed_steps = []
+
+    async def execute(self):
+        try:
+            # Step 1: Reserve inventory
+            await self.reserve_inventory()
+            self.completed_steps.append("inventory")
+
+            # Step 2: Process payment
+            await self.process_payment()
+            self.completed_steps.append("payment")
+
+            # Step 3: Confirm order
+            await self.confirm_order()
+
+        except Exception as e:
+            # Compensate in reverse order
+            await self.compensate()
+            raise
+
+    async def compensate(self):
+        for step in reversed(self.completed_steps):
+            if step == "inventory":
+                await inventory_service.release(self.order.id)
+            elif step == "payment":
+                await payment_service.refund(self.order.id)
+```
+
+**Choreography vs Orchestration**:
+
+| Aspect | Choreography | Orchestration |
+|--------|--------------|---------------|
+| Coordination | Decentralized (events) | Centralized (orchestrator) |
+| Coupling | Loose | Tight to orchestrator |
+| Complexity | Distributed across services | Concentrated in orchestrator |
+| Tracing | Harder (follow events) | Easier (single coordinator) |
+| Failure handling | Implicit (event handlers) | Explicit (orchestrator logic) |
+| Best for | Simple workflows | Complex workflows |
+
+### 2. Event Sourcing
+
+**Pattern: Store events, not state**
+
+```python
+# Traditional approach (storing state)
+class Order:
+    id: int
+    status: str  # "pending" → "confirmed" → "shipped"
+    total: float
+
+# Event sourcing (storing events)
+class OrderCreated(Event):
+    order_id: int
+    total: float
+
+class OrderConfirmed(Event):
+    order_id: int
+
+class OrderShipped(Event):
+    order_id: int
+
+# Rebuild state from events
+def rebuild_order(order_id):
+    events = event_store.get_events(order_id)
+    order = Order()
+    for event in events:
+        order.apply(event)  # Apply each event to rebuild state
+    return order
+```
+
+**Pros**: Complete audit trail, time travel, event replay
+**Cons**: Complexity, eventual consistency, schema evolution challenges
+
+### 3. CQRS (Command Query Responsibility Segregation)
+
+**Separate read and write models**:
+
+```python
+# Write model (commands)
+class CreateOrder:
+    def execute(self, data):
+        order = Order(**data)
+        await db.save(order)
+        await event_bus.publish("order.created", order)
+
+# Read model (projections)
+class OrderReadModel:
+    # Denormalized for fast reads
+    def __init__(self):
+        self.cache = {}
+
+    @event_bus.subscribe("order.created")
+    async def on_order_created(self, event):
+        self.cache[event["order_id"]] = {
+            "id": event["order_id"],
+            "customer_name": await get_customer_name(event["customer_id"]),
+            "status": "pending",
+            "total": event["total"]
+        }
+
+    def get_order(self, order_id):
+        return self.cache.get(order_id)  # Fast read, no joins
+```
+
+**Use when**: Read/write patterns differ significantly (e.g., analytics dashboards)
+
+## Resilience Patterns
+
+### 1. Circuit Breaker
+
+```python
+from circuitbreaker import circuit
+
+@circuit(failure_threshold=5, recovery_timeout=60)
+async def call_payment_service(amount):
+    response = await http.post("http://payment-service/charge", json={"amount": amount})
+    if response.status >= 500:
+        raise PaymentServiceError()
+    return response.json()
+
+# Circuit states:
+# CLOSED → normal operation
+# OPEN → fails fast after threshold
+# HALF_OPEN → test if service recovered
+```
+
+### 2. Retry with Exponential Backoff
+
+```python
+from tenacity import retry, stop_after_attempt, wait_exponential
+
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=2, max=10)
+)
+async def call_with_retry(url):
+    return await http.get(url)
+
+# Retries: 2s → 4s → 8s
+```
+
+### 3. Timeout
+
+```python
+import asyncio
+
+async def call_with_timeout(url):
+    try:
+        return await asyncio.wait_for(
+            http.get(url),
+            timeout=5.0  # 5 second timeout
+        )
+    except asyncio.TimeoutError:
+        return {"error": "Service timeout"}
+```
+
+### 4. Bulkhead
+
+**Isolate resources to prevent cascade failures**:
+
+```python
+# Separate thread pools for different services
+payment_pool = ThreadPoolExecutor(max_workers=10)
+inventory_pool = ThreadPoolExecutor(max_workers=5)
+
+async def call_payment():
+    return await asyncio.get_event_loop().run_in_executor(
+        payment_pool,
+        payment_service.call
+    )
+
+# If payment service is slow, it only exhausts payment_pool,
+# inventory calls still work
+```
+
+## API Gateway Pattern
+
+**Centralized entry point for client requests**:
+
+```
+Client → API Gateway → [Order, Payment, Inventory services]
+```
+
+**Responsibilities**:
+- Routing requests to services
+- Authentication/authorization
+- Rate limiting
+- Request/response transformation
+- Caching
+- Logging/monitoring
+
+**Example (Kong, AWS API Gateway, Nginx)**:
+
+```yaml
+# API Gateway config
+routes:
+  - path: /orders
+    service: order-service
+    auth: jwt
+    ratelimit: 100/minute
+
+  - path: /payments
+    service: payment-service
+    auth: oauth2
+    ratelimit: 50/minute
+```
+
+**Backend for Frontend (BFF) Pattern**:
+
+```
+Web Client → Web BFF → Services
+Mobile App → Mobile BFF → Services
+```
+
+Each client type has optimized gateway.
+
+## Service Discovery
+
+### 1. Client-Side Discovery
+
+```python
+# Service registry (Consul, Eureka)
+registry = ServiceRegistry("http://consul:8500")
+
+# Client looks up service
+instances = registry.get_instances("payment-service")
+instance = load_balancer.choose(instances)
+response = await http.get(f"http://{instance.host}:{instance.port}/charge")
+```
+
+### 2. Server-Side Discovery (Load Balancer)
+
+```
+Client → Load Balancer → [Service Instance 1, Instance 2, Instance 3]
+```
+
+**DNS-based**: Kubernetes services, AWS ELB
+
+## Observability
+
+### Distributed Tracing
+
+```python
+from opentelemetry import trace
+
+tracer = trace.get_tracer(__name__)
+
+async def create_order(order):
+    with tracer.start_as_current_span("create-order") as span:
+        span.set_attribute("order.id", order.id)
+        span.set_attribute("order.total", order.total)
+
+        # Trace propagates to payment service
+        payment = await payment_service.charge(
+            amount=order.total,
+            trace_context=span.context
+        )
+
+        span.add_event("payment-completed")
+        return order
+```
+
+**Tools**: Jaeger, Zipkin, AWS X-Ray, Datadog APM
+
+### Log Aggregation
+
+**Structured logging with correlation IDs**:
+
+```python
+import logging
+import uuid
+
+logger = logging.getLogger(__name__)
+
+async def handle_request(request):
+    correlation_id = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
+
+    logger.info("Processing request", extra={
+        "correlation_id": correlation_id,
+        "service": "order-service",
+        "user_id": request.user_id
+    })
+```
+
+**Tools**: ELK stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog
+
+## Monolith Decomposition Strategies
+
+### 1. Strangler Fig Pattern
+
+**Gradually replace monolith with microservices**:
+
+```
+Phase 1: Monolith handles everything
+Phase 2: Extract service, proxy some requests to it
+Phase 3: More services extracted, proxy more requests
+Phase 4: Monolith retired
+```
+
+### 2. Branch by Abstraction
+
+1. Create abstraction layer in monolith
+2. Implement new service
+3. Gradually migrate code behind abstraction
+4. Remove old implementation
+5. Extract as microservice
+
+### 3. Extract by Bounded Context
+
+Priority order:
+1. Services with clear boundaries (authentication, payments)
+2. Services changing frequently
+3. Services with different scaling needs
+4. Services with technology mismatches (e.g., Java monolith, Python ML service)
+
+## Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| **Distributed Monolith** | Services share database, deploy together | One DB per service, independent deployment |
+| **Nanoservices** | Too fine-grained, excessive network calls | Merge related services, follow DDD |
+| **Shared Database** | Tight coupling, schema changes break multiple services | Database per service |
+| **Synchronous Chains** | A→B→C→D, latency adds up, cascading failures | Async events, parallelize where possible |
+| **Chatty Services** | N+1 calls, excessive network overhead | Batch APIs, caching, coarser boundaries |
+| **No Circuit Breakers** | Cascading failures bring down system | Circuit breakers + timeouts + retries |
+| **No Distributed Tracing** | Impossible to debug cross-service issues | OpenTelemetry, correlation IDs |
+
+## Cross-References
+
+**Related skills**:
+- **Message queues** → `message-queues` (RabbitMQ, Kafka patterns)
+- **REST APIs** → `rest-api-design` (service interface design)
+- **gRPC** → Check if gRPC skill exists
+- **Security** → `ordis-security-architect` (service-to-service auth, zero trust)
+- **Database** → `database-integration` (per-service databases, migrations)
+- **Testing** → `api-testing` (contract testing, integration testing)
+
+## Further Reading
+
+- **Building Microservices** by Sam Newman
+- **Domain-Driven Design** by Eric Evans
+- **Release It!** by Michael Nygard (resilience patterns)
+- **Microservices Patterns** by Chris Richardson
--- a/skills/using-web-backend/rest-api-design.md
+++ b/skills/using-web-backend/rest-api-design.md
@@ -0,0 +1,523 @@
+
+# REST API Design
+
+## Overview
+
+**REST API design specialist covering resource modeling, HTTP semantics, versioning, pagination, and API evolution.**
+
+**Core principle**: REST is an architectural style based on resources, HTTP semantics, and stateless communication. Good REST API design makes resources discoverable, operations predictable, and evolution manageable.
+
+## When to Use This Skill
+
+Use when encountering:
+
+- **Resource modeling**: Designing URL structures, choosing singular vs plural, handling relationships
+- **HTTP methods**: GET, POST, PUT, PATCH, DELETE semantics and idempotency
+- **Status codes**: Choosing correct 2xx, 4xx, 5xx codes
+- **Versioning**: URI vs header versioning, managing API evolution
+- **Pagination**: Offset, cursor, or page-based pagination strategies
+- **Filtering/sorting**: Query parameter design for collections
+- **Error responses**: Standardized error formats
+- **HATEOAS**: Hypermedia-driven APIs and discoverability
+
+**Do NOT use for**:
+- GraphQL API design → `graphql-api-design`
+- Framework-specific implementation → `fastapi-development`, `django-development`, `express-development`
+- Authentication patterns → `api-authentication`
+
+## Quick Reference - HTTP Methods
+
+| Method | Semantics | Idempotent? | Safe? | Request Body | Response Body |
+|--------|-----------|-------------|-------|--------------|---------------|
+| GET | Retrieve resource | ✅ Yes | ✅ Yes | ❌ No | ✅ Yes |
+| POST | Create resource | ❌ No | ❌ No | ✅ Yes | ✅ Yes |
+| PUT | Replace resource | ✅ Yes | ❌ No | ✅ Yes | ✅ Optional |
+| PATCH | Partial update | ❌ No* | ❌ No | ✅ Yes | ✅ Optional |
+| DELETE | Remove resource | ✅ Yes | ❌ No | ❌ Optional | ✅ Optional |
+| HEAD | Retrieve headers | ✅ Yes | ✅ Yes | ❌ No | ❌ No |
+| OPTIONS | Supported methods | ✅ Yes | ✅ Yes | ❌ No | ✅ Yes |
+
+*PATCH can be designed to be idempotent but often isn't
+
+## Quick Reference - Status Codes
+
+| Code | Meaning | Use When |
+|------|---------|----------|
+| 200 OK | Success | GET, PUT, PATCH succeeded with response body |
+| 201 Created | Resource created | POST created new resource |
+| 202 Accepted | Async processing | Request accepted, processing continues async |
+| 204 No Content | Success, no body | DELETE succeeded, PUT/PATCH succeeded without response |
+| 400 Bad Request | Invalid input | Validation failed, malformed request |
+| 401 Unauthorized | Authentication failed | Missing or invalid credentials |
+| 403 Forbidden | Authorization failed | User authenticated but lacks permission |
+| 404 Not Found | Resource missing | Resource doesn't exist |
+| 409 Conflict | State conflict | Resource already exists, version conflict |
+| 422 Unprocessable Entity | Semantic error | Valid syntax but business logic failed |
+| 429 Too Many Requests | Rate limited | User exceeded rate limit |
+| 500 Internal Server Error | Server error | Unexpected server failure |
+| 503 Service Unavailable | Temporary outage | Maintenance, overload |
+
+## Resource Modeling Patterns
+
+### 1. URL Structure
+
+**✅ Good patterns**:
+
+```
+GET    /users                    # List users
+POST   /users                    # Create user
+GET    /users/{id}               # Get specific user
+PUT    /users/{id}               # Replace user
+PATCH  /users/{id}               # Update user
+DELETE /users/{id}               # Delete user
+
+GET    /users/{id}/orders        # User's orders (nested resource)
+POST   /users/{id}/orders        # Create order for user
+GET    /orders/{id}              # Get specific order (top-level for direct access)
+
+GET    /search/users?q=john      # Search endpoint
+```
+
+**❌ Anti-patterns**:
+
+```
+GET    /getUsers                 # Verb in URL (use HTTP method instead)
+POST   /users/create             # Redundant verb
+GET    /users/123/delete         # DELETE operation via GET
+POST   /api?action=createUser    # RPC-style, not REST
+GET    /users/{id}/orders/{id}   # Ambiguous - which {id}?
+```
+
+### 2. Singular vs Plural
+
+**Convention: Use plural for collections, even for single-item endpoints**
+
+```
+✅ /users/{id}         # Consistent plural
+✅ /orders/{id}        # Consistent plural
+
+❌ /user/{id}          # Inconsistent singular
+❌ /users/{id}/order/{id}  # Mixed singular/plural
+```
+
+**Exception**: Non-countable resources can be singular
+
+```
+✅ /me                 # Current user context
+✅ /config             # Application config (single resource)
+✅ /health             # Health check endpoint
+```
+
+### 3. Nested Resources vs Top-Level
+
+**Nested when showing relationship**:
+
+```
+GET /users/{userId}/orders          # "Orders belonging to this user"
+POST /users/{userId}/orders         # "Create order for this user"
+```
+
+**Top-level when resource has independent identity**:
+
+```
+GET /orders/{orderId}               # Direct access to order
+DELETE /orders/{orderId}            # Delete order directly
+```
+
+**Guidelines**:
+- Nest ≤ 2 levels deep (`/users/{id}/orders/{id}` is max)
+- Provide top-level access for resources that exist independently
+- Use query parameters for filtering instead of deep nesting
+
+```
+✅ GET /orders?userId=123           # Better than /users/123/orders/{id}
+❌ GET /users/{id}/orders/{id}/items/{id}  # Too deep
+```
+
+## Pagination Patterns
+
+### Offset Pagination
+
+**Good for**: Small datasets, page numbers, SQL databases
+
+```
+GET /users?limit=20&offset=40
+
+Response:
+{
+  "data": [...],
+  "pagination": {
+    "limit": 20,
+    "offset": 40,
+    "total": 1000,
+    "hasMore": true
+  }
+}
+```
+
+**Pros**: Simple, allows jumping to any page
+**Cons**: Performance degrades with large offsets, inconsistent with concurrent modifications
+
+### Cursor Pagination
+
+**Good for**: Large datasets, real-time data, NoSQL databases
+
+```
+GET /users?limit=20&after=eyJpZCI6MTIzfQ
+
+Response:
+{
+  "data": [...],
+  "pagination": {
+    "nextCursor": "eyJpZCI6MTQzfQ",
+    "hasMore": true
+  }
+}
+```
+
+**Pros**: Consistent results, efficient for large datasets
+**Cons**: Can't jump to arbitrary page, cursors are opaque
+
+### Page-Based Pagination
+
+**Good for**: UIs with page numbers
+
+```
+GET /users?page=3&pageSize=20
+
+Response:
+{
+  "data": [...],
+  "pagination": {
+    "page": 3,
+    "pageSize": 20,
+    "totalPages": 50,
+    "totalCount": 1000
+  }
+}
+```
+
+**Choice matrix**:
+
+| Use Case | Pattern |
+|----------|---------|
+| Admin dashboards, small datasets | Offset or Page |
+| Infinite scroll feeds | Cursor |
+| Real-time data (chat, notifications) | Cursor |
+| Need page numbers in UI | Page |
+| Large datasets (millions of rows) | Cursor |
+
+## Filtering and Sorting
+
+### Query Parameter Conventions
+
+```
+GET /users?status=active&role=admin           # Simple filtering
+GET /users?createdAfter=2024-01-01            # Date filtering
+GET /users?search=john                        # Full-text search
+GET /users?sort=createdAt&order=desc          # Sorting
+GET /users?sort=-createdAt                    # Alternative: prefix for descending
+GET /users?fields=id,name,email               # Sparse fieldsets
+GET /users?include=orders,profile             # Relationship inclusion
+```
+
+### Advanced Filtering Patterns
+
+**LHS Brackets (Rails-style)**:
+
+```
+GET /users?filter[status]=active&filter[role]=admin
+```
+
+**RHS Colon (JSON API style)**:
+
+```
+GET /users?filter=status:active,role:admin
+```
+
+**Comparison operators**:
+
+```
+GET /products?price[gte]=100&price[lte]=500   # Price between 100-500
+GET /users?createdAt[gt]=2024-01-01           # Created after date
+```
+
+## API Versioning Strategies
+
+### 1. URI Versioning
+
+```
+GET /v1/users
+GET /v2/users
+```
+
+**Pros**: Explicit, easy to route, clear in logs
+**Cons**: Violates REST principles (resource identity changes), URL proliferation
+
+**Best for**: Public APIs, major breaking changes
+
+### 2. Header Versioning
+
+```
+GET /users
+Accept: application/vnd.myapi.v2+json
+```
+
+**Pros**: Clean URLs, follows REST principles
+**Cons**: Less visible, harder to test in browser
+
+**Best for**: Internal APIs, clients with header control
+
+### 3. Query Parameter Versioning
+
+```
+GET /users?version=2
+```
+
+**Pros**: Easy to test, optional (can default to latest)
+**Cons**: Pollutes query parameters, not semantic
+
+**Best for**: Minor version variants, opt-in features
+
+### Version Deprecation Process
+
+1. **Announce**: Document deprecation timeline (6-12 months recommended)
+2. **Warn**: Add `Deprecated` header to responses
+3. **Sunset**: Add `Sunset` header with end date (RFC 8594)
+4. **Migrate**: Provide migration guides and tooling
+5. **Remove**: After sunset date, return 410 Gone
+
+```
+HTTP/1.1 200 OK
+Deprecated: true
+Sunset: Sat, 31 Dec 2024 23:59:59 GMT
+Link: </v2/users>; rel="successor-version"
+```
+
+## Error Response Format
+
+**Standard JSON error format**:
+
+```json
+{
+  "error": {
+    "code": "VALIDATION_ERROR",
+    "message": "One or more fields failed validation",
+    "details": [
+      {
+        "field": "email",
+        "message": "Invalid email format",
+        "code": "INVALID_FORMAT"
+      },
+      {
+        "field": "age",
+        "message": "Must be at least 18",
+        "code": "OUT_OF_RANGE"
+      }
+    ],
+    "requestId": "req_abc123",
+    "timestamp": "2024-11-14T10:30:00Z"
+  }
+}
+```
+
+**Problem Details (RFC 7807)**:
+
+```json
+{
+  "type": "https://api.example.com/errors/validation-error",
+  "title": "Validation Error",
+  "status": 400,
+  "detail": "The request body contains invalid data",
+  "instance": "/users",
+  "invalid-params": [
+    {
+      "name": "email",
+      "reason": "Invalid email format"
+    }
+  ]
+}
+```
+
+## HATEOAS (Hypermedia)
+
+**Level 3 REST includes hypermedia links**:
+
+```json
+{
+  "id": 123,
+  "name": "John Doe",
+  "status": "active",
+  "_links": {
+    "self": { "href": "/users/123" },
+    "orders": { "href": "/users/123/orders" },
+    "deactivate": {
+      "href": "/users/123/deactivate",
+      "method": "POST"
+    }
+  }
+}
+```
+
+**Benefits**:
+- Self-documenting API
+- Clients discover available actions
+- Server controls workflow
+- Reduces client-server coupling
+
+**Tradeoffs**:
+- Increased response size
+- Complexity for simple APIs
+- Limited client library support
+
+**When to use**: Complex workflows, long-lived APIs, discoverability requirements
+
+## Idempotency Keys
+
+**For POST operations that should be safely retryable**:
+
+```
+POST /orders
+Idempotency-Key: key_abc123xyz
+
+{
+  "items": [...],
+  "total": 99.99
+}
+```
+
+**Server behavior**:
+1. First request with key → Process and store result
+2. Duplicate request with same key → Return stored result (do not reprocess)
+3. Different request with same key → Return 409 Conflict
+
+**Implementation**:
+
+```python
+@app.post("/orders")
+def create_order(order: Order, idempotency_key: str = Header(None)):
+    if idempotency_key:
+        # Check if key was used before
+        cached = redis.get(f"idempotency:{idempotency_key}")
+        if cached:
+            return JSONResponse(content=cached, status_code=200)
+
+    # Process order
+    result = process_order(order)
+
+    if idempotency_key:
+        # Cache result for 24 hours
+        redis.setex(f"idempotency:{idempotency_key}", 86400, result)
+
+    return result
+```
+
+## API Evolution Patterns
+
+### Adding Fields (Non-Breaking)
+
+**✅ Safe changes**:
+- Add optional request fields
+- Add response fields
+- Add new endpoints
+- Add new query parameters
+
+**Client requirements**: Ignore unknown fields
+
+### Removing Fields (Breaking)
+
+**Strategies**:
+1. **Deprecation period**: Mark field as deprecated, remove in next major version
+2. **Versioning**: Create v2 without field
+3. **Optional → Required**: Never safe, always breaking
+
+### Changing Field Types (Breaking)
+
+**❌ Breaking**:
+- String → Number
+- Number → String
+- Boolean → String
+- Flat → Nested object
+
+**✅ Non-breaking**:
+- Number → String (if client coerces)
+- Adding nullability (required → optional)
+
+**Strategy**: Add new field with correct type, deprecate old field
+
+## Richardson Maturity Model
+
+| Level | Description | Example |
+|-------|-------------|---------|
+| 0 | POX (Plain Old XML) | Single endpoint, all operations via POST |
+| 1 | Resources | Multiple endpoints, still using POST for everything |
+| 2 | HTTP Verbs | Proper HTTP methods (GET, POST, PUT, DELETE) |
+| 3 | Hypermedia (HATEOAS) | Responses include links to related resources |
+
+**Most APIs target Level 2** (HTTP verbs + status codes).
+**Level 3 is optional** but valuable for complex domains.
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Why Bad | Fix |
+|--------------|---------|-----|
+| Verbs in URLs (`/createUser`) | Not RESTful, redundant with HTTP methods | Use POST /users |
+| GET with side effects | Violates HTTP semantics, not safe | Use POST/PUT/DELETE |
+| POST for everything | Loses HTTP semantics, not idempotent | Use appropriate method |
+| 200 for errors | Breaks HTTP contract | Use correct 4xx/5xx codes |
+| Deeply nested URLs | Hard to navigate, brittle | Max 2 levels, use query params |
+| Binary response flags | Unclear semantics | Use proper HTTP status codes |
+| Timestamps without timezone | Ambiguous | Use ISO 8601 with timezone |
+| Pagination without total | Can't show "Page X of Y" | Include total count or hasMore |
+
+## Best Practices Checklist
+
+**Resource Design**:
+- [ ] Resources are nouns, not verbs
+- [ ] Plural names for collections
+- [ ] Max 2 levels of nesting
+- [ ] Consistent naming conventions (snake_case or camelCase)
+
+**HTTP Semantics**:
+- [ ] Correct HTTP methods for operations
+- [ ] Proper status codes (not just 200/500)
+- [ ] Idempotent operations are actually idempotent
+- [ ] GET/HEAD have no side effects
+
+**API Evolution**:
+- [ ] Versioning strategy defined
+- [ ] Backward compatibility maintained within version
+- [ ] Deprecation headers for sunset features
+- [ ] Migration guides for breaking changes
+
+**Error Handling**:
+- [ ] Consistent error response format
+- [ ] Detailed field-level validation errors
+- [ ] Request IDs for tracing
+- [ ] Human-readable error messages
+
+**Performance**:
+- [ ] Pagination for large collections
+- [ ] ETags for caching
+- [ ] Gzip compression enabled
+- [ ] Rate limiting implemented
+
+## Cross-References
+
+**Related skills**:
+- **GraphQL alternative** → `graphql-api-design`
+- **FastAPI implementation** → `fastapi-development`
+- **Django implementation** → `django-development`
+- **Express implementation** → `express-development`
+- **Authentication** → `api-authentication`
+- **API testing** → `api-testing`
+- **API documentation** → `api-documentation` or `muna-technical-writer`
+- **Security** → `ordis-security-architect` (OWASP API Security)
+
+## Further Reading
+
+- **REST Dissertation**: Roy Fielding's original thesis
+- **RFC 7807**: Problem Details for HTTP APIs
+- **RFC 8594**: Sunset HTTP Header
+- **JSON:API**: Opinionated REST specification
+- **OpenAPI 3.0**: API documentation standard