Files
2025-11-29 18:29:10 +08:00

12 KiB

Data Validation Checklist

Comprehensive checklist for implementing robust data validation in TypeScript (Zod) and Python (Pydantic) applications.

Pre-Validation Setup

  • Identify all input sources (API requests, forms, file uploads, external APIs)
  • Choose validation library (Zod for TypeScript, Pydantic for Python)
  • Define validation strategy (fail-fast vs collect all errors)
  • Set up error handling (consistent error response format)
  • Document validation requirements (business rules, constraints)

TypeScript + Zod Validation

Schema Definition

  • All API endpoints have Zod schemas defined
  • Schema types exported for use in frontend
  • Schemas colocated with route handlers or in shared location
  • Schema composition used (z.object, z.array, z.union)
  • Reusable schemas extracted (common patterns like email, UUID)

Basic Validations

  • String validations applied:

    • .min() for minimum length
    • .max() for maximum length
    • .email() for email addresses
    • .url() for URLs
    • .uuid() for UUIDs
    • .regex() for custom patterns
    • .trim() to remove whitespace
  • Number validations applied:

    • .int() for integers
    • .positive() for positive numbers
    • .min() / .max() for ranges
    • .finite() to exclude Infinity/NaN
  • Array validations applied:

    • .min() for minimum items
    • .max() for maximum items
    • .nonempty() for required arrays
  • Date validations applied:

    • .min() for earliest date
    • .max() for latest date
    • Proper date parsing (z.coerce.date())

Advanced Validations

  • Custom refinements for complex rules:

    z.object({
      password: z.string(),
      confirmPassword: z.string()
    }).refine(data => data.password === data.confirmPassword, {
      message: "Passwords don't match",
      path: ["confirmPassword"]
    })
    
  • Conditional validations with .superRefine()

  • Transform validations to normalize data (.transform())

  • Discriminated unions for polymorphic data

  • Branded types for domain-specific values

Error Handling

  • Validation errors caught and formatted consistently
  • Error messages user-friendly (not technical jargon)
  • Field-level errors returned (which field failed)
  • Multiple errors collected (not just first error)
  • Error codes standardized (e.g., "INVALID_EMAIL")

Multi-Tenant Context

  • tenant_id validated on all requests requiring tenant context
  • UUID format verified for tenant_id
  • Tenant existence checked (tenant must exist in database)
  • User-tenant relationship verified (user belongs to tenant)
  • Admin permissions validated for admin-only operations

Python + Pydantic Validation

Model Definition

  • All API request models inherit from BaseModel
  • All response models defined with Pydantic
  • SQLModel used for database models (includes Pydantic)
  • Field validators used for custom validation
  • Model validators used for cross-field validation

Basic Field Validators

  • EmailStr used for email fields
  • HttpUrl used for URL fields
  • UUID4 used for UUID fields
  • Field() used with constraints:
    • min_length / max_length for strings
    • ge / le for number ranges (greater/less than or equal)
    • gt / lt for strict ranges
    • regex for pattern matching
from pydantic import BaseModel, EmailStr, Field

class UserCreate(BaseModel):
    email: EmailStr
    name: str = Field(..., min_length=1, max_length=100)
    age: int = Field(..., ge=0, le=150)

Advanced Validation

  • @field_validator for single field custom validation:

    @field_validator('password')
    @classmethod
    def validate_password(cls, v):
        if len(v) < 8:
            raise ValueError('Password must be at least 8 characters')
        return v
    
  • @model_validator for cross-field validation:

    @model_validator(mode='after')
    def check_passwords_match(self):
        if self.password != self.confirm_password:
            raise ValueError('Passwords do not match')
        return self
    
  • Custom validators handle edge cases

  • Mode='before' used for preprocessing

  • Mode='after' used for post-validation checks

Error Handling

  • ValidationError caught in API endpoints
  • Errors formatted to match frontend expectations
  • HTTPException raised with 422 status for validation errors
  • Error details included in response body
  • Logging added for validation failures (security monitoring)

Multi-Tenant Context

  • tenant_id field on all multi-tenant request models
  • Tenant UUID validated before database queries
  • Repository pattern enforces tenant filtering
  • Admin flag validated for privileged operations
  • RLS policies configured on database tables

Database Constraints

Schema Constraints

  • NOT NULL constraints on required fields
  • UNIQUE constraints on unique fields (email, username)
  • CHECK constraints for value ranges
  • FOREIGN KEY constraints for relationships
  • Default values defined where appropriate

Index Support

  • Indexes created on frequently queried fields
  • Composite indexes for multi-field queries
  • Partial indexes for filtered queries (WHERE clauses)
  • tenant_id indexed on all multi-tenant tables

File Upload Validation

  • File size limits enforced (e.g., 10MB max)
  • File type validation (MIME type checking)
  • File extension validation (whitelist allowed extensions)
  • Virus scanning (if handling untrusted uploads)
  • Content validation (parse and validate file content)

Image Uploads

  • Image dimensions validated (max width/height)
  • Image format verified (PNG, JPEG, etc.)
  • EXIF data stripped (security concern)
  • Thumbnails generated for large images

CSV/JSON Uploads

  • Parse errors handled gracefully
  • Schema validation applied to each row/object
  • Batch validation with error collection
  • Maximum rows/objects limit enforced

External API Integration

  • Response schemas defined for external APIs
  • Validation applied to external data
  • Graceful degradation when validation fails
  • Retry logic for transient failures
  • Timeout limits configured

Security Validations

Input Sanitization

  • HTML/script tags stripped from text inputs
  • SQL injection prevented (use ORM, not raw SQL)
  • XSS prevention (escape output in templates)
  • Path traversal prevented (validate file paths)
  • Command injection prevented (no shell execution of user input)

Authentication & Authorization

  • JWT tokens validated (signature, expiration)
  • Session tokens verified against database
  • User existence checked before operations
  • Permissions verified for protected resources
  • Rate limiting applied to prevent abuse

Sensitive Data

  • Passwords never logged or returned in responses
  • Credit card numbers validated with Luhn algorithm
  • SSN/Tax ID formats validated
  • PII handling compliant with regulations (GDPR, CCPA)
  • Encryption applied to sensitive stored data

Testing Validation Logic

Unit Tests

  • Valid inputs pass validation
  • Invalid inputs fail with correct error messages
  • Edge cases tested (empty strings, null, undefined)
  • Boundary values tested (min/max lengths, ranges)
  • Error messages verified (correct field, message)

Integration Tests

  • API endpoints validated in integration tests
  • Database constraints tested (violate constraint, expect error)
  • Multi-tenant isolation tested (cross-tenant access blocked)
  • File upload validation tested
  • External API mocking with invalid responses

Test Coverage

  • Validation logic 100% covered
  • Error paths tested (not just happy path)
  • Custom validators tested independently
  • Refinements tested with failing cases

Performance Considerations

  • Validation performance measured (avoid expensive validations in hot paths)
  • Async validation for I/O-bound checks (database lookups)
  • Caching applied to repeated validations (e.g., tenant existence)
  • Batch validation for arrays/lists
  • Early returns for fail-fast scenarios

Documentation

  • Validation rules documented in API docs
  • Error responses documented (status codes, error formats)
  • Examples provided (valid and invalid requests)
  • Schema exported for frontend consumption (TypeScript types)
  • Changelog updated when validation changes

Grey Haven Specific

TanStack Start (Frontend)

  • Form validation with Zod + TanStack Form
  • Server function validation (all server functions validate input)
  • Type safety maintained (Zod.infer<> for types)
  • Error display in UI components
  • Client-side validation mirrors server-side

FastAPI (Backend)

  • Request models use Pydantic
  • Response models use Pydantic
  • Repository methods validate before database operations
  • Service layer handles business rule validation
  • Dependency injection for validation context (tenant_id)

Database (Drizzle/SQLModel)

  • Drizzle schemas include validation constraints
  • SQLModel fields use Pydantic validators
  • Migration scripts add database constraints
  • Indexes support validation queries

Monitoring & Alerting

  • Validation failure metrics tracked
  • High failure rate alerts configured
  • Unusual validation patterns logged (potential attacks)
  • Performance metrics for validation operations
  • Error logs structured for analysis

Scoring

  • 80+ items checked: Excellent - Comprehensive validation
  • 60-79 items: Good - Most validation covered ⚠️
  • 40-59 items: Fair - Significant gaps exist 🔴
  • <40 items: Poor - Inadequate validation

Priority Items

Address these first:

  1. All API endpoints validated - Prevent invalid data entry
  2. Multi-tenant isolation - Security-critical
  3. SQL injection prevention - Use ORM, not raw SQL
  4. File upload validation - Common attack vector
  5. Error handling - User experience and debugging

Common Pitfalls

Don't:

  • Trust client-side validation alone (always validate server-side)
  • Use overly complex regex (hard to maintain, performance issues)
  • Return technical error messages to users
  • Skip validation on internal endpoints (defense in depth)
  • Log sensitive data in validation errors

Do:

  • Validate at boundaries (API endpoints, file uploads, external APIs)
  • Use standard validators (email, URL, UUID) from libraries
  • Provide clear, actionable error messages
  • Test validation logic thoroughly
  • Document validation requirements

Total Items: 120+ validation checks Critical Items: API validation, Multi-tenant, Security, File uploads Coverage: TypeScript, Python, Database, Security Last Updated: 2025-11-10