12 KiB
Data Validation Checklist
Comprehensive checklist for implementing robust data validation in TypeScript (Zod) and Python (Pydantic) applications.
Pre-Validation Setup
- Identify all input sources (API requests, forms, file uploads, external APIs)
- Choose validation library (Zod for TypeScript, Pydantic for Python)
- Define validation strategy (fail-fast vs collect all errors)
- Set up error handling (consistent error response format)
- Document validation requirements (business rules, constraints)
TypeScript + Zod Validation
Schema Definition
- All API endpoints have Zod schemas defined
- Schema types exported for use in frontend
- Schemas colocated with route handlers or in shared location
- Schema composition used (z.object, z.array, z.union)
- Reusable schemas extracted (common patterns like email, UUID)
Basic Validations
-
String validations applied:
.min()for minimum length.max()for maximum length.email()for email addresses.url()for URLs.uuid()for UUIDs.regex()for custom patterns.trim()to remove whitespace
-
Number validations applied:
.int()for integers.positive()for positive numbers.min()/.max()for ranges.finite()to exclude Infinity/NaN
-
Array validations applied:
.min()for minimum items.max()for maximum items.nonempty()for required arrays
-
Date validations applied:
.min()for earliest date.max()for latest date- Proper date parsing (z.coerce.date())
Advanced Validations
-
Custom refinements for complex rules:
z.object({ password: z.string(), confirmPassword: z.string() }).refine(data => data.password === data.confirmPassword, { message: "Passwords don't match", path: ["confirmPassword"] }) -
Conditional validations with
.superRefine() -
Transform validations to normalize data (
.transform()) -
Discriminated unions for polymorphic data
-
Branded types for domain-specific values
Error Handling
- Validation errors caught and formatted consistently
- Error messages user-friendly (not technical jargon)
- Field-level errors returned (which field failed)
- Multiple errors collected (not just first error)
- Error codes standardized (e.g., "INVALID_EMAIL")
Multi-Tenant Context
- tenant_id validated on all requests requiring tenant context
- UUID format verified for tenant_id
- Tenant existence checked (tenant must exist in database)
- User-tenant relationship verified (user belongs to tenant)
- Admin permissions validated for admin-only operations
Python + Pydantic Validation
Model Definition
- All API request models inherit from BaseModel
- All response models defined with Pydantic
- SQLModel used for database models (includes Pydantic)
- Field validators used for custom validation
- Model validators used for cross-field validation
Basic Field Validators
- EmailStr used for email fields
- HttpUrl used for URL fields
- UUID4 used for UUID fields
- Field() used with constraints:
min_length/max_lengthfor stringsge/lefor number ranges (greater/less than or equal)gt/ltfor strict rangesregexfor pattern matching
from pydantic import BaseModel, EmailStr, Field
class UserCreate(BaseModel):
email: EmailStr
name: str = Field(..., min_length=1, max_length=100)
age: int = Field(..., ge=0, le=150)
Advanced Validation
-
@field_validator for single field custom validation:
@field_validator('password') @classmethod def validate_password(cls, v): if len(v) < 8: raise ValueError('Password must be at least 8 characters') return v -
@model_validator for cross-field validation:
@model_validator(mode='after') def check_passwords_match(self): if self.password != self.confirm_password: raise ValueError('Passwords do not match') return self -
Custom validators handle edge cases
-
Mode='before' used for preprocessing
-
Mode='after' used for post-validation checks
Error Handling
- ValidationError caught in API endpoints
- Errors formatted to match frontend expectations
- HTTPException raised with 422 status for validation errors
- Error details included in response body
- Logging added for validation failures (security monitoring)
Multi-Tenant Context
- tenant_id field on all multi-tenant request models
- Tenant UUID validated before database queries
- Repository pattern enforces tenant filtering
- Admin flag validated for privileged operations
- RLS policies configured on database tables
Database Constraints
Schema Constraints
- NOT NULL constraints on required fields
- UNIQUE constraints on unique fields (email, username)
- CHECK constraints for value ranges
- FOREIGN KEY constraints for relationships
- Default values defined where appropriate
Index Support
- Indexes created on frequently queried fields
- Composite indexes for multi-field queries
- Partial indexes for filtered queries (WHERE clauses)
- tenant_id indexed on all multi-tenant tables
File Upload Validation
- File size limits enforced (e.g., 10MB max)
- File type validation (MIME type checking)
- File extension validation (whitelist allowed extensions)
- Virus scanning (if handling untrusted uploads)
- Content validation (parse and validate file content)
Image Uploads
- Image dimensions validated (max width/height)
- Image format verified (PNG, JPEG, etc.)
- EXIF data stripped (security concern)
- Thumbnails generated for large images
CSV/JSON Uploads
- Parse errors handled gracefully
- Schema validation applied to each row/object
- Batch validation with error collection
- Maximum rows/objects limit enforced
External API Integration
- Response schemas defined for external APIs
- Validation applied to external data
- Graceful degradation when validation fails
- Retry logic for transient failures
- Timeout limits configured
Security Validations
Input Sanitization
- HTML/script tags stripped from text inputs
- SQL injection prevented (use ORM, not raw SQL)
- XSS prevention (escape output in templates)
- Path traversal prevented (validate file paths)
- Command injection prevented (no shell execution of user input)
Authentication & Authorization
- JWT tokens validated (signature, expiration)
- Session tokens verified against database
- User existence checked before operations
- Permissions verified for protected resources
- Rate limiting applied to prevent abuse
Sensitive Data
- Passwords never logged or returned in responses
- Credit card numbers validated with Luhn algorithm
- SSN/Tax ID formats validated
- PII handling compliant with regulations (GDPR, CCPA)
- Encryption applied to sensitive stored data
Testing Validation Logic
Unit Tests
- Valid inputs pass validation
- Invalid inputs fail with correct error messages
- Edge cases tested (empty strings, null, undefined)
- Boundary values tested (min/max lengths, ranges)
- Error messages verified (correct field, message)
Integration Tests
- API endpoints validated in integration tests
- Database constraints tested (violate constraint, expect error)
- Multi-tenant isolation tested (cross-tenant access blocked)
- File upload validation tested
- External API mocking with invalid responses
Test Coverage
- Validation logic 100% covered
- Error paths tested (not just happy path)
- Custom validators tested independently
- Refinements tested with failing cases
Performance Considerations
- Validation performance measured (avoid expensive validations in hot paths)
- Async validation for I/O-bound checks (database lookups)
- Caching applied to repeated validations (e.g., tenant existence)
- Batch validation for arrays/lists
- Early returns for fail-fast scenarios
Documentation
- Validation rules documented in API docs
- Error responses documented (status codes, error formats)
- Examples provided (valid and invalid requests)
- Schema exported for frontend consumption (TypeScript types)
- Changelog updated when validation changes
Grey Haven Specific
TanStack Start (Frontend)
- Form validation with Zod + TanStack Form
- Server function validation (all server functions validate input)
- Type safety maintained (Zod.infer<> for types)
- Error display in UI components
- Client-side validation mirrors server-side
FastAPI (Backend)
- Request models use Pydantic
- Response models use Pydantic
- Repository methods validate before database operations
- Service layer handles business rule validation
- Dependency injection for validation context (tenant_id)
Database (Drizzle/SQLModel)
- Drizzle schemas include validation constraints
- SQLModel fields use Pydantic validators
- Migration scripts add database constraints
- Indexes support validation queries
Monitoring & Alerting
- Validation failure metrics tracked
- High failure rate alerts configured
- Unusual validation patterns logged (potential attacks)
- Performance metrics for validation operations
- Error logs structured for analysis
Scoring
- 80+ items checked: Excellent - Comprehensive validation ✅
- 60-79 items: Good - Most validation covered ⚠️
- 40-59 items: Fair - Significant gaps exist 🔴
- <40 items: Poor - Inadequate validation ❌
Priority Items
Address these first:
- All API endpoints validated - Prevent invalid data entry
- Multi-tenant isolation - Security-critical
- SQL injection prevention - Use ORM, not raw SQL
- File upload validation - Common attack vector
- Error handling - User experience and debugging
Common Pitfalls
❌ Don't:
- Trust client-side validation alone (always validate server-side)
- Use overly complex regex (hard to maintain, performance issues)
- Return technical error messages to users
- Skip validation on internal endpoints (defense in depth)
- Log sensitive data in validation errors
✅ Do:
- Validate at boundaries (API endpoints, file uploads, external APIs)
- Use standard validators (email, URL, UUID) from libraries
- Provide clear, actionable error messages
- Test validation logic thoroughly
- Document validation requirements
Related Resources
Total Items: 120+ validation checks Critical Items: API validation, Multi-tenant, Security, File uploads Coverage: TypeScript, Python, Database, Security Last Updated: 2025-11-10