Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:29:10 +08:00
commit 657f1e3da3
29 changed files with 2738 additions and 0 deletions

View File

@@ -0,0 +1,346 @@
# Data Validation Checklist
Comprehensive checklist for implementing robust data validation in TypeScript (Zod) and Python (Pydantic) applications.
## Pre-Validation Setup
- [ ] **Identify all input sources** (API requests, forms, file uploads, external APIs)
- [ ] **Choose validation library** (Zod for TypeScript, Pydantic for Python)
- [ ] **Define validation strategy** (fail-fast vs collect all errors)
- [ ] **Set up error handling** (consistent error response format)
- [ ] **Document validation requirements** (business rules, constraints)
## TypeScript + Zod Validation
### Schema Definition
- [ ] **All API endpoints have Zod schemas** defined
- [ ] **Schema types exported** for use in frontend
- [ ] **Schemas colocated** with route handlers or in shared location
- [ ] **Schema composition used** (z.object, z.array, z.union)
- [ ] **Reusable schemas extracted** (common patterns like email, UUID)
### Basic Validations
- [ ] **String validations** applied:
- [ ] `.min()` for minimum length
- [ ] `.max()` for maximum length
- [ ] `.email()` for email addresses
- [ ] `.url()` for URLs
- [ ] `.uuid()` for UUIDs
- [ ] `.regex()` for custom patterns
- [ ] `.trim()` to remove whitespace
- [ ] **Number validations** applied:
- [ ] `.int()` for integers
- [ ] `.positive()` for positive numbers
- [ ] `.min()` / `.max()` for ranges
- [ ] `.finite()` to exclude Infinity/NaN
- [ ] **Array validations** applied:
- [ ] `.min()` for minimum items
- [ ] `.max()` for maximum items
- [ ] `.nonempty()` for required arrays
- [ ] **Date validations** applied:
- [ ] `.min()` for earliest date
- [ ] `.max()` for latest date
- [ ] Proper date parsing (z.coerce.date())
### Advanced Validations
- [ ] **Custom refinements** for complex rules:
```typescript
z.object({
password: z.string(),
confirmPassword: z.string()
}).refine(data => data.password === data.confirmPassword, {
message: "Passwords don't match",
path: ["confirmPassword"]
})
```
- [ ] **Conditional validations** with `.superRefine()`
- [ ] **Transform validations** to normalize data (`.transform()`)
- [ ] **Discriminated unions** for polymorphic data
- [ ] **Branded types** for domain-specific values
### Error Handling
- [ ] **Validation errors caught** and formatted consistently
- [ ] **Error messages user-friendly** (not technical jargon)
- [ ] **Field-level errors** returned (which field failed)
- [ ] **Multiple errors collected** (not just first error)
- [ ] **Error codes standardized** (e.g., "INVALID_EMAIL")
### Multi-Tenant Context
- [ ] **tenant_id validated** on all requests requiring tenant context
- [ ] **UUID format verified** for tenant_id
- [ ] **Tenant existence checked** (tenant must exist in database)
- [ ] **User-tenant relationship verified** (user belongs to tenant)
- [ ] **Admin permissions validated** for admin-only operations
## Python + Pydantic Validation
### Model Definition
- [ ] **All API request models** inherit from BaseModel
- [ ] **All response models** defined with Pydantic
- [ ] **SQLModel used** for database models (includes Pydantic)
- [ ] **Field validators** used for custom validation
- [ ] **Model validators** used for cross-field validation
### Basic Field Validators
- [ ] **EmailStr** used for email fields
- [ ] **HttpUrl** used for URL fields
- [ ] **UUID4** used for UUID fields
- [ ] **Field()** used with constraints:
- [ ] `min_length` / `max_length` for strings
- [ ] `ge` / `le` for number ranges (greater/less than or equal)
- [ ] `gt` / `lt` for strict ranges
- [ ] `regex` for pattern matching
```python
from pydantic import BaseModel, EmailStr, Field
class UserCreate(BaseModel):
email: EmailStr
name: str = Field(..., min_length=1, max_length=100)
age: int = Field(..., ge=0, le=150)
```
### Advanced Validation
- [ ] **@field_validator** for single field custom validation:
```python
@field_validator('password')
@classmethod
def validate_password(cls, v):
if len(v) < 8:
raise ValueError('Password must be at least 8 characters')
return v
```
- [ ] **@model_validator** for cross-field validation:
```python
@model_validator(mode='after')
def check_passwords_match(self):
if self.password != self.confirm_password:
raise ValueError('Passwords do not match')
return self
```
- [ ] **Custom validators** handle edge cases
- [ ] **Mode='before'** used for preprocessing
- [ ] **Mode='after'** used for post-validation checks
### Error Handling
- [ ] **ValidationError caught** in API endpoints
- [ ] **Errors formatted** to match frontend expectations
- [ ] **HTTPException raised** with 422 status for validation errors
- [ ] **Error details included** in response body
- [ ] **Logging added** for validation failures (security monitoring)
### Multi-Tenant Context
- [ ] **tenant_id field** on all multi-tenant request models
- [ ] **Tenant UUID validated** before database queries
- [ ] **Repository pattern enforces** tenant filtering
- [ ] **Admin flag validated** for privileged operations
- [ ] **RLS policies configured** on database tables
## Database Constraints
### Schema Constraints
- [ ] **NOT NULL constraints** on required fields
- [ ] **UNIQUE constraints** on unique fields (email, username)
- [ ] **CHECK constraints** for value ranges
- [ ] **FOREIGN KEY constraints** for relationships
- [ ] **Default values** defined where appropriate
### Index Support
- [ ] **Indexes created** on frequently queried fields
- [ ] **Composite indexes** for multi-field queries
- [ ] **Partial indexes** for filtered queries (WHERE clauses)
- [ ] **tenant_id indexed** on all multi-tenant tables
## File Upload Validation
- [ ] **File size limits** enforced (e.g., 10MB max)
- [ ] **File type validation** (MIME type checking)
- [ ] **File extension validation** (whitelist allowed extensions)
- [ ] **Virus scanning** (if handling untrusted uploads)
- [ ] **Content validation** (parse and validate file content)
### Image Uploads
- [ ] **Image dimensions validated** (max width/height)
- [ ] **Image format verified** (PNG, JPEG, etc.)
- [ ] **EXIF data stripped** (security concern)
- [ ] **Thumbnails generated** for large images
### CSV/JSON Uploads
- [ ] **Parse errors handled gracefully**
- [ ] **Schema validation** applied to each row/object
- [ ] **Batch validation** with error collection
- [ ] **Maximum rows/objects** limit enforced
## External API Integration
- [ ] **Response schemas defined** for external APIs
- [ ] **Validation applied** to external data
- [ ] **Graceful degradation** when validation fails
- [ ] **Retry logic** for transient failures
- [ ] **Timeout limits** configured
## Security Validations
### Input Sanitization
- [ ] **HTML/script tags stripped** from text inputs
- [ ] **SQL injection prevented** (use ORM, not raw SQL)
- [ ] **XSS prevention** (escape output in templates)
- [ ] **Path traversal prevented** (validate file paths)
- [ ] **Command injection prevented** (no shell execution of user input)
### Authentication & Authorization
- [ ] **JWT tokens validated** (signature, expiration)
- [ ] **Session tokens verified** against database
- [ ] **User existence checked** before operations
- [ ] **Permissions verified** for protected resources
- [ ] **Rate limiting applied** to prevent abuse
### Sensitive Data
- [ ] **Passwords never logged** or returned in responses
- [ ] **Credit card numbers validated** with Luhn algorithm
- [ ] **SSN/Tax ID formats validated**
- [ ] **PII handling compliant** with regulations (GDPR, CCPA)
- [ ] **Encryption applied** to sensitive stored data
## Testing Validation Logic
### Unit Tests
- [ ] **Valid inputs pass** validation
- [ ] **Invalid inputs fail** with correct error messages
- [ ] **Edge cases tested** (empty strings, null, undefined)
- [ ] **Boundary values tested** (min/max lengths, ranges)
- [ ] **Error messages verified** (correct field, message)
### Integration Tests
- [ ] **API endpoints validated** in integration tests
- [ ] **Database constraints tested** (violate constraint, expect error)
- [ ] **Multi-tenant isolation tested** (cross-tenant access blocked)
- [ ] **File upload validation tested**
- [ ] **External API mocking** with invalid responses
### Test Coverage
- [ ] **Validation logic 100% covered**
- [ ] **Error paths tested** (not just happy path)
- [ ] **Custom validators tested** independently
- [ ] **Refinements tested** with failing cases
## Performance Considerations
- [ ] **Validation performance measured** (avoid expensive validations in hot paths)
- [ ] **Async validation** for I/O-bound checks (database lookups)
- [ ] **Caching applied** to repeated validations (e.g., tenant existence)
- [ ] **Batch validation** for arrays/lists
- [ ] **Early returns** for fail-fast scenarios
## Documentation
- [ ] **Validation rules documented** in API docs
- [ ] **Error responses documented** (status codes, error formats)
- [ ] **Examples provided** (valid and invalid requests)
- [ ] **Schema exported** for frontend consumption (TypeScript types)
- [ ] **Changelog updated** when validation changes
## Grey Haven Specific
### TanStack Start (Frontend)
- [ ] **Form validation** with Zod + TanStack Form
- [ ] **Server function validation** (all server functions validate input)
- [ ] **Type safety** maintained (Zod.infer<> for types)
- [ ] **Error display** in UI components
- [ ] **Client-side validation** mirrors server-side
### FastAPI (Backend)
- [ ] **Request models** use Pydantic
- [ ] **Response models** use Pydantic
- [ ] **Repository methods** validate before database operations
- [ ] **Service layer** handles business rule validation
- [ ] **Dependency injection** for validation context (tenant_id)
### Database (Drizzle/SQLModel)
- [ ] **Drizzle schemas** include validation constraints
- [ ] **SQLModel fields** use Pydantic validators
- [ ] **Migration scripts** add database constraints
- [ ] **Indexes support** validation queries
## Monitoring & Alerting
- [ ] **Validation failure metrics** tracked
- [ ] **High failure rate alerts** configured
- [ ] **Unusual validation patterns** logged (potential attacks)
- [ ] **Performance metrics** for validation operations
- [ ] **Error logs** structured for analysis
## Scoring
- **80+ items checked**: Excellent - Comprehensive validation ✅
- **60-79 items**: Good - Most validation covered ⚠️
- **40-59 items**: Fair - Significant gaps exist 🔴
- **<40 items**: Poor - Inadequate validation ❌
## Priority Items
Address these first:
1. **All API endpoints validated** - Prevent invalid data entry
2. **Multi-tenant isolation** - Security-critical
3. **SQL injection prevention** - Use ORM, not raw SQL
4. **File upload validation** - Common attack vector
5. **Error handling** - User experience and debugging
## Common Pitfalls
**Don't:**
- Trust client-side validation alone (always validate server-side)
- Use overly complex regex (hard to maintain, performance issues)
- Return technical error messages to users
- Skip validation on internal endpoints (defense in depth)
- Log sensitive data in validation errors
**Do:**
- Validate at boundaries (API endpoints, file uploads, external APIs)
- Use standard validators (email, URL, UUID) from libraries
- Provide clear, actionable error messages
- Test validation logic thoroughly
- Document validation requirements
## Related Resources
- [Zod Documentation](https://zod.dev)
- [Pydantic Documentation](https://docs.pydantic.dev)
- [OWASP Input Validation](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html)
- [data-validation skill](../SKILL.md)
---
**Total Items**: 120+ validation checks
**Critical Items**: API validation, Multi-tenant, Security, File uploads
**Coverage**: TypeScript, Python, Database, Security
**Last Updated**: 2025-11-10