Initial commit

2025-11-29 18:29:10 +08:00
commit 657f1e3da3
29 changed files with 2738 additions and 0 deletions
--- a/skills/data-validation/checklists/data-validation-checklist.md
+++ b/skills/data-validation/checklists/data-validation-checklist.md
@@ -0,0 +1,346 @@
+# Data Validation Checklist
+
+Comprehensive checklist for implementing robust data validation in TypeScript (Zod) and Python (Pydantic) applications.
+
+## Pre-Validation Setup
+
+- [ ] **Identify all input sources** (API requests, forms, file uploads, external APIs)
+- [ ] **Choose validation library** (Zod for TypeScript, Pydantic for Python)
+- [ ] **Define validation strategy** (fail-fast vs collect all errors)
+- [ ] **Set up error handling** (consistent error response format)
+- [ ] **Document validation requirements** (business rules, constraints)
+
+## TypeScript + Zod Validation
+
+### Schema Definition
+
+- [ ] **All API endpoints have Zod schemas** defined
+- [ ] **Schema types exported** for use in frontend
+- [ ] **Schemas colocated** with route handlers or in shared location
+- [ ] **Schema composition used** (z.object, z.array, z.union)
+- [ ] **Reusable schemas extracted** (common patterns like email, UUID)
+
+### Basic Validations
+
+- [ ] **String validations** applied:
+  - [ ] `.min()` for minimum length
+  - [ ] `.max()` for maximum length
+  - [ ] `.email()` for email addresses
+  - [ ] `.url()` for URLs
+  - [ ] `.uuid()` for UUIDs
+  - [ ] `.regex()` for custom patterns
+  - [ ] `.trim()` to remove whitespace
+
+- [ ] **Number validations** applied:
+  - [ ] `.int()` for integers
+  - [ ] `.positive()` for positive numbers
+  - [ ] `.min()` / `.max()` for ranges
+  - [ ] `.finite()` to exclude Infinity/NaN
+
+- [ ] **Array validations** applied:
+  - [ ] `.min()` for minimum items
+  - [ ] `.max()` for maximum items
+  - [ ] `.nonempty()` for required arrays
+
+- [ ] **Date validations** applied:
+  - [ ] `.min()` for earliest date
+  - [ ] `.max()` for latest date
+  - [ ] Proper date parsing (z.coerce.date())
+
+### Advanced Validations
+
+- [ ] **Custom refinements** for complex rules:
+  ```typescript
+  z.object({
+    password: z.string(),
+    confirmPassword: z.string()
+  }).refine(data => data.password === data.confirmPassword, {
+    message: "Passwords don't match",
+    path: ["confirmPassword"]
+  })
+  ```
+
+- [ ] **Conditional validations** with `.superRefine()`
+- [ ] **Transform validations** to normalize data (`.transform()`)
+- [ ] **Discriminated unions** for polymorphic data
+- [ ] **Branded types** for domain-specific values
+
+### Error Handling
+
+- [ ] **Validation errors caught** and formatted consistently
+- [ ] **Error messages user-friendly** (not technical jargon)
+- [ ] **Field-level errors** returned (which field failed)
+- [ ] **Multiple errors collected** (not just first error)
+- [ ] **Error codes standardized** (e.g., "INVALID_EMAIL")
+
+### Multi-Tenant Context
+
+- [ ] **tenant_id validated** on all requests requiring tenant context
+- [ ] **UUID format verified** for tenant_id
+- [ ] **Tenant existence checked** (tenant must exist in database)
+- [ ] **User-tenant relationship verified** (user belongs to tenant)
+- [ ] **Admin permissions validated** for admin-only operations
+
+## Python + Pydantic Validation
+
+### Model Definition
+
+- [ ] **All API request models** inherit from BaseModel
+- [ ] **All response models** defined with Pydantic
+- [ ] **SQLModel used** for database models (includes Pydantic)
+- [ ] **Field validators** used for custom validation
+- [ ] **Model validators** used for cross-field validation
+
+### Basic Field Validators
+
+- [ ] **EmailStr** used for email fields
+- [ ] **HttpUrl** used for URL fields
+- [ ] **UUID4** used for UUID fields
+- [ ] **Field()** used with constraints:
+  - [ ] `min_length` / `max_length` for strings
+  - [ ] `ge` / `le` for number ranges (greater/less than or equal)
+  - [ ] `gt` / `lt` for strict ranges
+  - [ ] `regex` for pattern matching
+
+```python
+from pydantic import BaseModel, EmailStr, Field
+
+class UserCreate(BaseModel):
+    email: EmailStr
+    name: str = Field(..., min_length=1, max_length=100)
+    age: int = Field(..., ge=0, le=150)
+```
+
+### Advanced Validation
+
+- [ ] **@field_validator** for single field custom validation:
+  ```python
+  @field_validator('password')
+  @classmethod
+  def validate_password(cls, v):
+      if len(v) < 8:
+          raise ValueError('Password must be at least 8 characters')
+      return v
+  ```
+
+- [ ] **@model_validator** for cross-field validation:
+  ```python
+  @model_validator(mode='after')
+  def check_passwords_match(self):
+      if self.password != self.confirm_password:
+          raise ValueError('Passwords do not match')
+      return self
+  ```
+
+- [ ] **Custom validators** handle edge cases
+- [ ] **Mode='before'** used for preprocessing
+- [ ] **Mode='after'** used for post-validation checks
+
+### Error Handling
+
+- [ ] **ValidationError caught** in API endpoints
+- [ ] **Errors formatted** to match frontend expectations
+- [ ] **HTTPException raised** with 422 status for validation errors
+- [ ] **Error details included** in response body
+- [ ] **Logging added** for validation failures (security monitoring)
+
+### Multi-Tenant Context
+
+- [ ] **tenant_id field** on all multi-tenant request models
+- [ ] **Tenant UUID validated** before database queries
+- [ ] **Repository pattern enforces** tenant filtering
+- [ ] **Admin flag validated** for privileged operations
+- [ ] **RLS policies configured** on database tables
+
+## Database Constraints
+
+### Schema Constraints
+
+- [ ] **NOT NULL constraints** on required fields
+- [ ] **UNIQUE constraints** on unique fields (email, username)
+- [ ] **CHECK constraints** for value ranges
+- [ ] **FOREIGN KEY constraints** for relationships
+- [ ] **Default values** defined where appropriate
+
+### Index Support
+
+- [ ] **Indexes created** on frequently queried fields
+- [ ] **Composite indexes** for multi-field queries
+- [ ] **Partial indexes** for filtered queries (WHERE clauses)
+- [ ] **tenant_id indexed** on all multi-tenant tables
+
+## File Upload Validation
+
+- [ ] **File size limits** enforced (e.g., 10MB max)
+- [ ] **File type validation** (MIME type checking)
+- [ ] **File extension validation** (whitelist allowed extensions)
+- [ ] **Virus scanning** (if handling untrusted uploads)
+- [ ] **Content validation** (parse and validate file content)
+
+### Image Uploads
+
+- [ ] **Image dimensions validated** (max width/height)
+- [ ] **Image format verified** (PNG, JPEG, etc.)
+- [ ] **EXIF data stripped** (security concern)
+- [ ] **Thumbnails generated** for large images
+
+### CSV/JSON Uploads
+
+- [ ] **Parse errors handled gracefully**
+- [ ] **Schema validation** applied to each row/object
+- [ ] **Batch validation** with error collection
+- [ ] **Maximum rows/objects** limit enforced
+
+## External API Integration
+
+- [ ] **Response schemas defined** for external APIs
+- [ ] **Validation applied** to external data
+- [ ] **Graceful degradation** when validation fails
+- [ ] **Retry logic** for transient failures
+- [ ] **Timeout limits** configured
+
+## Security Validations
+
+### Input Sanitization
+
+- [ ] **HTML/script tags stripped** from text inputs
+- [ ] **SQL injection prevented** (use ORM, not raw SQL)
+- [ ] **XSS prevention** (escape output in templates)
+- [ ] **Path traversal prevented** (validate file paths)
+- [ ] **Command injection prevented** (no shell execution of user input)
+
+### Authentication & Authorization
+
+- [ ] **JWT tokens validated** (signature, expiration)
+- [ ] **Session tokens verified** against database
+- [ ] **User existence checked** before operations
+- [ ] **Permissions verified** for protected resources
+- [ ] **Rate limiting applied** to prevent abuse
+
+### Sensitive Data
+
+- [ ] **Passwords never logged** or returned in responses
+- [ ] **Credit card numbers validated** with Luhn algorithm
+- [ ] **SSN/Tax ID formats validated**
+- [ ] **PII handling compliant** with regulations (GDPR, CCPA)
+- [ ] **Encryption applied** to sensitive stored data
+
+## Testing Validation Logic
+
+### Unit Tests
+
+- [ ] **Valid inputs pass** validation
+- [ ] **Invalid inputs fail** with correct error messages
+- [ ] **Edge cases tested** (empty strings, null, undefined)
+- [ ] **Boundary values tested** (min/max lengths, ranges)
+- [ ] **Error messages verified** (correct field, message)
+
+### Integration Tests
+
+- [ ] **API endpoints validated** in integration tests
+- [ ] **Database constraints tested** (violate constraint, expect error)
+- [ ] **Multi-tenant isolation tested** (cross-tenant access blocked)
+- [ ] **File upload validation tested**
+- [ ] **External API mocking** with invalid responses
+
+### Test Coverage
+
+- [ ] **Validation logic 100% covered**
+- [ ] **Error paths tested** (not just happy path)
+- [ ] **Custom validators tested** independently
+- [ ] **Refinements tested** with failing cases
+
+## Performance Considerations
+
+- [ ] **Validation performance measured** (avoid expensive validations in hot paths)
+- [ ] **Async validation** for I/O-bound checks (database lookups)
+- [ ] **Caching applied** to repeated validations (e.g., tenant existence)
+- [ ] **Batch validation** for arrays/lists
+- [ ] **Early returns** for fail-fast scenarios
+
+## Documentation
+
+- [ ] **Validation rules documented** in API docs
+- [ ] **Error responses documented** (status codes, error formats)
+- [ ] **Examples provided** (valid and invalid requests)
+- [ ] **Schema exported** for frontend consumption (TypeScript types)
+- [ ] **Changelog updated** when validation changes
+
+## Grey Haven Specific
+
+### TanStack Start (Frontend)
+
+- [ ] **Form validation** with Zod + TanStack Form
+- [ ] **Server function validation** (all server functions validate input)
+- [ ] **Type safety** maintained (Zod.infer<> for types)
+- [ ] **Error display** in UI components
+- [ ] **Client-side validation** mirrors server-side
+
+### FastAPI (Backend)
+
+- [ ] **Request models** use Pydantic
+- [ ] **Response models** use Pydantic
+- [ ] **Repository methods** validate before database operations
+- [ ] **Service layer** handles business rule validation
+- [ ] **Dependency injection** for validation context (tenant_id)
+
+### Database (Drizzle/SQLModel)
+
+- [ ] **Drizzle schemas** include validation constraints
+- [ ] **SQLModel fields** use Pydantic validators
+- [ ] **Migration scripts** add database constraints
+- [ ] **Indexes support** validation queries
+
+## Monitoring & Alerting
+
+- [ ] **Validation failure metrics** tracked
+- [ ] **High failure rate alerts** configured
+- [ ] **Unusual validation patterns** logged (potential attacks)
+- [ ] **Performance metrics** for validation operations
+- [ ] **Error logs** structured for analysis
+
+## Scoring
+
+- **80+ items checked**: Excellent - Comprehensive validation ✅
+- **60-79 items**: Good - Most validation covered ⚠️
+- **40-59 items**: Fair - Significant gaps exist 🔴
+- **<40 items**: Poor - Inadequate validation ❌
+
+## Priority Items
+
+Address these first:
+1. **All API endpoints validated** - Prevent invalid data entry
+2. **Multi-tenant isolation** - Security-critical
+3. **SQL injection prevention** - Use ORM, not raw SQL
+4. **File upload validation** - Common attack vector
+5. **Error handling** - User experience and debugging
+
+## Common Pitfalls
+
+❌ **Don't:**
+- Trust client-side validation alone (always validate server-side)
+- Use overly complex regex (hard to maintain, performance issues)
+- Return technical error messages to users
+- Skip validation on internal endpoints (defense in depth)
+- Log sensitive data in validation errors
+
+✅ **Do:**
+- Validate at boundaries (API endpoints, file uploads, external APIs)
+- Use standard validators (email, URL, UUID) from libraries
+- Provide clear, actionable error messages
+- Test validation logic thoroughly
+- Document validation requirements
+
+## Related Resources
+
+- [Zod Documentation](https://zod.dev)
+- [Pydantic Documentation](https://docs.pydantic.dev)
+- [OWASP Input Validation](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html)
+- [data-validation skill](../SKILL.md)
+
+---
+
+**Total Items**: 120+ validation checks
+**Critical Items**: API validation, Multi-tenant, Security, File uploads
+**Coverage**: TypeScript, Python, Database, Security
+**Last Updated**: 2025-11-10