256 lines
6.6 KiB
Markdown
256 lines
6.6 KiB
Markdown
---
|
|
name: ingest-validate-wf
|
|
description: Validate Digdag workflow and configuration files against production quality gates
|
|
---
|
|
|
|
# Validate Ingestion Workflow
|
|
|
|
## ⚠️ CRITICAL: This validates against strict production quality gates
|
|
|
|
I'll validate your ingestion workflow for compliance with production standards and best practices.
|
|
|
|
---
|
|
|
|
## What I'll Validate
|
|
|
|
### Quality Gates (ALL MUST PASS)
|
|
|
|
#### 1. Template Compliance
|
|
- ✅ Code matches documented templates 100%
|
|
- ✅ No unauthorized deviations from patterns
|
|
- ✅ All template sections present
|
|
- ✅ Exact formatting and structure
|
|
|
|
#### 2. Logging Requirements
|
|
- ✅ Start logging before data processing
|
|
- ✅ Success logging after td_load
|
|
- ✅ Error logging in `_error` blocks
|
|
- ✅ Minimum 3 logging blocks per data source
|
|
- ✅ Correct SQL template usage
|
|
|
|
#### 3. Error Handling
|
|
- ✅ `_error:` blocks present in all workflows
|
|
- ✅ Error logging with SQL present
|
|
- ✅ Proper error message capture
|
|
- ✅ Job ID and URL captured in errors
|
|
|
|
#### 4. Timestamp Format
|
|
- ✅ Correct format for connector type:
|
|
- Google BigQuery: SQL Server format (`CONVERT(varchar, ..., 121)`)
|
|
- Klaviyo: `.000000` (6 decimals, NO Z)
|
|
- OneTrust: `.000Z` (3 decimals, WITH Z)
|
|
- Shopify v2: ISO 8601
|
|
- ✅ Matches `docs/patterns/timestamp-formats.md`
|
|
|
|
#### 5. Incremental Field Handling
|
|
- ✅ Correct field names (table vs. API)
|
|
- ✅ Dual field handling where needed (Klaviyo campaigns)
|
|
- ✅ Proper COALESCE fallback logic
|
|
- ✅ Matches `docs/patterns/incremental-patterns.md`
|
|
|
|
#### 6. Workflow Structure
|
|
- ✅ Must match `docs/patterns/workflow-patterns.md`
|
|
- ✅ Proper timezone declaration (`timezone: UTC`)
|
|
- ✅ Correct `_export` includes
|
|
- ✅ Proper task naming conventions
|
|
- ✅ Correct file organization
|
|
- ✅ Parallel processing limits appropriate for source
|
|
|
|
#### 7. Configuration Files
|
|
- ✅ YAML syntax validity
|
|
- ✅ Secret references (`${secret:name}`) used correctly
|
|
- ✅ No hardcoded credentials
|
|
- ✅ Required parameters present
|
|
- ✅ Database references correct
|
|
- ✅ Mode set appropriately (`append`, `replace`)
|
|
|
|
#### 8. File Organization
|
|
- ✅ `.dig` files in `ingestion/` directory
|
|
- ✅ YAML configs in `ingestion/config/` subdirectory
|
|
- ✅ SQL files in `ingestion/sql/` subdirectory
|
|
- ✅ Proper file naming conventions
|
|
|
|
#### 9. Security
|
|
- ✅ No hardcoded credentials in any file
|
|
- ✅ Proper `${secret:name}` syntax usage
|
|
- ✅ `credentials_ingestion.json` NOT in version control
|
|
- ✅ `.gitignore` includes credentials file
|
|
|
|
---
|
|
|
|
## Validation Options
|
|
|
|
### Option 1: Validate Specific Workflow
|
|
Provide:
|
|
- **Workflow file path**: e.g., `ingestion/klaviyo_ingest_inc.dig`
|
|
- **Related config files**: (or I'll find them automatically)
|
|
|
|
I will:
|
|
1. Read the workflow file
|
|
2. Find all related config files
|
|
3. Check against ALL quality gates
|
|
4. Report detailed findings with line numbers
|
|
|
|
### Option 2: Validate Entire Source
|
|
Provide:
|
|
- **Source name**: e.g., `klaviyo`, `shopify_v2`, `google_bigquery`
|
|
|
|
I will:
|
|
1. Find all workflows for the source
|
|
2. Find all config files for the source
|
|
3. Validate against source-specific documentation
|
|
4. Check all quality gates
|
|
5. Report comprehensive findings
|
|
|
|
### Option 3: Validate All
|
|
Say: **"validate all"**
|
|
|
|
I will:
|
|
1. Find all workflows in `ingestion/`
|
|
2. Find all configs in `ingestion/config/`
|
|
3. Validate each against its source documentation
|
|
4. Check all quality gates
|
|
5. Report full project compliance status
|
|
|
|
---
|
|
|
|
## Validation Process
|
|
|
|
### Step 1: Read Documentation
|
|
I will read relevant documentation to verify compliance:
|
|
- Source-specific docs: `docs/sources/{source-name}.md`
|
|
- Pattern docs: `docs/patterns/*.md`
|
|
|
|
### Step 2: Load Files
|
|
I will read all specified workflow and config files
|
|
|
|
### Step 3: Check Quality Gates
|
|
I will verify each file against ALL quality gates listed above
|
|
|
|
### Step 4: Report Findings
|
|
|
|
#### Pass Report (if all gates pass)
|
|
```
|
|
✅ VALIDATION PASSED
|
|
|
|
Workflow: ingestion/{source}_ingest_inc.dig
|
|
Source: {source}
|
|
|
|
Quality Gates: 9/9 PASSED
|
|
✅ Template Compliance
|
|
✅ Logging Requirements
|
|
✅ Error Handling
|
|
✅ Timestamp Format
|
|
✅ Incremental Fields
|
|
✅ Workflow Structure
|
|
✅ Configuration Files
|
|
✅ File Organization
|
|
✅ Security
|
|
|
|
No issues found. Workflow is production-ready.
|
|
```
|
|
|
|
#### Fail Report (if any gate fails)
|
|
```
|
|
❌ VALIDATION FAILED
|
|
|
|
Workflow: ingestion/{source}_ingest_inc.dig
|
|
Source: {source}
|
|
|
|
Quality Gates: 6/9 PASSED
|
|
|
|
✅ Template Compliance
|
|
✅ Logging Requirements
|
|
❌ Error Handling - FAILED
|
|
- Missing _error block in main workflow
|
|
- Error logging SQL not found
|
|
|
|
✅ Timestamp Format
|
|
❌ Incremental Fields - FAILED
|
|
- Using wrong field name: 'updated_at' should be 'updated' for API
|
|
- Line 45: incremental_field parameter incorrect
|
|
|
|
✅ Workflow Structure
|
|
✅ Configuration Files
|
|
✅ File Organization
|
|
❌ Security - FAILED
|
|
- Hardcoded API key found in config/klaviyo_profiles_load.yml:12
|
|
- Should use ${secret:klaviyo_api_key}
|
|
|
|
RECOMMENDATIONS:
|
|
1. Add _error block to main workflow (see docs/patterns/workflow-patterns.md)
|
|
2. Fix incremental field name (see docs/sources/klaviyo.md)
|
|
3. Replace hardcoded credential with secret reference
|
|
|
|
Re-validate after fixing issues.
|
|
```
|
|
|
|
---
|
|
|
|
## Common Issues Detected
|
|
|
|
### Template Violations
|
|
- Simplified or "optimized" templates
|
|
- Removed "redundant" sections
|
|
- Modified variable names
|
|
- Changed structure
|
|
|
|
### Logging Violations
|
|
- Missing start/success/error logging
|
|
- Incorrect SQL template usage
|
|
- Missing job ID or URL capture
|
|
|
|
### Timestamp Format Errors
|
|
- Wrong decimal count
|
|
- Missing or incorrect timezone marker
|
|
- Using default instead of connector-specific format
|
|
|
|
### Incremental Field Errors
|
|
- Using table field name in API parameter
|
|
- Using API field name in SQL queries
|
|
- Missing COALESCE fallback
|
|
|
|
### Security Issues
|
|
- Hardcoded credentials
|
|
- Incorrect secret syntax
|
|
- Credentials file in version control
|
|
|
|
---
|
|
|
|
## Next Steps After Validation
|
|
|
|
### If Validation Passes
|
|
✅ Workflow is production-ready
|
|
- Deploy with confidence
|
|
- Monitor ingestion_log for ongoing health
|
|
|
|
### If Validation Fails
|
|
❌ Fix reported issues:
|
|
1. Re-read relevant documentation
|
|
2. Apply exact templates
|
|
3. Fix specific line numbers mentioned
|
|
4. Re-validate until all gates pass
|
|
|
|
**DO NOT deploy failing workflows to production**
|
|
|
|
---
|
|
|
|
## Production Quality Assurance
|
|
|
|
This validation ensures:
|
|
- ✅ Code works the first time
|
|
- ✅ Consistent patterns across sources
|
|
- ✅ Complete error handling and logging
|
|
- ✅ Maintainable and documented code
|
|
- ✅ No security vulnerabilities
|
|
- ✅ Compliance with team standards
|
|
|
|
---
|
|
|
|
**What would you like to validate?**
|
|
|
|
Options:
|
|
1. Validate specific workflow: Provide workflow file path
|
|
2. Validate entire source: Provide source name
|
|
3. Validate all: Say "validate all"
|