Files
gh-treasure-data-aps-claude…/commands/ingest-validate-wf.md
2025-11-30 09:02:41 +08:00

256 lines
6.6 KiB
Markdown

---
name: ingest-validate-wf
description: Validate Digdag workflow and configuration files against production quality gates
---
# Validate Ingestion Workflow
## ⚠️ CRITICAL: This validates against strict production quality gates
I'll validate your ingestion workflow for compliance with production standards and best practices.
---
## What I'll Validate
### Quality Gates (ALL MUST PASS)
#### 1. Template Compliance
- ✅ Code matches documented templates 100%
- ✅ No unauthorized deviations from patterns
- ✅ All template sections present
- ✅ Exact formatting and structure
#### 2. Logging Requirements
- ✅ Start logging before data processing
- ✅ Success logging after td_load
- ✅ Error logging in `_error` blocks
- ✅ Minimum 3 logging blocks per data source
- ✅ Correct SQL template usage
#### 3. Error Handling
-`_error:` blocks present in all workflows
- ✅ Error logging with SQL present
- ✅ Proper error message capture
- ✅ Job ID and URL captured in errors
#### 4. Timestamp Format
- ✅ Correct format for connector type:
- Google BigQuery: SQL Server format (`CONVERT(varchar, ..., 121)`)
- Klaviyo: `.000000` (6 decimals, NO Z)
- OneTrust: `.000Z` (3 decimals, WITH Z)
- Shopify v2: ISO 8601
- ✅ Matches `docs/patterns/timestamp-formats.md`
#### 5. Incremental Field Handling
- ✅ Correct field names (table vs. API)
- ✅ Dual field handling where needed (Klaviyo campaigns)
- ✅ Proper COALESCE fallback logic
- ✅ Matches `docs/patterns/incremental-patterns.md`
#### 6. Workflow Structure
- ✅ Must match `docs/patterns/workflow-patterns.md`
- ✅ Proper timezone declaration (`timezone: UTC`)
- ✅ Correct `_export` includes
- ✅ Proper task naming conventions
- ✅ Correct file organization
- ✅ Parallel processing limits appropriate for source
#### 7. Configuration Files
- ✅ YAML syntax validity
- ✅ Secret references (`${secret:name}`) used correctly
- ✅ No hardcoded credentials
- ✅ Required parameters present
- ✅ Database references correct
- ✅ Mode set appropriately (`append`, `replace`)
#### 8. File Organization
-`.dig` files in `ingestion/` directory
- ✅ YAML configs in `ingestion/config/` subdirectory
- ✅ SQL files in `ingestion/sql/` subdirectory
- ✅ Proper file naming conventions
#### 9. Security
- ✅ No hardcoded credentials in any file
- ✅ Proper `${secret:name}` syntax usage
-`credentials_ingestion.json` NOT in version control
-`.gitignore` includes credentials file
---
## Validation Options
### Option 1: Validate Specific Workflow
Provide:
- **Workflow file path**: e.g., `ingestion/klaviyo_ingest_inc.dig`
- **Related config files**: (or I'll find them automatically)
I will:
1. Read the workflow file
2. Find all related config files
3. Check against ALL quality gates
4. Report detailed findings with line numbers
### Option 2: Validate Entire Source
Provide:
- **Source name**: e.g., `klaviyo`, `shopify_v2`, `google_bigquery`
I will:
1. Find all workflows for the source
2. Find all config files for the source
3. Validate against source-specific documentation
4. Check all quality gates
5. Report comprehensive findings
### Option 3: Validate All
Say: **"validate all"**
I will:
1. Find all workflows in `ingestion/`
2. Find all configs in `ingestion/config/`
3. Validate each against its source documentation
4. Check all quality gates
5. Report full project compliance status
---
## Validation Process
### Step 1: Read Documentation
I will read relevant documentation to verify compliance:
- Source-specific docs: `docs/sources/{source-name}.md`
- Pattern docs: `docs/patterns/*.md`
### Step 2: Load Files
I will read all specified workflow and config files
### Step 3: Check Quality Gates
I will verify each file against ALL quality gates listed above
### Step 4: Report Findings
#### Pass Report (if all gates pass)
```
✅ VALIDATION PASSED
Workflow: ingestion/{source}_ingest_inc.dig
Source: {source}
Quality Gates: 9/9 PASSED
✅ Template Compliance
✅ Logging Requirements
✅ Error Handling
✅ Timestamp Format
✅ Incremental Fields
✅ Workflow Structure
✅ Configuration Files
✅ File Organization
✅ Security
No issues found. Workflow is production-ready.
```
#### Fail Report (if any gate fails)
```
❌ VALIDATION FAILED
Workflow: ingestion/{source}_ingest_inc.dig
Source: {source}
Quality Gates: 6/9 PASSED
✅ Template Compliance
✅ Logging Requirements
❌ Error Handling - FAILED
- Missing _error block in main workflow
- Error logging SQL not found
✅ Timestamp Format
❌ Incremental Fields - FAILED
- Using wrong field name: 'updated_at' should be 'updated' for API
- Line 45: incremental_field parameter incorrect
✅ Workflow Structure
✅ Configuration Files
✅ File Organization
❌ Security - FAILED
- Hardcoded API key found in config/klaviyo_profiles_load.yml:12
- Should use ${secret:klaviyo_api_key}
RECOMMENDATIONS:
1. Add _error block to main workflow (see docs/patterns/workflow-patterns.md)
2. Fix incremental field name (see docs/sources/klaviyo.md)
3. Replace hardcoded credential with secret reference
Re-validate after fixing issues.
```
---
## Common Issues Detected
### Template Violations
- Simplified or "optimized" templates
- Removed "redundant" sections
- Modified variable names
- Changed structure
### Logging Violations
- Missing start/success/error logging
- Incorrect SQL template usage
- Missing job ID or URL capture
### Timestamp Format Errors
- Wrong decimal count
- Missing or incorrect timezone marker
- Using default instead of connector-specific format
### Incremental Field Errors
- Using table field name in API parameter
- Using API field name in SQL queries
- Missing COALESCE fallback
### Security Issues
- Hardcoded credentials
- Incorrect secret syntax
- Credentials file in version control
---
## Next Steps After Validation
### If Validation Passes
✅ Workflow is production-ready
- Deploy with confidence
- Monitor ingestion_log for ongoing health
### If Validation Fails
❌ Fix reported issues:
1. Re-read relevant documentation
2. Apply exact templates
3. Fix specific line numbers mentioned
4. Re-validate until all gates pass
**DO NOT deploy failing workflows to production**
---
## Production Quality Assurance
This validation ensures:
- ✅ Code works the first time
- ✅ Consistent patterns across sources
- ✅ Complete error handling and logging
- ✅ Maintainable and documented code
- ✅ No security vulnerabilities
- ✅ Compliance with team standards
---
**What would you like to validate?**
Options:
1. Validate specific workflow: Provide workflow file path
2. Validate entire source: Provide source name
3. Validate all: Say "validate all"