Initial commit
This commit is contained in:
337
commands/hybrid-unif-config-validate.md
Normal file
337
commands/hybrid-unif-config-validate.md
Normal file
@@ -0,0 +1,337 @@
|
||||
---
|
||||
name: hybrid-unif-config-validate
|
||||
description: Validate YAML configuration for hybrid ID unification before SQL generation
|
||||
---
|
||||
|
||||
# Validate Hybrid ID Unification YAML
|
||||
|
||||
## Overview
|
||||
|
||||
Validate your `unify.yml` configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.
|
||||
|
||||
---
|
||||
|
||||
## What You Need
|
||||
|
||||
### Required Input
|
||||
1. **YAML Configuration File**: Path to your `unify.yml`
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: File Validation
|
||||
- Verify file exists and is readable
|
||||
- Check YAML syntax (proper indentation, quotes, etc.)
|
||||
- Ensure all required sections are present
|
||||
|
||||
### Step 2: Structure Validation
|
||||
Check presence and structure of:
|
||||
- **name**: Unification project name
|
||||
- **keys**: Key definitions with validation rules
|
||||
- **tables**: Source tables with key column mappings
|
||||
- **canonical_ids**: Canonical ID configuration
|
||||
- **master_tables**: Master table definitions (optional)
|
||||
|
||||
### Step 3: Content Validation
|
||||
Validate individual sections:
|
||||
|
||||
**Keys Section**:
|
||||
- ✓ Each key has a unique name
|
||||
- ✓ `valid_regexp` is a valid regex pattern (if provided)
|
||||
- ✓ `invalid_texts` is an array (if provided)
|
||||
- ⚠ Recommend validation rules if missing
|
||||
|
||||
**Tables Section**:
|
||||
- ✓ Each table has a name
|
||||
- ✓ Each table has at least one key_column
|
||||
- ✓ All referenced keys exist in keys section
|
||||
- ✓ Column names are valid identifiers
|
||||
- ⚠ Check for duplicate table definitions
|
||||
|
||||
**Canonical IDs Section**:
|
||||
- ✓ Has a name (will be canonical ID column name)
|
||||
- ✓ `merge_by_keys` references existing keys
|
||||
- ✓ `merge_iterations` is a positive integer (if provided)
|
||||
- ⚠ Suggest optimal iteration count if not specified
|
||||
|
||||
**Master Tables Section** (if present):
|
||||
- ✓ Each master table has a name and canonical_id
|
||||
- ✓ Referenced canonical_id exists
|
||||
- ✓ Attributes have proper structure
|
||||
- ✓ Source tables in attributes exist
|
||||
- ✓ Priority values are valid
|
||||
- ⚠ Check for attribute conflicts
|
||||
|
||||
### Step 4: Cross-Reference Validation
|
||||
- ✓ All merge_by_keys exist in keys section
|
||||
- ✓ All key_columns reference defined keys
|
||||
- ✓ All master table source tables exist in tables section
|
||||
- ✓ Canonical ID names don't conflict with existing columns
|
||||
|
||||
### Step 5: Best Practices Check
|
||||
Provide recommendations for:
|
||||
- Key validation rules
|
||||
- Iteration count optimization
|
||||
- Master table attribute priorities
|
||||
- Performance considerations
|
||||
|
||||
### Step 6: Validation Report
|
||||
Generate comprehensive report with:
|
||||
- ✅ Passed checks
|
||||
- ⚠ Warnings (non-critical issues)
|
||||
- ❌ Errors (must fix before generation)
|
||||
- 💡 Recommendations for improvement
|
||||
|
||||
---
|
||||
|
||||
## Command Usage
|
||||
|
||||
### Basic Usage
|
||||
```
|
||||
/cdp-hybrid-idu:hybrid-unif-config-validate
|
||||
|
||||
I'll prompt you for:
|
||||
- YAML file path
|
||||
```
|
||||
|
||||
### Direct Usage
|
||||
```
|
||||
YAML file: /path/to/unify.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Validation
|
||||
|
||||
### Input YAML
|
||||
```yaml
|
||||
name: customer_unification
|
||||
|
||||
keys:
|
||||
- name: email
|
||||
valid_regexp: ".*@.*"
|
||||
invalid_texts: ['', 'N/A', 'null']
|
||||
- name: customer_id
|
||||
invalid_texts: ['', 'N/A']
|
||||
|
||||
tables:
|
||||
- table: customer_profiles
|
||||
key_columns:
|
||||
- {column: email_std, key: email}
|
||||
- {column: customer_id, key: customer_id}
|
||||
- table: orders
|
||||
key_columns:
|
||||
- {column: email_address, key: email}
|
||||
|
||||
canonical_ids:
|
||||
- name: unified_id
|
||||
merge_by_keys: [email, customer_id]
|
||||
merge_iterations: 15
|
||||
|
||||
master_tables:
|
||||
- name: customer_master
|
||||
canonical_id: unified_id
|
||||
attributes:
|
||||
- name: best_email
|
||||
source_columns:
|
||||
- {table: customer_profiles, column: email_std, priority: 1}
|
||||
- {table: orders, column: email_address, priority: 2}
|
||||
```
|
||||
|
||||
### Validation Report
|
||||
```
|
||||
✅ YAML VALIDATION SUCCESSFUL
|
||||
|
||||
File Structure:
|
||||
✅ Valid YAML syntax
|
||||
✅ All required sections present
|
||||
✅ Proper indentation and formatting
|
||||
|
||||
Keys Section (2 keys):
|
||||
✅ email: Valid regex pattern, invalid_texts defined
|
||||
✅ customer_id: Invalid_texts defined
|
||||
⚠ Consider adding valid_regexp for customer_id for better validation
|
||||
|
||||
Tables Section (2 tables):
|
||||
✅ customer_profiles: 2 key columns mapped
|
||||
✅ orders: 1 key column mapped
|
||||
✅ All referenced keys exist
|
||||
|
||||
Canonical IDs Section:
|
||||
✅ Name: unified_id
|
||||
✅ Merge keys: email, customer_id (both exist)
|
||||
✅ Iterations: 15 (recommended range: 10-20)
|
||||
|
||||
Master Tables Section (1 master table):
|
||||
✅ customer_master: References unified_id
|
||||
✅ Attribute 'best_email': 2 sources with priorities
|
||||
✅ All source tables exist
|
||||
|
||||
Cross-References:
|
||||
✅ All merge_by_keys defined in keys section
|
||||
✅ All key_columns reference existing keys
|
||||
✅ All master table sources exist
|
||||
✅ No canonical ID name conflicts
|
||||
|
||||
Recommendations:
|
||||
💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
|
||||
💡 Add more master table attributes for richer customer profiles
|
||||
💡 Consider array attributes (top_3_emails) for historical tracking
|
||||
|
||||
Summary:
|
||||
✅ 0 errors
|
||||
⚠ 1 warning
|
||||
💡 3 recommendations
|
||||
|
||||
✓ Configuration is ready for SQL generation!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Checks
|
||||
|
||||
### Required Checks (Must Pass)
|
||||
- [ ] File exists and is readable
|
||||
- [ ] Valid YAML syntax
|
||||
- [ ] `name` field present
|
||||
- [ ] `keys` section present with at least one key
|
||||
- [ ] `tables` section present with at least one table
|
||||
- [ ] `canonical_ids` section present
|
||||
- [ ] All merge_by_keys exist in keys section
|
||||
- [ ] All key_columns reference defined keys
|
||||
- [ ] No duplicate key names
|
||||
- [ ] No duplicate table names
|
||||
|
||||
### Warning Checks (Recommended)
|
||||
- [ ] Keys have validation rules (valid_regexp or invalid_texts)
|
||||
- [ ] Merge_iterations specified (otherwise auto-calculated)
|
||||
- [ ] Master tables defined for unified customer view
|
||||
- [ ] Source tables have unique key combinations
|
||||
- [ ] Attribute priorities are sequential
|
||||
|
||||
### Best Practice Checks
|
||||
- [ ] Email keys have email regex pattern
|
||||
- [ ] Phone keys have phone validation
|
||||
- [ ] Invalid_texts include common null values ('', 'N/A', 'null')
|
||||
- [ ] Master tables use time-based order_by for recency
|
||||
- [ ] Array attributes for historical data (top_3_emails, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Common Validation Errors
|
||||
|
||||
### Syntax Errors
|
||||
**Error**: `Invalid YAML: mapping values are not allowed here`
|
||||
**Solution**: Check indentation (use spaces, not tabs), ensure colons have space after them
|
||||
|
||||
**Error**: `Invalid YAML: could not find expected ':'`
|
||||
**Solution**: Check for missing colons in key-value pairs
|
||||
|
||||
### Structure Errors
|
||||
**Error**: `Missing required section: keys`
|
||||
**Solution**: Add keys section with at least one key definition
|
||||
|
||||
**Error**: `Empty tables section`
|
||||
**Solution**: Add at least one table with key_columns
|
||||
|
||||
### Reference Errors
|
||||
**Error**: `Key 'phone' referenced in table 'orders' but not defined in keys section`
|
||||
**Solution**: Add phone key to keys section or remove reference
|
||||
|
||||
**Error**: `Merge key 'phone_number' not found in keys section`
|
||||
**Solution**: Add phone_number to keys section or remove from merge_by_keys
|
||||
|
||||
**Error**: `Master table source 'customer_360' not found in tables section`
|
||||
**Solution**: Add customer_360 to tables section or use correct table name
|
||||
|
||||
### Value Errors
|
||||
**Error**: `merge_iterations must be a positive integer, got: 'auto'`
|
||||
**Solution**: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)
|
||||
|
||||
**Error**: `Priority must be a positive integer, got: 'high'`
|
||||
**Solution**: Use numeric priority (1 for highest, 2 for second, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Validation Levels
|
||||
|
||||
### Strict Mode (Default)
|
||||
- Fails on any structural errors
|
||||
- Warns on missing best practices
|
||||
- Recommends optimizations
|
||||
|
||||
### Lenient Mode
|
||||
- Only fails on critical syntax errors
|
||||
- Allows missing optional fields
|
||||
- Minimal warnings
|
||||
|
||||
---
|
||||
|
||||
## Platform-Specific Validation
|
||||
|
||||
### Databricks-Specific
|
||||
- ✓ Table names compatible with Unity Catalog
|
||||
- ✓ Column names valid for Spark SQL
|
||||
- ⚠ Check for reserved keywords (DATABASE, TABLE, etc.)
|
||||
|
||||
### Snowflake-Specific
|
||||
- ✓ Table names compatible with Snowflake
|
||||
- ✓ Column names valid for Snowflake SQL
|
||||
- ⚠ Check for reserved keywords (ACCOUNT, SCHEMA, etc.)
|
||||
|
||||
---
|
||||
|
||||
## What Happens Next
|
||||
|
||||
### If Validation Passes
|
||||
```
|
||||
✅ Configuration validated successfully!
|
||||
|
||||
Ready for:
|
||||
• SQL generation (Databricks or Snowflake)
|
||||
• Direct execution after generation
|
||||
|
||||
Next steps:
|
||||
1. /cdp-hybrid-idu:hybrid-generate-databricks
|
||||
2. /cdp-hybrid-idu:hybrid-generate-snowflake
|
||||
3. /cdp-hybrid-idu:hybrid-setup (complete workflow)
|
||||
```
|
||||
|
||||
### If Validation Fails
|
||||
```
|
||||
❌ Configuration has errors that must be fixed
|
||||
|
||||
Errors (must fix):
|
||||
1. Missing required section: canonical_ids
|
||||
2. Undefined key 'phone' referenced in table 'orders'
|
||||
|
||||
Suggestions:
|
||||
• Add canonical_ids section with name and merge_by_keys
|
||||
• Add phone key to keys section or remove from orders
|
||||
|
||||
Would you like help fixing these issues? (y/n)
|
||||
```
|
||||
|
||||
I can help you:
|
||||
- Fix syntax errors
|
||||
- Add missing sections
|
||||
- Define proper validation rules
|
||||
- Optimize configuration
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Validation passes when:
|
||||
- ✅ YAML syntax is valid
|
||||
- ✅ All required sections present
|
||||
- ✅ All references resolved
|
||||
- ✅ No structural errors
|
||||
- ✅ Ready for SQL generation
|
||||
|
||||
---
|
||||
|
||||
**Ready to validate your YAML configuration?**
|
||||
|
||||
Provide your `unify.yml` file path to begin validation!
|
||||
Reference in New Issue
Block a user