9.0 KiB
name, description
| name | description |
|---|---|
| hybrid-unif-config-validate | Validate YAML configuration for hybrid ID unification before SQL generation |
Validate Hybrid ID Unification YAML
Overview
Validate your unify.yml configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.
What You Need
Required Input
- YAML Configuration File: Path to your
unify.yml
What I'll Do
Step 1: File Validation
- Verify file exists and is readable
- Check YAML syntax (proper indentation, quotes, etc.)
- Ensure all required sections are present
Step 2: Structure Validation
Check presence and structure of:
- name: Unification project name
- keys: Key definitions with validation rules
- tables: Source tables with key column mappings
- canonical_ids: Canonical ID configuration
- master_tables: Master table definitions (optional)
Step 3: Content Validation
Validate individual sections:
Keys Section:
- ✓ Each key has a unique name
- ✓
valid_regexpis a valid regex pattern (if provided) - ✓
invalid_textsis an array (if provided) - ⚠ Recommend validation rules if missing
Tables Section:
- ✓ Each table has a name
- ✓ Each table has at least one key_column
- ✓ All referenced keys exist in keys section
- ✓ Column names are valid identifiers
- ⚠ Check for duplicate table definitions
Canonical IDs Section:
- ✓ Has a name (will be canonical ID column name)
- ✓
merge_by_keysreferences existing keys - ✓
merge_iterationsis a positive integer (if provided) - ⚠ Suggest optimal iteration count if not specified
Master Tables Section (if present):
- ✓ Each master table has a name and canonical_id
- ✓ Referenced canonical_id exists
- ✓ Attributes have proper structure
- ✓ Source tables in attributes exist
- ✓ Priority values are valid
- ⚠ Check for attribute conflicts
Step 4: Cross-Reference Validation
- ✓ All merge_by_keys exist in keys section
- ✓ All key_columns reference defined keys
- ✓ All master table source tables exist in tables section
- ✓ Canonical ID names don't conflict with existing columns
Step 5: Best Practices Check
Provide recommendations for:
- Key validation rules
- Iteration count optimization
- Master table attribute priorities
- Performance considerations
Step 6: Validation Report
Generate comprehensive report with:
- ✅ Passed checks
- ⚠ Warnings (non-critical issues)
- ❌ Errors (must fix before generation)
- 💡 Recommendations for improvement
Command Usage
Basic Usage
/cdp-hybrid-idu:hybrid-unif-config-validate
I'll prompt you for:
- YAML file path
Direct Usage
YAML file: /path/to/unify.yml
Example Validation
Input YAML
name: customer_unification
keys:
- name: email
valid_regexp: ".*@.*"
invalid_texts: ['', 'N/A', 'null']
- name: customer_id
invalid_texts: ['', 'N/A']
tables:
- table: customer_profiles
key_columns:
- {column: email_std, key: email}
- {column: customer_id, key: customer_id}
- table: orders
key_columns:
- {column: email_address, key: email}
canonical_ids:
- name: unified_id
merge_by_keys: [email, customer_id]
merge_iterations: 15
master_tables:
- name: customer_master
canonical_id: unified_id
attributes:
- name: best_email
source_columns:
- {table: customer_profiles, column: email_std, priority: 1}
- {table: orders, column: email_address, priority: 2}
Validation Report
✅ YAML VALIDATION SUCCESSFUL
File Structure:
✅ Valid YAML syntax
✅ All required sections present
✅ Proper indentation and formatting
Keys Section (2 keys):
✅ email: Valid regex pattern, invalid_texts defined
✅ customer_id: Invalid_texts defined
⚠ Consider adding valid_regexp for customer_id for better validation
Tables Section (2 tables):
✅ customer_profiles: 2 key columns mapped
✅ orders: 1 key column mapped
✅ All referenced keys exist
Canonical IDs Section:
✅ Name: unified_id
✅ Merge keys: email, customer_id (both exist)
✅ Iterations: 15 (recommended range: 10-20)
Master Tables Section (1 master table):
✅ customer_master: References unified_id
✅ Attribute 'best_email': 2 sources with priorities
✅ All source tables exist
Cross-References:
✅ All merge_by_keys defined in keys section
✅ All key_columns reference existing keys
✅ All master table sources exist
✅ No canonical ID name conflicts
Recommendations:
💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
💡 Add more master table attributes for richer customer profiles
💡 Consider array attributes (top_3_emails) for historical tracking
Summary:
✅ 0 errors
⚠ 1 warning
💡 3 recommendations
✓ Configuration is ready for SQL generation!
Validation Checks
Required Checks (Must Pass)
- File exists and is readable
- Valid YAML syntax
namefield presentkeyssection present with at least one keytablessection present with at least one tablecanonical_idssection present- All merge_by_keys exist in keys section
- All key_columns reference defined keys
- No duplicate key names
- No duplicate table names
Warning Checks (Recommended)
- Keys have validation rules (valid_regexp or invalid_texts)
- Merge_iterations specified (otherwise auto-calculated)
- Master tables defined for unified customer view
- Source tables have unique key combinations
- Attribute priorities are sequential
Best Practice Checks
- Email keys have email regex pattern
- Phone keys have phone validation
- Invalid_texts include common null values ('', 'N/A', 'null')
- Master tables use time-based order_by for recency
- Array attributes for historical data (top_3_emails, etc.)
Common Validation Errors
Syntax Errors
Error: Invalid YAML: mapping values are not allowed here
Solution: Check indentation (use spaces, not tabs), ensure colons have space after them
Error: Invalid YAML: could not find expected ':'
Solution: Check for missing colons in key-value pairs
Structure Errors
Error: Missing required section: keys
Solution: Add keys section with at least one key definition
Error: Empty tables section
Solution: Add at least one table with key_columns
Reference Errors
Error: Key 'phone' referenced in table 'orders' but not defined in keys section
Solution: Add phone key to keys section or remove reference
Error: Merge key 'phone_number' not found in keys section
Solution: Add phone_number to keys section or remove from merge_by_keys
Error: Master table source 'customer_360' not found in tables section
Solution: Add customer_360 to tables section or use correct table name
Value Errors
Error: merge_iterations must be a positive integer, got: 'auto'
Solution: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)
Error: Priority must be a positive integer, got: 'high'
Solution: Use numeric priority (1 for highest, 2 for second, etc.)
Validation Levels
Strict Mode (Default)
- Fails on any structural errors
- Warns on missing best practices
- Recommends optimizations
Lenient Mode
- Only fails on critical syntax errors
- Allows missing optional fields
- Minimal warnings
Platform-Specific Validation
Databricks-Specific
- ✓ Table names compatible with Unity Catalog
- ✓ Column names valid for Spark SQL
- ⚠ Check for reserved keywords (DATABASE, TABLE, etc.)
Snowflake-Specific
- ✓ Table names compatible with Snowflake
- ✓ Column names valid for Snowflake SQL
- ⚠ Check for reserved keywords (ACCOUNT, SCHEMA, etc.)
What Happens Next
If Validation Passes
✅ Configuration validated successfully!
Ready for:
• SQL generation (Databricks or Snowflake)
• Direct execution after generation
Next steps:
1. /cdp-hybrid-idu:hybrid-generate-databricks
2. /cdp-hybrid-idu:hybrid-generate-snowflake
3. /cdp-hybrid-idu:hybrid-setup (complete workflow)
If Validation Fails
❌ Configuration has errors that must be fixed
Errors (must fix):
1. Missing required section: canonical_ids
2. Undefined key 'phone' referenced in table 'orders'
Suggestions:
• Add canonical_ids section with name and merge_by_keys
• Add phone key to keys section or remove from orders
Would you like help fixing these issues? (y/n)
I can help you:
- Fix syntax errors
- Add missing sections
- Define proper validation rules
- Optimize configuration
Success Criteria
Validation passes when:
- ✅ YAML syntax is valid
- ✅ All required sections present
- ✅ All references resolved
- ✅ No structural errors
- ✅ Ready for SQL generation
Ready to validate your YAML configuration?
Provide your unify.yml file path to begin validation!