Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:02:39 +08:00
commit 515e7bf6be
18 changed files with 5770 additions and 0 deletions

View File

@@ -0,0 +1,337 @@
---
name: hybrid-unif-config-validate
description: Validate YAML configuration for hybrid ID unification before SQL generation
---
# Validate Hybrid ID Unification YAML
## Overview
Validate your `unify.yml` configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.
---
## What You Need
### Required Input
1. **YAML Configuration File**: Path to your `unify.yml`
---
## What I'll Do
### Step 1: File Validation
- Verify file exists and is readable
- Check YAML syntax (proper indentation, quotes, etc.)
- Ensure all required sections are present
### Step 2: Structure Validation
Check presence and structure of:
- **name**: Unification project name
- **keys**: Key definitions with validation rules
- **tables**: Source tables with key column mappings
- **canonical_ids**: Canonical ID configuration
- **master_tables**: Master table definitions (optional)
### Step 3: Content Validation
Validate individual sections:
**Keys Section**:
- ✓ Each key has a unique name
-`valid_regexp` is a valid regex pattern (if provided)
-`invalid_texts` is an array (if provided)
- ⚠ Recommend validation rules if missing
**Tables Section**:
- ✓ Each table has a name
- ✓ Each table has at least one key_column
- ✓ All referenced keys exist in keys section
- ✓ Column names are valid identifiers
- ⚠ Check for duplicate table definitions
**Canonical IDs Section**:
- ✓ Has a name (will be canonical ID column name)
-`merge_by_keys` references existing keys
-`merge_iterations` is a positive integer (if provided)
- ⚠ Suggest optimal iteration count if not specified
**Master Tables Section** (if present):
- ✓ Each master table has a name and canonical_id
- ✓ Referenced canonical_id exists
- ✓ Attributes have proper structure
- ✓ Source tables in attributes exist
- ✓ Priority values are valid
- ⚠ Check for attribute conflicts
### Step 4: Cross-Reference Validation
- ✓ All merge_by_keys exist in keys section
- ✓ All key_columns reference defined keys
- ✓ All master table source tables exist in tables section
- ✓ Canonical ID names don't conflict with existing columns
### Step 5: Best Practices Check
Provide recommendations for:
- Key validation rules
- Iteration count optimization
- Master table attribute priorities
- Performance considerations
### Step 6: Validation Report
Generate comprehensive report with:
- ✅ Passed checks
- ⚠ Warnings (non-critical issues)
- ❌ Errors (must fix before generation)
- 💡 Recommendations for improvement
---
## Command Usage
### Basic Usage
```
/cdp-hybrid-idu:hybrid-unif-config-validate
I'll prompt you for:
- YAML file path
```
### Direct Usage
```
YAML file: /path/to/unify.yml
```
---
## Example Validation
### Input YAML
```yaml
name: customer_unification
keys:
- name: email
valid_regexp: ".*@.*"
invalid_texts: ['', 'N/A', 'null']
- name: customer_id
invalid_texts: ['', 'N/A']
tables:
- table: customer_profiles
key_columns:
- {column: email_std, key: email}
- {column: customer_id, key: customer_id}
- table: orders
key_columns:
- {column: email_address, key: email}
canonical_ids:
- name: unified_id
merge_by_keys: [email, customer_id]
merge_iterations: 15
master_tables:
- name: customer_master
canonical_id: unified_id
attributes:
- name: best_email
source_columns:
- {table: customer_profiles, column: email_std, priority: 1}
- {table: orders, column: email_address, priority: 2}
```
### Validation Report
```
✅ YAML VALIDATION SUCCESSFUL
File Structure:
✅ Valid YAML syntax
✅ All required sections present
✅ Proper indentation and formatting
Keys Section (2 keys):
✅ email: Valid regex pattern, invalid_texts defined
✅ customer_id: Invalid_texts defined
⚠ Consider adding valid_regexp for customer_id for better validation
Tables Section (2 tables):
✅ customer_profiles: 2 key columns mapped
✅ orders: 1 key column mapped
✅ All referenced keys exist
Canonical IDs Section:
✅ Name: unified_id
✅ Merge keys: email, customer_id (both exist)
✅ Iterations: 15 (recommended range: 10-20)
Master Tables Section (1 master table):
✅ customer_master: References unified_id
✅ Attribute 'best_email': 2 sources with priorities
✅ All source tables exist
Cross-References:
✅ All merge_by_keys defined in keys section
✅ All key_columns reference existing keys
✅ All master table sources exist
✅ No canonical ID name conflicts
Recommendations:
💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
💡 Add more master table attributes for richer customer profiles
💡 Consider array attributes (top_3_emails) for historical tracking
Summary:
✅ 0 errors
⚠ 1 warning
💡 3 recommendations
✓ Configuration is ready for SQL generation!
```
---
## Validation Checks
### Required Checks (Must Pass)
- [ ] File exists and is readable
- [ ] Valid YAML syntax
- [ ] `name` field present
- [ ] `keys` section present with at least one key
- [ ] `tables` section present with at least one table
- [ ] `canonical_ids` section present
- [ ] All merge_by_keys exist in keys section
- [ ] All key_columns reference defined keys
- [ ] No duplicate key names
- [ ] No duplicate table names
### Warning Checks (Recommended)
- [ ] Keys have validation rules (valid_regexp or invalid_texts)
- [ ] Merge_iterations specified (otherwise auto-calculated)
- [ ] Master tables defined for unified customer view
- [ ] Source tables have unique key combinations
- [ ] Attribute priorities are sequential
### Best Practice Checks
- [ ] Email keys have email regex pattern
- [ ] Phone keys have phone validation
- [ ] Invalid_texts include common null values ('', 'N/A', 'null')
- [ ] Master tables use time-based order_by for recency
- [ ] Array attributes for historical data (top_3_emails, etc.)
---
## Common Validation Errors
### Syntax Errors
**Error**: `Invalid YAML: mapping values are not allowed here`
**Solution**: Check indentation (use spaces, not tabs), ensure colons have space after them
**Error**: `Invalid YAML: could not find expected ':'`
**Solution**: Check for missing colons in key-value pairs
### Structure Errors
**Error**: `Missing required section: keys`
**Solution**: Add keys section with at least one key definition
**Error**: `Empty tables section`
**Solution**: Add at least one table with key_columns
### Reference Errors
**Error**: `Key 'phone' referenced in table 'orders' but not defined in keys section`
**Solution**: Add phone key to keys section or remove reference
**Error**: `Merge key 'phone_number' not found in keys section`
**Solution**: Add phone_number to keys section or remove from merge_by_keys
**Error**: `Master table source 'customer_360' not found in tables section`
**Solution**: Add customer_360 to tables section or use correct table name
### Value Errors
**Error**: `merge_iterations must be a positive integer, got: 'auto'`
**Solution**: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)
**Error**: `Priority must be a positive integer, got: 'high'`
**Solution**: Use numeric priority (1 for highest, 2 for second, etc.)
---
## Validation Levels
### Strict Mode (Default)
- Fails on any structural errors
- Warns on missing best practices
- Recommends optimizations
### Lenient Mode
- Only fails on critical syntax errors
- Allows missing optional fields
- Minimal warnings
---
## Platform-Specific Validation
### Databricks-Specific
- ✓ Table names compatible with Unity Catalog
- ✓ Column names valid for Spark SQL
- ⚠ Check for reserved keywords (DATABASE, TABLE, etc.)
### Snowflake-Specific
- ✓ Table names compatible with Snowflake
- ✓ Column names valid for Snowflake SQL
- ⚠ Check for reserved keywords (ACCOUNT, SCHEMA, etc.)
---
## What Happens Next
### If Validation Passes
```
✅ Configuration validated successfully!
Ready for:
• SQL generation (Databricks or Snowflake)
• Direct execution after generation
Next steps:
1. /cdp-hybrid-idu:hybrid-generate-databricks
2. /cdp-hybrid-idu:hybrid-generate-snowflake
3. /cdp-hybrid-idu:hybrid-setup (complete workflow)
```
### If Validation Fails
```
❌ Configuration has errors that must be fixed
Errors (must fix):
1. Missing required section: canonical_ids
2. Undefined key 'phone' referenced in table 'orders'
Suggestions:
• Add canonical_ids section with name and merge_by_keys
• Add phone key to keys section or remove from orders
Would you like help fixing these issues? (y/n)
```
I can help you:
- Fix syntax errors
- Add missing sections
- Define proper validation rules
- Optimize configuration
---
## Success Criteria
Validation passes when:
- ✅ YAML syntax is valid
- ✅ All required sections present
- ✅ All references resolved
- ✅ No structural errors
- ✅ Ready for SQL generation
---
**Ready to validate your YAML configuration?**
Provide your `unify.yml` file path to begin validation!