Files
gh-treasure-data-aps-claude…/commands/hybrid-unif-config-validate.md
2025-11-30 09:02:39 +08:00

9.0 KiB

name, description
name description
hybrid-unif-config-validate Validate YAML configuration for hybrid ID unification before SQL generation

Validate Hybrid ID Unification YAML

Overview

Validate your unify.yml configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.


What You Need

Required Input

  1. YAML Configuration File: Path to your unify.yml

What I'll Do

Step 1: File Validation

  • Verify file exists and is readable
  • Check YAML syntax (proper indentation, quotes, etc.)
  • Ensure all required sections are present

Step 2: Structure Validation

Check presence and structure of:

  • name: Unification project name
  • keys: Key definitions with validation rules
  • tables: Source tables with key column mappings
  • canonical_ids: Canonical ID configuration
  • master_tables: Master table definitions (optional)

Step 3: Content Validation

Validate individual sections:

Keys Section:

  • ✓ Each key has a unique name
  • valid_regexp is a valid regex pattern (if provided)
  • invalid_texts is an array (if provided)
  • ⚠ Recommend validation rules if missing

Tables Section:

  • ✓ Each table has a name
  • ✓ Each table has at least one key_column
  • ✓ All referenced keys exist in keys section
  • ✓ Column names are valid identifiers
  • ⚠ Check for duplicate table definitions

Canonical IDs Section:

  • ✓ Has a name (will be canonical ID column name)
  • merge_by_keys references existing keys
  • merge_iterations is a positive integer (if provided)
  • ⚠ Suggest optimal iteration count if not specified

Master Tables Section (if present):

  • ✓ Each master table has a name and canonical_id
  • ✓ Referenced canonical_id exists
  • ✓ Attributes have proper structure
  • ✓ Source tables in attributes exist
  • ✓ Priority values are valid
  • ⚠ Check for attribute conflicts

Step 4: Cross-Reference Validation

  • ✓ All merge_by_keys exist in keys section
  • ✓ All key_columns reference defined keys
  • ✓ All master table source tables exist in tables section
  • ✓ Canonical ID names don't conflict with existing columns

Step 5: Best Practices Check

Provide recommendations for:

  • Key validation rules
  • Iteration count optimization
  • Master table attribute priorities
  • Performance considerations

Step 6: Validation Report

Generate comprehensive report with:

  • Passed checks
  • ⚠ Warnings (non-critical issues)
  • Errors (must fix before generation)
  • 💡 Recommendations for improvement

Command Usage

Basic Usage

/cdp-hybrid-idu:hybrid-unif-config-validate

I'll prompt you for:
- YAML file path

Direct Usage

YAML file: /path/to/unify.yml

Example Validation

Input YAML

name: customer_unification

keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A']

tables:
  - table: customer_profiles
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
  - table: orders
    key_columns:
      - {column: email_address, key: email}

canonical_ids:
  - name: unified_id
    merge_by_keys: [email, customer_id]
    merge_iterations: 15

master_tables:
  - name: customer_master
    canonical_id: unified_id
    attributes:
      - name: best_email
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1}
          - {table: orders, column: email_address, priority: 2}

Validation Report

✅ YAML VALIDATION SUCCESSFUL

File Structure:
  ✅ Valid YAML syntax
  ✅ All required sections present
  ✅ Proper indentation and formatting

Keys Section (2 keys):
  ✅ email: Valid regex pattern, invalid_texts defined
  ✅ customer_id: Invalid_texts defined
  ⚠ Consider adding valid_regexp for customer_id for better validation

Tables Section (2 tables):
  ✅ customer_profiles: 2 key columns mapped
  ✅ orders: 1 key column mapped
  ✅ All referenced keys exist

Canonical IDs Section:
  ✅ Name: unified_id
  ✅ Merge keys: email, customer_id (both exist)
  ✅ Iterations: 15 (recommended range: 10-20)

Master Tables Section (1 master table):
  ✅ customer_master: References unified_id
  ✅ Attribute 'best_email': 2 sources with priorities
  ✅ All source tables exist

Cross-References:
  ✅ All merge_by_keys defined in keys section
  ✅ All key_columns reference existing keys
  ✅ All master table sources exist
  ✅ No canonical ID name conflicts

Recommendations:
  💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
  💡 Add more master table attributes for richer customer profiles
  💡 Consider array attributes (top_3_emails) for historical tracking

Summary:
  ✅ 0 errors
  ⚠ 1 warning
  💡 3 recommendations

✓ Configuration is ready for SQL generation!

Validation Checks

Required Checks (Must Pass)

  • File exists and is readable
  • Valid YAML syntax
  • name field present
  • keys section present with at least one key
  • tables section present with at least one table
  • canonical_ids section present
  • All merge_by_keys exist in keys section
  • All key_columns reference defined keys
  • No duplicate key names
  • No duplicate table names
  • Keys have validation rules (valid_regexp or invalid_texts)
  • Merge_iterations specified (otherwise auto-calculated)
  • Master tables defined for unified customer view
  • Source tables have unique key combinations
  • Attribute priorities are sequential

Best Practice Checks

  • Email keys have email regex pattern
  • Phone keys have phone validation
  • Invalid_texts include common null values ('', 'N/A', 'null')
  • Master tables use time-based order_by for recency
  • Array attributes for historical data (top_3_emails, etc.)

Common Validation Errors

Syntax Errors

Error: Invalid YAML: mapping values are not allowed here Solution: Check indentation (use spaces, not tabs), ensure colons have space after them

Error: Invalid YAML: could not find expected ':' Solution: Check for missing colons in key-value pairs

Structure Errors

Error: Missing required section: keys Solution: Add keys section with at least one key definition

Error: Empty tables section Solution: Add at least one table with key_columns

Reference Errors

Error: Key 'phone' referenced in table 'orders' but not defined in keys section Solution: Add phone key to keys section or remove reference

Error: Merge key 'phone_number' not found in keys section Solution: Add phone_number to keys section or remove from merge_by_keys

Error: Master table source 'customer_360' not found in tables section Solution: Add customer_360 to tables section or use correct table name

Value Errors

Error: merge_iterations must be a positive integer, got: 'auto' Solution: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)

Error: Priority must be a positive integer, got: 'high' Solution: Use numeric priority (1 for highest, 2 for second, etc.)


Validation Levels

Strict Mode (Default)

  • Fails on any structural errors
  • Warns on missing best practices
  • Recommends optimizations

Lenient Mode

  • Only fails on critical syntax errors
  • Allows missing optional fields
  • Minimal warnings

Platform-Specific Validation

Databricks-Specific

  • ✓ Table names compatible with Unity Catalog
  • ✓ Column names valid for Spark SQL
  • ⚠ Check for reserved keywords (DATABASE, TABLE, etc.)

Snowflake-Specific

  • ✓ Table names compatible with Snowflake
  • ✓ Column names valid for Snowflake SQL
  • ⚠ Check for reserved keywords (ACCOUNT, SCHEMA, etc.)

What Happens Next

If Validation Passes

✅ Configuration validated successfully!

Ready for:
  • SQL generation (Databricks or Snowflake)
  • Direct execution after generation

Next steps:
  1. /cdp-hybrid-idu:hybrid-generate-databricks
  2. /cdp-hybrid-idu:hybrid-generate-snowflake
  3. /cdp-hybrid-idu:hybrid-setup (complete workflow)

If Validation Fails

❌ Configuration has errors that must be fixed

Errors (must fix):
  1. Missing required section: canonical_ids
  2. Undefined key 'phone' referenced in table 'orders'

Suggestions:
  • Add canonical_ids section with name and merge_by_keys
  • Add phone key to keys section or remove from orders

Would you like help fixing these issues? (y/n)

I can help you:

  • Fix syntax errors
  • Add missing sections
  • Define proper validation rules
  • Optimize configuration

Success Criteria

Validation passes when:

  • YAML syntax is valid
  • All required sections present
  • All references resolved
  • No structural errors
  • Ready for SQL generation

Ready to validate your YAML configuration?

Provide your unify.yml file path to begin validation!