Files
gh-ihistand-claude-plugins-…/skills/acuantia-dataform.md
2025-11-29 18:48:08 +08:00

20 KiB

name: acuantia-dataform description: Use when working on Acuantia's BigQuery Dataform pipeline (acuantia-gcp-dataform project) - adds Acuantia-specific patterns on top of dataform-engineering-fundamentals: ODS two-arg ref() syntax, looker_ filename prefix, Looker integration (looker_prod/looker_dev), acuantia dataset conventions, coordination with callrail_data_export/dialpad_data_integration/looker projects

Acuantia Dataform Engineering

REQUIRED PREREQUISITE

YOU MUST USE dataform-engineering-fundamentals SKILL FIRST.

This skill is a thin extension layer that adds Acuantia-specific patterns on top of the generic dataform-engineering-fundamentals skill.

Before using this skill:

  1. Read and follow dataform-engineering-fundamentals completely
  2. Apply ALL generic Dataform practices from that skill
  3. Then apply the Acuantia-specific patterns below

This skill does NOT repeat generic practices. If you're looking for:

  • TDD workflow → See dataform-engineering-fundamentals
  • Safety practices (--schema-suffix dev, --dry-run) → See dataform-engineering-fundamentals
  • ${ref()} enforcement → See dataform-engineering-fundamentals
  • Documentation standards → See dataform-engineering-fundamentals
  • Architecture patterns → See dataform-engineering-fundamentals

This skill ONLY adds: Acuantia-specific conventions that differ from or extend generic patterns.

When to Use

Use this skill when working on:

  • acuantia-gcp-dataform project
  • Tables that integrate with Acuantia's Looker instance
  • Transformations using Acuantia's ODS (Operational Data Store) architecture
  • Pipelines coordinating with callrail_data_export or dialpad_data_integration projects

Acuantia-Specific Patterns

1. ODS Architecture and Two-Argument ref()

Acuantia uses a special ODS (Operational Data Store) architecture that requires two-argument ref() syntax.

ODS Architecture:

  • acuantia.ods - Source of truth (master operational data)
  • acuantia.ods_dev - Development/staging dataset
  • acuantia.ods_prod - Production staging dataset

CRITICAL: Use two-argument ref() for ODS tables to avoid suffix duplication:

-- CORRECT: Two-argument ref() for ODS
FROM ${ref("ods", "sap_customers")}
FROM ${ref("ods", "magento_orders")}

-- WRONG: Single-argument causes ods_dev_dev with --schema-suffix dev
FROM ${ref("sap_customers")}  -- Creates ods_dev_dev ❌

Why: The ODS schema name itself gets the suffix applied. Two-argument ref() prevents ods_dev_dev when using --schema-suffix dev.

All other tables: Use single-argument ref() as per dataform-engineering-fundamentals.

2. Looker Table Naming Convention and Layering Architecture

CRITICAL WORKFLOW: All Looker tables follow a mandatory two-layer architecture:

  • definitions/intermediate/looker/ - Views with business logic (prefix: vw_looker*)
  • definitions/output/looker/ - Tables that materialize the views (prefix: looker*)

REQUIRED PATTERN:

  1. Create intermediate view in definitions/intermediate/looker/ with vw_looker* prefix
  2. Create output table in definitions/output/looker/ with looker* prefix (matching view name without vw_ prefix)
  3. Output table MUST be type "table" (never "view") and source from the intermediate view

File naming convention:

# CORRECT - Two-layer pattern
definitions/intermediate/looker/vw_looker_customer_metrics.sqlx  (view with logic)
definitions/output/looker/looker_customer_metrics.sqlx           (table from view)

definitions/intermediate/looker/vw_looker_sales_summary.sqlx     (view with logic)
definitions/output/looker/looker_sales_summary.sqlx              (table from view)

# WRONG - Table directly in output without intermediate view
definitions/output/looker/looker_customer_metrics.sqlx           ❌
definitions/output/looker/customer_metrics.sqlx                  ❌ (also missing prefix)

# WRONG - Using "view" type in output folder
definitions/output/looker/looker_sales_summary.sqlx              ❌ (if type: "view")

Implementation pattern:

-- definitions/intermediate/looker/vw_looker_customer_metrics.sqlx
config {
  type: "view",
  schema: "dataform",  // Intermediate views live in dataform schema
  description: "Customer metrics calculation logic",
  columns: {
    customer_id: "Unique customer identifier",
    lifetime_value: "Total customer lifetime value",
    order_count: "Total number of orders"
  }
}

SELECT
  customer_id,
  SUM(order_total) as lifetime_value,
  COUNT(DISTINCT order_id) as order_count
FROM ${ref("ods", "orders")}
GROUP BY customer_id

-- definitions/output/looker/looker_customer_metrics.sqlx
config {
  type: "table",  // MUST be "table", never "view"
  schema: "looker",  // Base name - suffix appended automatically
  tags: ["looker", "daily"],
  columns: {
    customer_id: "Unique customer identifier",
    lifetime_value: "Total customer lifetime value",
    order_count: "Total number of orders"
  }
}

SELECT * FROM ${ref("vw_looker_customer_metrics")}

Why this layering pattern:

  • Separates business logic (intermediate views) from materialization (output tables)
  • Intermediate views can be modified and tested without recreating Looker tables
  • Multiple output tables can reference the same intermediate logic
  • Clear naming convention makes the view-to-table relationship obvious
  • Makes Looker-specific tables immediately identifiable
  • Prevents naming conflicts with other intermediate tables
  • Aligns with Looker project conventions in looker/ directory

Schema configuration requirements:

  • Intermediate views: schema: "dataform" (base name, suffix is appended)
  • Output tables: schema: "looker" (base name, NOT "looker_prod" - suffix is appended automatically)
    • With --schema-suffix devlooker_dev
    • With --schema-suffix prodlooker_prod

Config name convention:

  • DO NOT include name: in config if it matches the filename
  • Dataform automatically uses the filename (without .sqlx) as the table/view name
  • Only specify name: when it must differ from the filename (e.g., for deduplication or dependency reasons)
-- WRONG: Redundant name specification
-- File: looker_customer_metrics.sqlx
config {
  type: "table",
  name: "looker_customer_metrics",  //  Unnecessary - matches filename
  schema: "looker"
}

-- CORRECT: Omit name when it matches filename
-- File: looker_customer_metrics.sqlx
config {
  type: "table",  //  Name defaults to filename
  schema: "looker"
}

-- CORRECT: Specify name only when different from filename
-- File: looker_customer_metrics_v2.sqlx
config {
  type: "table",
  name: "looker_customer_metrics",  //  Necessary - differs from filename
  schema: "looker"
}

3. Acuantia Dataset Conventions

Primary Datasets:

  • acuantia.ods - Master operational data store (source of truth)
  • acuantia.ods_dev / acuantia.ods_prod - ODS staging datasets
  • acuantia.looker_prod - Production Looker tables
  • acuantia.looker_dev - Development Looker tables
  • acuantia.reports_prod - Production reports and views used outside Looker (typically linked to Google Sheets)
  • acuantia.reports_dev - Development reports and views used outside Looker
  • acuantia.dataform - Operations and temp tables
  • acuantia.callrail_api - CallRail raw data
  • acuantia.dialpad_api - Dialpad raw data
  • acuantia.hubspot - HubSpot data (via Fivetran)
  • acuantia.magento_rotoplas_me_22_prod - Magento/Adobe Commerce data (via Fivetran)

Note on reports datasets: When creating tables/views in reports_* datasets, use case-insensitive collation for string columns to ensure compatibility with Google Sheets connections. Apply collation at table/column creation time using COLLATE 'und:ci'.

Schema suffix behavior:

# With --schema-suffix dev
looker_prod → looker_dev
reports_prod → reports_dev
ods → ods (no suffix, use two-arg ref)
dataform → dataform_dev

4. Looker Integration Context

Tables in definitions/output/looker/ feed Acuantia's Looker instance at https://looker.acuantia.com.

Optimization requirements:

  • Add partitioning/clustering for query performance (Looker users run ad-hoc queries)
  • Use descriptive column names (Looker dimension names derive from these)
  • Include comprehensive column descriptions (synced to Looker metadata via scripts)
  • Consider common Looker user query patterns (filters, aggregations)

Looker-specific config pattern:

config {
  type: "table",
  schema: "looker",  // Base name - DO NOT use "looker_prod"
  tags: ["looker", "daily"],
  bigquery: {
    partitionBy: "DATE(order_date)",
    clusterBy: ["customer_id", "region"]
  },
  columns: {
    customer_id: "Unique customer identifier from SAP (KUNNR field)",
    order_date: "Date when order was placed",
    region: "Geographic region for reporting (matches Looker region dimension)"
  }
}

Metadata sync: Use node scripts/updateLookerDescriptions.js in acuantia-gcp-dataform to sync column descriptions to Looker views.

5. Reports Dataset Integration

Tables in definitions/output/reports/ are designed for use outside the Looker subsystem, typically connected to Google Sheets for ad-hoc reporting and data sharing.

Use cases:

  • Google Sheets data connections for business users
  • Ad-hoc reports shared with stakeholders
  • Export tables for external systems
  • Simplified views for non-technical users

CRITICAL: Reports dataset schema configuration:

NEVER use schema: "reports_prod" or schema: "reports_dev" in SQLX files.

The --schema-suffix flag automatically appends _prod or _dev to the schema name. Using schema: "reports_prod" would incorrectly create reports_prod_prod.

CORRECT pattern:

config {
  type: "table",  // or "view" depending on use case
  schema: "reports",  // Will resolve to reports_dev or reports_prod
  tags: ["reports", "google_sheets"],
  description: "Customer summary for Google Sheets integration",
  columns: {
    customer_name: {
      description: "Customer full name",
      bigqueryPolicyTags: ["COLLATE 'und:ci'"]  // Case-insensitive for Google Sheets
    },
    email: {
      description: "Customer email address",
      bigqueryPolicyTags: ["COLLATE 'und:ci'"]
    }
  }
}

Schema resolution:

  • schema: "reports" + --schema-suffix devacuantia.reports_dev
  • schema: "reports" + --schema-suffix prodacuantia.reports_prod

Case-insensitive collation requirement: When creating tables in reports_* datasets, apply case-insensitive collation to string columns:

CREATE TABLE `acuantia.reports_prod.customer_export` (
  customer_name STRING COLLATE 'und:ci',
  email STRING COLLATE 'und:ci',
  order_count INT64
);

Why case-insensitivity matters:

  • Google Sheets connections often require case-insensitive string matching
  • Prevents duplicate entries from case variations (e.g., "John" vs "JOHN")
  • Simplifies filtering and lookups for business users

IMPORTANT: Always use base schema name without suffix:

  • Use schema: "reports" in SQLX files (NOT "reports_dev" or "reports_prod")
  • The --schema-suffix flag handles the environment:
    • --schema-suffix devreports_dev
    • --schema-suffix prodreports_prod

6. Source System Integration

Acuantia integrates data from multiple source systems. Use specific terminology when documenting columns:

SAP ERP:

columns: {
  customer_id: "SAP Customer Number (KUNNR) - unique identifier in SAP ERP",
  customer_name: "Customer name (NAME1 field) - legal business name",
  account_group: "Customer Account Group (KTOKD) - classification code"
}

Dialpad API (from dialpad_data_integration project):

-- Source declaration
-- definitions/sources/dialpad/calls.sqlx
config {
  type: "declaration",
  database: "acuantia",
  schema: "dialpad_api",
  name: "calls",
  description: "Dialpad call records with transcripts and sentiment analysis",
  columns: {
    call_id: "Unique call identifier from Dialpad API",
    transcript: "Full call transcript from Dialpad AI",
    sentiment: "Overall call sentiment: positive/negative/neutral/mixed"
  }
}

CallRail API (from callrail_data_export project):

-- Source declaration
-- definitions/sources/callrail/calls.sqlx
config {
  type: "declaration",
  database: "acuantia",
  schema: "callrail_api",
  name: "calls",
  columns: {
    call_id: "Unique CallRail call identifier",
    tracking_phone_number: "CallRail tracking number that received the call",
    attribution: "Nested attribution data (source, medium, campaign)"
  }
}

HubSpot CRM (via Fivetran):

config {
  type: "declaration",
  database: "acuantia",
  schema: "hubspot",
  name: "contact"
}

Magento/Adobe Commerce (via Fivetran):

config {
  type: "declaration",
  database: "acuantia",
  schema: "magento_rotoplas_me_22_prod",
  name: "sales_order"
}

7. Cross-Project Coordination

Acuantia's data platform spans multiple projects that work together:

callrail_data_export/          → acuantia.callrail_api.*
dialpad_data_integration/      → acuantia.dialpad_api.*
acuantia-gcp-dataform/         → Transform and model
looker/                        → Visualize and report

When modifying schemas:

  1. Source changes (callrail_data_export or dialpad_data_integration):

    • Update Python schema definitions
    • Test with small data exports
    • Deploy to production
  2. Dataform updates (acuantia-gcp-dataform):

    • Update source declarations in definitions/sources/
    • Modify transformations if needed
    • Update definitions/output/looker/ tables
    • Test with --schema-suffix dev
  3. Looker updates (looker project):

    • Update view definitions
    • Add new dimensions/measures
    • Test in development environment

Schema change protocol: Always coordinate changes across all three layers (raw → transformed → visualization).

8. Business Context

Acuantia serves four main product verticals:

  • Septic: Septic tank systems
  • General: General purpose containers
  • Industrial: Industrial containers and equipment
  • Chemical: Chemical storage containers

Key business entities:

  • TankHolding: Key business vertical with specialized recovery operations
  • Customer Journey: Multi-touch attribution across CallRail, HubSpot, and Magento
  • Voice of Customer (VoC): Dialpad call transcripts analyzed for sentiment and topics

When creating tables, consider how they support these business verticals and use cases.

Validation Queries (Acuantia-Specific)

# Check Looker dev tables
bq query --use_legacy_sql=false \
  "SELECT COUNT(*) FROM \`acuantia.looker_dev.looker_customer_metrics\`"

# Check reports dev tables
bq query --use_legacy_sql=false \
  "SELECT COUNT(*) FROM \`acuantia.reports_dev.customer_export\`"

# Check ODS tables
bq query --use_legacy_sql=false \
  "SELECT COUNT(*) FROM \`acuantia.ods.sap_customers\`"

# Verify CallRail data freshness
bq query --use_legacy_sql=false \
  "SELECT MAX(start_time) FROM \`acuantia.callrail_api.calls\`"

# Verify Dialpad data freshness
bq query --use_legacy_sql=false \
  "SELECT MAX(start_time) FROM \`acuantia.dialpad_api.calls\`"

Common Acuantia-Specific Mistakes

Mistake 1: Using single-argument ref() for ODS tables

-- WRONG: Creates ods_dev_dev with --schema-suffix dev
FROM ${ref("sap_customers")}

-- CORRECT: Two-argument ref() for ODS
FROM ${ref("ods", "sap_customers")}

Mistake 2: Skipping intermediate view layer for Looker tables

# WRONG: Table directly in output folder without intermediate view
definitions/output/looker/looker_customer_metrics.sqlx
config { type: "table", schema: "looker_prod" }
SELECT customer_id, SUM(order_total) FROM ...

# CORRECT: Two-layer pattern
definitions/intermediate/looker/vw_looker_customer_metrics.sqlx
config { type: "view", schema: "dataform" }
SELECT customer_id, SUM(order_total) FROM ...

definitions/output/looker/looker_customer_metrics.sqlx
config { type: "table", schema: "looker_prod" }
SELECT * FROM ${ref("vw_looker_customer_metrics")}

Mistake 3: Missing looker_ prefix or wrong view prefix

# WRONG - Missing prefix
definitions/output/looker/customer_metrics.sqlx

# WRONG - View without vw_ prefix
definitions/intermediate/looker/looker_customer_metrics.sqlx

# CORRECT - Proper naming
definitions/intermediate/looker/vw_looker_customer_metrics.sqlx
definitions/output/looker/looker_customer_metrics.sqlx

Mistake 4: Using "view" type in output/looker folder

-- WRONG: View in output folder (should be table)
-- definitions/output/looker/looker_sales.sqlx
config {
  type: "view",  //  Wrong - must be "table"
  schema: "looker_prod"
}

-- CORRECT: Table in output folder
-- definitions/output/looker/looker_sales.sqlx
config {
  type: "table",  //  Correct
  schema: "looker_prod"
}

Mistake 5: Using wrong schema for Looker tables or including suffix

-- WRONG: Wrong schema name
config {
  type: "table",
  schema: "reporting"  // Not Looker-specific
}

-- WRONG: Including suffix in schema name
config {
  type: "table",
  schema: "looker_prod"  // Will create looker_prod_prod 
}

-- CORRECT: Base schema name
config {
  type: "table",
  schema: "looker"  // Suffix appended automatically 
}

Mistake 6: Using suffixed schema names for reports datasets

-- WRONG: Including suffix in schema name
config {
  type: "table",
  schema: "reports_prod"  // Will create reports_prod_prod 
}

-- WRONG: Using dev suffix
config {
  type: "table",
  schema: "reports_dev"  // Will create reports_dev_dev 
}

-- CORRECT: Base schema name
config {
  type: "table",
  schema: "reports"  // Suffix appended automatically 
}

Mistake 7: Hardcoding acuantia.ods in queries

-- WRONG: Hardcoded path
FROM `acuantia.ods.sap_customers`

-- CORRECT: Use two-argument ref()
FROM ${ref("ods", "sap_customers")}

Red Flags - Acuantia-Specific

If you're thinking any of these thoughts, STOP:

  • "I'll use single-argument ref() for ODS tables (it's simpler)"
  • "I don't need an intermediate view, I'll put the logic directly in the output table"
  • "I'll create a view in definitions/output/looker/ instead of a table"
  • "I don't need the looker_ prefix for this Looker table"
  • "I don't need the vw_ prefix for this intermediate view"
  • "I'll use schema: 'looker_prod' instead of schema: 'looker'"
  • "I'll use schema: 'reports_prod' instead of schema: 'reports'"
  • "The schema suffix is just for dev, I should hardcode _prod for production"
  • "I'll add name: to the config even though it matches the filename"
  • "I'll skip coordinating with the looker/ project team"
  • "I don't need to check CallRail/Dialpad data freshness"

All of these mean: You're about to break Acuantia conventions. Follow the patterns above.

Quick Reference

Pattern Acuantia Convention
ODS tables Two-argument ref(): ${ref("ods", "table_name")}
Looker intermediate views definitions/intermediate/looker/vw_looker_*.sqlx (type: "view")
Looker output tables definitions/output/looker/looker_*.sqlx (type: "table", sources from vw_looker_*)
Looker schema Intermediate: schema: "dataform", Output: schema: "looker" (NOT "looker_prod")
Reports tables definitions/output/reports/*.sqlx (use schema: "reports" NOT "reports_prod")
Reports collation Use COLLATE 'und:ci' for string columns (case-insensitive for Google Sheets)
Schema suffixes NEVER hardcode _prod or _dev - use --schema-suffix flag instead
Config name Omit name: if it matches filename - only specify when different
CallRail data acuantia.callrail_api.*
Dialpad data acuantia.dialpad_api.*
HubSpot data acuantia.hubspot.*
Magento data acuantia.magento_rotoplas_me_22_prod.*
Dev testing --schema-suffix dev (see dataform-engineering-fundamentals)
Looker metadata Run node scripts/updateLookerDescriptions.js

Summary

This skill adds only Acuantia-specific patterns. For all generic Dataform practices:

  • TDD workflow
  • Safety practices
  • ${ref()} enforcement (general cases)
  • Documentation standards
  • Architecture patterns
  • Troubleshooting

→ See dataform-engineering-fundamentals skill.

The patterns in this skill (ODS two-arg ref, looker_ prefix, Acuantia datasets, cross-project coordination) are required additions to the generic foundation, not replacements.