20 KiB
Acuantia Dataform Engineering
REQUIRED PREREQUISITE
YOU MUST USE dataform-engineering-fundamentals SKILL FIRST.
This skill is a thin extension layer that adds Acuantia-specific patterns on top of the generic dataform-engineering-fundamentals skill.
Before using this skill:
- Read and follow
dataform-engineering-fundamentalscompletely - Apply ALL generic Dataform practices from that skill
- Then apply the Acuantia-specific patterns below
This skill does NOT repeat generic practices. If you're looking for:
- TDD workflow → See
dataform-engineering-fundamentals - Safety practices (--schema-suffix dev, --dry-run) → See
dataform-engineering-fundamentals - ${ref()} enforcement → See
dataform-engineering-fundamentals - Documentation standards → See
dataform-engineering-fundamentals - Architecture patterns → See
dataform-engineering-fundamentals
This skill ONLY adds: Acuantia-specific conventions that differ from or extend generic patterns.
When to Use
Use this skill when working on:
acuantia-gcp-dataformproject- Tables that integrate with Acuantia's Looker instance
- Transformations using Acuantia's ODS (Operational Data Store) architecture
- Pipelines coordinating with
callrail_data_exportordialpad_data_integrationprojects
Acuantia-Specific Patterns
1. ODS Architecture and Two-Argument ref()
Acuantia uses a special ODS (Operational Data Store) architecture that requires two-argument ref() syntax.
ODS Architecture:
acuantia.ods- Source of truth (master operational data)acuantia.ods_dev- Development/staging datasetacuantia.ods_prod- Production staging dataset
CRITICAL: Use two-argument ref() for ODS tables to avoid suffix duplication:
-- CORRECT: Two-argument ref() for ODS
FROM ${ref("ods", "sap_customers")}
FROM ${ref("ods", "magento_orders")}
-- WRONG: Single-argument causes ods_dev_dev with --schema-suffix dev
FROM ${ref("sap_customers")} -- Creates ods_dev_dev ❌
Why: The ODS schema name itself gets the suffix applied. Two-argument ref() prevents ods_dev_dev when using --schema-suffix dev.
All other tables: Use single-argument ref() as per dataform-engineering-fundamentals.
2. Looker Table Naming Convention and Layering Architecture
CRITICAL WORKFLOW: All Looker tables follow a mandatory two-layer architecture:
definitions/intermediate/looker/- Views with business logic (prefix:vw_looker*)definitions/output/looker/- Tables that materialize the views (prefix:looker*)
REQUIRED PATTERN:
- Create intermediate view in
definitions/intermediate/looker/withvw_looker*prefix - Create output table in
definitions/output/looker/withlooker*prefix (matching view name withoutvw_prefix) - Output table MUST be type "table" (never "view") and source from the intermediate view
File naming convention:
# CORRECT - Two-layer pattern
definitions/intermediate/looker/vw_looker_customer_metrics.sqlx (view with logic)
definitions/output/looker/looker_customer_metrics.sqlx (table from view)
definitions/intermediate/looker/vw_looker_sales_summary.sqlx (view with logic)
definitions/output/looker/looker_sales_summary.sqlx (table from view)
# WRONG - Table directly in output without intermediate view
definitions/output/looker/looker_customer_metrics.sqlx ❌
definitions/output/looker/customer_metrics.sqlx ❌ (also missing prefix)
# WRONG - Using "view" type in output folder
definitions/output/looker/looker_sales_summary.sqlx ❌ (if type: "view")
Implementation pattern:
-- definitions/intermediate/looker/vw_looker_customer_metrics.sqlx
config {
type: "view",
schema: "dataform", // Intermediate views live in dataform schema
description: "Customer metrics calculation logic",
columns: {
customer_id: "Unique customer identifier",
lifetime_value: "Total customer lifetime value",
order_count: "Total number of orders"
}
}
SELECT
customer_id,
SUM(order_total) as lifetime_value,
COUNT(DISTINCT order_id) as order_count
FROM ${ref("ods", "orders")}
GROUP BY customer_id
-- definitions/output/looker/looker_customer_metrics.sqlx
config {
type: "table", // MUST be "table", never "view"
schema: "looker", // Base name - suffix appended automatically
tags: ["looker", "daily"],
columns: {
customer_id: "Unique customer identifier",
lifetime_value: "Total customer lifetime value",
order_count: "Total number of orders"
}
}
SELECT * FROM ${ref("vw_looker_customer_metrics")}
Why this layering pattern:
- Separates business logic (intermediate views) from materialization (output tables)
- Intermediate views can be modified and tested without recreating Looker tables
- Multiple output tables can reference the same intermediate logic
- Clear naming convention makes the view-to-table relationship obvious
- Makes Looker-specific tables immediately identifiable
- Prevents naming conflicts with other intermediate tables
- Aligns with Looker project conventions in
looker/directory
Schema configuration requirements:
- Intermediate views:
schema: "dataform"(base name, suffix is appended) - Output tables:
schema: "looker"(base name, NOT"looker_prod"- suffix is appended automatically)- With
--schema-suffix dev→looker_dev - With
--schema-suffix prod→looker_prod
- With
Config name convention:
- DO NOT include
name:in config if it matches the filename - Dataform automatically uses the filename (without .sqlx) as the table/view name
- Only specify
name:when it must differ from the filename (e.g., for deduplication or dependency reasons)
-- WRONG: Redundant name specification
-- File: looker_customer_metrics.sqlx
config {
type: "table",
name: "looker_customer_metrics", // ❌ Unnecessary - matches filename
schema: "looker"
}
-- CORRECT: Omit name when it matches filename
-- File: looker_customer_metrics.sqlx
config {
type: "table", // ✅ Name defaults to filename
schema: "looker"
}
-- CORRECT: Specify name only when different from filename
-- File: looker_customer_metrics_v2.sqlx
config {
type: "table",
name: "looker_customer_metrics", // ✅ Necessary - differs from filename
schema: "looker"
}
3. Acuantia Dataset Conventions
Primary Datasets:
acuantia.ods- Master operational data store (source of truth)acuantia.ods_dev/acuantia.ods_prod- ODS staging datasetsacuantia.looker_prod- Production Looker tablesacuantia.looker_dev- Development Looker tablesacuantia.reports_prod- Production reports and views used outside Looker (typically linked to Google Sheets)acuantia.reports_dev- Development reports and views used outside Lookeracuantia.dataform- Operations and temp tablesacuantia.callrail_api- CallRail raw dataacuantia.dialpad_api- Dialpad raw dataacuantia.hubspot- HubSpot data (via Fivetran)acuantia.magento_rotoplas_me_22_prod- Magento/Adobe Commerce data (via Fivetran)
Note on reports datasets: When creating tables/views in reports_* datasets, use case-insensitive collation for string columns to ensure compatibility with Google Sheets connections. Apply collation at table/column creation time using COLLATE 'und:ci'.
Schema suffix behavior:
# With --schema-suffix dev
looker_prod → looker_dev
reports_prod → reports_dev
ods → ods (no suffix, use two-arg ref)
dataform → dataform_dev
4. Looker Integration Context
Tables in definitions/output/looker/ feed Acuantia's Looker instance at https://looker.acuantia.com.
Optimization requirements:
- Add partitioning/clustering for query performance (Looker users run ad-hoc queries)
- Use descriptive column names (Looker dimension names derive from these)
- Include comprehensive column descriptions (synced to Looker metadata via scripts)
- Consider common Looker user query patterns (filters, aggregations)
Looker-specific config pattern:
config {
type: "table",
schema: "looker", // Base name - DO NOT use "looker_prod"
tags: ["looker", "daily"],
bigquery: {
partitionBy: "DATE(order_date)",
clusterBy: ["customer_id", "region"]
},
columns: {
customer_id: "Unique customer identifier from SAP (KUNNR field)",
order_date: "Date when order was placed",
region: "Geographic region for reporting (matches Looker region dimension)"
}
}
Metadata sync: Use node scripts/updateLookerDescriptions.js in acuantia-gcp-dataform to sync column descriptions to Looker views.
5. Reports Dataset Integration
Tables in definitions/output/reports/ are designed for use outside the Looker subsystem, typically connected to Google Sheets for ad-hoc reporting and data sharing.
Use cases:
- Google Sheets data connections for business users
- Ad-hoc reports shared with stakeholders
- Export tables for external systems
- Simplified views for non-technical users
CRITICAL: Reports dataset schema configuration:
NEVER use schema: "reports_prod" or schema: "reports_dev" in SQLX files.
The --schema-suffix flag automatically appends _prod or _dev to the schema name. Using schema: "reports_prod" would incorrectly create reports_prod_prod.
CORRECT pattern:
config {
type: "table", // or "view" depending on use case
schema: "reports", // Will resolve to reports_dev or reports_prod
tags: ["reports", "google_sheets"],
description: "Customer summary for Google Sheets integration",
columns: {
customer_name: {
description: "Customer full name",
bigqueryPolicyTags: ["COLLATE 'und:ci'"] // Case-insensitive for Google Sheets
},
email: {
description: "Customer email address",
bigqueryPolicyTags: ["COLLATE 'und:ci'"]
}
}
}
Schema resolution:
schema: "reports"+--schema-suffix dev→acuantia.reports_devschema: "reports"+--schema-suffix prod→acuantia.reports_prod
Case-insensitive collation requirement:
When creating tables in reports_* datasets, apply case-insensitive collation to string columns:
CREATE TABLE `acuantia.reports_prod.customer_export` (
customer_name STRING COLLATE 'und:ci',
email STRING COLLATE 'und:ci',
order_count INT64
);
Why case-insensitivity matters:
- Google Sheets connections often require case-insensitive string matching
- Prevents duplicate entries from case variations (e.g., "John" vs "JOHN")
- Simplifies filtering and lookups for business users
IMPORTANT: Always use base schema name without suffix:
- Use
schema: "reports"in SQLX files (NOT"reports_dev"or"reports_prod") - The
--schema-suffixflag handles the environment:--schema-suffix dev→reports_dev--schema-suffix prod→reports_prod
6. Source System Integration
Acuantia integrates data from multiple source systems. Use specific terminology when documenting columns:
SAP ERP:
columns: {
customer_id: "SAP Customer Number (KUNNR) - unique identifier in SAP ERP",
customer_name: "Customer name (NAME1 field) - legal business name",
account_group: "Customer Account Group (KTOKD) - classification code"
}
Dialpad API (from dialpad_data_integration project):
-- Source declaration
-- definitions/sources/dialpad/calls.sqlx
config {
type: "declaration",
database: "acuantia",
schema: "dialpad_api",
name: "calls",
description: "Dialpad call records with transcripts and sentiment analysis",
columns: {
call_id: "Unique call identifier from Dialpad API",
transcript: "Full call transcript from Dialpad AI",
sentiment: "Overall call sentiment: positive/negative/neutral/mixed"
}
}
CallRail API (from callrail_data_export project):
-- Source declaration
-- definitions/sources/callrail/calls.sqlx
config {
type: "declaration",
database: "acuantia",
schema: "callrail_api",
name: "calls",
columns: {
call_id: "Unique CallRail call identifier",
tracking_phone_number: "CallRail tracking number that received the call",
attribution: "Nested attribution data (source, medium, campaign)"
}
}
HubSpot CRM (via Fivetran):
config {
type: "declaration",
database: "acuantia",
schema: "hubspot",
name: "contact"
}
Magento/Adobe Commerce (via Fivetran):
config {
type: "declaration",
database: "acuantia",
schema: "magento_rotoplas_me_22_prod",
name: "sales_order"
}
7. Cross-Project Coordination
Acuantia's data platform spans multiple projects that work together:
callrail_data_export/ → acuantia.callrail_api.*
dialpad_data_integration/ → acuantia.dialpad_api.*
acuantia-gcp-dataform/ → Transform and model
looker/ → Visualize and report
When modifying schemas:
-
Source changes (callrail_data_export or dialpad_data_integration):
- Update Python schema definitions
- Test with small data exports
- Deploy to production
-
Dataform updates (acuantia-gcp-dataform):
- Update source declarations in
definitions/sources/ - Modify transformations if needed
- Update
definitions/output/looker/tables - Test with
--schema-suffix dev
- Update source declarations in
-
Looker updates (looker project):
- Update view definitions
- Add new dimensions/measures
- Test in development environment
Schema change protocol: Always coordinate changes across all three layers (raw → transformed → visualization).
8. Business Context
Acuantia serves four main product verticals:
- Septic: Septic tank systems
- General: General purpose containers
- Industrial: Industrial containers and equipment
- Chemical: Chemical storage containers
Key business entities:
- TankHolding: Key business vertical with specialized recovery operations
- Customer Journey: Multi-touch attribution across CallRail, HubSpot, and Magento
- Voice of Customer (VoC): Dialpad call transcripts analyzed for sentiment and topics
When creating tables, consider how they support these business verticals and use cases.
Validation Queries (Acuantia-Specific)
# Check Looker dev tables
bq query --use_legacy_sql=false \
"SELECT COUNT(*) FROM \`acuantia.looker_dev.looker_customer_metrics\`"
# Check reports dev tables
bq query --use_legacy_sql=false \
"SELECT COUNT(*) FROM \`acuantia.reports_dev.customer_export\`"
# Check ODS tables
bq query --use_legacy_sql=false \
"SELECT COUNT(*) FROM \`acuantia.ods.sap_customers\`"
# Verify CallRail data freshness
bq query --use_legacy_sql=false \
"SELECT MAX(start_time) FROM \`acuantia.callrail_api.calls\`"
# Verify Dialpad data freshness
bq query --use_legacy_sql=false \
"SELECT MAX(start_time) FROM \`acuantia.dialpad_api.calls\`"
Common Acuantia-Specific Mistakes
Mistake 1: Using single-argument ref() for ODS tables
-- WRONG: Creates ods_dev_dev with --schema-suffix dev
FROM ${ref("sap_customers")}
-- CORRECT: Two-argument ref() for ODS
FROM ${ref("ods", "sap_customers")}
Mistake 2: Skipping intermediate view layer for Looker tables
# WRONG: Table directly in output folder without intermediate view
definitions/output/looker/looker_customer_metrics.sqlx
config { type: "table", schema: "looker_prod" }
SELECT customer_id, SUM(order_total) FROM ...
# CORRECT: Two-layer pattern
definitions/intermediate/looker/vw_looker_customer_metrics.sqlx
config { type: "view", schema: "dataform" }
SELECT customer_id, SUM(order_total) FROM ...
definitions/output/looker/looker_customer_metrics.sqlx
config { type: "table", schema: "looker_prod" }
SELECT * FROM ${ref("vw_looker_customer_metrics")}
Mistake 3: Missing looker_ prefix or wrong view prefix
# WRONG - Missing prefix
definitions/output/looker/customer_metrics.sqlx
# WRONG - View without vw_ prefix
definitions/intermediate/looker/looker_customer_metrics.sqlx
# CORRECT - Proper naming
definitions/intermediate/looker/vw_looker_customer_metrics.sqlx
definitions/output/looker/looker_customer_metrics.sqlx
Mistake 4: Using "view" type in output/looker folder
-- WRONG: View in output folder (should be table)
-- definitions/output/looker/looker_sales.sqlx
config {
type: "view", // ❌ Wrong - must be "table"
schema: "looker_prod"
}
-- CORRECT: Table in output folder
-- definitions/output/looker/looker_sales.sqlx
config {
type: "table", // ✅ Correct
schema: "looker_prod"
}
Mistake 5: Using wrong schema for Looker tables or including suffix
-- WRONG: Wrong schema name
config {
type: "table",
schema: "reporting" // Not Looker-specific
}
-- WRONG: Including suffix in schema name
config {
type: "table",
schema: "looker_prod" // Will create looker_prod_prod ❌
}
-- CORRECT: Base schema name
config {
type: "table",
schema: "looker" // Suffix appended automatically ✅
}
Mistake 6: Using suffixed schema names for reports datasets
-- WRONG: Including suffix in schema name
config {
type: "table",
schema: "reports_prod" // Will create reports_prod_prod ❌
}
-- WRONG: Using dev suffix
config {
type: "table",
schema: "reports_dev" // Will create reports_dev_dev ❌
}
-- CORRECT: Base schema name
config {
type: "table",
schema: "reports" // Suffix appended automatically ✅
}
Mistake 7: Hardcoding acuantia.ods in queries
-- WRONG: Hardcoded path
FROM `acuantia.ods.sap_customers`
-- CORRECT: Use two-argument ref()
FROM ${ref("ods", "sap_customers")}
Red Flags - Acuantia-Specific
If you're thinking any of these thoughts, STOP:
- "I'll use single-argument ref() for ODS tables (it's simpler)"
- "I don't need an intermediate view, I'll put the logic directly in the output table"
- "I'll create a view in definitions/output/looker/ instead of a table"
- "I don't need the looker_ prefix for this Looker table"
- "I don't need the vw_ prefix for this intermediate view"
- "I'll use
schema: 'looker_prod'instead ofschema: 'looker'" - "I'll use
schema: 'reports_prod'instead ofschema: 'reports'" - "The schema suffix is just for dev, I should hardcode _prod for production"
- "I'll add
name:to the config even though it matches the filename" - "I'll skip coordinating with the looker/ project team"
- "I don't need to check CallRail/Dialpad data freshness"
All of these mean: You're about to break Acuantia conventions. Follow the patterns above.
Quick Reference
| Pattern | Acuantia Convention |
|---|---|
| ODS tables | Two-argument ref(): ${ref("ods", "table_name")} |
| Looker intermediate views | definitions/intermediate/looker/vw_looker_*.sqlx (type: "view") |
| Looker output tables | definitions/output/looker/looker_*.sqlx (type: "table", sources from vw_looker_*) |
| Looker schema | Intermediate: schema: "dataform", Output: schema: "looker" (NOT "looker_prod") |
| Reports tables | definitions/output/reports/*.sqlx (use schema: "reports" NOT "reports_prod") |
| Reports collation | Use COLLATE 'und:ci' for string columns (case-insensitive for Google Sheets) |
| Schema suffixes | NEVER hardcode _prod or _dev - use --schema-suffix flag instead |
| Config name | Omit name: if it matches filename - only specify when different |
| CallRail data | acuantia.callrail_api.* |
| Dialpad data | acuantia.dialpad_api.* |
| HubSpot data | acuantia.hubspot.* |
| Magento data | acuantia.magento_rotoplas_me_22_prod.* |
| Dev testing | --schema-suffix dev (see dataform-engineering-fundamentals) |
| Looker metadata | Run node scripts/updateLookerDescriptions.js |
Summary
This skill adds only Acuantia-specific patterns. For all generic Dataform practices:
- TDD workflow
- Safety practices
- ${ref()} enforcement (general cases)
- Documentation standards
- Architecture patterns
- Troubleshooting
→ See dataform-engineering-fundamentals skill.
The patterns in this skill (ODS two-arg ref, looker_ prefix, Acuantia datasets, cross-project coordination) are required additions to the generic foundation, not replacements.