Initial commit

2025-11-29 18:48:10 +08:00
commit b317d592e2
8 changed files with 837 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,16 @@
+{
+  "name": "dataform-toolkit",
+  "description": "Comprehensive toolkit for BigQuery Dataform development with engineering best practices. Enforces TDD workflow, proper ref() usage, comprehensive documentation, and safe development patterns. Includes quick-access commands for common ETL workflows.",
+  "version": "1.0.0",
+  "author": {
+    "name": "Ivan Histand",
+    "email": "ihistand@rotoplas.com",
+    "url": "https://github.com/ihistand"
+  },
+  "skills": [
+    "./skills"
+  ],
+  "commands": [
+    "./commands"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# dataform-toolkit
+
+Comprehensive toolkit for BigQuery Dataform development with engineering best practices. Enforces TDD workflow, proper ref() usage, comprehensive documentation, and safe development patterns. Includes quick-access commands for common ETL workflows.
--- a/commands/dataform-deploy.md
+++ b/commands/dataform-deploy.md
@@ -0,0 +1,21 @@
+---
+description: Deploy tested Dataform table to production
+---
+
+You are deploying a Dataform table to production using best practices from the dataform-engineering-fundamentals skill.
+
+**Workflow:**
+
+1. Invoke the dataform-engineering-fundamentals skill
+2. Ask the user which table they want to deploy
+3. **Pre-deployment verification:**
+   - Confirm the table has been tested in dev environment
+   - Verify all tests are passing
+   - Check that documentation (columns: {}) is complete
+4. **Deployment:**
+   - Run `dataform run --dry-run --actions <table_name>` (production dry-run)
+   - If successful, run `dataform run --actions <table_name>` (production execution)
+   - Verify deployment with validation queries
+5. Report deployment results
+
+**Critical**: Never deploy without dev testing first. Wrong results delivered quickly are worse than correct results delivered with a small delay.
--- a/commands/dataform-etl.md
+++ b/commands/dataform-etl.md
@@ -0,0 +1,24 @@
+---
+description: Launch ETL agent for BigQuery Dataform development
+---
+
+You are launching the ETL Dataform engineer agent to handle data transformation pipeline work.
+
+**Purpose**: The ETL agent specializes in BigQuery Dataform projects, SQLX files, data quality, and pipeline development. Use this for:
+- Complex Dataform transformations
+- Pipeline troubleshooting
+- Data quality implementations
+- ELT workflow coordination
+
+**Task**: Use the Task tool with `subagent_type="etl"` to launch the ETL agent. Pass the user's request as the prompt, including:
+- What they need to accomplish
+- Any relevant context about tables, datasets, or business logic
+- Whether this is new development, modification, or troubleshooting
+
+The ETL agent has access to the dataform-engineering-fundamentals skill and will follow best practices for BigQuery Dataform development.
+
+**Example**:
+```
+User asks: "Help me create a customer metrics table"
+You launch: Task tool with subagent_type="etl" and prompt="Create a customer metrics table in Dataform following TDD workflow. Ask user about required metrics and data sources."
+```
--- a/commands/dataform-new-table.md
+++ b/commands/dataform-new-table.md
@@ -0,0 +1,33 @@
+---
+description: Create new Dataform table using TDD workflow
+---
+
+You are creating a new Dataform table following the Test-Driven Development (TDD) workflow from the dataform-engineering-fundamentals skill.
+
+**Workflow:**
+
+1. Invoke the dataform-engineering-fundamentals skill
+2. Ask the user about the table requirements:
+   - Table name and purpose
+   - Expected columns and their descriptions
+   - Data sources (for creating source declarations if needed)
+   - Business logic and transformations
+3. **RED Phase - Write tests first:**
+   - Create assertion file in `definitions/assertions/`
+   - Write data quality tests (duplicates, nulls, invalid values)
+   - Run tests - they should FAIL (table doesn't exist yet)
+4. **GREEN Phase - Write minimal implementation:**
+   - Create source declarations if needed
+   - Create table SQLX file with:
+     - Proper config block with type, schema
+     - Complete columns: {} documentation
+     - SQL transformation
+   - Run table creation: `dataform run --schema-suffix dev --actions <table_name>`
+   - Run tests - they should PASS
+5. **REFACTOR Phase - Improve while keeping tests passing:**
+   - Optimize query performance if needed
+   - Add partitioning/clustering if appropriate
+   - Improve documentation clarity
+6. Report completion with file locations and next steps
+
+**Critical**: Always write tests FIRST, then implementation. Tests-after means you're checking what it does, not defining what it should do.
--- a/commands/dataform-test.md
+++ b/commands/dataform-test.md
@@ -0,0 +1,18 @@
+---
+description: Test Dataform table in dev environment with safety checks
+---
+
+You are testing a Dataform table using best practices from the dataform-engineering-fundamentals skill.
+
+**Workflow:**
+
+1. Invoke the dataform-engineering-fundamentals skill
+2. Ask the user which table they want to test
+3. Follow the safety checklist:
+   - Run `dataform compile` to check syntax
+   - Run `dataform run --schema-suffix dev --dry-run --actions <table_name>` to validate SQL
+   - Run `dataform run --schema-suffix dev --actions <table_name>` to execute in dev
+   - Run basic validation queries to verify results
+4. Report results and any issues found
+
+**Critical**: Always use `--schema-suffix dev` for testing. Never test directly in production.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,61 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:ihistand/claude-plugins:dataform-toolkit",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "e6a636048c3bff11a032604ed8e612d6079ef75e",
+    "treeHash": "4e905a8c3fcb1cb4ea7483eacc456319fedc9a71f085b8911c35ce88c517ac48",
+    "generatedAt": "2025-11-28T10:17:39.224711Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "dataform-toolkit",
+    "description": "Comprehensive toolkit for BigQuery Dataform development with engineering best practices. Enforces TDD workflow, proper ref() usage, comprehensive documentation, and safe development patterns. Includes quick-access commands for common ETL workflows.",
+    "version": "1.0.0"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "7f16ecada38db3846022fa1c6b3fdd21735f2636a1ab283b0f47b9962503f3db"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "c06b33b8c1ae866c964c1ab0bf108ba3ba595d481b820602eefac4dd1e0bcf1c"
+      },
+      {
+        "path": "commands/dataform-test.md",
+        "sha256": "3d63082fec4d207dd69025cfc255eaca7dc7914e7f292bdd7d567930f9cfac17"
+      },
+      {
+        "path": "commands/dataform-new-table.md",
+        "sha256": "44b5b4f4357e91d8f49b9e370df6658f13d53388f11ddb03a5158ceb2ca96eb9"
+      },
+      {
+        "path": "commands/dataform-etl.md",
+        "sha256": "d3bc64ebbff09f2f489da91806dfc0b8f892f472f93766bb1ea3b401a99df04a"
+      },
+      {
+        "path": "commands/dataform-deploy.md",
+        "sha256": "6496fa0a56fffaa0e9f0669efa1261d2d6117a85a281be522ec3599779609a25"
+      },
+      {
+        "path": "skills/dataform-engineering-fundamentals.md",
+        "sha256": "5d0e9bf09c2eed1c7ad7fd443ec04372670c07bbf4e6187fdc24d03fcdb35eb8"
+      }
+    ],
+    "dirSha256": "4e905a8c3fcb1cb4ea7483eacc456319fedc9a71f085b8911c35ce88c517ac48"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/dataform-engineering-fundamentals.md
+++ b/skills/dataform-engineering-fundamentals.md
@@ -0,0 +1,661 @@
+---
+name: dataform-engineering-fundamentals
+description: Use when developing BigQuery Dataform transformations, SQLX files, source declarations, or troubleshooting pipelines - enforces TDD workflow (tests first), ALWAYS use ${ref()} never hardcoded table paths, comprehensive columns:{} documentation, safety practices (--schema-suffix dev, --dry-run), proper ref() syntax, .sqlx for new declarations, no schema config in operations/tests, and architecture patterns that prevent technical debt under time pressure
+---
+
+# Dataform Engineering Fundamentals
+
+## Overview
+
+**Core principle**: Safety practices and proper architecture are NEVER optional in Dataform development, regardless of time pressure or business urgency.
+
+**REQUIRED FOUNDATION:** This skill builds upon superpowers:test-driven-development. All TDD principles from that skill apply to Dataform development. This skill adapts TDD specifically for BigQuery Dataform SQLX files.
+
+**Official Documentation:** For Dataform syntax, configuration options, and API reference, see https://cloud.google.com/dataform/docs
+
+**Best Practices Guide:** For repository structure, naming conventions, and managing large workflows, see https://cloud.google.com/dataform/docs/best-practices-repositories
+
+Time pressure does not justify skipping safety checks or creating technical debt. The time "saved" by shortcuts gets multiplied into hours of debugging, broken dependencies, and production issues.
+
+## When to Use
+
+Use this skill for ANY Dataform work:
+- Creating new SQLX transformations
+- Modifying existing tables
+- Adding data sources
+- Troubleshooting pipeline failures
+- "Quick" reports or ad-hoc analysis
+
+**Especially** use when:
+- Under time pressure or deadlines
+- Stakeholders are waiting
+- Working late at night (exhausted)
+- Tempted to "just make it work"
+
+**Related Skills**:
+- **Before designing new features**: Use superpowers:brainstorming to refine requirements into clear designs before writing any code
+- **When troubleshooting failures**: Use superpowers:systematic-debugging for structured problem-solving
+- **When debugging complex issues**: Use superpowers:root-cause-tracing to trace errors back to their source
+- **When writing documentation, commit messages, or any prose**: Use elements-of-style:writing-clearly-and-concisely to apply Strunk's timeless writing rules for clarity and conciseness
+
+## Non-Negotiable Safety Practices
+
+These are ALWAYS required. No exceptions for deadlines, urgency, or "simple" tasks:
+
+### 1. Always Use `--schema-suffix dev` for Testing
+
+```bash
+# WRONG: Testing in production
+dataform run --actions my_table
+
+# CORRECT: Test in dev first
+dataform run --schema-suffix dev --actions my_table
+```
+
+**Why**: Writes to `schema_dev.my_table` instead of `schema_prod.my_table` (or adds `_dev` suffix based on your configuration). Allows safe testing without impacting production data or dashboards.
+
+### 2. Always Use `--dry-run` Before Execution
+
+```bash
+# Check compilation
+dataform compile
+
+# Validate SQL without executing
+dataform run --schema-suffix dev --dry-run --actions my_table
+
+# Only then execute
+dataform run --schema-suffix dev --actions my_table
+```
+
+**Why**: Catches SQL errors, missing dependencies, and cost estimation before using BigQuery slots.
+
+### 3. Source Declarations Before ref()
+
+**WRONG**: Using tables without source declarations
+```sql
+-- This will break dependency tracking
+FROM `project_id.external_schema.table_name`
+```
+
+**CORRECT**: Create source declaration first
+```sql
+-- definitions/sources/external_system/table_name.sqlx
+config {
+  type: "declaration",
+  database: "project_id",
+  schema: "external_schema",
+  name: "table_name"
+}
+
+-- Then reference it
+FROM ${ref("table_name")}
+```
+
+### 4. ALWAYS Use ${ref()} - NEVER Hardcoded Table Paths
+
+**WRONG**: Hardcoded table paths
+```sql
+-- NEVER do this
+FROM `project.external_schema.table_name`
+FROM `project.reporting_schema.customer_metrics`
+SELECT * FROM project.source_schema.customers
+```
+
+**CORRECT**: Always use ${ref()}
+```sql
+-- Create source declaration first, then reference
+FROM ${ref("table_name")}
+FROM ${ref("customer_metrics")}
+SELECT * FROM ${ref("customers")}
+```
+
+**Why**:
+- Dataform tracks dependencies automatically with ref()
+- Hardcoded paths break dependency graphs
+- ref() enables --schema-suffix to work correctly
+- Refactoring is easier when references are managed
+
+**Exception**: None. There is NO valid reason to use hardcoded table paths in SQLX files.
+
+### 5. Proper ref() Syntax
+
+**WRONG**: Including schema in ref() unnecessarily
+```sql
+FROM ${ref("external_schema", "sales_order")}
+```
+
+**CORRECT**: Use single argument when source declared
+```sql
+FROM ${ref("sales_order")}
+```
+
+**When to use two-argument ref()**:
+- Source declarations that haven't been imported yet
+- Special schema architectures where schema suffix behavior needs explicit control
+- Cross-database references in multi-project setups
+
+**Why**:
+- Single-argument ref() works for most tables
+- Dataform resolves the full path from source declarations
+- Two-argument form is only needed for special cases
+
+### 6. Basic Validation Queries
+
+Always verify your output:
+```bash
+# Check row counts
+bq query --use_legacy_sql=false \
+  "SELECT COUNT(*) FROM \`project.schema_dev.my_table\`"
+
+# Check for nulls in critical fields
+bq query --use_legacy_sql=false \
+  "SELECT COUNT(*) FROM \`project.schema_dev.my_table\`
+   WHERE key_field IS NULL"
+```
+
+**Why**: Catches silent failures (empty tables, null values, bad joins) immediately.
+
+## Architecture Patterns (Not Optional)
+
+Even for "quick" work, follow these patterns:
+
+**Reference:** For detailed guidance on repository structure, naming conventions, and managing large workflows, see https://cloud.google.com/dataform/docs/best-practices-repositories
+
+### Layered Structure
+
+```
+definitions/
+  sources/          # External data declarations
+  intermediate/     # Transformations and business logic
+  output/           # Final tables for consumption
+    reports/        # Reporting tables
+    marts/          # Data marts for specific use cases
+```
+
+**Don't**: Create monolithic queries directly in output layer
+
+**Do**: Break into intermediate steps for reusability and testing
+
+### Incremental vs Full Refresh
+
+```sql
+config {
+  type: "incremental",
+  uniqueKey: "order_id",
+  bigquery: {
+    partitionBy: "DATE(order_date)",
+    clusterBy: ["customer_id", "product_id"]
+  }
+}
+```
+
+**When to use incremental**: Tables that grow daily (events, transactions, logs)
+
+**When to use full refresh**: Small dimension tables, aggregations with lookback windows
+
+### Dataform Assertions
+
+```sql
+config {
+  type: "table",
+  assertions: {
+    uniqueKey: ["call_id"],
+    nonNull: ["customer_phone_number", "start_time"],
+    rowConditions: ["duration >= 0"]
+  }
+}
+```
+
+**Why**: Catches data quality issues automatically during pipeline runs.
+
+### Source Declarations: Prefer .sqlx Files
+
+**STRONGLY PREFER**: .sqlx files for ALL new declarations
+```sql
+-- definitions/sources/external_system/table_name.sqlx
+config {
+  type: "declaration",
+  database: "project_id",
+  schema: "external_schema",
+  name: "table_name",
+  columns: {
+    id: "Unique identifier for records",
+    // ... more columns
+  }
+}
+```
+
+**ACCEPTABLE (legacy only)**: .js files for existing declarations
+```javascript
+// definitions/sources/legacy_declarations.js (existing file)
+declare({
+  database: "project_id",
+  schema: "source_schema",
+  name: "customers"
+});
+```
+
+**Rule**: ALL NEW source declarations MUST be .sqlx files. Existing .js declarations can remain but should be migrated to .sqlx when modifying them.
+
+**Why**: .sqlx files support column documentation, are more maintainable, and integrate better with Dataform's dependency tracking.
+
+### Schema Configuration Rules
+
+**Operations**: Files in `definitions/operations/` should NOT include `schema:` config
+```sql
+-- CORRECT
+config {
+  type: "operations",
+  tags: ["daily"]
+}
+
+-- WRONG
+config {
+  type: "operations",
+  schema: "dataform",  // DON'T specify schema
+  tags: ["daily"]
+}
+```
+
+**Tests/Assertions**: Files in `definitions/test/` should NOT include `schema:` config
+```sql
+-- CORRECT
+config {
+  type: "assertion",
+  description: "Check for duplicates"
+}
+
+-- WRONG
+config {
+  type: "assertion",
+  schema: "dataform_assertions",  // DON'T specify schema
+  description: "Check for duplicates"
+}
+```
+
+**Why**: Operations live in the default `dataform` schema and assertions live in `dataform_assertions` schema (configured in `workflow_settings.yaml`). Specifying schema explicitly can cause conflicts.
+
+## Documentation Standards (Non-Negotiable)
+
+All tables with `type: "table"` MUST include comprehensive `columns: {}` documentation in the config block.
+
+**Writing Clear Documentation**: When writing column descriptions, commit messages, or any prose that humans will read, use elements-of-style:writing-clearly-and-concisely to ensure clarity and conciseness.
+
+### columns: {} Requirement
+
+**WRONG**: Table without column documentation
+```sql
+config {
+  type: "table",
+  schema: "reporting"
+}
+
+SELECT customer_id, total_revenue FROM ${ref("orders")}
+```
+
+**CORRECT**: Complete column documentation
+```sql
+config {
+  type: "table",
+  schema: "reporting",
+  columns: {
+    customer_id: "Unique customer identifier from source system",
+    total_revenue: "Sum of all order amounts in USD, excluding refunds"
+  }
+}
+
+SELECT customer_id, total_revenue FROM ${ref("orders")}
+```
+
+### Where to Get Column Descriptions
+
+Column descriptions should be derived from:
+
+1. **Source Declarations**: Copy descriptions from upstream source tables
+2. **Third-party Documentation**: Use official API documentation for external systems (CRM, ERP, analytics platforms)
+3. **Business Logic**: Document calculated fields, transformations, and business rules
+4. **BI Tool Requirements**: Include context that dashboard builders and analysts need
+5. **Dataform Documentation**: Reference https://cloud.google.com/dataform/docs for Dataform-specific configuration and built-in functions
+
+**Example with ERP source documentation**:
+```sql
+config {
+  type: "table",
+  schema: "reporting",
+  columns: {
+    customer_id: "Unique customer identifier from ERP system",
+    customer_name: "Customer legal business name",
+    account_group: "Customer classification code for account management",
+    credit_limit: "Maximum allowed credit in USD"
+  }
+}
+```
+
+### Source Declarations Should Include columns: {}
+
+When applicable, source declarations should also document columns:
+
+```sql
+-- definitions/sources/external_api/events.sqlx
+config {
+  type: "declaration",
+  database: "project_id",
+  schema: "external_api",
+  name: "events",
+  description: "Event records from external API with enriched data",
+  columns: {
+    event_id: "Unique event identifier from API",
+    user_id: "User identifier who triggered the event",
+    event_type: "Type of event (click, view, purchase, etc.)",
+    timestamp: "UTC timestamp when event occurred",
+    properties: "JSON object containing event-specific properties"
+  }
+}
+```
+
+**Why document sources**: Downstream tables inherit and extend these descriptions, creating documentation consistency across the pipeline.
+
+## Test-Driven Development (TDD) Workflow
+
+**REQUIRED BACKGROUND:** You MUST understand and follow superpowers:test-driven-development
+
+**BEFORE TDD:** When creating NEW features with unclear requirements, use superpowers:brainstorming FIRST to refine rough ideas into clear designs. Only start TDD once you have a clear understanding of what needs to be built.
+
+When creating NEW features or tables in Dataform, apply the TDD cycle:
+
+1. **RED**: Write tests first, watch them fail
+2. **GREEN**: Write minimal code to make tests pass
+3. **REFACTOR**: Clean up while keeping tests passing
+
+The superpowers:test-driven-development skill provides the foundational TDD principles. This section adapts those principles specifically for Dataform tables and SQLX files.
+
+### TDD for Dataform Tables
+
+**WRONG: Implementation-first approach**
+```
+1. Write SQLX transformation
+2. Test manually with bq query
+3. "It works, ship it"
+```
+
+**CORRECT: Test-first approach**
+```
+1. Write data quality assertions first
+2. Write unit tests for business logic
+3. Run tests - they should FAIL (table doesn't exist yet)
+4. Write SQLX transformation
+5. Run tests - they should PASS
+6. Refactor transformation if needed
+```
+
+### Example TDD Workflow
+
+**Step 1: Write assertions first** (definitions/assertions/assert_customer_metrics.sqlx)
+```sql
+config {
+  type: "assertion",
+  description: "Customer metrics must have valid data"
+}
+
+-- This WILL fail initially (table doesn't exist)
+SELECT 'Duplicate customer_id' AS test
+FROM ${ref("customer_metrics")}
+GROUP BY customer_id
+HAVING COUNT(*) > 1
+
+UNION ALL
+
+SELECT 'Negative lifetime value' AS test
+FROM ${ref("customer_metrics")}
+WHERE lifetime_value < 0
+```
+
+**Step 2: Run tests - watch them fail**
+```bash
+dataform run --schema-suffix dev --run-tests --actions assert_customer_metrics
+# ERROR: Table customer_metrics does not exist ✓ EXPECTED
+```
+
+**Step 3: Write minimal implementation** (definitions/output/reports/customer_metrics.sqlx)
+```sql
+config {
+  type: "table",
+  schema: "reporting",
+  columns: {
+    customer_id: "Unique customer identifier",
+    lifetime_value: "Total revenue from customer in USD"
+  }
+}
+
+SELECT
+  customer_id,
+  SUM(order_total) AS lifetime_value
+FROM ${ref("orders")}
+GROUP BY customer_id
+```
+
+**Step 4: Run tests - watch them pass**
+```bash
+dataform run --schema-suffix dev --actions customer_metrics
+dataform run --schema-suffix dev --run-tests --actions assert_customer_metrics
+# No rows returned ✓ TESTS PASS
+```
+
+### Why TDD Matters in Dataform
+
+- **Catches bugs before production**: Tests fail when logic is wrong
+- **Documents expected behavior**: Tests show what the table should do
+- **Prevents regressions**: Future changes won't break existing logic
+- **Faster debugging**: Test failures pinpoint exact issues
+- **Confidence in refactoring**: Change code safely with test coverage
+
+### TDD Red Flags
+
+If you're thinking:
+- "I'll write tests after the implementation" → **NO, write tests FIRST**
+- "Tests are overkill for this simple table" → **NO, simple tables break too**
+- "I'll test manually with bq query" → **NO, manual tests aren't repeatable**
+- "Tests after achieve the same result" → **NO, tests-first catches design flaws**
+
+**All of these mean**: You're skipping TDD. Write tests first, then implementation.
+
+**See also**: The superpowers:test-driven-development skill contains additional TDD rationalizations and red flags that apply universally to all code, including Dataform SQLX files.
+
+## Quick Reference
+
+| Task | Command | Notes |
+|------|---------|-------|
+| Compile only | `dataform compile` | Check syntax, no BigQuery execution |
+| Dry run | `dataform run --schema-suffix dev --dry-run --actions table_name` | Validate SQL, estimate cost |
+| Test in dev | `dataform run --schema-suffix dev --actions table_name` | Safe execution in dev environment |
+| Run with dependencies | `dataform run --schema-suffix dev --include-deps --actions table_name` | Run upstream dependencies first |
+| Run by tag | `dataform run --schema-suffix dev --tags looker` | Run all tables with tag |
+| Production deploy | `dataform run --actions table_name` | Only after dev testing succeeds |
+
+## Common Rationalizations (And Why They're Wrong)
+
+| Excuse | Reality | Fix |
+|--------|---------|-----|
+| "Too urgent to test in dev" | Production failures waste MORE time than dev testing | 3 minutes testing saves 60 minutes debugging |
+| "It's just a quick report" | "Quick" reports become permanent tables | Use proper architecture from start |
+| "Business is waiting" | Broken output wastes stakeholder time | Correct results delivered 10 minutes later > wrong results now |
+| "Hardcoding table path is faster than ${ref()}" | Breaks dependency tracking, creates maintenance nightmare | Create source declaration, use ${ref()} (30 seconds) |
+| "I'll refactor it later" | Technical debt rarely gets fixed | Do it right the first time (saves time overall) |
+| "Correctness over elegance" | Architecture = maintainability, not elegance | Proper structure IS correctness |
+| "I'll add tests after" | After = never | Write tests FIRST (TDD), then implementation |
+| "I'll add documentation after" | After = never | Add columns: {} in config block immediately |
+| "Working late, just need it working" | Exhaustion causes mistakes | Discipline matters MORE when tired |
+| "Column docs are optional for internal tables" | All tables become external eventually | Document everything, always |
+| "Tests after achieve same result" | Tests-after = checking what it does; tests-first = defining what it should do | TDD catches design flaws early |
+
+## Red Flags - STOP Immediately
+
+If you're thinking any of these thoughts, STOP and follow the skill:
+
+- "I'll skip `--schema-suffix dev` this once"
+- "No time for `--dry-run`"
+- "I'll just hardcode the table path instead of using ${ref()}"
+- "I'll use backticks instead of ${ref()} (it's faster)"
+- "I'll just create one file instead of intermediate layers"
+- "Tests are optional for ad-hoc work"
+- "I'll write tests after the implementation"
+- "I'll add column documentation later"
+- "This table doesn't need columns: {} block"
+- "I'll use a .js file for declarations (faster to write)"
+- "I'll add schema: config to this operation/test file"
+- "I'll fix the technical debt later"
+- "This is different because [business reason]"
+
+**All of these mean**: You're about to create problems. Follow the non-negotiable practices.
+
+## Common Mistakes
+
+### Mistake 1: Using tables before declaring sources
+
+```sql
+-- WRONG: Direct table reference
+FROM `project.external_schema.contacts`
+
+-- CORRECT: Declare source first
+FROM ${ref("contacts")}
+```
+
+**Fix**: Create source declaration in `definitions/sources/` before using in queries.
+
+### Mistake 2: Mixing ref() with manual schema qualification
+
+```sql
+-- WRONG: When source exists
+FROM ${ref("dataset_name", "table_name")}
+
+-- CORRECT
+FROM ${ref("table_name")}
+```
+
+**Fix**: Use single-argument `ref()` when source declaration exists. Dataform handles full path resolution.
+
+### Mistake 3: Skipping dev testing under pressure
+
+**Symptom**: "I'll deploy directly to production because it's urgent"
+
+**Fix**: `--schema-suffix dev` takes 30 seconds longer than production deploy. Production failures take hours to fix.
+
+### Mistake 4: Creating monolithic transformations
+
+**Symptom**: 200-line SQLX file with 5 CTEs doing multiple transformations
+
+**Fix**: Break into intermediate tables. Each table should do ONE transformation clearly.
+
+### Mistake 5: Missing columns: {} documentation
+
+**Symptom**: Table config without column descriptions
+
+**Fix**: Add comprehensive `columns: {}` block to EVERY table with `type: "table"`. Get descriptions from source docs, upstream tables, or business logic.
+
+### Mistake 6: Writing implementation before tests
+
+**Symptom**: Creating SQLX file, then adding assertions afterward (or never)
+
+**Fix**: Follow TDD cycle - write assertions first, watch them fail, write implementation, watch tests pass.
+
+### Mistake 7: Using .js files for NEW source declarations
+
+**Symptom**: Creating NEW `definitions/sources/sources.js` files with declare() functions
+
+**Fix**: Create .sqlx files in `definitions/sources/[system]/[table].sqlx` with proper config blocks and column documentation. Existing .js files can remain until they need modification.
+
+### Mistake 8: Hardcoded table paths instead of ${ref()}
+
+**Symptom**: Using backtick-quoted table paths in queries
+```sql
+FROM `project.external_api.events`
+SELECT * FROM project.source_schema.customers
+```
+
+**Fix**: ALWAYS use ${ref()} after creating source declarations
+```sql
+FROM ${ref("events")}
+SELECT * FROM ${ref("customers")}
+```
+
+**Why critical**: Hardcoded paths break dependency tracking, prevent --schema-suffix from working, and make refactoring impossible.
+
+### Mistake 9: Adding schema: config to operations or tests
+
+**Symptom**: Operations or test files with explicit schema configuration
+```sql
+config {
+  type: "operations",
+  schema: "dataform",  // Wrong!
+}
+```
+
+**Fix**: Remove schema: config - operations and tests use default schemas from workflow_settings.yaml
+
+## Time Pressure Protocol
+
+When under extreme time pressure (board meeting in 2 hours, production down, stakeholder waiting):
+
+1. ✅ **Still use dev testing** - 3 minutes saves 60 minutes debugging
+2. ✅ **Still use --dry-run** - Catches errors before wasting BigQuery slots
+3. ✅ **Still create source declarations** - Broken dependencies waste MORE time
+4. ✅ **Still add columns: {} documentation** - Takes 2 minutes, saves hours explaining to Looker users
+5. ✅ **Still write tests first (TDD)** - 5 minutes writing assertions prevents production bugs
+6. ✅ **Still do basic validation** - Wrong results are worse than delayed results
+7. ⚠️ **Can skip**: Extensive documentation files, peer review, performance optimization
+8. ⚠️ **Must document**: Tag as "technical_debt", create TODO with follow-up tasks
+
+**The bottom line**: Safety practices save time. Skipping them wastes time. Even under pressure.
+
+## Troubleshooting Dataform Errors
+
+**RECOMMENDED APPROACH:** When encountering ANY bug, test failure, or unexpected behavior, use superpowers:systematic-debugging before attempting fixes. For errors deep in execution or cascading failures, use superpowers:root-cause-tracing to identify the original trigger.
+
+**Official Reference:** For Dataform-specific errors, configuration issues, or syntax questions, consult https://cloud.google.com/dataform/docs
+
+### "Table not found" errors
+
+**Quick fixes:**
+1. Check source declaration exists in `definitions/sources/`
+2. Verify ref() syntax (single argument if source exists)
+3. Check schema/database match in source config
+4. Run `dataform compile` to see resolved SQL
+
+**If issue persists:** Use superpowers:systematic-debugging for structured root cause investigation.
+
+### Dependency cycle errors
+
+**Quick fixes:**
+1. Use `${ref("table_name")}` not direct table references
+2. Check for circular dependencies (A → B → A)
+3. Review dependency graph in Dataform UI
+
+**If issue persists:** Use superpowers:root-cause-tracing to trace the dependency chain back to the source of the cycle.
+
+### Timeout errors
+
+**Quick fixes:**
+1. Add partitioning/clustering to config
+2. Use incremental updates instead of full refresh
+3. Break large transformations into smaller intermediate tables
+
+**If issue persists:** Use superpowers:systematic-debugging to investigate query performance systematically.
+
+## Real-World Impact
+
+**Scenario**: "Quick" report created without source declarations, skipping dev testing.
+
+**Cost**:
+- 10 minutes saved initially
+- 2 hours debugging "table not found" errors in production
+- 3 stakeholder escalations
+- 1 broken morning dashboard
+- Net loss: 110 minutes
+
+**With proper practices**:
+- 13 minutes total (3 extra for dev testing)
+- Zero production issues
+- Zero escalations
+- Net gain: 97 minutes
+
+**Takeaway**: Discipline is faster than shortcuts.