Initial commit
This commit is contained in:
160
commands/ingest-add-klaviyo.md
Normal file
160
commands/ingest-add-klaviyo.md
Normal file
@@ -0,0 +1,160 @@
|
||||
---
|
||||
name: ingest-add-klaviyo
|
||||
description: Generate complete Klaviyo ingestion workflow with all data sources using exact templates
|
||||
---
|
||||
|
||||
# Add Klaviyo Ingestion
|
||||
|
||||
## ⚠️ CRITICAL: This command generates ALL files at ONCE using exact templates
|
||||
|
||||
I'll create a complete Klaviyo ingestion setup based on proven templates from `docs/sources/klaviyo.md`.
|
||||
|
||||
---
|
||||
|
||||
## What I'll Generate
|
||||
|
||||
### MANDATORY: All files created in ONE response
|
||||
|
||||
I will generate ALL of the following files in a SINGLE response using multiple Write tool calls:
|
||||
|
||||
### Workflow Files
|
||||
1. **`ingestion/klaviyo_ingest_inc.dig`** - Incremental ingestion workflow
|
||||
2. **`ingestion/klaviyo_ingest_hist.dig`** - Historical backfill workflow
|
||||
|
||||
### Configuration Files (in `ingestion/config/`)
|
||||
3. **`klaviyo_datasources.yml`** - Datasource definitions for all objects
|
||||
4. **`klaviyo_profiles_load.yml`** - Profiles configuration
|
||||
5. **`klaviyo_events_load.yml`** - Events configuration
|
||||
6. **`klaviyo_campaigns_load.yml`** - Campaigns configuration
|
||||
7. **`klaviyo_lists_load.yml`** - Lists configuration
|
||||
8. **`klaviyo_email_templates_load.yml`** - Email templates configuration
|
||||
9. **`klaviyo_metrics_load.yml`** - Metrics configuration
|
||||
|
||||
### Credentials Template
|
||||
10. Updated `credentials_ingestion.json` with Klaviyo credentials section
|
||||
|
||||
**Total: 10 files created in ONE response**
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Please provide the following information:
|
||||
|
||||
### Required
|
||||
1. **Klaviyo API Key**: Your Klaviyo private API key (will be stored as secret)
|
||||
2. **TD Authentication ID**: Treasure Data authentication ID for Klaviyo connector (e.g., `klaviyo_auth_default`)
|
||||
3. **Default Start Date**: Initial historical load start date
|
||||
- Format: `YYYY-MM-DDTHH:MM:SS.000000`
|
||||
- Example: `2023-09-01T00:00:00.000000`
|
||||
|
||||
### Optional
|
||||
4. **Target Database**: Default is `client_src` (leave blank to use default)
|
||||
|
||||
---
|
||||
|
||||
## Process I'll Follow
|
||||
|
||||
### Step 1: Read Klaviyo Documentation (MANDATORY)
|
||||
I will READ these documentation files BEFORE generating ANY code:
|
||||
- `docs/sources/klaviyo.md` - Klaviyo exact templates
|
||||
- `docs/patterns/workflow-patterns.md` - Workflow patterns
|
||||
- `docs/patterns/logging-patterns.md` - Logging templates
|
||||
- `docs/patterns/timestamp-formats.md` - Klaviyo timestamp format (`.000000`)
|
||||
- `docs/patterns/incremental-patterns.md` - Dual field names for campaigns
|
||||
|
||||
### Step 2: Generate ALL 10 Files in ONE Response
|
||||
Using multiple Write tool calls in a SINGLE message:
|
||||
- Write workflow files (2 files)
|
||||
- Write datasource config (1 file)
|
||||
- Write load configs (6 files)
|
||||
- Write credentials template update (1 file)
|
||||
|
||||
### Step 3: Copy Exact Templates
|
||||
Templates will be copied character-for-character from documentation:
|
||||
- Klaviyo-specific timestamp format: `.000000` (6 decimals, NO Z)
|
||||
- Dual incremental fields for campaigns: `updated_at` in table, `updated` in API
|
||||
- Events with NO incremental field parameter
|
||||
- Exact SQL logging blocks
|
||||
- Exact error handling blocks
|
||||
|
||||
### Step 4: Verify Quality Gates
|
||||
Before delivering, I will verify:
|
||||
✅ All 10 files created
|
||||
✅ Klaviyo timestamp format: `.000000` (6 decimals, NO Z)
|
||||
✅ Campaigns dual field names correct
|
||||
✅ Events config has NO incremental_field parameter
|
||||
✅ All logging blocks present (start, success, error)
|
||||
✅ All error handling blocks present
|
||||
✅ Parallel processing with limit: 3
|
||||
✅ COALESCE fallback to historical table
|
||||
|
||||
---
|
||||
|
||||
## Klaviyo-Specific Configuration
|
||||
|
||||
### Objects Included
|
||||
1. **Profiles**: Customer profiles (incremental: `updated`)
|
||||
2. **Events**: Customer events (NO incremental field)
|
||||
3. **Campaigns**: Email campaigns (incremental: `updated_at` in table, `updated` in API)
|
||||
4. **Lists**: Email lists (incremental: `updated`)
|
||||
5. **Email Templates**: Campaign templates (incremental: `updated`)
|
||||
6. **Metrics**: Event metrics (incremental: `updated`)
|
||||
|
||||
### Key Features
|
||||
- **Dual incremental fields**: Campaigns use different field names in table vs API
|
||||
- **Events handling**: No incremental parameter in config
|
||||
- **Timestamp format**: `.000000` (6 decimals, NO Z suffix)
|
||||
- **Parallel processing**: Limit of 3 for API rate limits
|
||||
- **Fallback logic**: COALESCE from incremental → historical → default
|
||||
|
||||
---
|
||||
|
||||
## After Generation
|
||||
|
||||
### 1. Upload Credentials
|
||||
```bash
|
||||
# Navigate to your ingestion directory
|
||||
cd ingestion/
|
||||
td wf secrets --project ingestion --set @credentials_ingestion.json
|
||||
```
|
||||
|
||||
### 2. Test Syntax
|
||||
```bash
|
||||
td wf check klaviyo_ingest_inc.dig
|
||||
td wf check klaviyo_ingest_hist.dig
|
||||
```
|
||||
|
||||
### 3. Run Historical Backfill (First Time)
|
||||
```bash
|
||||
td wf run klaviyo_ingest_hist.dig
|
||||
```
|
||||
|
||||
### 4. Run Incremental (Ongoing)
|
||||
```bash
|
||||
td wf run klaviyo_ingest_inc.dig
|
||||
```
|
||||
|
||||
### 5. Monitor Ingestion
|
||||
```sql
|
||||
SELECT * FROM client_src.ingestion_log
|
||||
WHERE source_name LIKE 'klaviyo%'
|
||||
ORDER BY time DESC
|
||||
LIMIT 20
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Production-Ready Guarantee
|
||||
|
||||
All generated code will:
|
||||
- ✅ Follow exact Klaviyo templates from `docs/sources/klaviyo.md`
|
||||
- ✅ Use correct timestamp format (`.000000`)
|
||||
- ✅ Handle dual incremental fields correctly
|
||||
- ✅ Include all 6 data objects
|
||||
- ✅ Include comprehensive logging and error handling
|
||||
- ✅ Work the first time without modifications
|
||||
|
||||
---
|
||||
|
||||
**Ready to proceed? Provide the required information (API key, TD auth ID, start date) and I'll generate all 10 files in ONE response using exact templates from documentation.**
|
||||
186
commands/ingest-add-object.md
Normal file
186
commands/ingest-add-object.md
Normal file
@@ -0,0 +1,186 @@
|
||||
---
|
||||
name: ingest-add-object
|
||||
description: Add a new object/table to an existing data source workflow
|
||||
---
|
||||
|
||||
# Add Object to Existing Source
|
||||
|
||||
## ⚠️ CRITICAL: This command generates ALL required files at ONCE
|
||||
|
||||
I'll help you add a new object/table to an existing ingestion source following exact templates from documentation.
|
||||
|
||||
---
|
||||
|
||||
## Required Information
|
||||
|
||||
Please provide:
|
||||
|
||||
### 1. Source Information
|
||||
- **Existing Source Name**: Which source? (e.g., `klaviyo`, `shopify_v2`, `salesforce`)
|
||||
- **New Object Name**: What object are you adding? (e.g., `orders`, `products`, `contacts`)
|
||||
|
||||
### 2. Object Details
|
||||
- **Table Name**: Desired table name in Treasure Data (e.g., `shopify_orders`)
|
||||
- **Incremental Field**: Field indicating record updates (e.g., `updated_at`, `modified_date`)
|
||||
- **Default Start Date**: Initial load start date (format: `2023-09-01T00:00:00.000000`)
|
||||
|
||||
### 3. Ingestion Mode
|
||||
- **Mode**: Which workflow?
|
||||
- `incremental` - Add to incremental workflow
|
||||
- `historical` - Add to historical workflow
|
||||
- `both` - Add to both workflows
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### MANDATORY: All files created/updated in ONE response
|
||||
|
||||
I will generate/update ALL of the following in a SINGLE response:
|
||||
|
||||
#### Files to Create
|
||||
1. **`ingestion/config/{source}_{object}_load.yml`** - New load configuration
|
||||
|
||||
#### Files to Update
|
||||
2. **`ingestion/config/{source}_datasources.yml`** - Add object to datasource list
|
||||
3. **`ingestion/{source}_ingest_inc.dig`** - Updated workflow (if incremental mode)
|
||||
4. **`ingestion/{source}_ingest_hist.dig`** - Updated workflow (if historical mode)
|
||||
|
||||
**Total: 1 new file + 2-3 updated files in ONE response**
|
||||
|
||||
---
|
||||
|
||||
## Process I'll Follow
|
||||
|
||||
### Step 1: Read Source Documentation (MANDATORY)
|
||||
I will READ the source-specific documentation BEFORE making ANY changes:
|
||||
- `docs/sources/{source}.md` - Source-specific exact templates
|
||||
- `docs/patterns/timestamp-formats.md` - Correct timestamp format
|
||||
- `docs/patterns/incremental-patterns.md` - Incremental field handling
|
||||
|
||||
### Step 2: Read Existing Files
|
||||
I will read the existing workflow and datasource config to understand current structure
|
||||
|
||||
### Step 3: Generate/Update ALL Files in ONE Response
|
||||
Using multiple Write/Edit tool calls in a SINGLE message:
|
||||
- Write new load config
|
||||
- Edit datasource config to add new object
|
||||
- Edit workflow(s) to include new object processing
|
||||
|
||||
### Step 4: Copy Exact Templates
|
||||
I will use exact templates for the new object:
|
||||
- Match existing object patterns exactly
|
||||
- Use correct timestamp format for the source
|
||||
- Use correct incremental field names
|
||||
- Include all logging blocks
|
||||
- Include all error handling
|
||||
|
||||
### Step 5: Verify Quality Gates
|
||||
Before delivering, I will verify:
|
||||
✅ New load config matches template for source
|
||||
✅ Datasource config updated correctly
|
||||
✅ Workflow(s) updated with proper structure
|
||||
✅ Timestamp format correct for source
|
||||
✅ Incremental field handling correct
|
||||
✅ All logging blocks present
|
||||
✅ All error handling blocks present
|
||||
|
||||
---
|
||||
|
||||
## Source-Specific Considerations
|
||||
|
||||
### Google BigQuery
|
||||
- Use `inc_field` (NOT `incremental_field`)
|
||||
- Use SQL Server timestamp format
|
||||
- Add to appropriate datasource list (BigQuery or inc)
|
||||
|
||||
### Klaviyo
|
||||
- Use `.000000` timestamp format (6 decimals, NO Z)
|
||||
- Check if dual field names needed (like campaigns)
|
||||
- Add to `inc_datasources` or `hist_datasources` list
|
||||
|
||||
### OneTrust
|
||||
- Use `.000Z` timestamp format (3 decimals, WITH Z)
|
||||
- Consider monthly batch processing for historical
|
||||
- Add to appropriate datasource list
|
||||
|
||||
### Shopify v2
|
||||
- Use ISO 8601 timestamp format
|
||||
- Historical uses `created_at`, incremental uses `updated_at`
|
||||
- Add to appropriate datasource list
|
||||
|
||||
---
|
||||
|
||||
## Example Output
|
||||
|
||||
For adding `orders` object to `shopify_v2`:
|
||||
|
||||
### Files Created/Updated:
|
||||
1. ✅ Created: `ingestion/config/shopify_v2_orders_load.yml`
|
||||
2. ✅ Updated: `ingestion/config/shopify_v2_datasources.yml` (added orders to inc_datasources)
|
||||
3. ✅ Updated: `ingestion/shopify_v2_ingest_inc.dig` (workflow already handles new datasource)
|
||||
|
||||
### Verification Complete:
|
||||
✅ Load config uses ISO 8601 timestamp format
|
||||
✅ Incremental field set to `updated_at`
|
||||
✅ Datasource config updated with orders entry
|
||||
✅ Workflow will automatically process new object
|
||||
✅ All logging blocks present
|
||||
✅ Error handling present
|
||||
|
||||
---
|
||||
|
||||
## After Generation
|
||||
|
||||
### 1. Upload Credentials (if new credentials needed)
|
||||
```bash
|
||||
cd ingestion
|
||||
td wf secrets --project ingestion --set @credentials_ingestion.json
|
||||
```
|
||||
|
||||
### 2. Test Syntax
|
||||
```bash
|
||||
td wf check {source}_ingest_inc.dig
|
||||
# or
|
||||
td wf check {source}_ingest_hist.dig
|
||||
```
|
||||
|
||||
### 3. Run Workflow to Ingest New Object
|
||||
```bash
|
||||
td wf run {source}_ingest_inc.dig
|
||||
# or
|
||||
td wf run {source}_ingest_hist.dig
|
||||
```
|
||||
|
||||
### 4. Monitor Ingestion
|
||||
```sql
|
||||
SELECT * FROM client_src.ingestion_log
|
||||
WHERE source_name = '{source}'
|
||||
AND table_name = '{source}_{object}'
|
||||
ORDER BY time DESC
|
||||
LIMIT 10
|
||||
```
|
||||
|
||||
### 5. Verify Data
|
||||
```sql
|
||||
SELECT COUNT(*) as row_count,
|
||||
MIN(time) as first_record,
|
||||
MAX(time) as last_record
|
||||
FROM client_src.{source}_{object}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Production-Ready Guarantee
|
||||
|
||||
All generated/updated code will:
|
||||
- ✅ Match existing patterns exactly
|
||||
- ✅ Use correct timestamp format for source
|
||||
- ✅ Include all required logging
|
||||
- ✅ Include all error handling
|
||||
- ✅ Work seamlessly with existing workflow
|
||||
- ✅ Be production-ready immediately
|
||||
|
||||
---
|
||||
|
||||
**Ready to proceed? Provide the required information (source name, object name, table name, incremental field, start date, mode) and I'll generate/update all required files in ONE response using exact templates from documentation.**
|
||||
289
commands/ingest-new.md
Normal file
289
commands/ingest-new.md
Normal file
@@ -0,0 +1,289 @@
|
||||
---
|
||||
name: ingest-new
|
||||
description: Create a complete ingestion workflow for a new data source
|
||||
---
|
||||
|
||||
# STOP - READ THIS FIRST
|
||||
|
||||
You are about to create a CDP ingestion workflow. You MUST collect configuration parameters interactively using the `AskUserQuestion` tool.
|
||||
|
||||
DO NOT ask all questions at once. DO NOT use markdown lists. DO NOT explain what you're going to do.
|
||||
|
||||
EXECUTE the AskUserQuestion tool calls below IN ORDER.
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION SEQUENCE - FOLLOW EXACTLY
|
||||
|
||||
### ACTION 1: Ask Data Source Question
|
||||
|
||||
USE the AskUserQuestion tool RIGHT NOW to ask question 1.
|
||||
|
||||
DO NOT PROCEED until you execute this tool call:
|
||||
|
||||
```
|
||||
AskUserQuestion with:
|
||||
- Question: "What data source are you ingesting from?"
|
||||
- Header: "Data Source"
|
||||
- Options:
|
||||
* Klaviyo (API-based connector)
|
||||
* Shopify (E-commerce platform)
|
||||
* Salesforce (CRM system)
|
||||
* Custom API (REST-based)
|
||||
```
|
||||
|
||||
STOP. EXECUTE THIS TOOL NOW. DO NOT READ FURTHER UNTIL COMPLETE.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 2: Ask Ingestion Mode Question
|
||||
|
||||
**CHECKPOINT**: Did you get an answer to question 1? If NO, go back to ACTION 1.
|
||||
|
||||
NOW ask question 2 using AskUserQuestion tool:
|
||||
|
||||
```
|
||||
AskUserQuestion with:
|
||||
- Question: "What ingestion mode do you need?"
|
||||
- Header: "Mode"
|
||||
- Options:
|
||||
* Both (historical + incremental) - Recommended for complete setup
|
||||
* Incremental only - Ongoing sync only
|
||||
* Historical only - One-time backfill
|
||||
```
|
||||
|
||||
STOP. EXECUTE THIS TOOL NOW. DO NOT READ FURTHER UNTIL COMPLETE.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 3: Ask Tables/Objects
|
||||
|
||||
**CHECKPOINT**: Did you get an answer to question 2? If NO, go back to ACTION 2.
|
||||
|
||||
This is a free-text question. Tell the user:
|
||||
|
||||
"Please provide the table or object names to ingest (comma-separated)."
|
||||
|
||||
Example: `orders, customers, products`
|
||||
|
||||
WAIT for user response. DO NOT PROCEED.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 4: Ask Incremental Field (CONDITIONAL)
|
||||
|
||||
**CHECKPOINT**:
|
||||
- Did user select "Incremental only" OR "Both" in question 2?
|
||||
- YES → Ask this question
|
||||
- NO → Skip to ACTION 6
|
||||
|
||||
NOW ask question 4 using AskUserQuestion tool:
|
||||
|
||||
```
|
||||
AskUserQuestion with:
|
||||
- Question: "What field tracks record updates?"
|
||||
- Header: "Incremental Field"
|
||||
- Options:
|
||||
* updated_at (Timestamp field)
|
||||
* modified_date (Date field)
|
||||
* last_modified_time (Datetime field)
|
||||
```
|
||||
|
||||
STOP. EXECUTE THIS TOOL NOW. DO NOT READ FURTHER UNTIL COMPLETE.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 5: Ask Start Date (CONDITIONAL)
|
||||
|
||||
**CHECKPOINT**:
|
||||
- Did user select "Incremental only" OR "Both" in question 2?
|
||||
- YES → Ask this question
|
||||
- NO → Skip to ACTION 6
|
||||
|
||||
This is a free-text question. Tell the user:
|
||||
|
||||
"What is the initial load start date?"
|
||||
|
||||
Format: `YYYY-MM-DDTHH:mm:ss.000000`
|
||||
Example: `2024-01-01T00:00:00.000000`
|
||||
|
||||
WAIT for user response. DO NOT PROCEED.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 6: Ask Target Database
|
||||
|
||||
**CHECKPOINT**: Did you complete all previous questions? If NO, go back.
|
||||
|
||||
NOW ask question 6 using AskUserQuestion tool:
|
||||
|
||||
```
|
||||
AskUserQuestion with:
|
||||
- Question: "Which target database should data be loaded into?"
|
||||
- Header: "Target DB"
|
||||
- Options:
|
||||
* mck_src (Standard client database)
|
||||
* Custom database (Specify custom name)
|
||||
```
|
||||
|
||||
STOP. EXECUTE THIS TOOL NOW. DO NOT READ FURTHER UNTIL COMPLETE.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 7: Ask Authentication
|
||||
|
||||
**CHECKPOINT**: Did you get an answer to question 6? If NO, go back to ACTION 6.
|
||||
|
||||
NOW ask question 7 using AskUserQuestion tool:
|
||||
|
||||
```
|
||||
AskUserQuestion with:
|
||||
- Question: "What type of authentication is required?"
|
||||
- Header: "Auth Type"
|
||||
- Options:
|
||||
* API Key (Single key authentication)
|
||||
* OAuth Token (Token-based auth)
|
||||
* Service Account JSON (Google/Cloud auth)
|
||||
* Username & Password (Basic auth)
|
||||
```
|
||||
|
||||
STOP. EXECUTE THIS TOOL NOW. DO NOT READ FURTHER UNTIL COMPLETE.
|
||||
|
||||
---
|
||||
|
||||
### ACTION 8: Show Configuration Summary
|
||||
|
||||
**CHECKPOINT**: Have you collected ALL required parameters? If NO, go back and complete missing questions.
|
||||
|
||||
NOW display the configuration summary:
|
||||
|
||||
```
|
||||
📋 Configuration Summary:
|
||||
|
||||
Source: {source_name}
|
||||
Connector Type: {connector_type}
|
||||
Ingestion Mode: {mode}
|
||||
Tables/Objects: {objects}
|
||||
Target Database: {target_database}
|
||||
[If applicable] Incremental Field: {field}
|
||||
[If applicable] Start Date: {date}
|
||||
Authentication: {auth_type}
|
||||
```
|
||||
|
||||
ASK: "Does this configuration look correct? Type 'yes' to proceed with generation."
|
||||
|
||||
WAIT for user confirmation. DO NOT PROCEED until user types "yes".
|
||||
|
||||
---
|
||||
|
||||
## WORKFLOW GENERATION (ONLY AFTER USER CONFIRMS)
|
||||
|
||||
**CHECKPOINT**: Did user confirm with "yes"? If NO, STOP and wait.
|
||||
|
||||
### Step 1: Read Documentation Templates
|
||||
|
||||
Read these files in this EXACT order:
|
||||
|
||||
1. `docs/sources/template-new-source.md`
|
||||
2. `docs/patterns/workflow-patterns.md`
|
||||
3. `docs/patterns/logging-patterns.md`
|
||||
4. `docs/patterns/timestamp-formats.md`
|
||||
5. `docs/patterns/incremental-patterns.md`
|
||||
|
||||
Check if source-specific template exists:
|
||||
- `docs/sources/{source-name}.md` (e.g., `docs/sources/klaviyo.md`)
|
||||
|
||||
### Step 2: Generate Files (ALL IN ONE RESPONSE)
|
||||
|
||||
Use multiple Write tool calls in a SINGLE message to create:
|
||||
|
||||
#### For "Incremental only" mode:
|
||||
1. `ingestion/{source}_ingest_inc.dig`
|
||||
2. `ingestion/config/{source}_datasources.yml`
|
||||
3. `ingestion/config/{source}_{object1}_load.yml`
|
||||
4. `ingestion/config/{source}_{object2}_load.yml` (if multiple objects)
|
||||
|
||||
#### For "Historical only" mode:
|
||||
1. `ingestion/{source}_ingest_hist.dig`
|
||||
2. `ingestion/config/{source}_datasources.yml`
|
||||
3. `ingestion/config/{source}_{object}_load.yml`
|
||||
|
||||
#### For "Both" mode:
|
||||
1. `ingestion/{source}_ingest_hist.dig`
|
||||
2. `ingestion/{source}_ingest_inc.dig`
|
||||
3. `ingestion/config/{source}_datasources.yml`
|
||||
4. `ingestion/config/{source}_{object}_load.yml` (per object)
|
||||
|
||||
### Step 3: Template Rules (MANDATORY)
|
||||
|
||||
- Copy templates EXACTLY character-for-character
|
||||
- NO simplification, NO optimization, NO improvements
|
||||
- ONLY replace placeholders: `{source_name}`, `{object_name}`, `{database}`, `{connector_type}`
|
||||
- Keep ALL logging blocks
|
||||
- Keep ALL error handling blocks
|
||||
- Keep ALL timestamp functions
|
||||
|
||||
### Step 4: Quality Verification
|
||||
|
||||
Before showing output to user, verify:
|
||||
- ✅ All template sections present
|
||||
- ✅ All logging blocks included (start, success, error)
|
||||
- ✅ All error handling blocks present
|
||||
- ✅ Timestamp format matches connector type
|
||||
- ✅ Incremental field handling correct
|
||||
- ✅ No deviations from template
|
||||
|
||||
---
|
||||
|
||||
## Post-Generation Instructions
|
||||
|
||||
After successfully creating all files, show the user:
|
||||
|
||||
### Next Steps:
|
||||
|
||||
1. **Upload credentials**:
|
||||
```bash
|
||||
cd ingestion
|
||||
td wf secrets --project ingestion --set @credentials_ingestion.json
|
||||
```
|
||||
|
||||
2. **Test workflow syntax**:
|
||||
```bash
|
||||
td wf check {source}_ingest_inc.dig
|
||||
```
|
||||
|
||||
3. **Deploy to Treasure Data**:
|
||||
```bash
|
||||
td wf push ingestion
|
||||
```
|
||||
|
||||
4. **Run the workflow**:
|
||||
```bash
|
||||
td wf start ingestion {source}_ingest_inc --session now
|
||||
```
|
||||
|
||||
5. **Monitor ingestion log**:
|
||||
```sql
|
||||
SELECT * FROM {target_database}.ingestion_log
|
||||
WHERE source_name = '{source}'
|
||||
ORDER BY time DESC
|
||||
LIMIT 10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ERROR RECOVERY
|
||||
|
||||
IF you did NOT use AskUserQuestion tool for each question:
|
||||
- Print: "❌ ERROR: I failed to follow the interactive collection process."
|
||||
- Print: "🔄 Restarting from ACTION 1..."
|
||||
- GO BACK to ACTION 1 and start over
|
||||
|
||||
IF user says "skip questions" or "just ask all at once":
|
||||
- Print: "❌ Cannot skip interactive collection - this ensures accuracy and prevents errors."
|
||||
- Print: "✅ I'll collect parameters one at a time to ensure we get the configuration right."
|
||||
- PROCEED with ACTION 1
|
||||
|
||||
---
|
||||
|
||||
**NOW BEGIN: Execute ACTION 1 immediately. Use AskUserQuestion tool for the first question.**
|
||||
255
commands/ingest-validate-wf.md
Normal file
255
commands/ingest-validate-wf.md
Normal file
@@ -0,0 +1,255 @@
|
||||
---
|
||||
name: ingest-validate-wf
|
||||
description: Validate Digdag workflow and configuration files against production quality gates
|
||||
---
|
||||
|
||||
# Validate Ingestion Workflow
|
||||
|
||||
## ⚠️ CRITICAL: This validates against strict production quality gates
|
||||
|
||||
I'll validate your ingestion workflow for compliance with production standards and best practices.
|
||||
|
||||
---
|
||||
|
||||
## What I'll Validate
|
||||
|
||||
### Quality Gates (ALL MUST PASS)
|
||||
|
||||
#### 1. Template Compliance
|
||||
- ✅ Code matches documented templates 100%
|
||||
- ✅ No unauthorized deviations from patterns
|
||||
- ✅ All template sections present
|
||||
- ✅ Exact formatting and structure
|
||||
|
||||
#### 2. Logging Requirements
|
||||
- ✅ Start logging before data processing
|
||||
- ✅ Success logging after td_load
|
||||
- ✅ Error logging in `_error` blocks
|
||||
- ✅ Minimum 3 logging blocks per data source
|
||||
- ✅ Correct SQL template usage
|
||||
|
||||
#### 3. Error Handling
|
||||
- ✅ `_error:` blocks present in all workflows
|
||||
- ✅ Error logging with SQL present
|
||||
- ✅ Proper error message capture
|
||||
- ✅ Job ID and URL captured in errors
|
||||
|
||||
#### 4. Timestamp Format
|
||||
- ✅ Correct format for connector type:
|
||||
- Google BigQuery: SQL Server format (`CONVERT(varchar, ..., 121)`)
|
||||
- Klaviyo: `.000000` (6 decimals, NO Z)
|
||||
- OneTrust: `.000Z` (3 decimals, WITH Z)
|
||||
- Shopify v2: ISO 8601
|
||||
- ✅ Matches `docs/patterns/timestamp-formats.md`
|
||||
|
||||
#### 5. Incremental Field Handling
|
||||
- ✅ Correct field names (table vs. API)
|
||||
- ✅ Dual field handling where needed (Klaviyo campaigns)
|
||||
- ✅ Proper COALESCE fallback logic
|
||||
- ✅ Matches `docs/patterns/incremental-patterns.md`
|
||||
|
||||
#### 6. Workflow Structure
|
||||
- ✅ Must match `docs/patterns/workflow-patterns.md`
|
||||
- ✅ Proper timezone declaration (`timezone: UTC`)
|
||||
- ✅ Correct `_export` includes
|
||||
- ✅ Proper task naming conventions
|
||||
- ✅ Correct file organization
|
||||
- ✅ Parallel processing limits appropriate for source
|
||||
|
||||
#### 7. Configuration Files
|
||||
- ✅ YAML syntax validity
|
||||
- ✅ Secret references (`${secret:name}`) used correctly
|
||||
- ✅ No hardcoded credentials
|
||||
- ✅ Required parameters present
|
||||
- ✅ Database references correct
|
||||
- ✅ Mode set appropriately (`append`, `replace`)
|
||||
|
||||
#### 8. File Organization
|
||||
- ✅ `.dig` files in `ingestion/` directory
|
||||
- ✅ YAML configs in `ingestion/config/` subdirectory
|
||||
- ✅ SQL files in `ingestion/sql/` subdirectory
|
||||
- ✅ Proper file naming conventions
|
||||
|
||||
#### 9. Security
|
||||
- ✅ No hardcoded credentials in any file
|
||||
- ✅ Proper `${secret:name}` syntax usage
|
||||
- ✅ `credentials_ingestion.json` NOT in version control
|
||||
- ✅ `.gitignore` includes credentials file
|
||||
|
||||
---
|
||||
|
||||
## Validation Options
|
||||
|
||||
### Option 1: Validate Specific Workflow
|
||||
Provide:
|
||||
- **Workflow file path**: e.g., `ingestion/klaviyo_ingest_inc.dig`
|
||||
- **Related config files**: (or I'll find them automatically)
|
||||
|
||||
I will:
|
||||
1. Read the workflow file
|
||||
2. Find all related config files
|
||||
3. Check against ALL quality gates
|
||||
4. Report detailed findings with line numbers
|
||||
|
||||
### Option 2: Validate Entire Source
|
||||
Provide:
|
||||
- **Source name**: e.g., `klaviyo`, `shopify_v2`, `google_bigquery`
|
||||
|
||||
I will:
|
||||
1. Find all workflows for the source
|
||||
2. Find all config files for the source
|
||||
3. Validate against source-specific documentation
|
||||
4. Check all quality gates
|
||||
5. Report comprehensive findings
|
||||
|
||||
### Option 3: Validate All
|
||||
Say: **"validate all"**
|
||||
|
||||
I will:
|
||||
1. Find all workflows in `ingestion/`
|
||||
2. Find all configs in `ingestion/config/`
|
||||
3. Validate each against its source documentation
|
||||
4. Check all quality gates
|
||||
5. Report full project compliance status
|
||||
|
||||
---
|
||||
|
||||
## Validation Process
|
||||
|
||||
### Step 1: Read Documentation
|
||||
I will read relevant documentation to verify compliance:
|
||||
- Source-specific docs: `docs/sources/{source-name}.md`
|
||||
- Pattern docs: `docs/patterns/*.md`
|
||||
|
||||
### Step 2: Load Files
|
||||
I will read all specified workflow and config files
|
||||
|
||||
### Step 3: Check Quality Gates
|
||||
I will verify each file against ALL quality gates listed above
|
||||
|
||||
### Step 4: Report Findings
|
||||
|
||||
#### Pass Report (if all gates pass)
|
||||
```
|
||||
✅ VALIDATION PASSED
|
||||
|
||||
Workflow: ingestion/{source}_ingest_inc.dig
|
||||
Source: {source}
|
||||
|
||||
Quality Gates: 9/9 PASSED
|
||||
✅ Template Compliance
|
||||
✅ Logging Requirements
|
||||
✅ Error Handling
|
||||
✅ Timestamp Format
|
||||
✅ Incremental Fields
|
||||
✅ Workflow Structure
|
||||
✅ Configuration Files
|
||||
✅ File Organization
|
||||
✅ Security
|
||||
|
||||
No issues found. Workflow is production-ready.
|
||||
```
|
||||
|
||||
#### Fail Report (if any gate fails)
|
||||
```
|
||||
❌ VALIDATION FAILED
|
||||
|
||||
Workflow: ingestion/{source}_ingest_inc.dig
|
||||
Source: {source}
|
||||
|
||||
Quality Gates: 6/9 PASSED
|
||||
|
||||
✅ Template Compliance
|
||||
✅ Logging Requirements
|
||||
❌ Error Handling - FAILED
|
||||
- Missing _error block in main workflow
|
||||
- Error logging SQL not found
|
||||
|
||||
✅ Timestamp Format
|
||||
❌ Incremental Fields - FAILED
|
||||
- Using wrong field name: 'updated_at' should be 'updated' for API
|
||||
- Line 45: incremental_field parameter incorrect
|
||||
|
||||
✅ Workflow Structure
|
||||
✅ Configuration Files
|
||||
✅ File Organization
|
||||
❌ Security - FAILED
|
||||
- Hardcoded API key found in config/klaviyo_profiles_load.yml:12
|
||||
- Should use ${secret:klaviyo_api_key}
|
||||
|
||||
RECOMMENDATIONS:
|
||||
1. Add _error block to main workflow (see docs/patterns/workflow-patterns.md)
|
||||
2. Fix incremental field name (see docs/sources/klaviyo.md)
|
||||
3. Replace hardcoded credential with secret reference
|
||||
|
||||
Re-validate after fixing issues.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues Detected
|
||||
|
||||
### Template Violations
|
||||
- Simplified or "optimized" templates
|
||||
- Removed "redundant" sections
|
||||
- Modified variable names
|
||||
- Changed structure
|
||||
|
||||
### Logging Violations
|
||||
- Missing start/success/error logging
|
||||
- Incorrect SQL template usage
|
||||
- Missing job ID or URL capture
|
||||
|
||||
### Timestamp Format Errors
|
||||
- Wrong decimal count
|
||||
- Missing or incorrect timezone marker
|
||||
- Using default instead of connector-specific format
|
||||
|
||||
### Incremental Field Errors
|
||||
- Using table field name in API parameter
|
||||
- Using API field name in SQL queries
|
||||
- Missing COALESCE fallback
|
||||
|
||||
### Security Issues
|
||||
- Hardcoded credentials
|
||||
- Incorrect secret syntax
|
||||
- Credentials file in version control
|
||||
|
||||
---
|
||||
|
||||
## Next Steps After Validation
|
||||
|
||||
### If Validation Passes
|
||||
✅ Workflow is production-ready
|
||||
- Deploy with confidence
|
||||
- Monitor ingestion_log for ongoing health
|
||||
|
||||
### If Validation Fails
|
||||
❌ Fix reported issues:
|
||||
1. Re-read relevant documentation
|
||||
2. Apply exact templates
|
||||
3. Fix specific line numbers mentioned
|
||||
4. Re-validate until all gates pass
|
||||
|
||||
**DO NOT deploy failing workflows to production**
|
||||
|
||||
---
|
||||
|
||||
## Production Quality Assurance
|
||||
|
||||
This validation ensures:
|
||||
- ✅ Code works the first time
|
||||
- ✅ Consistent patterns across sources
|
||||
- ✅ Complete error handling and logging
|
||||
- ✅ Maintainable and documented code
|
||||
- ✅ No security vulnerabilities
|
||||
- ✅ Compliance with team standards
|
||||
|
||||
---
|
||||
|
||||
**What would you like to validate?**
|
||||
|
||||
Options:
|
||||
1. Validate specific workflow: Provide workflow file path
|
||||
2. Validate entire source: Provide source name
|
||||
3. Validate all: Say "validate all"
|
||||
Reference in New Issue
Block a user