Initial commit

2025-11-29 18:30:23 +08:00
commit d765cdd7eb
13 changed files with 1286 additions and 0 deletions
--- a/commands/clean-database.md
+++ b/commands/clean-database.md
@@ -0,0 +1,35 @@
+---
+name: clean-database
+description: Normalize, deduplicate, and validate enriched datasets to maintain accuracy and compliance.
+usage: /data-enrichment:clean-database --input enriched.csv --rules rules.yaml
+---
+
+# Clean Database Command
+
+## Purpose
+Run data quality workflows (formatting, deduplication, validation, suppression) before syncing enriched records into downstream systems.
+
+## Syntax
+```bash
+/data-enrichment:clean-database \
+  --input enriched.csv \
+  --rules rules.yaml \
+  --output clean.csv \
+  --gdpr true
+```
+
+### Parameters
+- `--input`: Source CSV/JSON/Parquet file.
+- `--rules`: YAML/JSON config defining normalization rules, required fields, dedupe logic.
+- `--output`: File path or system destination (Salesforce, HubSpot, Snowflake).
+- `--gdpr`: Apply regional compliance filters (default true).
+- `--suppress-list`: Path to opt-out or customer suppression list.
+- `--format`: Output format (csv, json, parquet, api-sync).
+
+## Features
+- Email/phone format correction, country normalization, timezone calculation.
+- Deduping via fuzzy matching and configurable keys.
+- Confidence scoring and rejection report for records failing validation.
+- Audit log of transformations for compliance.
+
+---