Initial commit

2025-11-29 18:30:46 +08:00
commit ca9abd0543
12 changed files with 419 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,24 @@
+{
+  "name": "growth-experiments",
+  "description": "Experiment backlog, launch, and learning governance across the funnel",
+  "version": "1.0.0",
+  "author": {
+    "name": "GTM Agents",
+    "email": "opensource@intentgpt.ai"
+  },
+  "skills": [
+    "./skills/hypothesis-library/SKILL.md",
+    "./skills/experiment-design-kit/SKILL.md",
+    "./skills/guardrail-scorecard/SKILL.md"
+  ],
+  "agents": [
+    "./agents/experimentation-strategist.md",
+    "./agents/test-engineer.md",
+    "./agents/insight-analyst.md"
+  ],
+  "commands": [
+    "./commands/prioritize-hypotheses.md",
+    "./commands/launch-experiment.md",
+    "./commands/synthesize-learnings.md"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# growth-experiments
+
+Experiment backlog, launch, and learning governance across the funnel
--- a/agents/experimentation-strategist.md
+++ b/agents/experimentation-strategist.md
@@ -0,0 +1,30 @@
+---
+name: experimentation-strategist
+description: Prioritizes hypotheses, capacity, and portfolio governance for growth
+  experiments.
+model: sonnet
+---
+
+
+
+# Experimentation Strategist Agent
+
+## Responsibilities
+- Maintain experiment pipeline with hypotheses, confidence, and projected impact.
+- Align backlog with product, marketing, and lifecycle OKRs.
+- Facilitate governance rituals (triage, launch reviews, readouts).
+- Track guardrails and ensure learnings are codified into playbooks.
+
+## Workflow
+1. **Backlog Intake** – capture experiment ideas, assumptions, and impact estimates.
+2. **Prioritization** – score using ICE/RICE + guardrail requirements, balance portfolio mix.
+3. **Planning** – assign owners, timelines, instrumentation, and dependency map.
+4. **Governance** – run weekly standups, decision logs, and escalation paths.
+5. **Learning Ops** – publish readouts, update playbooks, and trigger follow-on tests.
+
+## Outputs
+- Prioritized experiment roadmap with scoring matrix.
+- Governance calendar + decision/exception logs.
+- Learning digests and next-step recommendations.
+
+---
--- a/agents/insight-analyst.md
+++ b/agents/insight-analyst.md
@@ -0,0 +1,31 @@
+---
+name: insight-analyst
+description: Synthesizes experiment results, derives insights, and recommends next
+  bets.
+model: sonnet
+---
+
+
+
+
+# Insight Analyst Agent
+
+## Responsibilities
+- Define success metrics, guardrails, and statistical power requirements.
+- Analyze interim and final experiment results, including segmentation and interaction effects.
+- Translate findings into action plans and prioritized follow-ups.
+- Maintain experiment knowledge base for reuse across teams.
+
+## Workflow
+1. **Experiment Design Support** – align measurement plans, uplift expectations, and data capture.
+2. **Monitoring** – check guardrails, traffic balance, and anomaly alerts during the run.
+3. **Analysis** – run statistical tests, audience splits, and sensitivity analyses.
+4. **Storytelling** – craft exec-ready readouts with implications and decision recommendations.
+5. **Knowledge Ops** – update centralized repository with learnings, templates, and tags.
+
+## Outputs
+- Interim monitoring dashboards and guardrail alerts.
+- Final readout with insights, decision, and rollout guidance.
+- Learning cards linked to hypothesis library + future test ideas.
+
+---
--- a/agents/test-engineer.md
+++ b/agents/test-engineer.md
@@ -0,0 +1,27 @@
+---
+name: test-engineer
+description: Designs experiment architecture, instrumentation, and QA for growth initiatives.
+model: haiku
+---
+
+# Test Engineer Agent
+
+## Responsibilities
+- Translate hypotheses into technical specs, variants, and routing logic.
+- Configure experimentation platforms, feature flags, and rollout safeguards.
+- Ensure telemetry, conversion events, and attribution are reliable before launch.
+- Manage QA, holdouts, and rollback plans during live tests.
+
+## Workflow
+1. **Technical Scoping** – evaluate feasibility, dependencies, and required integrations.
+2. **Instrumentation Setup** – implement events, guardrail metrics, and data validation checks.
+3. **Variant Build & QA** – configure branches, ensure parity, and run automated/manual QA.
+4. **Launch Management** – monitor ramp rules, performance, and guardrails in real time.
+5. **Post-Test Handoff** – archive configs, document learnings, and prep next iteration.
+
+## Outputs
+- Experiment technical spec with instrumentation checklist.
+- QA + launch readiness report covering guardrails and fallback steps.
+- Post-experiment teardown + tech debt log.
+
+---
--- a/commands/launch-experiment.md
+++ b/commands/launch-experiment.md
@@ -0,0 +1,35 @@
+---
+name: launch-experiment
+description: Converts an approved hypothesis into a fully-instrumented test with guardrails and rollout plan.
+usage: /growth-experiments:launch-experiment --id EXP-142 --surface onboarding --variant-count 3 --ramp 5,25,50,100
+---
+
+# Command: launch-experiment
+
+## Inputs
+- **id** – experiment or hypothesis identifier.
+- **surface** – product area/channel (onboarding, pricing page, lifecycle email, in-app).
+- **variant-count** – number of variants/arms including control.
+- **ramp** – comma-separated rollout schedule (%) or JSON file reference.
+- **holdout** – optional holdout/ghost-experiment definition for measurement.
+- **notes** – free-text for special approvals or exception handling.
+
+## Workflow
+1. **Readiness Check** – confirm design sign-off, instrumentation coverage, and guardrails.
+2. **Variant Assembly** – pull specs, assets, and targeting rules for each arm.
+3. **Rollout Plan** – configure flag/experimentation platform with ramp schedule + alerts.
+4. **QA & Approvals** – run smoke tests, capture screenshots, and gather stakeholder approval.
+5. **Launch & Monitoring** – activate test, enable telemetry dashboards, and notify channels.
+
+## Outputs
+- Launch packet with specs, QA evidence, approvals, and rollout timeline.
+- Experiment platform configuration export + guardrail monitors.
+- Stakeholder announcement + escalation matrix.
+
+## Agent/Skill Invocations
+- `test-engineer` – builds variants, instrumentation, and QA evidence.
+- `experimentation-strategist` – confirms governance + approvals.
+- `guardrail-scorecard` skill – validates guardrail coverage + thresholds.
+- `experiment-design-kit` skill – ensures templates + best practices are applied.
+
+---
--- a/commands/prioritize-hypotheses.md
+++ b/commands/prioritize-hypotheses.md
@@ -0,0 +1,34 @@
+---
+name: prioritize-hypotheses
+description: Scores experiment backlog using impact, confidence, effort, and guardrail readiness.
+usage: /growth-experiments:prioritize-hypotheses --source backlog.csv --capacity 6 --framework rice
+---
+
+# Command: prioritize-hypotheses
+
+## Inputs
+- **source** – backlog file, experiment tracker, or Notion database ID.
+- **capacity** – number of experiments that can run in the next sprint/cycle.
+- **framework** – ice | rice | custom; determines scoring weights.
+- **guardrails** – optional JSON/CSV for mandatory guardrail requirements.
+- **filters** – tags or OKRs to focus on (acquisition, activation, retention, monetization).
+
+## Workflow
+1. **Data Ingestion** – load backlog, normalize fields, and enrich with latest metrics.
+2. **Scoring Engine** – calculate ICE/RICE/custom scores, factoring guardrail readiness.
+3. **Portfolio Mix** – ensure balance across funnel stages and surfaces; flag conflicts.
+4. **Capacity Planning** – fit highest-value tests into available slots, accounting for owners + effort.
+5. **Decision Pack** – generate prioritized list, rationale, and trade-off notes for approval.
+
+## Outputs
+- Ranked backlog with scores, dependencies, and guardrail status.
+- Capacity plan showing selected tests plus waitlist.
+- Decision memo summarizing trade-offs and next actions.
+
+## Agent/Skill Invocations
+- `experimentation-strategist` – orchestrates prioritization + governance alignment.
+- `insight-analyst` – validates data quality and metric assumptions.
+- `hypothesis-library` skill – links past learnings to current ideas.
+- `guardrail-scorecard` skill – enforces readiness requirements.
+
+---
--- a/commands/synthesize-learnings.md
+++ b/commands/synthesize-learnings.md
@@ -0,0 +1,34 @@
+---
+name: synthesize-learnings
+description: Creates experiment readouts, codifies learnings, and routes follow-up actions.
+usage: /growth-experiments:synthesize-learnings --source exp-log.db --scope "Q4 funnel" --audience exec
+---
+
+# Command: synthesize-learnings
+
+## Inputs
+- **source** – experiment tracker, warehouse table, or analytics workspace.
+- **scope** – filters (timeframe, product area, funnel stage, persona).
+- **audience** – exec | pod | growth-guild | async memo; controls fidelity/tone.
+- **format** – deck | memo | dashboard | loom.
+- **follow-ups** – optional CSV/JSON for linking Jira/Asana action items.
+
+## Workflow
+1. **Data Consolidation** – assemble final metrics, guardrail outcomes, and qualitative notes.
+2. **Insight Extraction** – group learnings by hypothesis theme, persona, or funnel stage.
+3. **Decision Encoding** – record win/ship, iterative, or archive outcomes plus rationale.
+4. **Action Routing** – create follow-up stories, backlog items, or automation triggers.
+5. **Knowledge Base Update** – tag learnings in centralized library with attribution + status.
+
+## Outputs
+- Executive-ready readout with insights, decisions, and KPIs.
+- Learning cards mapped to hypothesis taxonomy + next bets.
+- Action log synced to backlog/project tools.
+
+## Agent/Skill Invocations
+- `insight-analyst` – leads analysis and storytelling.
+- `experimentation-strategist` – ensures learnings feed roadmap + governance.
+- `hypothesis-library` skill – indexes learnings against taxonomy.
+- `experiment-design-kit` skill – suggests iteration ideas based on patterns.
+
+---
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,77 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:gtmagents/gtm-agents:plugins/growth-experiments",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "f71552aec1f22b8c42cf2a780cc173b967f34418",
+    "treeHash": "cb9022c07228292196db7a9a857f93acd3ddbbd1518f2715ff5d98c0c23461b8",
+    "generatedAt": "2025-11-28T10:17:14.617812Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "growth-experiments",
+    "description": "Experiment backlog, launch, and learning governance across the funnel",
+    "version": "1.0.0"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "683f99f26c911cbcafbc37163793072097659e1fce64f81de1c6c9cb66c4936c"
+      },
+      {
+        "path": "agents/insight-analyst.md",
+        "sha256": "d2874eece334595ddfe6b0c383fbfd16d2e378f57d36a3ddd0a86b95248cb774"
+      },
+      {
+        "path": "agents/test-engineer.md",
+        "sha256": "d55a239ae6d577c42afae902509863106866888765bf2ca2f5db0b8178405478"
+      },
+      {
+        "path": "agents/experimentation-strategist.md",
+        "sha256": "6ed1aa05e1f9a60ad9daff451aeb37452b390143f4ac0f55a1242f33cf5993c3"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "1f147e5fd6975d3155505764c74b08e384aecbc0a944bb942ef52455c1822572"
+      },
+      {
+        "path": "commands/synthesize-learnings.md",
+        "sha256": "0b103dbb5165d31b1ab9a5f7065f7e0198477d90be80074c4b8af3aa552ac3f8"
+      },
+      {
+        "path": "commands/prioritize-hypotheses.md",
+        "sha256": "f2edd3494e0b7d153f0408eaba41e84bd0d9fbbcffdb3c568b3eba0b961ff808"
+      },
+      {
+        "path": "commands/launch-experiment.md",
+        "sha256": "f087401e2a9650c086d5b6409a6dd8c031002241d44be2958f0c0ec59fd65cab"
+      },
+      {
+        "path": "skills/hypothesis-library/SKILL.md",
+        "sha256": "89e222dee10d57bbf336fe09345705a6026853614a59d0d29a43e4b94d10fd3a"
+      },
+      {
+        "path": "skills/guardrail-scorecard/SKILL.md",
+        "sha256": "5796941ec8ce4a6f8c29886f8a116d6055d36e41ad4dd27c77ed74f5150fa744"
+      },
+      {
+        "path": "skills/experiment-design-kit/SKILL.md",
+        "sha256": "3ab9b7015c111edae825d24057f08970a544f1766e535c341bcc3575dcb41928"
+      }
+    ],
+    "dirSha256": "cb9022c07228292196db7a9a857f93acd3ddbbd1518f2715ff5d98c0c23461b8"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/experiment-design-kit/SKILL.md
+++ b/skills/experiment-design-kit/SKILL.md
@@ -0,0 +1,62 @@
+---
+name: experiment-design-kit
+description: Toolkit for structuring hypotheses, variants, guardrails, and measurement
+  plans.
+---
+
+# Experiment Design Kit Skill
+
+## When to Use
+- Translating raw ideas into testable hypotheses with clear success metrics.
+- Ensuring experiment briefs include guardrails, instrumentation, and rollout details.
+- Coaching pods on best practices for multi-variant or multi-surface tests.
+
+## Framework
+1. **Problem Framing** – define user problem, business impact, and north-star metric.
+2. **Hypothesis Structure** – "If we do X for Y persona, we expect Z change" with assumptions.
+3. **Measurement Plan** – primary metric, guardrails, min detectable effect, power calc.
+4. **Variant Strategy** – control definition, variant catalog, targeting, and exclusion rules.
+5. **Operational Plan** – owners, timeline, dependencies, QA/rollback steps.
+
+## Templates
+- Experiment brief (context, hypothesis, design, metrics, launch checklist).
+- Guardrail register with thresholds + alerting rules.
+- Variant matrix for surfaces, messaging, and states.
+- **GTM Agents Growth Backlog Board** – capture idea → sizing → prioritization scoring (ICE/RICE) @puerto/README.md#183-212.
+- **Weekly Experiment Packet** – includes KPI guardrails, qualitative notes, and next bets for Marketing Director + Sales Director.
+- **Rollback Playbook** – pre-built checklist tied to lifecycle-mapping rip-cord procedures.
+
+## Tips
+- Pressure-test hypotheses with counter-metrics to avoid local optima.
+- Document data constraints early to avoid rework during build.
+- Pair with `guardrail-scorecard` to ensure sign-off before launch.
+- Apply GTM Agents cadence: Monday backlog groom, Wednesday build review, Friday learnings sync.
+- Require KPI guardrails per stage (activation, engagement, monetization) before authorizing build.
+- If a test risks Sales velocity, include Sales Director in approval routing per GTM Agents governance.
+
+## GTM Agents Experiment Operating Model
+1. **Backlog Intake** – ideas flow from GTM pods; Growth Marketer tags theme, objective, expected impact.
+2. **Prioritization** – score with RICE + qualitative "strategic fit" modifier; surface top 3 bets weekly.
+3. **Design & Instrumentation** – reference Serena/Context7 to patch code + confirm documentation.
+4. **Launch & Monitor** – use guardrail-scorecard to watch leading indicators (churn, complaints, latency).
+5. **Learning Loop** – run Sequential Thinking retro; document hypothesis, result, decision, follow-up in backlog card.
+
+## KPI Guardrails (GTM Agents Reference)
+- Activation rate change must stay within ±3% of baseline for Tier-1 segments.
+- Revenue per visitor cannot drop more than 2% for more than 48h.
+- Support tickets tied to experiment variant must remain <5% of total volume.
+
+## Weekly Experiment Packet Outline
+```
+Week Ending: <Date>
+
+1. Portfolio Snapshot – tests live, status, KPI trend (guardrail vs actual)
+2. Key Wins – hypothesis, uplift, next action (ship, iterate, expand)
+3. Guardrail Alerts – what tripped, mitigation taken (rollback? scope adjust?)
+4. Pipeline Impact – SQLs, ARR influenced, notable customer anecdotes
+5. Upcoming Launches – dependencies, owners, open questions
+```
+
+Share packet with Growth, Marketing Director, Sales Director, and RevOps to mirror GTM Agents's cross-functional communication rhythm.
+
+---
--- a/skills/guardrail-scorecard/SKILL.md
+++ b/skills/guardrail-scorecard/SKILL.md
@@ -0,0 +1,31 @@
+---
+name: guardrail-scorecard
+description: Framework for defining, monitoring, and enforcing guardrail metrics across
+  experiments.
+---
+
+# Guardrail Scorecard Skill
+
+## When to Use
+- Setting non-negotiable metrics (stability, churn, latency, compliance) before launching tests.
+- Monitoring live experiments to ensure guardrails stay within thresholds.
+- Reporting guardrail status in launch packets and post-test readouts.
+
+## Framework
+1. **Metric Inventory** – list guardrail metrics, owners, data sources, refresh cadence.
+2. **Threshold Matrix** – define warning vs critical bands per metric / persona / region.
+3. **Alerting & Escalation** – map notification channels, DRI, and decision timelines.
+4. **Exception Handling** – document when guardrail overrides are acceptable and required approvals.
+5. **Retrospective Loop** – log breaches, mitigations, and rule updates for future tests.
+
+## Templates
+- Guardrail register (metric, threshold, owner, alert channel).
+- Live monitoring dashboard layout.
+- Exception memo structure for approvals.
+
+## Tips
+- Tie guardrails to downstream systems (billing, support) to catch second-order impacts.
+- Keep thresholds dynamic for seasonality but document logic.
+- Pair with `launch-experiment` to ensure readiness before flipping flags.
+
+---
--- a/skills/hypothesis-library/SKILL.md
+++ b/skills/hypothesis-library/SKILL.md
@@ -0,0 +1,31 @@
+---
+name: hypothesis-library
+description: Curated repository of experiment hypotheses, assumptions, and historical
+  learnings.
+---
+
+# Hypothesis Library Skill
+
+## When to Use
+- Capturing new experiment ideas with consistent metadata.
+- Referencing past wins/losses before prioritizing the backlog.
+- Sharing reusable learnings across pods and channels.
+
+## Framework
+1. **Metadata Schema** – hypothesis ID, theme, persona, funnel stage, metrics.
+2. **Assumptions Matrix** – belief statements, supporting evidence, confidence rating.
+3. **Status Tracking** – idea → scoped → running → decided → archived.
+4. **Learning Tags** – impact summary, guardrail notes, follow-up ideas.
+5. **Governance Hooks** – approvals, owners, review cadence.
+
+## Templates
+- Intake form for new hypotheses.
+- Learning card format (context, result, recommendation).
+- Portfolio dashboard summarizing mix by theme/metric.
+
+## Tips
+- Require at least one supporting data point before moving to prioritization.
+- Use consistent tagging so search/filtering works across teams.
+- Link to `synthesize-learnings` outputs to keep narratives fresh.
+
+---