Initial commit

2025-11-29 18:28:37 +08:00
commit ccc65b3f07
180 changed files with 53970 additions and 0 deletions
--- a/skills/create-plans/references/checkpoints.md
+++ b/skills/create-plans/references/checkpoints.md
@@ -0,0 +1,584 @@
+# Human Checkpoints in Plans
+
+Plans execute autonomously. Checkpoints formalize the interaction points where human verification or decisions are needed.
+
+**Core principle:** Claude automates everything with CLI/API. Checkpoints are for verification and decisions, not manual work.
+
+## Checkpoint Types
+
+### 1. `checkpoint:human-verify` (Most Common)
+
+**When:** Claude completed automated work, human confirms it works correctly.
+
+**Use for:**
+- Visual UI checks (layout, styling, responsiveness)
+- Interactive flows (click through wizard, test user flows)
+- Functional verification (feature works as expected)
+- Audio/video playback quality
+- Animation smoothness
+- Accessibility testing
+
+**Structure:**
+```xml
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>[What Claude automated and deployed/built]</what-built>
+  <how-to-verify>
+    [Exact steps to test - URLs, commands, expected behavior]
+  </how-to-verify>
+  <resume-signal>[How to continue - "approved", "yes", or describe issues]</resume-signal>
+</task>
+```
+
+**Key elements:**
+- `<what-built>`: What Claude automated (deployed, built, configured)
+- `<how-to-verify>`: Exact steps to confirm it works (numbered, specific)
+- `<resume-signal>`: Clear indication of how to continue
+
+**Example: Vercel Deployment**
+```xml
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <files>.vercel/, vercel.json</files>
+  <action>Run `vercel --yes` to create project and deploy. Capture deployment URL from output.</action>
+  <verify>vercel ls shows deployment, curl {url} returns 200</verify>
+  <done>App deployed, URL captured</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Deployed to Vercel at https://myapp-abc123.vercel.app</what-built>
+  <how-to-verify>
+    Visit https://myapp-abc123.vercel.app and confirm:
+    - Homepage loads without errors
+    - Login form is visible
+    - No console errors in browser DevTools
+  </how-to-verify>
+  <resume-signal>Type "approved" to continue, or describe issues to fix</resume-signal>
+</task>
+```
+
+**Example: UI Component**
+```xml
+<task type="auto">
+  <name>Build responsive dashboard layout</name>
+  <files>src/components/Dashboard.tsx, src/app/dashboard/page.tsx</files>
+  <action>Create dashboard with sidebar, header, and content area. Use Tailwind responsive classes for mobile.</action>
+  <verify>npm run build succeeds, no TypeScript errors</verify>
+  <done>Dashboard component builds without errors</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Responsive dashboard layout at /dashboard</what-built>
+  <how-to-verify>
+    1. Run: npm run dev
+    2. Visit: http://localhost:3000/dashboard
+    3. Desktop (>1024px): Verify sidebar left, content right, header top
+    4. Tablet (768px): Verify sidebar collapses to hamburger
+    5. Mobile (375px): Verify single column, bottom nav
+    6. Check: No layout shift, no horizontal scroll
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe layout issues</resume-signal>
+</task>
+```
+
+**Example: Xcode Build**
+```xml
+<task type="auto">
+  <name>Build macOS app with Xcode</name>
+  <files>App.xcodeproj, Sources/</files>
+  <action>Run `xcodebuild -project App.xcodeproj -scheme App build`. Check for compilation errors in output.</action>
+  <verify>Build output contains "BUILD SUCCEEDED", no errors</verify>
+  <done>App builds successfully</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Built macOS app at DerivedData/Build/Products/Debug/App.app</what-built>
+  <how-to-verify>
+    Open App.app and test:
+    - App launches without crashes
+    - Menu bar icon appears
+    - Preferences window opens correctly
+    - No visual glitches or layout issues
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+
+### 2. `checkpoint:decision`
+
+**When:** Human must make choice that affects implementation direction.
+
+**Use for:**
+- Technology selection (which auth provider, which database)
+- Architecture decisions (monorepo vs separate repos)
+- Design choices (color scheme, layout approach)
+- Feature prioritization (which variant to build)
+- Data model decisions (schema structure)
+
+**Structure:**
+```xml
+<task type="checkpoint:decision" gate="blocking">
+  <decision>[What's being decided]</decision>
+  <context>[Why this decision matters]</context>
+  <options>
+    <option id="option-a">
+      <name>[Option name]</name>
+      <pros>[Benefits]</pros>
+      <cons>[Tradeoffs]</cons>
+    </option>
+    <option id="option-b">
+      <name>[Option name]</name>
+      <pros>[Benefits]</pros>
+      <cons>[Tradeoffs]</cons>
+    </option>
+  </options>
+  <resume-signal>[How to indicate choice]</resume-signal>
+</task>
+```
+
+**Key elements:**
+- `<decision>`: What's being decided
+- `<context>`: Why this matters
+- `<options>`: Each option with balanced pros/cons (not prescriptive)
+- `<resume-signal>`: How to indicate choice
+
+**Example: Auth Provider Selection**
+```xml
+<task type="checkpoint:decision" gate="blocking">
+  <decision>Select authentication provider</decision>
+  <context>
+    Need user authentication for the app. Three solid options with different tradeoffs.
+  </context>
+  <options>
+    <option id="supabase">
+      <name>Supabase Auth</name>
+      <pros>Built-in with Supabase DB we're using, generous free tier, row-level security integration</pros>
+      <cons>Less customizable UI, tied to Supabase ecosystem</cons>
+    </option>
+    <option id="clerk">
+      <name>Clerk</name>
+      <pros>Beautiful pre-built UI, best developer experience, excellent docs</pros>
+      <cons>Paid after 10k MAU, vendor lock-in</cons>
+    </option>
+    <option id="nextauth">
+      <name>NextAuth.js</name>
+      <pros>Free, self-hosted, maximum control, widely adopted</pros>
+      <cons>More setup work, you manage security updates, UI is DIY</cons>
+    </option>
+  </options>
+  <resume-signal>Select: supabase, clerk, or nextauth</resume-signal>
+</task>
+```
+
+### 3. `checkpoint:human-action` (Rare)
+
+**When:** Action has NO CLI/API and requires human-only interaction, OR Claude hit an authentication gate during automation.
+
+**Use ONLY for:**
+- **Authentication gates** - Claude tried to use CLI/API but needs credentials to continue (this is NOT a failure)
+- Email verification links (account creation requires clicking email)
+- SMS 2FA codes (phone verification)
+- Manual account approvals (platform requires human review before API access)
+- Credit card 3D Secure flows (web-based payment authorization)
+- OAuth app approvals (some platforms require web-based approval)
+
+**Do NOT use for pre-planned manual work:**
+- Manually deploying to Vercel (use `vercel` CLI - auth gate if needed)
+- Manually creating Stripe webhooks (use Stripe API - auth gate if needed)
+- Manually creating databases (use provider CLI - auth gate if needed)
+- Running builds/tests manually (use Bash tool)
+- Creating files manually (use Write tool)
+
+**Structure:**
+```xml
+<task type="checkpoint:human-action" gate="blocking">
+  <action>[What human must do - Claude already did everything automatable]</action>
+  <instructions>
+    [What Claude already automated]
+    [The ONE thing requiring human action]
+  </instructions>
+  <verification>[What Claude can check afterward]</verification>
+  <resume-signal>[How to continue]</resume-signal>
+</task>
+```
+
+**Key principle:** Claude automates EVERYTHING possible first, only asks human for the truly unavoidable manual step.
+
+**Example: Email Verification**
+```xml
+<task type="auto">
+  <name>Create SendGrid account via API</name>
+  <action>Use SendGrid API to create subuser account with provided email. Request verification email.</action>
+  <verify>API returns 201, account created</verify>
+  <done>Account created, verification email sent</done>
+</task>
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Complete email verification for SendGrid account</action>
+  <instructions>
+    I created the account and requested verification email.
+    Check your inbox for SendGrid verification link and click it.
+  </instructions>
+  <verification>SendGrid API key works: curl test succeeds</verification>
+  <resume-signal>Type "done" when email verified</resume-signal>
+</task>
+```
+
+**Example: Credit Card 3D Secure**
+```xml
+<task type="auto">
+  <name>Create Stripe payment intent</name>
+  <action>Use Stripe API to create payment intent for $99. Generate checkout URL.</action>
+  <verify>Stripe API returns payment intent ID and URL</verify>
+  <done>Payment intent created</done>
+</task>
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Complete 3D Secure authentication</action>
+  <instructions>
+    I created the payment intent: https://checkout.stripe.com/pay/cs_test_abc123
+    Visit that URL and complete the 3D Secure verification flow with your test card.
+  </instructions>
+  <verification>Stripe webhook receives payment_intent.succeeded event</verification>
+  <resume-signal>Type "done" when payment completes</resume-signal>
+</task>
+```
+
+**Example: Authentication Gate (Dynamic Checkpoint)**
+```xml
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <files>.vercel/, vercel.json</files>
+  <action>Run `vercel --yes` to deploy</action>
+  <verify>vercel ls shows deployment, curl returns 200</verify>
+</task>
+
+<!-- If vercel returns "Error: Not authenticated", Claude creates checkpoint on the fly -->
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Authenticate Vercel CLI so I can continue deployment</action>
+  <instructions>
+    I tried to deploy but got authentication error.
+    Run: vercel login
+    This will open your browser - complete the authentication flow.
+  </instructions>
+  <verification>vercel whoami returns your account email</verification>
+  <resume-signal>Type "done" when authenticated</resume-signal>
+</task>
+
+<!-- After authentication, Claude retries the deployment -->
+
+<task type="auto">
+  <name>Retry Vercel deployment</name>
+  <action>Run `vercel --yes` (now authenticated)</action>
+  <verify>vercel ls shows deployment, curl returns 200</verify>
+</task>
+```
+
+**Key distinction:** Authentication gates are created dynamically when Claude encounters auth errors during automation. They're NOT pre-planned - Claude tries to automate first, only asks for credentials when blocked.
+
+See references/cli-automation.md "Authentication Gates" section for more examples and full protocol.
+
+## Execution Protocol
+
+When Claude encounters `type="checkpoint:*"`:
+
+1. **Stop immediately** - do not proceed to next task
+2. **Display checkpoint clearly:**
+
+```
+════════════════════════════════════════
+CHECKPOINT: [Type]
+════════════════════════════════════════
+
+Task [X] of [Y]: [Name]
+
+[Display checkpoint-specific content]
+
+[Resume signal instruction]
+════════════════════════════════════════
+```
+
+3. **Wait for user response** - do not hallucinate completion
+4. **Verify if possible** - check files, run tests, whatever is specified
+5. **Resume execution** - continue to next task only after confirmation
+
+**For checkpoint:human-verify:**
+```
+════════════════════════════════════════
+CHECKPOINT: Verification Required
+════════════════════════════════════════
+
+Task 5 of 8: Responsive dashboard layout
+
+I built: Responsive dashboard at /dashboard
+
+How to verify:
+1. Run: npm run dev
+2. Visit: http://localhost:3000/dashboard
+3. Test: Resize browser window to mobile/tablet/desktop
+4. Confirm: No layout shift, proper responsive behavior
+
+Type "approved" to continue, or describe issues.
+════════════════════════════════════════
+```
+
+**For checkpoint:decision:**
+```
+════════════════════════════════════════
+CHECKPOINT: Decision Required
+════════════════════════════════════════
+
+Task 2 of 6: Select authentication provider
+
+Decision: Which auth provider should we use?
+
+Context: Need user authentication. Three options with different tradeoffs.
+
+Options:
+1. supabase - Built-in with our DB, free tier
+2. clerk - Best DX, paid after 10k users
+3. nextauth - Self-hosted, maximum control
+
+Select: supabase, clerk, or nextauth
+════════════════════════════════════════
+```
+
+## Writing Good Checkpoints
+
+**DO:**
+- Automate everything with CLI/API before checkpoint
+- Be specific: "Visit https://myapp.vercel.app" not "check deployment"
+- Number verification steps: easier to follow
+- State expected outcomes: "You should see X"
+- Provide context: why this checkpoint exists
+- Make verification executable: clear, testable steps
+
+**DON'T:**
+- Ask human to do work Claude can automate (deploy, create resources, run builds)
+- Assume knowledge: "Configure the usual settings" ❌
+- Skip steps: "Set up database" ❌ (too vague)
+- Mix multiple verifications in one checkpoint (split them)
+- Make verification impossible (Claude can't check visual appearance without user confirmation)
+
+## When to Use Checkpoints
+
+**Use checkpoint:human-verify for:**
+- Visual verification (UI, layouts, animations)
+- Interactive testing (click flows, user journeys)
+- Quality checks (audio/video playback, animation smoothness)
+- Confirming deployed apps are accessible
+
+**Use checkpoint:decision for:**
+- Technology selection (auth providers, databases, frameworks)
+- Architecture choices (monorepo, deployment strategy)
+- Design decisions (color schemes, layout approaches)
+- Feature prioritization
+
+**Use checkpoint:human-action for:**
+- Email verification links (no API)
+- SMS 2FA codes (no API)
+- Manual approvals with no automation
+- 3D Secure payment flows
+
+**Don't use checkpoints for:**
+- Things Claude can verify programmatically (tests pass, build succeeds)
+- File operations (Claude can read files to verify)
+- Code correctness (use tests and static analysis)
+- Anything automatable via CLI/API
+
+## Checkpoint Placement
+
+Place checkpoints:
+- **After automation completes** - not before Claude does the work
+- **After UI buildout** - before declaring phase complete
+- **Before dependent work** - decisions before implementation
+- **At integration points** - after configuring external services
+
+Bad placement:
+- Before Claude automates (asking human to do automatable work) ❌
+- Too frequent (every other task is a checkpoint) ❌
+- Too late (checkpoint is last task, but earlier tasks needed its result) ❌
+
+## Complete Examples
+
+### Example 1: Deployment Flow (Correct)
+
+```xml
+<!-- Claude automates everything -->
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <files>.vercel/, vercel.json, package.json</files>
+  <action>
+    1. Run `vercel --yes` to create project and deploy
+    2. Capture deployment URL from output
+    3. Set environment variables with `vercel env add`
+    4. Trigger production deployment with `vercel --prod`
+  </action>
+  <verify>
+    - vercel ls shows deployment
+    - curl {url} returns 200
+    - Environment variables set correctly
+  </verify>
+  <done>App deployed to production, URL captured</done>
+</task>
+
+<!-- Human verifies visual/functional correctness -->
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Deployed to https://myapp.vercel.app</what-built>
+  <how-to-verify>
+    Visit https://myapp.vercel.app and confirm:
+    - Homepage loads correctly
+    - All images/assets load
+    - Navigation works
+    - No console errors
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+
+### Example 2: Database Setup (Correct)
+
+```xml
+<!-- Claude automates everything -->
+<task type="auto">
+  <name>Create Upstash Redis database</name>
+  <files>.env</files>
+  <action>
+    1. Run `upstash redis create myapp-cache --region us-east-1`
+    2. Capture connection URL from output
+    3. Write to .env: UPSTASH_REDIS_URL={url}
+    4. Verify connection with test command
+  </action>
+  <verify>
+    - upstash redis list shows database
+    - .env contains UPSTASH_REDIS_URL
+    - Test connection succeeds
+  </verify>
+  <done>Redis database created and configured</done>
+</task>
+
+<!-- NO CHECKPOINT NEEDED - Claude automated everything and verified programmatically -->
+```
+
+### Example 3: Stripe Webhooks (Correct)
+
+```xml
+<!-- Claude automates everything -->
+<task type="auto">
+  <name>Configure Stripe webhooks</name>
+  <files>.env, src/app/api/webhooks/route.ts</files>
+  <action>
+    1. Use Stripe API to create webhook endpoint pointing to /api/webhooks
+    2. Subscribe to events: payment_intent.succeeded, customer.subscription.updated
+    3. Save webhook signing secret to .env
+    4. Implement webhook handler in route.ts
+  </action>
+  <verify>
+    - Stripe API returns webhook endpoint ID
+    - .env contains STRIPE_WEBHOOK_SECRET
+    - curl webhook endpoint returns 200
+  </verify>
+  <done>Stripe webhooks configured and handler implemented</done>
+</task>
+
+<!-- Human verifies in Stripe dashboard -->
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Stripe webhook configured via API</what-built>
+  <how-to-verify>
+    Visit Stripe Dashboard > Developers > Webhooks
+    Confirm: Endpoint shows https://myapp.com/api/webhooks with correct events
+  </how-to-verify>
+  <resume-signal>Type "yes" if correct</resume-signal>
+</task>
+```
+
+## Anti-Patterns
+
+### ❌ BAD: Asking human to automate
+
+```xml
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Deploy to Vercel</action>
+  <instructions>
+    1. Visit vercel.com/new
+    2. Import Git repository
+    3. Click Deploy
+    4. Copy deployment URL
+  </instructions>
+  <verification>Deployment exists</verification>
+  <resume-signal>Paste URL</resume-signal>
+</task>
+```
+
+**Why bad:** Vercel has a CLI. Claude should run `vercel --yes`.
+
+### ✅ GOOD: Claude automates, human verifies
+
+```xml
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <action>Run `vercel --yes`. Capture URL.</action>
+  <verify>vercel ls shows deployment, curl returns 200</verify>
+</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Deployed to {url}</what-built>
+  <how-to-verify>Visit {url}, check homepage loads</how-to-verify>
+  <resume-signal>Type "approved"</resume-signal>
+</task>
+```
+
+### ❌ BAD: Too many checkpoints
+
+```xml
+<task type="auto">Create schema</task>
+<task type="checkpoint:human-verify">Check schema</task>
+<task type="auto">Create API route</task>
+<task type="checkpoint:human-verify">Check API</task>
+<task type="auto">Create UI form</task>
+<task type="checkpoint:human-verify">Check form</task>
+```
+
+**Why bad:** Verification fatigue. Combine into one checkpoint at end.
+
+### ✅ GOOD: Single verification checkpoint
+
+```xml
+<task type="auto">Create schema</task>
+<task type="auto">Create API route</task>
+<task type="auto">Create UI form</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Complete auth flow (schema + API + UI)</what-built>
+  <how-to-verify>Test full flow: register, login, access protected page</how-to-verify>
+  <resume-signal>Type "approved"</resume-signal>
+</task>
+```
+
+### ❌ BAD: Asking for automatable file operations
+
+```xml
+<task type="checkpoint:human-action">
+  <action>Create .env file</action>
+  <instructions>
+    1. Create .env in project root
+    2. Add: DATABASE_URL=...
+    3. Add: STRIPE_KEY=...
+  </instructions>
+</task>
+```
+
+**Why bad:** Claude has Write tool. This should be `type="auto"`.
+
+## Summary
+
+Checkpoints formalize human-in-the-loop points. Use them when Claude cannot complete a task autonomously OR when human verification is required for correctness.
+
+**The golden rule:** If Claude CAN automate it, Claude MUST automate it.
+
+**Checkpoint priority:**
+1. **checkpoint:human-verify** (90% of checkpoints) - Claude automated everything, human confirms visual/functional correctness
+2. **checkpoint:decision** (9% of checkpoints) - Human makes architectural/technology choices
+3. **checkpoint:human-action** (1% of checkpoints) - Truly unavoidable manual steps with no API/CLI
+
+**See also:** references/cli-automation.md for exhaustive list of what Claude can automate.
--- a/skills/create-plans/references/cli-automation.md
+++ b/skills/create-plans/references/cli-automation.md
@@ -0,0 +1,497 @@
+# CLI and API Automation Reference
+
+**Core principle:** If it has a CLI or API, Claude does it. Never ask the human to perform manual steps that Claude can automate.
+
+This reference documents what Claude CAN and SHOULD automate during plan execution.
+
+## Deployment Platforms
+
+### Vercel
+**CLI:** `vercel`
+
+**What Claude automates:**
+- Create and deploy projects: `vercel --yes`
+- Set environment variables: `vercel env add KEY production`
+- Link to git repo: `vercel link`
+- Trigger deployments: `vercel --prod`
+- Get deployment URLs: `vercel ls`
+- Manage domains: `vercel domains add example.com`
+
+**Never ask human to:**
+- Visit vercel.com/new to create project
+- Click through dashboard to add env vars
+- Manually link repository
+
+**Checkpoint pattern:**
+```xml
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <action>Run `vercel --yes` to deploy. Capture deployment URL.</action>
+  <verify>vercel ls shows deployment, curl {url} returns 200</verify>
+</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Deployed to {url}</what-built>
+  <how-to-verify>Visit {url} - check homepage loads</how-to-verify>
+  <resume-signal>Type "yes" if correct</resume-signal>
+</task>
+```
+
+### Railway
+**CLI:** `railway`
+
+**What Claude automates:**
+- Initialize project: `railway init`
+- Link to repo: `railway link`
+- Deploy: `railway up`
+- Set variables: `railway variables set KEY=value`
+- Get deployment URL: `railway domain`
+
+### Fly.io
+**CLI:** `fly`
+
+**What Claude automates:**
+- Launch app: `fly launch --no-deploy`
+- Deploy: `fly deploy`
+- Set secrets: `fly secrets set KEY=value`
+- Scale: `fly scale count 2`
+
+## Payment & Billing
+
+### Stripe
+**CLI:** `stripe`
+
+**What Claude automates:**
+- Create webhook endpoints: `stripe listen --forward-to localhost:3000/api/webhooks`
+- Trigger test events: `stripe trigger payment_intent.succeeded`
+- Create products/prices: Stripe API via curl/fetch
+- Manage customers: Stripe API via curl/fetch
+- Check webhook logs: `stripe webhooks list`
+
+**Never ask human to:**
+- Visit dashboard.stripe.com to create webhook
+- Click through UI to create products
+- Manually copy webhook signing secret
+
+**Checkpoint pattern:**
+```xml
+<task type="auto">
+  <name>Configure Stripe webhooks</name>
+  <action>Use Stripe API to create webhook endpoint at /api/webhooks. Save signing secret to .env.</action>
+  <verify>stripe webhooks list shows endpoint, .env contains STRIPE_WEBHOOK_SECRET</verify>
+</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Stripe webhook configured</what-built>
+  <how-to-verify>Check Stripe dashboard > Developers > Webhooks shows endpoint with correct URL</how-to-verify>
+  <resume-signal>Type "yes" if correct</resume-signal>
+</task>
+```
+
+## Databases & Backend
+
+### Supabase
+**CLI:** `supabase`
+
+**What Claude automates:**
+- Initialize project: `supabase init`
+- Link to remote: `supabase link --project-ref {ref}`
+- Create migrations: `supabase migration new {name}`
+- Push migrations: `supabase db push`
+- Generate types: `supabase gen types typescript`
+- Deploy functions: `supabase functions deploy {name}`
+
+**Never ask human to:**
+- Visit supabase.com to create project manually
+- Click through dashboard to run migrations
+- Copy/paste connection strings
+
+**Note:** Project creation may require web dashboard initially (no CLI for initial project creation), but all subsequent work (migrations, functions, etc.) is CLI-automated.
+
+### Upstash (Redis/Kafka)
+**CLI:** `upstash`
+
+**What Claude automates:**
+- Create Redis database: `upstash redis create {name} --region {region}`
+- Get connection details: `upstash redis get {id}`
+- Create Kafka cluster: `upstash kafka create {name} --region {region}`
+
+**Never ask human to:**
+- Visit console.upstash.com
+- Click through UI to create database
+- Copy/paste connection URLs manually
+
+**Checkpoint pattern:**
+```xml
+<task type="auto">
+  <name>Create Upstash Redis database</name>
+  <action>Run `upstash redis create myapp-cache --region us-east-1`. Save URL to .env.</action>
+  <verify>.env contains UPSTASH_REDIS_URL, upstash redis list shows database</verify>
+</task>
+```
+
+### PlanetScale
+**CLI:** `pscale`
+
+**What Claude automates:**
+- Create database: `pscale database create {name} --region {region}`
+- Create branch: `pscale branch create {db} {branch}`
+- Deploy request: `pscale deploy-request create {db} {branch}`
+- Connection string: `pscale connect {db} {branch}`
+
+## Version Control & CI/CD
+
+### GitHub
+**CLI:** `gh`
+
+**What Claude automates:**
+- Create repo: `gh repo create {name} --public/--private`
+- Create issues: `gh issue create --title "{title}" --body "{body}"`
+- Create PR: `gh pr create --title "{title}" --body "{body}"`
+- Manage secrets: `gh secret set {KEY}`
+- Trigger workflows: `gh workflow run {name}`
+- Check status: `gh run list`
+
+**Never ask human to:**
+- Visit github.com to create repo
+- Click through UI to add secrets
+- Manually create issues/PRs
+
+## Build Tools & Testing
+
+### Node/npm/pnpm/bun
+**What Claude automates:**
+- Install dependencies: `npm install`, `pnpm install`, `bun install`
+- Run builds: `npm run build`
+- Run tests: `npm test`, `npm run test:e2e`
+- Type checking: `tsc --noEmit`
+
+**Never ask human to:** Run these commands manually
+
+### Xcode (macOS/iOS)
+**CLI:** `xcodebuild`
+
+**What Claude automates:**
+- Build project: `xcodebuild -project App.xcodeproj -scheme App build`
+- Run tests: `xcodebuild test -project App.xcodeproj -scheme App`
+- Archive: `xcodebuild archive -project App.xcodeproj -scheme App`
+- Check compilation: Parse xcodebuild output for errors
+
+**Never ask human to:**
+- Open Xcode and click Product > Build
+- Click Product > Test manually
+- Check for errors by looking at Xcode UI
+
+**Checkpoint pattern:**
+```xml
+<task type="auto">
+  <name>Build macOS app</name>
+  <action>Run `xcodebuild -project App.xcodeproj -scheme App build`. Check output for errors.</action>
+  <verify>Build succeeds with "BUILD SUCCEEDED" in output</verify>
+</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Built macOS app at DerivedData/Build/Products/Debug/App.app</what-built>
+  <how-to-verify>Open App.app and check: login flow works, no visual glitches</how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+
+## Environment Configuration
+
+### .env Files
+**Tool:** Write tool
+
+**What Claude automates:**
+- Create .env files: Use Write tool
+- Append variables: Use Edit tool
+- Read current values: Use Read tool
+
+**Never ask human to:**
+- Manually create .env file
+- Copy/paste values into .env
+- Edit .env in text editor
+
+**Pattern:**
+```xml
+<task type="auto">
+  <name>Configure environment variables</name>
+  <action>Write .env file with: DATABASE_URL, STRIPE_KEY, JWT_SECRET (generated).</action>
+  <verify>Read .env confirms all variables present</verify>
+</task>
+```
+
+## Email & Communication
+
+### Resend
+**API:** Resend API via HTTP
+
+**What Claude automates:**
+- Create API keys via dashboard API (if available) or instructions for one-time setup
+- Send emails: Resend API
+- Configure domains: Resend API
+
+### SendGrid
+**API:** SendGrid API via HTTP
+
+**What Claude automates:**
+- Create API keys via API
+- Send emails: SendGrid API
+- Configure webhooks: SendGrid API
+
+**Note:** Initial account setup may require email verification (checkpoint:human-action), but all subsequent work is API-automated.
+
+## Authentication Gates
+
+**Critical distinction:** When Claude tries to use a CLI/API and gets an authentication error, this is NOT a failure - it's a gate that requires human input to unblock automation.
+
+**Pattern: Claude encounters auth error → creates checkpoint → you authenticate → Claude continues**
+
+### Example: Vercel CLI Not Authenticated
+
+```xml
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <files>.vercel/, vercel.json</files>
+  <action>Run `vercel --yes` to deploy</action>
+  <verify>vercel ls shows deployment</verify>
+</task>
+
+<!-- If vercel returns "Error: Not authenticated" -->
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Authenticate Vercel CLI so I can continue deployment</action>
+  <instructions>
+    I tried to deploy but got authentication error.
+    Run: vercel login
+    This will open your browser - complete the authentication flow.
+  </instructions>
+  <verification>vercel whoami returns your account email</verification>
+  <resume-signal>Type "done" when authenticated</resume-signal>
+</task>
+
+<!-- After authentication, Claude retries automatically -->
+
+<task type="auto">
+  <name>Retry Vercel deployment</name>
+  <action>Run `vercel --yes` (now authenticated)</action>
+  <verify>vercel ls shows deployment, curl returns 200</verify>
+</task>
+```
+
+### Example: Stripe CLI Needs API Key
+
+```xml
+<task type="auto">
+  <name>Create Stripe webhook endpoint</name>
+  <action>Use Stripe API to create webhook at /api/webhooks</action>
+</task>
+
+<!-- If API returns 401 Unauthorized -->
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Provide Stripe API key so I can continue webhook configuration</action>
+  <instructions>
+    I need your Stripe API key to create webhooks.
+    1. Visit dashboard.stripe.com/apikeys
+    2. Copy your "Secret key" (starts with sk_test_ or sk_live_)
+    3. Paste it here or run: export STRIPE_SECRET_KEY=sk_...
+  </instructions>
+  <verification>Stripe API key works: curl test succeeds</verification>
+  <resume-signal>Type "done" or paste the key</resume-signal>
+</task>
+
+<!-- After key provided, Claude writes to .env and continues -->
+
+<task type="auto">
+  <name>Save Stripe key and create webhook</name>
+  <action>
+    1. Write STRIPE_SECRET_KEY to .env
+    2. Create webhook endpoint via Stripe API
+    3. Save webhook secret to .env
+  </action>
+  <verify>.env contains both keys, webhook endpoint exists</verify>
+</task>
+```
+
+### Example: GitHub CLI Not Logged In
+
+```xml
+<task type="auto">
+  <name>Create GitHub repository</name>
+  <action>Run `gh repo create myapp --public`</action>
+</task>
+
+<!-- If gh returns "Not logged in" -->
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Authenticate GitHub CLI so I can create repository</action>
+  <instructions>
+    I need GitHub authentication to create the repo.
+    Run: gh auth login
+    Follow the prompts to authenticate (browser or token).
+  </instructions>
+  <verification>gh auth status shows "Logged in"</verification>
+  <resume-signal>Type "done" when authenticated</resume-signal>
+</task>
+
+<task type="auto">
+  <name>Create repository (authenticated)</name>
+  <action>Run `gh repo create myapp --public`</action>
+  <verify>gh repo view shows repository exists</verify>
+</task>
+```
+
+### Example: Upstash CLI Needs API Key
+
+```xml
+<task type="auto">
+  <name>Create Upstash Redis database</name>
+  <action>Run `upstash redis create myapp-cache --region us-east-1`</action>
+</task>
+
+<!-- If upstash returns auth error -->
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Configure Upstash CLI credentials so I can create database</action>
+  <instructions>
+    I need Upstash authentication to create Redis database.
+    1. Visit console.upstash.com/account/api
+    2. Copy your API key
+    3. Run: upstash auth login
+    4. Paste your API key when prompted
+  </instructions>
+  <verification>upstash auth status shows authenticated</verification>
+  <resume-signal>Type "done" when authenticated</resume-signal>
+</task>
+
+<task type="auto">
+  <name>Create Redis database (authenticated)</name>
+  <action>
+    1. Run `upstash redis create myapp-cache --region us-east-1`
+    2. Capture connection URL
+    3. Write to .env: UPSTASH_REDIS_URL={url}
+  </action>
+  <verify>upstash redis list shows database, .env contains URL</verify>
+</task>
+```
+
+### Authentication Gate Protocol
+
+**When Claude encounters authentication error during execution:**
+
+1. **Recognize it's not a failure** - Missing auth is expected, not a bug
+2. **Stop current task** - Don't retry repeatedly
+3. **Create checkpoint:human-action on the fly** - Dynamic checkpoint, not pre-planned
+4. **Provide exact authentication steps** - CLI commands, where to get keys
+5. **Verify authentication** - Test that auth works before continuing
+6. **Retry the original task** - Resume automation where it left off
+7. **Continue normally** - One auth gate doesn't break the flow
+
+**Key difference from pre-planned checkpoints:**
+- Pre-planned: "I need you to do X" (wrong - Claude should automate)
+- Auth gate: "I tried to automate X but need credentials to continue" (correct - unblocks automation)
+
+**This preserves agentic flow:**
+- Claude tries automation first
+- Only asks for help when blocked by credentials
+- Continues automating after unblocked
+- You never manually deploy/create resources - just provide keys
+
+## When checkpoint:human-action is REQUIRED
+
+**Truly rare cases where no CLI/API exists:**
+
+1. **Email verification links** - Account signup requires clicking verification email
+2. **SMS verification codes** - 2FA requiring phone
+3. **Manual account approvals** - Platform requires human review before API access
+4. **Domain DNS records at registrar** - Some registrars have no API
+5. **Credit card input** - Payment methods requiring 3D Secure web flow
+6. **OAuth app approval** - Some platforms require web-based app approval flow
+
+**For these rare cases:**
+```xml
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Complete email verification for SendGrid account</action>
+  <instructions>
+    I created the account and requested verification email.
+    Check your inbox for verification link and click it.
+  </instructions>
+  <verification>SendGrid API key works: curl test succeeds</verification>
+  <resume-signal>Type "done" when verified</resume-signal>
+</task>
+```
+
+**Key difference:** Claude does EVERYTHING possible first (account creation, API requests), only asks human for the one thing with no automation path.
+
+## Quick Reference: "Can Claude automate this?"
+
+| Action | CLI/API? | Claude does it? |
+|--------|----------|-----------------|
+| Deploy to Vercel | ✅ `vercel` | YES |
+| Create Stripe webhook | ✅ Stripe API | YES |
+| Run xcodebuild | ✅ `xcodebuild` | YES |
+| Write .env file | ✅ Write tool | YES |
+| Create Upstash DB | ✅ `upstash` CLI | YES |
+| Install npm packages | ✅ `npm` | YES |
+| Create GitHub repo | ✅ `gh` | YES |
+| Run tests | ✅ `npm test` | YES |
+| Create Supabase project | ⚠️ Web dashboard | NO (then CLI for everything else) |
+| Click email verification link | ❌ No API | NO |
+| Enter credit card with 3DS | ❌ No API | NO |
+
+**Default answer: YES.** Unless explicitly in the "NO" category, Claude automates it.
+
+## Decision Tree
+
+```
+┌─────────────────────────────────────┐
+│ Task requires external resource?    │
+└──────────────┬──────────────────────┘
+               │
+               ▼
+┌─────────────────────────────────────┐
+│ Does it have CLI/API/tool access?   │
+└──────────────┬──────────────────────┘
+               │
+         ┌─────┴─────┐
+         │           │
+         ▼           ▼
+       YES          NO
+         │           │
+         │           ▼
+         │     ┌──────────────────────────────┐
+         │     │ checkpoint:human-action      │
+         │     │ (email links, 2FA, etc.)     │
+         │     └──────────────────────────────┘
+         │
+         ▼
+    ┌────────────────────────────────────────┐
+    │ task type="auto"                       │
+    │ Claude automates via CLI/API           │
+    └────────────┬───────────────────────────┘
+                 │
+                 ▼
+    ┌────────────────────────────────────────┐
+    │ checkpoint:human-verify                │
+    │ Human confirms visual/functional       │
+    └────────────────────────────────────────┘
+```
+
+## Summary
+
+**The rule:** If Claude CAN do it, Claude MUST do it.
+
+Checkpoints are for:
+- **Verification** - Confirming Claude's automated work looks/behaves correctly
+- **Decisions** - Choosing between valid approaches
+- **True blockers** - Rare actions with literally no API/CLI (email links, 2FA)
+
+Checkpoints are NOT for:
+- Deploying (use CLI)
+- Creating resources (use CLI/API)
+- Running builds (use Bash)
+- Writing files (use Write tool)
+- Anything with automation available
+
+**This keeps the agentic coding workflow intact - Claude does the work, you verify results.**
--- a/skills/create-plans/references/context-management.md
+++ b/skills/create-plans/references/context-management.md
@@ -0,0 +1,138 @@
+<overview>
+Claude has a finite context window. This reference defines how to monitor usage and handle approaching limits gracefully.
+</overview>
+
+<context_awareness>
+Claude receives system warnings showing token usage:
+
+```
+Token usage: 150000/200000; 50000 remaining
+```
+
+This information appears in `<system_warning>` tags during the conversation.
+</context_awareness>
+
+<thresholds>
+<threshold level="comfortable" remaining="50%+">
+**Status**: Plenty of room
+**Action**: Work normally
+</threshold>
+
+<threshold level="getting_full" remaining="25%">
+**Status**: Context accumulating
+**Action**: Mention to user: "Context getting full. Consider wrapping up or creating handoff soon."
+**No immediate action required.**
+</threshold>
+
+<threshold level="low" remaining="15%">
+**Status**: Running low
+**Action**:
+1. Pause at next safe point (complete current atomic operation)
+2. Ask user: "Running low on context (~30k tokens remaining). Options:
+   - Create handoff now and resume in fresh session
+   - Push through (risky if complex work remains)"
+3. Await user decision
+
+**Do not start new large operations.**
+</threshold>
+
+<threshold level="critical" remaining="10%">
+**Status**: Must stop
+**Action**:
+1. Complete current atomic task (don't leave broken state)
+2. **Automatically create handoff** without asking
+3. Tell user: "Context limit reached. Created handoff at [location]. Start fresh session to continue."
+4. **Stop working** - do not start any new tasks
+
+This is non-negotiable. Running out of context mid-task is worse than stopping early.
+</threshold>
+</thresholds>
+
+<what_counts_as_atomic>
+An atomic operation is one that shouldn't be interrupted:
+
+**Atomic (finish before stopping)**:
+- Writing a single file
+- Running a validation command
+- Completing a single task from the plan
+
+**Not atomic (can pause between)**:
+- Multiple tasks in sequence
+- Multi-file changes (can pause between files)
+- Research + implementation (can pause between)
+
+When hitting 10% threshold, finish current atomic operation, then stop.
+</what_counts_as_atomic>
+
+<handoff_content_at_limit>
+When auto-creating handoff at 10%, include:
+
+```yaml
+---
+phase: [current phase]
+task: [current task number]
+total_tasks: [total]
+status: context_limit_reached
+last_updated: [timestamp]
+---
+```
+
+Body must capture:
+1. What was just completed
+2. What task was in progress (and how far)
+3. What remains
+4. Any decisions/context from this session
+
+Be thorough - the next session starts fresh.
+</handoff_content_at_limit>
+
+<preventing_context_bloat>
+Strategies to extend context life:
+
+**Don't re-read files unnecessarily**
+- Read once, remember content
+- Don't cat the same file multiple times
+
+**Summarize rather than quote**
+- "The schema has 5 models including User and Session"
+- Not: [paste entire schema]
+
+**Use targeted reads**
+- Read specific functions, not entire files
+- Use grep to find relevant sections
+
+**Clear completed work from "memory"**
+- Once a task is done, don't keep referencing it
+- Move forward, don't re-explain
+
+**Avoid verbose output**
+- Concise responses
+- Don't repeat user's question back
+- Don't over-explain obvious things
+</preventing_context_bloat>
+
+<user_signals>
+Watch for user signals that suggest context concern:
+
+- "Let's wrap up"
+- "Save my place"
+- "I need to step away"
+- "Pack it up"
+- "Create a handoff"
+- "Running low on context?"
+
+Any of these → trigger handoff workflow immediately.
+</user_signals>
+
+<fresh_session_guidance>
+When user returns in fresh session:
+
+1. They invoke skill
+2. Context scan finds handoff
+3. Resume workflow activates
+4. Load handoff, present summary
+5. Delete handoff after confirmation
+6. Continue from saved state
+
+The fresh session has full context available again.
+</fresh_session_guidance>
--- a/skills/create-plans/references/domain-expertise.md
+++ b/skills/create-plans/references/domain-expertise.md
@@ -0,0 +1,170 @@
+# Domain Expertise Structure
+
+Guide for creating domain expertise skills that work efficiently with create-plans.
+
+## Purpose
+
+Domain expertise provides context-specific knowledge (Swift/macOS patterns, Next.js conventions, Unity workflows) that makes plans more accurate and actionable.
+
+**Critical:** Domain skills must be context-efficient. Loading 20k+ tokens of references defeats the purpose.
+
+## File Structure
+
+```
+~/.claude/skills/expertise/[domain-name]/
+├── SKILL.md              # Core principles + references_index (5-7k tokens)
+├── references/           # Selective loading based on phase type
+│   ├── always-useful.md  # Conventions, patterns used in all phases
+│   ├── database.md       # Database-specific guidance
+│   ├── ui-layout.md      # UI-specific guidance
+│   ├── api-routes.md     # API-specific guidance
+│   └── ...
+└── workflows/            # Optional: domain-specific workflows
+    └── ...
+```
+
+## SKILL.md Template
+
+```markdown
+---
+name: [domain-name]
+description: [What this expertise covers]
+---
+
+<principles>
+## Core Principles
+
+[Fundamental patterns that apply to ALL work in this domain]
+[Should be complete enough to plan without loading references]
+
+Examples:
+- File organization patterns
+- Naming conventions
+- Architecture patterns
+- Common gotchas to avoid
+- Framework-specific requirements
+
+**Keep this section comprehensive but concise (~3-5k tokens).**
+</principles>
+
+<references_index>
+## Reference Loading Guide
+
+When planning phases, load references based on phase type:
+
+**For [phase-type-1] phases:**
+- references/[file1].md - [What it contains]
+- references/[file2].md - [What it contains]
+
+**For [phase-type-2] phases:**
+- references/[file3].md - [What it contains]
+- references/[file4].md - [What it contains]
+
+**Always useful (load for any phase):**
+- references/conventions.md - [What it contains]
+- references/common-patterns.md - [What it contains]
+
+**Examples of phase type mapping:**
+- Database/persistence phases → database.md, migrations.md
+- UI/layout phases → ui-patterns.md, design-system.md
+- API/backend phases → api-routes.md, auth.md
+- Integration phases → system-apis.md, third-party.md
+</references_index>
+
+<workflows>
+## Optional Workflows
+
+[If domain has specific workflows, list them here]
+[These are NOT auto-loaded - only used when specifically invoked]
+</workflows>
+```
+
+## Reference File Guidelines
+
+Each reference file should be:
+
+**1. Focused** - Single concern (database patterns, UI layout, API design)
+
+**2. Actionable** - Contains patterns Claude can directly apply
+```markdown
+# Database Patterns
+
+## Table Naming
+- Singular nouns (User, not Users)
+- snake_case for SQL, PascalCase for models
+
+## Common Patterns
+- Soft deletes: deleted_at timestamp
+- Audit columns: created_at, updated_at
+- Foreign keys: [table]_id format
+```
+
+**3. Sized appropriately** - 500-2000 lines (~1-5k tokens)
+   - Too small: Not worth separate file
+   - Too large: Split into more focused files
+
+**4. Self-contained** - Can be understood without reading other references
+
+## Context Efficiency Examples
+
+**Bad (old approach):**
+```
+Load all references: 10,728 lines = ~27k tokens
+Result: 50% context before planning starts
+```
+
+**Good (new approach):**
+```
+Load SKILL.md: ~5k tokens
+Planning UI phase → load ui-layout.md + conventions.md: ~7k tokens
+Total: ~12k tokens (saves 15k for workspace)
+```
+
+## Phase Type Classification
+
+Help create-plans determine which references to load:
+
+**Common phase types:**
+- **Foundation/Setup** - Project structure, dependencies, configuration
+- **Database/Data** - Schema, models, migrations, queries
+- **API/Backend** - Routes, controllers, business logic, auth
+- **UI/Frontend** - Components, layouts, styling, interactions
+- **Integration** - External APIs, system services, third-party SDKs
+- **Features** - Domain-specific functionality
+- **Polish** - Performance, accessibility, error handling
+
+**References should map to these types** so create-plans can load the right context.
+
+## Migration Guide
+
+If you have an existing domain skill with many references:
+
+1. **Audit references** - What's actually useful vs. reference dumps?
+
+2. **Consolidate principles** - Move core patterns into SKILL.md principles section
+
+3. **Create references_index** - Map phase types to relevant references
+
+4. **Test loading** - Verify you can plan a phase with <15k token overhead
+
+5. **Iterate** - Adjust groupings based on actual planning needs
+
+## Example: macos-apps
+
+**Before (inefficient):**
+- 20 reference files
+- Load all: 10,728 lines (~27k tokens)
+
+**After (efficient):**
+
+SKILL.md contains:
+- Swift/SwiftUI core principles
+- macOS app architecture patterns
+- Common patterns (MV VM, data flow)
+- references_index mapping:
+  - UI phases → swiftui-layout.md, appleHIG.md (~4k)
+  - Data phases → core-data.md, swift-concurrency.md (~5k)
+  - System phases → appkit-integration.md, menu-bar.md (~3k)
+  - Always → swift-conventions.md (~2k)
+
+**Result:** 5-12k tokens instead of 27k (saves 15-22k for planning)
--- a/skills/create-plans/references/git-integration.md
+++ b/skills/create-plans/references/git-integration.md
@@ -0,0 +1,106 @@
+# Git Integration Reference
+
+## Core Principle
+
+**Commit outcomes, not process.**
+
+The git log should read like a changelog of what shipped, not a diary of planning activity.
+
+## Commit Points (Only 3)
+
+| Event | Commit? | Why |
+|-------|---------|-----|
+| BRIEF + ROADMAP created | YES | Project initialization |
+| PLAN.md created | NO | Intermediate - commit with completion |
+| RESEARCH.md created | NO | Intermediate |
+| FINDINGS.md created | NO | Intermediate |
+| **Phase completed** | YES | Actual code shipped |
+| Handoff created | YES | WIP state preserved |
+
+## Git Check on Invocation
+
+```bash
+git rev-parse --git-dir 2>/dev/null || echo "NO_GIT_REPO"
+```
+
+If NO_GIT_REPO:
+- Inline: "No git repo found. Initialize one? (Recommended for version control)"
+- If yes: `git init`
+
+## Commit Message Formats
+
+### 1. Project Initialization (brief + roadmap together)
+
+```
+docs: initialize [project-name] ([N] phases)
+
+[One-liner from BRIEF.md]
+
+Phases:
+1. [phase-name]: [goal]
+2. [phase-name]: [goal]
+3. [phase-name]: [goal]
+```
+
+What to commit:
+```bash
+git add .planning/
+git commit
+```
+
+### 2. Phase Completion
+
+```
+feat([domain]): [one-liner from SUMMARY.md]
+
+- [Key accomplishment 1]
+- [Key accomplishment 2]
+- [Key accomplishment 3]
+
+[If issues encountered:]
+Note: [issue and resolution]
+```
+
+Use `fix([domain])` for bug fix phases.
+
+What to commit:
+```bash
+git add .planning/phases/XX-name/  # PLAN.md + SUMMARY.md
+git add src/                        # Actual code created
+git commit
+```
+
+### 3. Handoff (WIP)
+
+```
+wip: [phase-name] paused at task [X]/[Y]
+
+Current: [task name]
+[If blocked:] Blocked: [reason]
+```
+
+What to commit:
+```bash
+git add .planning/
+git commit
+```
+
+## Example Clean Git Log
+
+```
+a]7f2d1 feat(checkout): Stripe payments with webhook verification
+b]3e9c4 feat(products): catalog with search, filters, and pagination
+c]8a1b2 feat(auth): JWT with refresh rotation using jose
+d]5c3d7 feat(foundation): Next.js 15 + Prisma + Tailwind scaffold
+e]2f4a8 docs: initialize ecommerce-app (5 phases)
+```
+
+## What NOT To Commit Separately
+
+- PLAN.md creation (wait for phase completion)
+- RESEARCH.md (intermediate)
+- FINDINGS.md (intermediate)
+- Minor planning tweaks
+- "Fixed typo in roadmap"
+
+These create noise. Commit outcomes, not process.
--- a/skills/create-plans/references/hierarchy-rules.md
+++ b/skills/create-plans/references/hierarchy-rules.md
@@ -0,0 +1,142 @@
+<overview>
+The planning hierarchy ensures context flows down and progress flows up.
+Each level builds on the previous and enables the next.
+</overview>
+
+<hierarchy>
+```
+BRIEF.md          ← Vision (human-focused)
+    ↓
+ROADMAP.md        ← Structure (phases)
+    ↓
+phases/XX/PLAN.md ← Implementation (Claude-executable)
+    ↓
+prompts/          ← Execution (via create-meta-prompts)
+```
+</hierarchy>
+
+<level name="brief">
+**Purpose**: Capture vision, goals, constraints
+**Audience**: Human (the user)
+**Contains**: What we're building, why, success criteria, out of scope
+**Creates**: `.planning/BRIEF.md`
+
+**Requires**: Nothing (can start here)
+**Enables**: Roadmap creation
+
+This is the ONLY document optimized for human reading.
+</level>
+
+<level name="roadmap">
+**Purpose**: Define phases and sequence
+**Audience**: Both human and Claude
+**Contains**: Phase names, goals, dependencies, progress tracking
+**Creates**: `.planning/ROADMAP.md`, `.planning/phases/` directories
+
+**Requires**: Brief (or quick context if skipping)
+**Enables**: Phase planning
+
+Roadmap looks UP to Brief for scope, looks DOWN to track phase completion.
+</level>
+
+<level name="phase_plan">
+**Purpose**: Define Claude-executable tasks
+**Audience**: Claude (the implementer)
+**Contains**: Tasks with Files/Action/Verification/Done-when
+**Creates**: `.planning/phases/XX-name/PLAN.md`
+
+**Requires**: Roadmap (to know phase scope)
+**Enables**: Prompt generation, direct execution
+
+Phase plan looks UP to Roadmap for scope, produces implementation details.
+</level>
+
+<level name="prompts">
+**Purpose**: Optimized execution instructions
+**Audience**: Claude (via create-meta-prompts)
+**Contains**: Research/Plan/Do prompts with metadata
+**Creates**: `.planning/phases/XX-name/prompts/`
+
+**Requires**: Phase plan (tasks to execute)
+**Enables**: Autonomous execution
+
+Prompts are generated from phase plan via create-meta-prompts skill.
+</level>
+
+<navigation_rules>
+<looking_up>
+When creating a lower-level artifact, ALWAYS read higher levels for context:
+
+- Creating Roadmap → Read Brief
+- Planning Phase → Read Roadmap AND Brief
+- Generating Prompts → Read Phase Plan AND Roadmap
+
+This ensures alignment with overall vision.
+</looking_up>
+
+<looking_down>
+When updating a higher-level artifact, check lower levels for status:
+
+- Updating Roadmap progress → Check which phase PLANs exist, completion state
+- Reviewing Brief → See how far we've come via Roadmap
+
+This enables progress tracking.
+</looking_down>
+
+<missing_prerequisites>
+If a prerequisite doesn't exist:
+
+```
+Creating phase plan but no roadmap exists.
+
+Options:
+1. Create roadmap first (recommended)
+2. Create quick roadmap placeholder
+3. Proceed anyway (not recommended - loses hierarchy benefits)
+```
+
+Always offer to create missing pieces rather than skipping.
+</missing_prerequisites>
+</navigation_rules>
+
+<file_locations>
+All planning artifacts in `.planning/`:
+
+```
+.planning/
+├── BRIEF.md                    # One per project
+├── ROADMAP.md                  # One per project
+└── phases/
+    ├── 01-phase-name/
+    │   ├── PLAN.md             # One per phase
+    │   ├── .continue-here.md   # Temporary (when paused)
+    │   └── prompts/            # Generated execution prompts
+    ├── 02-phase-name/
+    │   ├── PLAN.md
+    │   └── prompts/
+    └── ...
+```
+
+Phase directories use `XX-kebab-case` for consistent ordering.
+</file_locations>
+
+<scope_inheritance>
+Each level inherits and narrows scope:
+
+**Brief**: "Build a task management app"
+**Roadmap**: "Phase 1: Core task CRUD, Phase 2: Projects, Phase 3: Collaboration"
+**Phase 1 Plan**: "Task 1: Database schema, Task 2: API endpoints, Task 3: UI"
+
+Scope flows DOWN and gets more specific.
+Progress flows UP and gets aggregated.
+</scope_inheritance>
+
+<cross_phase_context>
+When planning Phase N, Claude should understand:
+
+- What Phase N-1 delivered (completed work)
+- What Phase N should build on (foundations)
+- What Phase N+1 will need (don't paint into corner)
+
+Read previous phase's PLAN.md to understand current state.
+</cross_phase_context>
--- a/skills/create-plans/references/milestone-management.md
+++ b/skills/create-plans/references/milestone-management.md
@@ -0,0 +1,495 @@
+# Milestone Management & Greenfield/Brownfield Planning
+
+Milestones mark shipped versions. They solve the "what happens after v1.0?" problem.
+
+## The Core Problem
+
+**After shipping v1.0:**
+- Planning artifacts optimized for greenfield (starting from scratch)
+- But now you have: existing code, users, constraints, shipped features
+- Need brownfield awareness without losing planning structure
+
+**Solution:** Milestone-bounded extensions with updated BRIEF.
+
+## Three Planning Modes
+
+### 1. Greenfield (v1.0 Initial Development)
+
+**Characteristics:**
+- No existing code
+- No users
+- No constraints from shipped versions
+- Pure "build from scratch" mode
+
+**Planning structure:**
+```
+.planning/
+├── BRIEF.md              # Original vision
+├── ROADMAP.md            # Phases 1-4
+└── phases/
+    ├── 01-foundation/
+    ├── 02-features/
+    ├── 03-polish/
+    └── 04-launch/
+```
+
+**BRIEF.md looks like:**
+```markdown
+# Project Brief: AppName
+
+**Vision:** Build a thing that does X
+
+**Purpose:** Solve problem Y
+
+**Scope:**
+- Feature A
+- Feature B
+- Feature C
+
+**Success:** Ships and works
+```
+
+**Workflow:** Normal planning → execution → transition flow
+
+---
+
+### 2. Brownfield Extensions (v1.1, v1.2 - Same Codebase)
+
+**Characteristics:**
+- v1.0 shipped and in use
+- Adding features / fixing issues
+- Same codebase, continuous evolution
+- Existing code referenced in new plans
+
+**Planning structure:**
+```
+.planning/
+├── BRIEF.md              # Updated with "Current State"
+├── ROADMAP.md            # Phases 1-6 (grouped by milestone)
+├── MILESTONES.md         # v1.0 entry
+└── phases/
+    ├── 01-foundation/    # ✓ v1.0
+    ├── 02-features/      # ✓ v1.0
+    ├── 03-polish/        # ✓ v1.0
+    ├── 04-launch/        # ✓ v1.0
+    ├── 05-security/      # 🚧 v1.1 (in progress)
+    └── 06-performance/   # 📋 v1.1 (planned)
+```
+
+**BRIEF.md updated:**
+```markdown
+# Project Brief: AppName
+
+## Current State (Updated: 2025-12-01)
+
+**Shipped:** v1.0 MVP (2025-11-25)
+**Users:** 500 downloads, 50 daily actives
+**Feedback:** Requesting dark mode, occasional crashes on network errors
+**Codebase:** 2,450 lines Swift, macOS 13.0+, AppKit
+
+## v1.1 Goals
+
+**Vision:** Harden reliability and add dark mode based on user feedback
+
+**Motivation:**
+- 5 crash reports related to network errors
+- 15 users requested dark mode
+- Want to improve before marketing push
+
+**Scope (v1.1):**
+- Comprehensive error handling
+- Dark mode support
+- Crash reporting integration
+
+---
+
+<details>
+<summary>Original Vision (v1.0 - Archived)</summary>
+
+[Original brief content]
+
+</details>
+```
+
+**ROADMAP.md updated:**
+```markdown
+# Roadmap: AppName
+
+## Milestones
+
+- ✅ **v1.0 MVP** - Phases 1-4 (shipped 2025-11-25)
+- 🚧 **v1.1 Hardening** - Phases 5-6 (in progress)
+
+## Phases
+
+<details>
+<summary>✅ v1.0 MVP (Phases 1-4) - SHIPPED 2025-11-25</summary>
+
+- [x] Phase 1: Foundation
+- [x] Phase 2: Core Features
+- [x] Phase 3: Polish
+- [x] Phase 4: Launch
+
+</details>
+
+### 🚧 v1.1 Hardening (In Progress)
+
+- [ ] Phase 5: Error Handling & Stability
+- [ ] Phase 6: Dark Mode UI
+```
+
+**How plans become brownfield-aware:**
+
+When planning Phase 5, the PLAN.md automatically gets context:
+
+```markdown
+<context>
+@.planning/BRIEF.md                      # Knows: v1.0 shipped, codebase exists
+@.planning/MILESTONES.md                 # Knows: what v1.0 delivered
+@AppName/NetworkManager.swift            # Existing code to improve
+@AppName/APIClient.swift                 # Existing code to fix
+</context>
+
+<tasks>
+<task type="auto">
+  <name>Add comprehensive error handling to NetworkManager</name>
+  <files>AppName/NetworkManager.swift</files>
+  <action>Existing NetworkManager has basic try/catch. Add: retry logic (3 attempts with exponential backoff), specific error types (NetworkError enum), user-friendly error messages. Maintain existing public API - internal improvements only.</action>
+  <verify>Build succeeds, existing tests pass, new error tests pass</verify>
+  <done>All network calls have retry logic, error messages are user-friendly</done>
+</task>
+```
+
+**Key difference from greenfield:**
+- PLAN references existing files in `<context>`
+- Tasks say "update existing X" not "create X"
+- Verify includes "existing tests pass" (regression check)
+- Checkpoints may verify existing behavior still works
+
+---
+
+### 3. Major Iterations (v2.0+ - Still Same Codebase)
+
+**Characteristics:**
+- Large rewrites within same codebase
+- 8-15+ phases planned
+- Breaking changes, new architecture
+- Still continuous from v1.x
+
+**Planning structure:**
+```
+.planning/
+├── BRIEF.md              # Updated for v2.0 vision
+├── ROADMAP.md            # Phases 1-14 (grouped)
+├── MILESTONES.md         # v1.0, v1.1 entries
+└── phases/
+    ├── 01-foundation/    # ✓ v1.0
+    ├── 02-features/      # ✓ v1.0
+    ├── 03-polish/        # ✓ v1.0
+    ├── 04-launch/        # ✓ v1.0
+    ├── 05-security/      # ✓ v1.1
+    ├── 06-performance/   # ✓ v1.1
+    ├── 07-swiftui-core/  # 🚧 v2.0 (in progress)
+    ├── 08-swiftui-views/ # 📋 v2.0 (planned)
+    ├── 09-new-arch/      # 📋 v2.0
+    └── ...               # Up to 14
+```
+
+**ROADMAP.md:**
+```markdown
+## Milestones
+
+- ✅ **v1.0 MVP** - Phases 1-4 (shipped 2025-11-25)
+- ✅ **v1.1 Hardening** - Phases 5-6 (shipped 2025-12-10)
+- 🚧 **v2.0 SwiftUI Redesign** - Phases 7-14 (in progress)
+
+## Phases
+
+<details>
+<summary>✅ v1.0 MVP (Phases 1-4)</summary>
+[Collapsed]
+</details>
+
+<details>
+<summary>✅ v1.1 Hardening (Phases 5-6)</summary>
+[Collapsed]
+</details>
+
+### 🚧 v2.0 SwiftUI Redesign (In Progress)
+
+- [ ] Phase 7: SwiftUI Core Migration
+- [ ] Phase 8: SwiftUI Views
+- [ ] Phase 9: New Architecture
+- [ ] Phase 10: Widget Support
+- [ ] Phase 11: iOS Companion
+- [ ] Phase 12: Performance
+- [ ] Phase 13: Testing
+- [ ] Phase 14: Launch
+```
+
+**Same rules apply:** Continuous phase numbering, milestone groupings, brownfield-aware plans.
+
+---
+
+## When to Archive and Start Fresh
+
+**Archive ONLY for these scenarios:**
+
+### Scenario 1: Separate Codebase
+
+**Example:**
+- Built: WeatherBar (macOS app) ✓ shipped
+- Now building: WeatherBar-iOS (separate Xcode project, different repo or workspace)
+
+**Action:**
+```
+.planning/
+├── archive/
+│   └── v1-macos/
+│       ├── BRIEF.md
+│       ├── ROADMAP.md
+│       ├── MILESTONES.md
+│       └── phases/
+├── BRIEF.md              # Fresh: iOS app
+├── ROADMAP.md            # Fresh: starts at phase 01
+└── phases/
+    └── 01-ios-foundation/
+```
+
+**Why:** Different codebase = different planning context. Old planning doesn't help with iOS-specific decisions.
+
+### Scenario 2: Complete Rewrite (Different Repo)
+
+**Example:**
+- Built: AppName v1 (AppKit, shipped) ✓
+- Now building: AppName v2 (complete SwiftUI rewrite, new git repo)
+
+**Action:** Same as Scenario 1 - archive v1, fresh planning for v2
+
+**Why:** New repo, starting from scratch, v1 planning doesn't transfer.
+
+### Scenario 3: Different Product
+
+**Example:**
+- Built: WeatherBar (weather app) ✓
+- Now building: TaskBar (task management app)
+
+**Action:** New project entirely, new `.planning/` directory
+
+**Why:** Completely different product, no relationship.
+
+---
+
+## Decision Tree
+
+```
+Starting new work?
+│
+├─ Same codebase/repo?
+│  │
+│  ├─ YES → Extend existing roadmap
+│  │        ├─ Add phases 5-6+ to ROADMAP
+│  │        ├─ Update BRIEF "Current State"
+│  │        ├─ Plans reference existing code in @context
+│  │        └─ Continue normal workflow
+│  │
+│  └─ NO → Is it a separate platform/codebase for same product?
+│           │
+│           ├─ YES (e.g., iOS version of Mac app)
+│           │    └─ Archive existing planning
+│           │         └─ Start fresh with new BRIEF/ROADMAP
+│           │              └─ Reference original in "Context" section
+│           │
+│           └─ NO (completely different product)
+│                └─ New project, new planning directory
+│
+└─ Is this v1.0 initial delivery?
+   └─ YES → Greenfield mode
+            └─ Just follow normal workflow
+```
+
+---
+
+## Milestone Workflow Triggers
+
+### When completing v1.0 (first ship):
+
+**User:** "I'm ready to ship v1.0"
+
+**Action:**
+1. Verify phases 1-4 complete (all summaries exist)
+2. `/milestone:complete "v1.0 MVP"`
+3. Creates MILESTONES.md entry
+4. Updates BRIEF with "Current State"
+5. Reorganizes ROADMAP with milestone grouping
+6. Git tag v1.0
+7. Commit milestone changes
+
+**Result:** Historical record created, ready for v1.1 work
+
+### When adding v1.1 work:
+
+**User:** "Add dark mode and notifications"
+
+**Action:**
+1. Check BRIEF "Current State" - sees v1.0 shipped
+2. Ask: "Add phases 5-6 to existing roadmap? (yes / archive and start fresh)"
+3. User: "yes"
+4. Update BRIEF with v1.1 goals
+5. Add Phase 5-6 to ROADMAP under "v1.1" milestone heading
+6. Continue normal planning workflow
+
+**Result:** Phases 5-6 added, brownfield-aware through updated BRIEF
+
+### When completing v1.1:
+
+**User:** "Ship v1.1"
+
+**Action:**
+1. Verify phases 5-6 complete
+2. `/milestone:complete "v1.1 Security"`
+3. Add v1.1 entry to MILESTONES.md (prepended, newest first)
+4. Update BRIEF current state to v1.1
+5. Collapse phases 5-6 in ROADMAP
+6. Git tag v1.1
+
+**Result:** v1.0 and v1.1 both in MILESTONES.md, ROADMAP shows history
+
+---
+
+## Brownfield Plan Patterns
+
+**How a brownfield plan differs from greenfield:**
+
+### Greenfield Plan (v1.0):
+```markdown
+<objective>
+Create authentication system from scratch.
+</objective>
+
+<context>
+@.planning/BRIEF.md
+@.planning/ROADMAP.md
+</context>
+
+<tasks>
+<task type="auto">
+  <name>Create User model</name>
+  <files>src/models/User.ts</files>
+  <action>Create User interface with id, email, passwordHash, createdAt fields. Export from models/index.</action>
+  <verify>TypeScript compiles, User type exported</verify>
+  <done>User model exists and is importable</done>
+</task>
+```
+
+### Brownfield Plan (v1.1):
+```markdown
+<objective>
+Add MFA to existing authentication system.
+</objective>
+
+<context>
+@.planning/BRIEF.md              # Shows v1.0 shipped, auth exists
+@.planning/MILESTONES.md         # Shows what v1.0 delivered
+@src/models/User.ts              # Existing User model
+@src/auth/AuthService.ts         # Existing auth logic
+</context>
+
+<tasks>
+<task type="auto">
+  <name>Add MFA fields to User model</name>
+  <files>src/models/User.ts</files>
+  <action>Add to existing User interface: mfaEnabled (boolean), mfaSecret (string | null), mfaBackupCodes (string[]). Maintain backward compatibility - all new fields optional or have defaults.</action>
+  <verify>TypeScript compiles, existing User usages still work</verify>
+  <done>User model has MFA fields, no breaking changes</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>MFA enrollment flow</what-built>
+  <how-to-verify>
+    1. Run: npm run dev
+    2. Login as existing user (test@example.com)
+    3. Navigate to Settings → Security
+    4. Click "Enable MFA" - should show QR code
+    5. Scan with authenticator app (Google Authenticator)
+    6. Enter code - should enable successfully
+    7. Logout, login again - should prompt for MFA code
+    8. Verify: existing users without MFA can still login (backward compat)
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+
+**Key differences:**
+1. **@context** includes existing code files
+2. **Actions** say "add to existing" / "update existing" / "maintain backward compat"
+3. **Verification** includes regression checks ("existing X still works")
+4. **Checkpoints** may verify existing user flows still work
+
+---
+
+## BRIEF Current State Section
+
+The "Current State" section in BRIEF.md is what makes plans brownfield-aware.
+
+**After v1.0 ships:**
+
+```markdown
+## Current State (Updated: 2025-11-25)
+
+**Shipped:** v1.0 MVP (2025-11-25)
+**Status:** Production
+**Users:** 500 downloads, 50 daily actives, growing 10% weekly
+**Feedback:**
+- "Love the simplicity" (common theme)
+- 15 requests for dark mode
+- 5 crash reports on network errors
+- 3 requests for multiple accounts
+
+**Codebase:**
+- 2,450 lines of Swift
+- macOS 13.0+ (AppKit)
+- OpenWeather API integration
+- Auto-refresh every 30 min
+- Signed and notarized
+
+**Known Issues:**
+- Network errors crash app (no retry logic)
+- Memory leak in auto-refresh timer
+- No dark mode support
+```
+
+When planning Phase 5 (v1.1), Claude reads this and knows:
+- Code exists (2,450 lines Swift)
+- Users exist (500 downloads)
+- Feedback exists (15 want dark mode)
+- Issues exist (network crashes, memory leak)
+
+Plans automatically become brownfield-aware because BRIEF says "this is what we have."
+
+---
+
+## Summary
+
+**Greenfield (v1.0):**
+- Fresh BRIEF with vision
+- Phases 1-4 (or however many)
+- Plans create from scratch
+- Ship → complete milestone
+
+**Brownfield (v1.1+):**
+- Update BRIEF "Current State"
+- Add phases 5-6+ to ROADMAP
+- Plans reference existing code
+- Plans include regression checks
+- Ship → complete milestone
+
+**Archive (rare):**
+- Only for separate codebases or different products
+- Move `.planning/` to `.planning/archive/v1-name/`
+- Start fresh with new BRIEF/ROADMAP
+- New planning references old in context
+
+**Key insight:** Same roadmap, continuous phase numbering (01-99), milestone groupings keep it organized. BRIEF "Current State" makes everything brownfield-aware automatically.
+
+This scales from "hello world" to 100 shipped versions.
--- a/skills/create-plans/references/plan-format.md
+++ b/skills/create-plans/references/plan-format.md
@@ -0,0 +1,377 @@
+<overview>
+Claude-executable plans have a specific format that enables Claude to implement without interpretation. This reference defines what makes a plan executable vs. vague.
+
+**Key insight:** PLAN.md IS the executable prompt. It contains everything Claude needs to execute the phase, including objective, context references, tasks, verification, success criteria, and output specification.
+</overview>
+
+<core_principle>
+A plan is Claude-executable when Claude can read the PLAN.md and immediately start implementing without asking clarifying questions.
+
+If Claude has to guess, interpret, or make assumptions - the task is too vague.
+</core_principle>
+
+<prompt_structure>
+Every PLAN.md follows this XML structure:
+
+```markdown
+---
+phase: XX-name
+type: execute
+domain: [optional]
+---
+
+<objective>
+[What and why]
+Purpose: [...]
+Output: [...]
+</objective>
+
+<context>
+@.planning/BRIEF.md
+@.planning/ROADMAP.md
+@relevant/source/files.ts
+</context>
+
+<tasks>
+<task type="auto">
+  <name>Task N: [Name]</name>
+  <files>[paths]</files>
+  <action>[what to do, what to avoid and WHY]</action>
+  <verify>[command/check]</verify>
+  <done>[criteria]</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>[what Claude automated]</what-built>
+  <how-to-verify>[numbered verification steps]</how-to-verify>
+  <resume-signal>[how to continue - "approved" or describe issues]</resume-signal>
+</task>
+
+<task type="checkpoint:decision" gate="blocking">
+  <decision>[what needs deciding]</decision>
+  <context>[why this matters]</context>
+  <options>
+    <option id="option-a"><name>[Name]</name><pros>[pros]</pros><cons>[cons]</cons></option>
+    <option id="option-b"><name>[Name]</name><pros>[pros]</pros><cons>[cons]</cons></option>
+  </options>
+  <resume-signal>[how to indicate choice]</resume-signal>
+</task>
+</tasks>
+
+<verification>
+[Overall phase checks]
+</verification>
+
+<success_criteria>
+[Measurable completion]
+</success_criteria>
+
+<output>
+[SUMMARY.md specification]
+</output>
+```
+</prompt_structure>
+
+<task_anatomy>
+Every task has four required fields:
+
+<field name="files">
+**What it is**: Exact file paths that will be created or modified.
+
+**Good**: `src/app/api/auth/login/route.ts`, `prisma/schema.prisma`
+**Bad**: "the auth files", "relevant components"
+
+Be specific. If you don't know the file path, figure it out first.
+</field>
+
+<field name="action">
+**What it is**: Specific implementation instructions, including what to avoid and WHY.
+
+**Good**: "Create POST endpoint that accepts {email, password}, validates using bcrypt against User table, returns JWT in httpOnly cookie with 15-min expiry. Use jose library (not jsonwebtoken - CommonJS issues with Next.js Edge runtime)."
+
+**Bad**: "Add authentication", "Make login work"
+
+Include: technology choices, data structures, behavior details, pitfalls to avoid.
+</field>
+
+<field name="verify">
+**What it is**: How to prove the task is complete.
+
+**Good**:
+- `npm test` passes
+- `curl -X POST /api/auth/login` returns 200 with Set-Cookie header
+- Build completes without errors
+
+**Bad**: "It works", "Looks good", "User can log in"
+
+Must be executable - a command, a test, an observable behavior.
+</field>
+
+<field name="done">
+**What it is**: Acceptance criteria - the measurable state of completion.
+
+**Good**: "Valid credentials return 200 + JWT cookie, invalid credentials return 401"
+
+**Bad**: "Authentication is complete"
+
+Should be testable without subjective judgment.
+</field>
+</task_anatomy>
+
+<task_types>
+Tasks have a `type` attribute that determines how they execute:
+
+<type name="auto">
+**Default task type** - Claude executes autonomously.
+
+**Structure:**
+```xml
+<task type="auto">
+  <name>Task 3: Create login endpoint with JWT</name>
+  <files>src/app/api/auth/login/route.ts</files>
+  <action>POST endpoint accepting {email, password}. Query User by email, compare password with bcrypt. On match, create JWT with jose library, set as httpOnly cookie (15-min expiry). Return 200. On mismatch, return 401.</action>
+  <verify>curl -X POST localhost:3000/api/auth/login returns 200 with Set-Cookie header</verify>
+  <done>Valid credentials → 200 + cookie. Invalid → 401.</done>
+</task>
+```
+
+Use for: Everything Claude can do independently (code, tests, builds, file operations).
+</type>
+
+<type name="checkpoint:human-action">
+**RARELY USED** - Only for actions with NO CLI/API. Claude automates everything possible first.
+
+**Structure:**
+```xml
+<task type="checkpoint:human-action" gate="blocking">
+  <action>[Unavoidable manual step - email link, 2FA code]</action>
+  <instructions>
+    [What Claude already automated]
+    [The ONE thing requiring human action]
+  </instructions>
+  <verification>[What Claude can check afterward]</verification>
+  <resume-signal>[How to continue]</resume-signal>
+</task>
+```
+
+Use ONLY for: Email verification links, SMS 2FA codes, manual approvals with no API, 3D Secure payment flows.
+
+Do NOT use for: Anything with a CLI (Vercel, Stripe, Upstash, Railway, GitHub), builds, tests, file creation, deployments.
+
+See: references/cli-automation.md for what Claude can automate.
+
+**Execution:** Claude automates everything with CLI/API, stops only for truly unavoidable manual steps.
+</type>
+
+<type name="checkpoint:human-verify">
+**Human must verify Claude's work** - Visual checks, UX testing.
+
+**Structure:**
+```xml
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Responsive dashboard layout</what-built>
+  <how-to-verify>
+    1. Run: npm run dev
+    2. Visit: http://localhost:3000/dashboard
+    3. Desktop (>1024px): Verify sidebar left, content right
+    4. Tablet (768px): Verify sidebar collapses to hamburger
+    5. Mobile (375px): Verify single column, bottom nav
+    6. Check: No layout shift, no horizontal scroll
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+
+Use for: UI/UX verification, visual design checks, animation smoothness, accessibility testing.
+
+**Execution:** Claude builds the feature, stops, provides testing instructions, waits for approval/feedback.
+</type>
+
+<type name="checkpoint:decision">
+**Human must make implementation choice** - Direction-setting decisions.
+
+**Structure:**
+```xml
+<task type="checkpoint:decision" gate="blocking">
+  <decision>Select authentication provider</decision>
+  <context>We need user authentication. Three approaches with different tradeoffs:</context>
+  <options>
+    <option id="supabase">
+      <name>Supabase Auth</name>
+      <pros>Built-in with Supabase, generous free tier</pros>
+      <cons>Less customizable UI, tied to ecosystem</cons>
+    </option>
+    <option id="clerk">
+      <name>Clerk</name>
+      <pros>Beautiful pre-built UI, best DX</pros>
+      <cons>Paid after 10k MAU</cons>
+    </option>
+    <option id="nextauth">
+      <name>NextAuth.js</name>
+      <pros>Free, self-hosted, maximum control</pros>
+      <cons>More setup, you manage security</cons>
+    </option>
+  </options>
+  <resume-signal>Select: supabase, clerk, or nextauth</resume-signal>
+</task>
+```
+
+Use for: Technology selection, architecture decisions, design choices, feature prioritization.
+
+**Execution:** Claude presents options with balanced pros/cons, waits for decision, proceeds with chosen direction.
+</type>
+
+**When to use checkpoints:**
+- Visual/UX verification (after Claude builds) → `checkpoint:human-verify`
+- Implementation direction choice → `checkpoint:decision`
+- Truly unavoidable manual actions (email links, 2FA) → `checkpoint:human-action` (rare)
+
+**When NOT to use checkpoints:**
+- Anything with CLI/API (Claude automates it) → `type="auto"`
+- Deployments (Vercel, Railway, Fly) → `type="auto"` with CLI
+- Creating resources (Upstash, Stripe, GitHub) → `type="auto"` with CLI/API
+- File operations, tests, builds → `type="auto"`
+
+**Golden rule:** If Claude CAN automate it, Claude MUST automate it. See: references/cli-automation.md
+
+See `references/checkpoints.md` for comprehensive checkpoint guidance.
+</task_types>
+
+<context_references>
+Use @file references to load context for the prompt:
+
+```markdown
+<context>
+@.planning/BRIEF.md           # Project vision
+@.planning/ROADMAP.md         # Phase structure
+@.planning/phases/02-auth/FINDINGS.md  # Research results
+@src/lib/db.ts                # Existing database setup
+@src/types/user.ts            # Existing type definitions
+</context>
+```
+
+Reference files that Claude needs to understand before implementing.
+</context_references>
+
+<verification_section>
+Overall phase verification (beyond individual task verification):
+
+```markdown
+<verification>
+Before declaring phase complete:
+- [ ] `npm run build` succeeds without errors
+- [ ] `npm test` passes all tests
+- [ ] No TypeScript errors
+- [ ] Feature works end-to-end manually
+</verification>
+```
+</verification_section>
+
+<success_criteria_section>
+Measurable criteria for phase completion:
+
+```markdown
+<success_criteria>
+- All tasks completed
+- All verification checks pass
+- No errors or warnings introduced
+- JWT auth flow works end-to-end
+- Protected routes redirect unauthenticated users
+</success_criteria>
+```
+</success_criteria_section>
+
+<output_section>
+Specify the SUMMARY.md structure:
+
+```markdown
+<output>
+After completion, create `.planning/phases/XX-name/SUMMARY.md`:
+
+# Phase X: Name Summary
+
+**[Substantive one-liner]**
+
+## Accomplishments
+## Files Created/Modified
+## Decisions Made
+## Issues Encountered
+## Next Phase Readiness
+</output>
+```
+</output_section>
+
+<specificity_levels>
+<too_vague>
+```xml
+<task type="auto">
+  <name>Task 1: Add authentication</name>
+  <files>???</files>
+  <action>Implement auth</action>
+  <verify>???</verify>
+  <done>Users can authenticate</done>
+</task>
+```
+
+Claude: "How? What type? What library? Where?"
+</too_vague>
+
+<just_right>
+```xml
+<task type="auto">
+  <name>Task 1: Create login endpoint with JWT</name>
+  <files>src/app/api/auth/login/route.ts</files>
+  <action>POST endpoint accepting {email, password}. Query User by email, compare password with bcrypt. On match, create JWT with jose library, set as httpOnly cookie (15-min expiry). Return 200. On mismatch, return 401. Use jose instead of jsonwebtoken (CommonJS issues with Edge).</action>
+  <verify>curl -X POST localhost:3000/api/auth/login -H "Content-Type: application/json" -d '{"email":"test@test.com","password":"test123"}' returns 200 with Set-Cookie header containing JWT</verify>
+  <done>Valid credentials → 200 + cookie. Invalid → 401. Missing fields → 400.</done>
+</task>
+```
+
+Claude can implement this immediately.
+</just_right>
+
+<too_detailed>
+Writing the actual code in the plan. Trust Claude to implement from clear instructions.
+</too_detailed>
+</specificity_levels>
+
+<anti_patterns>
+<vague_actions>
+- "Set up the infrastructure"
+- "Handle edge cases"
+- "Make it production-ready"
+- "Add proper error handling"
+
+These require Claude to decide WHAT to do. Specify it.
+</vague_actions>
+
+<unverifiable_completion>
+- "It works correctly"
+- "User experience is good"
+- "Code is clean"
+- "Tests pass" (which tests? do they exist?)
+
+These require subjective judgment. Make it objective.
+</unverifiable_completion>
+
+<missing_context>
+- "Use the standard approach"
+- "Follow best practices"
+- "Like the other endpoints"
+
+Claude doesn't know your standards. Be explicit.
+</missing_context>
+</anti_patterns>
+
+<sizing_tasks>
+Good task size: 15-60 minutes of Claude work.
+
+**Too small**: "Add import statement for bcrypt" (combine with related task)
+**Just right**: "Create login endpoint with JWT validation" (focused, specific)
+**Too big**: "Implement full authentication system" (split into multiple plans)
+
+If a task takes multiple sessions, break it down.
+If a task is trivial, combine with related tasks.
+
+**Note on scope:** If a phase has >7 tasks or spans multiple subsystems, split into multiple plans using the naming convention `{phase}-{plan}-PLAN.md`. See `references/scope-estimation.md` for guidance.
+</sizing_tasks>
--- a/skills/create-plans/references/research-pitfalls.md
+++ b/skills/create-plans/references/research-pitfalls.md
@@ -0,0 +1,198 @@
+# Research Pitfalls - Known Patterns to Avoid
+
+## Purpose
+This document catalogs research mistakes discovered in production use, providing specific patterns to avoid and verification strategies to prevent recurrence.
+
+## Known Pitfalls
+
+### Pitfall 1: Configuration Scope Assumptions
+**What**: Assuming global configuration means no project-scoping exists
+**Example**: Concluding "MCP servers are configured GLOBALLY only" while missing project-scoped `.mcp.json`
+**Why it happens**: Not explicitly checking all known configuration patterns
+**Prevention**:
+```xml
+<verification_checklist>
+**CRITICAL**: Verify ALL configuration scopes:
+□ User/global scope - System-wide configuration
+□ Project scope - Project-level configuration files
+□ Local scope - Project-specific user overrides
+□ Workspace scope - IDE/tool workspace settings
+□ Environment scope - Environment variables
+</verification_checklist>
+```
+
+### Pitfall 2: "Search for X" Vagueness
+**What**: Asking researchers to "search for documentation" without specifying where
+**Example**: "Research MCP documentation" → finds outdated community blog instead of official docs
+**Why it happens**: Vague research instructions don't specify exact sources
+**Prevention**:
+```xml
+<sources>
+Official sources (use WebFetch):
+- https://exact-url-to-official-docs
+- https://exact-url-to-api-reference
+
+Search queries (use WebSearch):
+- "specific search query {current_year}"
+- "another specific query {current_year}"
+</sources>
+```
+
+### Pitfall 3: Deprecated vs Current Features
+**What**: Finding archived/old documentation and concluding feature doesn't exist
+**Example**: Finding 2022 docs saying "feature not supported" when current version added it
+**Why it happens**: Not checking multiple sources or recent updates
+**Prevention**:
+```xml
+<verification_checklist>
+□ Check current official documentation
+□ Review changelog/release notes for recent updates
+□ Verify version numbers and publication dates
+□ Cross-reference multiple authoritative sources
+</verification_checklist>
+```
+
+### Pitfall 4: Tool-Specific Variations
+**What**: Conflating capabilities across different tools/environments
+**Example**: "Claude Desktop supports X" ≠ "Claude Code supports X"
+**Why it happens**: Not explicitly checking each environment separately
+**Prevention**:
+```xml
+<verification_checklist>
+□ Claude Desktop capabilities
+□ Claude Code capabilities
+□ VS Code extension capabilities
+□ API/SDK capabilities
+Document which environment supports which features
+</verification_checklist>
+```
+
+### Pitfall 5: Confident Negative Claims Without Citations
+**What**: Making definitive "X is not possible" statements without official source verification
+**Example**: "Folder-scoped MCP configuration is not supported" (missing `.mcp.json`)
+**Why it happens**: Drawing conclusions from absence of evidence rather than evidence of absence
+**Prevention**:
+```xml
+<critical_claims_audit>
+For any "X is not possible" or "Y is the only way" statement:
+- [ ] Is this verified by official documentation stating it explicitly?
+- [ ] Have I checked for recent updates that might change this?
+- [ ] Have I verified all possible approaches/mechanisms?
+- [ ] Am I confusing "I didn't find it" with "it doesn't exist"?
+</critical_claims_audit>
+```
+
+### Pitfall 6: Missing Enumeration
+**What**: Investigating open-ended scope without enumerating known possibilities first
+**Example**: "Research configuration options" instead of listing specific options to verify
+**Why it happens**: Not creating explicit checklist of items to investigate
+**Prevention**:
+```xml
+<verification_checklist>
+Enumerate ALL known options FIRST:
+□ Option 1: [specific item]
+□ Option 2: [specific item]
+□ Option 3: [specific item]
+□ Check for additional unlisted options
+
+For each option above, document:
+- Existence (confirmed/not found/unclear)
+- Official source URL
+- Current status (active/deprecated/beta)
+</verification_checklist>
+```
+
+### Pitfall 7: Single-Source Verification
+**What**: Relying on a single source for critical claims
+**Example**: Using only Stack Overflow answer from 2021 for current best practices
+**Why it happens**: Not cross-referencing multiple authoritative sources
+**Prevention**:
+```xml
+<source_verification>
+For critical claims, require multiple sources:
+- [ ] Official documentation (primary)
+- [ ] Release notes/changelog (for currency)
+- [ ] Additional authoritative source (for verification)
+- [ ] Contradiction check (ensure sources agree)
+</source_verification>
+```
+
+### Pitfall 8: Assumed Completeness
+**What**: Assuming search results are complete and authoritative
+**Example**: First Google result is outdated but assumed current
+**Why it happens**: Not verifying publication dates and source authority
+**Prevention**:
+```xml
+<source_verification>
+For each source consulted:
+- [ ] Publication/update date verified (prefer recent/current)
+- [ ] Source authority confirmed (official docs, not blogs)
+- [ ] Version relevance checked (matches current version)
+- [ ] Multiple search queries tried (not just one)
+</source_verification>
+```
+
+## Red Flags in Research Outputs
+
+### 🚩 Red Flag 1: Zero "Not Found" Results
+**Warning**: Every investigation succeeds perfectly
+**Problem**: Real research encounters dead ends, ambiguity, and unknowns
+**Action**: Expect honest reporting of limitations, contradictions, and gaps
+
+### 🚩 Red Flag 2: No Confidence Indicators
+**Warning**: All findings presented as equally certain
+**Problem**: Can't distinguish verified facts from educated guesses
+**Action**: Require confidence levels (High/Medium/Low) for key findings
+
+### 🚩 Red Flag 3: Missing URLs
+**Warning**: "According to documentation..." without specific URL
+**Problem**: Can't verify claims or check for updates
+**Action**: Require actual URLs for all official documentation claims
+
+### 🚩 Red Flag 4: Definitive Statements Without Evidence
+**Warning**: "X cannot do Y" or "Z is the only way" without citation
+**Problem**: Strong claims require strong evidence
+**Action**: Flag for verification against official sources
+
+### 🚩 Red Flag 5: Incomplete Enumeration
+**Warning**: Verification checklist lists 4 items, output covers 2
+**Problem**: Systematic gaps in coverage
+**Action**: Ensure all enumerated items addressed or marked "not found"
+
+## Continuous Improvement
+
+When research gaps occur:
+
+1. **Document the gap**
+   - What was missed or incorrect?
+   - What was the actual correct information?
+   - What was the impact?
+
+2. **Root cause analysis**
+   - Why wasn't it caught?
+   - Which verification step would have prevented it?
+   - What pattern does this reveal?
+
+3. **Update this document**
+   - Add new pitfall entry
+   - Update relevant checklists
+   - Share lesson learned
+
+## Quick Reference Checklist
+
+Before submitting research, verify:
+
+- [ ] All enumerated items investigated (not just some)
+- [ ] Negative claims verified with official docs
+- [ ] Multiple sources cross-referenced for critical claims
+- [ ] URLs provided for all official documentation
+- [ ] Publication dates checked (prefer recent/current)
+- [ ] Tool/environment-specific variations documented
+- [ ] Confidence levels assigned honestly
+- [ ] Assumptions distinguished from verified facts
+- [ ] "What might I have missed?" review completed
+
+---
+
+**Living Document**: Update after each significant research gap
+**Lessons From**: MCP configuration research gap (missed `.mcp.json`)
--- a/skills/create-plans/references/scope-estimation.md
+++ b/skills/create-plans/references/scope-estimation.md
@@ -0,0 +1,415 @@
+# Scope Estimation & Quality-Driven Plan Splitting
+
+Plans must maintain consistent quality from first task to last. This requires understanding the **quality degradation curve** and splitting aggressively to stay in the peak quality zone.
+
+## The Quality Degradation Curve
+
+**Critical insight:** Claude doesn't degrade at arbitrary percentages - it degrades when it *perceives* context pressure and enters "completion mode."
+
+```
+Context Usage  │  Quality Level   │  Claude's Mental State
+─────────────────────────────────────────────────────────
+0-30%          │  ████████ PEAK   │  "I can be thorough and comprehensive"
+               │                  │  No anxiety, full detail, best work
+
+30-50%         │  ██████ GOOD     │  "Still have room, maintaining quality"
+               │                  │  Engaged, confident, solid work
+
+50-70%         │  ███ DEGRADING   │  "Getting tight, need to be efficient"
+               │                  │  Efficiency mode, compression begins
+
+70%+           │  █ POOR          │  "Running out, must finish quickly"
+               │                  │  Self-lobotomization, rushed, minimal
+```
+
+**The 40-50% inflection point:**
+
+This is where quality breaks. Claude sees context mounting and thinks "I'd better conserve now or I won't finish." Result: The classic mid-execution statement "I'll complete the remaining tasks more concisely" = quality crash.
+
+**The fundamental rule:** Stop BEFORE quality degrades, not at context limit.
+
+## Target: 50% Context Maximum
+
+**Plans should complete within ~50% of context usage.**
+
+Why 50% not 80%?
+- Huge safety buffer
+- No context anxiety possible
+- Quality maintained from start to finish
+- Room for unexpected complexity
+- Space for iteration and fixes
+
+**If you target 80%, you're planning for failure.** By the time you hit 80%, you've already spent 40% in degradation mode.
+
+## The 2-3 Task Rule
+
+**Each plan should contain 2-3 tasks maximum.**
+
+Why this number?
+
+**Task 1 (0-15% context):**
+- Fresh context
+- Peak quality
+- Comprehensive implementation
+- Full testing
+- Complete documentation
+
+**Task 2 (15-35% context):**
+- Still in peak zone
+- Quality maintained
+- Buffer feels safe
+- No anxiety
+
+**Task 3 (35-50% context):**
+- Beginning to feel pressure
+- Quality still good but managing it
+- Natural stopping point
+- Better to commit here
+
+**Task 4+ (50%+ context):**
+- DEGRADATION ZONE
+- "I'll do this concisely" appears
+- Quality crashes
+- Should have split before this
+
+**The principle:** Each task is independently committable. 2-3 focused changes per commit creates beautiful, surgical git history.
+
+## Signals to Split Into Multiple Plans
+
+### Always Split If:
+
+**1. More than 3 tasks**
+- Even if tasks seem small
+- Each additional task increases degradation risk
+- Split into logical groups of 2-3
+
+**2. Multiple subsystems**
+```
+❌ Bad (1 plan):
+- Database schema (3 files)
+- API routes (5 files)
+- UI components (8 files)
+Total: 16 files, 1 plan → guaranteed degradation
+
+✅ Good (3 plans):
+- 01-01-PLAN.md: Database schema (3 files, 2 tasks)
+- 01-02-PLAN.md: API routes (5 files, 3 tasks)
+- 01-03-PLAN.md: UI components (8 files, 3 tasks)
+Total: 16 files, 3 plans → consistent quality
+```
+
+**3. Any task with >5 file modifications**
+- Large tasks burn context fast
+- Split by file groups or logical units
+- Better: 3 plans of 2 files each vs 1 plan of 6 files
+
+**4. Checkpoint + implementation work**
+- Checkpoints require user interaction (context preserved)
+- Implementation after checkpoint should be separate plan
+```
+✅ Good split:
+- 02-01-PLAN.md: Setup (checkpoint: decision on auth provider)
+- 02-02-PLAN.md: Implement chosen auth solution
+```
+
+**5. Research + implementation**
+- Research produces FINDINGS.md (separate plan)
+- Implementation consumes FINDINGS.md (separate plan)
+- Clear boundary, clean handoff
+
+### Consider Splitting If:
+
+**1. Estimated >5 files modified total**
+- Context from reading existing code
+- Context from diffs
+- Context from responses
+- Adds up faster than expected
+
+**2. Complex domains (auth, payments, data modeling)**
+- These require careful thinking
+- Burns more context per task than simple CRUD
+- Split more aggressively
+
+**3. Any uncertainty about approach**
+- "Figure out X" phase separate from "implement X" phase
+- Don't mix exploration and implementation
+
+**4. Natural semantic boundaries**
+- Setup → Core → Features
+- Backend → Frontend
+- Configuration → Implementation → Testing
+
+## Splitting Strategies
+
+### By Subsystem
+
+**Phase:** "Authentication System"
+
+**Split:**
+```
+- 03-01-PLAN.md: Database models (User, Session tables + relations)
+- 03-02-PLAN.md: Auth API (register, login, logout endpoints)
+- 03-03-PLAN.md: Protected routes (middleware, JWT validation)
+- 03-04-PLAN.md: UI components (login form, registration form)
+```
+
+Each plan: 2-3 tasks, single subsystem, clean commits.
+
+### By Dependency
+
+**Phase:** "Payment Integration"
+
+**Split:**
+```
+- 04-01-PLAN.md: Stripe setup (webhook endpoints via API, env vars, test mode)
+- 04-02-PLAN.md: Subscription logic (plans, checkout, customer portal)
+- 04-03-PLAN.md: Frontend integration (pricing page, payment flow)
+```
+
+Later plans depend on earlier completion. Sequential execution, fresh context each time.
+
+### By Complexity
+
+**Phase:** "Dashboard Buildout"
+
+**Split:**
+```
+- 05-01-PLAN.md: Layout shell (simple: sidebar, header, routing)
+- 05-02-PLAN.md: Data fetching (moderate: TanStack Query setup, API integration)
+- 05-03-PLAN.md: Data visualization (complex: charts, tables, real-time updates)
+```
+
+Complex work gets its own plan with full context budget.
+
+### By Verification Points
+
+**Phase:** "Deployment Pipeline"
+
+**Split:**
+```
+- 06-01-PLAN.md: Vercel setup (deploy via CLI, configure domains)
+  → Ends with checkpoint:human-verify "check xyz.vercel.app loads"
+
+- 06-02-PLAN.md: Environment config (secrets via CLI, env vars)
+  → Autonomous (no checkpoints) → subagent execution
+
+- 06-03-PLAN.md: CI/CD (GitHub Actions, preview deploys)
+  → Ends with checkpoint:human-verify "check PR preview works"
+```
+
+Verification checkpoints create natural boundaries. Autonomous plans between checkpoints execute via subagent with fresh context.
+
+## Autonomous vs Interactive Plans
+
+**Critical optimization:** Plans without checkpoints don't need main context.
+
+### Autonomous Plans (No Checkpoints)
+- Contains only `type="auto"` tasks
+- No user interaction needed
+- **Execute via subagent with fresh 200k context**
+- Impossible to degrade (always starts at 0%)
+- Creates SUMMARY, commits, reports back
+- Can run in parallel (multiple subagents)
+
+### Interactive Plans (Has Checkpoints)
+- Contains `checkpoint:human-verify` or `checkpoint:decision` tasks
+- Requires user interaction
+- Must execute in main context
+- Still target 50% context (2-3 tasks)
+
+**Planning guidance:** If splitting a phase, try to:
+- Group autonomous work together (→ subagent)
+- Separate interactive work (→ main context)
+- Maximize autonomous plans (more fresh contexts)
+
+Example:
+```
+Phase: Feature X
+- 07-01-PLAN.md: Backend (autonomous) → subagent
+- 07-02-PLAN.md: Frontend (autonomous) → subagent
+- 07-03-PLAN.md: Integration test (has checkpoint:human-verify) → main context
+```
+
+Two fresh contexts, one interactive verification. Perfect.
+
+## Anti-Patterns
+
+### ❌ The "Comprehensive Plan" Anti-Pattern
+
+```
+Plan: "Complete Authentication System"
+Tasks:
+1. Database models
+2. Migration files
+3. Auth API endpoints
+4. JWT utilities
+5. Protected route middleware
+6. Password hashing
+7. Login form component
+8. Registration form component
+
+Result: 8 tasks, 80%+ context, degradation at task 4-5
+```
+
+**Why this fails:**
+- Task 1-3: Good quality
+- Task 4-5: "I'll do these concisely" = degradation begins
+- Task 6-8: Rushed, minimal, poor quality
+
+### ✅ The "Atomic Plan" Pattern
+
+```
+Split into 4 plans:
+
+Plan 1: "Auth Database Models" (2 tasks)
+- Database schema (User, Session)
+- Migration files
+
+Plan 2: "Auth API Core" (3 tasks)
+- Register endpoint
+- Login endpoint
+- JWT utilities
+
+Plan 3: "Auth API Protection" (2 tasks)
+- Protected route middleware
+- Logout endpoint
+
+Plan 4: "Auth UI Components" (2 tasks)
+- Login form
+- Registration form
+```
+
+**Why this succeeds:**
+- Each plan: 2-3 tasks, 30-40% context
+- All tasks: Peak quality throughout
+- Git history: 4 focused commits
+- Easy to verify each piece
+- Rollback is surgical
+
+### ❌ The "Efficiency Trap" Anti-Pattern
+
+```
+Thinking: "These tasks are small, let's do 6 to be efficient"
+
+Result: Task 1-2 are good, task 3-4 begin degrading, task 5-6 are rushed
+```
+
+**Why this fails:** You're optimizing for fewer plans, not quality. The "efficiency" is false - poor quality requires more rework.
+
+### ✅ The "Quality First" Pattern
+
+```
+Thinking: "These tasks are small, but let's do 2-3 to guarantee quality"
+
+Result: All tasks peak quality, clean commits, no rework needed
+```
+
+**Why this succeeds:** You optimize for quality, which is true efficiency. No rework = faster overall.
+
+## Estimating Context Usage
+
+**Rough heuristics for plan size:**
+
+### File Counts
+- 0-3 files modified: Small task (~10-15% context)
+- 4-6 files modified: Medium task (~20-30% context)
+- 7+ files modified: Large task (~40%+ context) - split this
+
+### Complexity
+- Simple CRUD: ~15% per task
+- Business logic: ~25% per task
+- Complex algorithms: ~40% per task
+- Domain modeling: ~35% per task
+
+### 2-Task Plan (Safe)
+- 2 simple tasks: ~30% total ✅ Plenty of room
+- 2 medium tasks: ~50% total ✅ At target
+- 2 complex tasks: ~80% total ❌ Too tight, split
+
+### 3-Task Plan (Risky)
+- 3 simple tasks: ~45% total ✅ Good
+- 3 medium tasks: ~75% total ⚠️ Pushing it
+- 3 complex tasks: 120% total ❌ Impossible, split
+
+**Conservative principle:** When in doubt, split. Better to have an extra plan than degraded quality.
+
+## The Atomic Commit Philosophy
+
+**What we're optimizing for:** Beautiful git history where each commit is:
+- Focused (2-3 related changes)
+- Complete (fully implemented, tested)
+- Documented (clear commit message)
+- Reviewable (small enough to understand)
+- Revertable (surgical rollback possible)
+
+**Bad git history (large plans):**
+```
+feat(auth): Complete authentication system
+- Added 16 files
+- Modified 8 files
+- 1200 lines changed
+- Contains: models, API, UI, middleware, utilities
+```
+
+Impossible to review, hard to understand, can't revert without losing everything.
+
+**Good git history (atomic plans):**
+```
+feat(auth-01): Add User and Session database models
+- Added schema files
+- Added migration
+- 45 lines changed
+
+feat(auth-02): Implement register and login API endpoints
+- Added /api/auth/register
+- Added /api/auth/login
+- Added JWT utilities
+- 120 lines changed
+
+feat(auth-03): Add protected route middleware
+- Added middleware/auth.ts
+- Added tests
+- 60 lines changed
+
+feat(auth-04): Build login and registration forms
+- Added LoginForm component
+- Added RegisterForm component
+- 90 lines changed
+```
+
+Each commit tells a story. Each is reviewable. Each is revertable. This is craftsmanship.
+
+## Quality Assurance Through Scope Control
+
+**The guarantee:** When you follow the 2-3 task rule with 50% context target:
+
+1. **Consistency:** First task has same quality as last task
+2. **Thoroughness:** No "I'll complete X concisely" degradation
+3. **Documentation:** Full context budget for comments/tests
+4. **Error handling:** Space for proper validation and edge cases
+5. **Testing:** Room for comprehensive test coverage
+
+**The cost:** More plans to manage.
+
+**The benefit:** Consistent excellence. No rework. Clean history. Maintainable code.
+
+**The trade-off is worth it.**
+
+## Summary
+
+**Old way (3-6 tasks, 80% target):**
+- Tasks 1-2: Good
+- Tasks 3-4: Degrading
+- Tasks 5-6: Poor
+- Git: Large, unreviewable commits
+- Quality: Inconsistent
+
+**New way (2-3 tasks, 50% target):**
+- All tasks: Peak quality
+- Git: Atomic, surgical commits
+- Quality: Consistent excellence
+- Autonomous plans: Subagent execution (fresh context)
+
+**The principle:** Aggressive atomicity. More plans, smaller scope, consistent quality.
+
+**The rule:** If in doubt, split. Quality over consolidation. Always.
--- a/skills/create-plans/references/user-gates.md
+++ b/skills/create-plans/references/user-gates.md
@@ -0,0 +1,72 @@
+# User Gates Reference
+
+User gates prevent Claude from charging ahead at critical decision points.
+
+## Question Types
+
+### AskUserQuestion Tool
+Use for **structured choices** (2-4 options):
+- Selecting from distinct approaches
+- Domain/type selection
+- When user needs to see options to decide
+
+Examples:
+- "What type of project?" (macos-app / iphone-app / web-app / other)
+- "Research confidence is low. How to proceed?" (dig deeper / proceed anyway / pause)
+- "Multiple valid approaches exist:" (Option A / Option B / Option C)
+
+### Inline Questions
+Use for **simple confirmations**:
+- Yes/no decisions
+- "Does this look right?"
+- "Ready to proceed?"
+
+Examples:
+- "Here's the task breakdown: [list]. Does this look right?"
+- "Proceed with this approach?"
+- "I'll initialize a git repo. OK?"
+
+## Decision Gate Loop
+
+After gathering context, ALWAYS offer:
+
+```
+Ready to [action], or would you like me to ask more questions?
+
+1. Proceed - I have enough context
+2. Ask more questions - There are details to clarify
+3. Let me add context - I want to provide additional information
+```
+
+Loop continues until user selects "Proceed".
+
+## Mandatory Gate Points
+
+| Location | Gate Type | Trigger |
+|----------|-----------|---------|
+| plan-phase | Inline | Confirm task breakdown |
+| plan-phase | AskUserQuestion | Multiple valid approaches |
+| plan-phase | AskUserQuestion | Decision gate before writing |
+| research-phase | AskUserQuestion | Low confidence findings |
+| research-phase | Inline | Open questions acknowledgment |
+| execute-phase | Inline | Verification failure |
+| execute-phase | Inline | Issues review before proceeding |
+| execute-phase | AskUserQuestion | Previous phase had issues |
+| create-brief | AskUserQuestion | Decision gate before writing |
+| create-roadmap | Inline | Confirm phase breakdown |
+| create-roadmap | AskUserQuestion | Decision gate before writing |
+| handoff | Inline | Handoff acknowledgment |
+
+## Good vs Bad Gating
+
+### Good
+- Gate before writing artifacts (not after)
+- Gate when genuinely ambiguous
+- Gate when issues affect next steps
+- Quick inline for simple confirmations
+
+### Bad
+- Asking obvious choices ("Should I save the file?")
+- Multiple gates for same decision
+- AskUserQuestion for yes/no
+- Gates after the fact