Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:28:37 +08:00
commit ccc65b3f07
180 changed files with 53970 additions and 0 deletions

View File

@@ -0,0 +1,584 @@
# Human Checkpoints in Plans
Plans execute autonomously. Checkpoints formalize the interaction points where human verification or decisions are needed.
**Core principle:** Claude automates everything with CLI/API. Checkpoints are for verification and decisions, not manual work.
## Checkpoint Types
### 1. `checkpoint:human-verify` (Most Common)
**When:** Claude completed automated work, human confirms it works correctly.
**Use for:**
- Visual UI checks (layout, styling, responsiveness)
- Interactive flows (click through wizard, test user flows)
- Functional verification (feature works as expected)
- Audio/video playback quality
- Animation smoothness
- Accessibility testing
**Structure:**
```xml
<task type="checkpoint:human-verify" gate="blocking">
<what-built>[What Claude automated and deployed/built]</what-built>
<how-to-verify>
[Exact steps to test - URLs, commands, expected behavior]
</how-to-verify>
<resume-signal>[How to continue - "approved", "yes", or describe issues]</resume-signal>
</task>
```
**Key elements:**
- `<what-built>`: What Claude automated (deployed, built, configured)
- `<how-to-verify>`: Exact steps to confirm it works (numbered, specific)
- `<resume-signal>`: Clear indication of how to continue
**Example: Vercel Deployment**
```xml
<task type="auto">
<name>Deploy to Vercel</name>
<files>.vercel/, vercel.json</files>
<action>Run `vercel --yes` to create project and deploy. Capture deployment URL from output.</action>
<verify>vercel ls shows deployment, curl {url} returns 200</verify>
<done>App deployed, URL captured</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Deployed to Vercel at https://myapp-abc123.vercel.app</what-built>
<how-to-verify>
Visit https://myapp-abc123.vercel.app and confirm:
- Homepage loads without errors
- Login form is visible
- No console errors in browser DevTools
</how-to-verify>
<resume-signal>Type "approved" to continue, or describe issues to fix</resume-signal>
</task>
```
**Example: UI Component**
```xml
<task type="auto">
<name>Build responsive dashboard layout</name>
<files>src/components/Dashboard.tsx, src/app/dashboard/page.tsx</files>
<action>Create dashboard with sidebar, header, and content area. Use Tailwind responsive classes for mobile.</action>
<verify>npm run build succeeds, no TypeScript errors</verify>
<done>Dashboard component builds without errors</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Responsive dashboard layout at /dashboard</what-built>
<how-to-verify>
1. Run: npm run dev
2. Visit: http://localhost:3000/dashboard
3. Desktop (>1024px): Verify sidebar left, content right, header top
4. Tablet (768px): Verify sidebar collapses to hamburger
5. Mobile (375px): Verify single column, bottom nav
6. Check: No layout shift, no horizontal scroll
</how-to-verify>
<resume-signal>Type "approved" or describe layout issues</resume-signal>
</task>
```
**Example: Xcode Build**
```xml
<task type="auto">
<name>Build macOS app with Xcode</name>
<files>App.xcodeproj, Sources/</files>
<action>Run `xcodebuild -project App.xcodeproj -scheme App build`. Check for compilation errors in output.</action>
<verify>Build output contains "BUILD SUCCEEDED", no errors</verify>
<done>App builds successfully</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Built macOS app at DerivedData/Build/Products/Debug/App.app</what-built>
<how-to-verify>
Open App.app and test:
- App launches without crashes
- Menu bar icon appears
- Preferences window opens correctly
- No visual glitches or layout issues
</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
### 2. `checkpoint:decision`
**When:** Human must make choice that affects implementation direction.
**Use for:**
- Technology selection (which auth provider, which database)
- Architecture decisions (monorepo vs separate repos)
- Design choices (color scheme, layout approach)
- Feature prioritization (which variant to build)
- Data model decisions (schema structure)
**Structure:**
```xml
<task type="checkpoint:decision" gate="blocking">
<decision>[What's being decided]</decision>
<context>[Why this decision matters]</context>
<options>
<option id="option-a">
<name>[Option name]</name>
<pros>[Benefits]</pros>
<cons>[Tradeoffs]</cons>
</option>
<option id="option-b">
<name>[Option name]</name>
<pros>[Benefits]</pros>
<cons>[Tradeoffs]</cons>
</option>
</options>
<resume-signal>[How to indicate choice]</resume-signal>
</task>
```
**Key elements:**
- `<decision>`: What's being decided
- `<context>`: Why this matters
- `<options>`: Each option with balanced pros/cons (not prescriptive)
- `<resume-signal>`: How to indicate choice
**Example: Auth Provider Selection**
```xml
<task type="checkpoint:decision" gate="blocking">
<decision>Select authentication provider</decision>
<context>
Need user authentication for the app. Three solid options with different tradeoffs.
</context>
<options>
<option id="supabase">
<name>Supabase Auth</name>
<pros>Built-in with Supabase DB we're using, generous free tier, row-level security integration</pros>
<cons>Less customizable UI, tied to Supabase ecosystem</cons>
</option>
<option id="clerk">
<name>Clerk</name>
<pros>Beautiful pre-built UI, best developer experience, excellent docs</pros>
<cons>Paid after 10k MAU, vendor lock-in</cons>
</option>
<option id="nextauth">
<name>NextAuth.js</name>
<pros>Free, self-hosted, maximum control, widely adopted</pros>
<cons>More setup work, you manage security updates, UI is DIY</cons>
</option>
</options>
<resume-signal>Select: supabase, clerk, or nextauth</resume-signal>
</task>
```
### 3. `checkpoint:human-action` (Rare)
**When:** Action has NO CLI/API and requires human-only interaction, OR Claude hit an authentication gate during automation.
**Use ONLY for:**
- **Authentication gates** - Claude tried to use CLI/API but needs credentials to continue (this is NOT a failure)
- Email verification links (account creation requires clicking email)
- SMS 2FA codes (phone verification)
- Manual account approvals (platform requires human review before API access)
- Credit card 3D Secure flows (web-based payment authorization)
- OAuth app approvals (some platforms require web-based approval)
**Do NOT use for pre-planned manual work:**
- Manually deploying to Vercel (use `vercel` CLI - auth gate if needed)
- Manually creating Stripe webhooks (use Stripe API - auth gate if needed)
- Manually creating databases (use provider CLI - auth gate if needed)
- Running builds/tests manually (use Bash tool)
- Creating files manually (use Write tool)
**Structure:**
```xml
<task type="checkpoint:human-action" gate="blocking">
<action>[What human must do - Claude already did everything automatable]</action>
<instructions>
[What Claude already automated]
[The ONE thing requiring human action]
</instructions>
<verification>[What Claude can check afterward]</verification>
<resume-signal>[How to continue]</resume-signal>
</task>
```
**Key principle:** Claude automates EVERYTHING possible first, only asks human for the truly unavoidable manual step.
**Example: Email Verification**
```xml
<task type="auto">
<name>Create SendGrid account via API</name>
<action>Use SendGrid API to create subuser account with provided email. Request verification email.</action>
<verify>API returns 201, account created</verify>
<done>Account created, verification email sent</done>
</task>
<task type="checkpoint:human-action" gate="blocking">
<action>Complete email verification for SendGrid account</action>
<instructions>
I created the account and requested verification email.
Check your inbox for SendGrid verification link and click it.
</instructions>
<verification>SendGrid API key works: curl test succeeds</verification>
<resume-signal>Type "done" when email verified</resume-signal>
</task>
```
**Example: Credit Card 3D Secure**
```xml
<task type="auto">
<name>Create Stripe payment intent</name>
<action>Use Stripe API to create payment intent for $99. Generate checkout URL.</action>
<verify>Stripe API returns payment intent ID and URL</verify>
<done>Payment intent created</done>
</task>
<task type="checkpoint:human-action" gate="blocking">
<action>Complete 3D Secure authentication</action>
<instructions>
I created the payment intent: https://checkout.stripe.com/pay/cs_test_abc123
Visit that URL and complete the 3D Secure verification flow with your test card.
</instructions>
<verification>Stripe webhook receives payment_intent.succeeded event</verification>
<resume-signal>Type "done" when payment completes</resume-signal>
</task>
```
**Example: Authentication Gate (Dynamic Checkpoint)**
```xml
<task type="auto">
<name>Deploy to Vercel</name>
<files>.vercel/, vercel.json</files>
<action>Run `vercel --yes` to deploy</action>
<verify>vercel ls shows deployment, curl returns 200</verify>
</task>
<!-- If vercel returns "Error: Not authenticated", Claude creates checkpoint on the fly -->
<task type="checkpoint:human-action" gate="blocking">
<action>Authenticate Vercel CLI so I can continue deployment</action>
<instructions>
I tried to deploy but got authentication error.
Run: vercel login
This will open your browser - complete the authentication flow.
</instructions>
<verification>vercel whoami returns your account email</verification>
<resume-signal>Type "done" when authenticated</resume-signal>
</task>
<!-- After authentication, Claude retries the deployment -->
<task type="auto">
<name>Retry Vercel deployment</name>
<action>Run `vercel --yes` (now authenticated)</action>
<verify>vercel ls shows deployment, curl returns 200</verify>
</task>
```
**Key distinction:** Authentication gates are created dynamically when Claude encounters auth errors during automation. They're NOT pre-planned - Claude tries to automate first, only asks for credentials when blocked.
See references/cli-automation.md "Authentication Gates" section for more examples and full protocol.
## Execution Protocol
When Claude encounters `type="checkpoint:*"`:
1. **Stop immediately** - do not proceed to next task
2. **Display checkpoint clearly:**
```
════════════════════════════════════════
CHECKPOINT: [Type]
════════════════════════════════════════
Task [X] of [Y]: [Name]
[Display checkpoint-specific content]
[Resume signal instruction]
════════════════════════════════════════
```
3. **Wait for user response** - do not hallucinate completion
4. **Verify if possible** - check files, run tests, whatever is specified
5. **Resume execution** - continue to next task only after confirmation
**For checkpoint:human-verify:**
```
════════════════════════════════════════
CHECKPOINT: Verification Required
════════════════════════════════════════
Task 5 of 8: Responsive dashboard layout
I built: Responsive dashboard at /dashboard
How to verify:
1. Run: npm run dev
2. Visit: http://localhost:3000/dashboard
3. Test: Resize browser window to mobile/tablet/desktop
4. Confirm: No layout shift, proper responsive behavior
Type "approved" to continue, or describe issues.
════════════════════════════════════════
```
**For checkpoint:decision:**
```
════════════════════════════════════════
CHECKPOINT: Decision Required
════════════════════════════════════════
Task 2 of 6: Select authentication provider
Decision: Which auth provider should we use?
Context: Need user authentication. Three options with different tradeoffs.
Options:
1. supabase - Built-in with our DB, free tier
2. clerk - Best DX, paid after 10k users
3. nextauth - Self-hosted, maximum control
Select: supabase, clerk, or nextauth
════════════════════════════════════════
```
## Writing Good Checkpoints
**DO:**
- Automate everything with CLI/API before checkpoint
- Be specific: "Visit https://myapp.vercel.app" not "check deployment"
- Number verification steps: easier to follow
- State expected outcomes: "You should see X"
- Provide context: why this checkpoint exists
- Make verification executable: clear, testable steps
**DON'T:**
- Ask human to do work Claude can automate (deploy, create resources, run builds)
- Assume knowledge: "Configure the usual settings" ❌
- Skip steps: "Set up database" ❌ (too vague)
- Mix multiple verifications in one checkpoint (split them)
- Make verification impossible (Claude can't check visual appearance without user confirmation)
## When to Use Checkpoints
**Use checkpoint:human-verify for:**
- Visual verification (UI, layouts, animations)
- Interactive testing (click flows, user journeys)
- Quality checks (audio/video playback, animation smoothness)
- Confirming deployed apps are accessible
**Use checkpoint:decision for:**
- Technology selection (auth providers, databases, frameworks)
- Architecture choices (monorepo, deployment strategy)
- Design decisions (color schemes, layout approaches)
- Feature prioritization
**Use checkpoint:human-action for:**
- Email verification links (no API)
- SMS 2FA codes (no API)
- Manual approvals with no automation
- 3D Secure payment flows
**Don't use checkpoints for:**
- Things Claude can verify programmatically (tests pass, build succeeds)
- File operations (Claude can read files to verify)
- Code correctness (use tests and static analysis)
- Anything automatable via CLI/API
## Checkpoint Placement
Place checkpoints:
- **After automation completes** - not before Claude does the work
- **After UI buildout** - before declaring phase complete
- **Before dependent work** - decisions before implementation
- **At integration points** - after configuring external services
Bad placement:
- Before Claude automates (asking human to do automatable work) ❌
- Too frequent (every other task is a checkpoint) ❌
- Too late (checkpoint is last task, but earlier tasks needed its result) ❌
## Complete Examples
### Example 1: Deployment Flow (Correct)
```xml
<!-- Claude automates everything -->
<task type="auto">
<name>Deploy to Vercel</name>
<files>.vercel/, vercel.json, package.json</files>
<action>
1. Run `vercel --yes` to create project and deploy
2. Capture deployment URL from output
3. Set environment variables with `vercel env add`
4. Trigger production deployment with `vercel --prod`
</action>
<verify>
- vercel ls shows deployment
- curl {url} returns 200
- Environment variables set correctly
</verify>
<done>App deployed to production, URL captured</done>
</task>
<!-- Human verifies visual/functional correctness -->
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Deployed to https://myapp.vercel.app</what-built>
<how-to-verify>
Visit https://myapp.vercel.app and confirm:
- Homepage loads correctly
- All images/assets load
- Navigation works
- No console errors
</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
### Example 2: Database Setup (Correct)
```xml
<!-- Claude automates everything -->
<task type="auto">
<name>Create Upstash Redis database</name>
<files>.env</files>
<action>
1. Run `upstash redis create myapp-cache --region us-east-1`
2. Capture connection URL from output
3. Write to .env: UPSTASH_REDIS_URL={url}
4. Verify connection with test command
</action>
<verify>
- upstash redis list shows database
- .env contains UPSTASH_REDIS_URL
- Test connection succeeds
</verify>
<done>Redis database created and configured</done>
</task>
<!-- NO CHECKPOINT NEEDED - Claude automated everything and verified programmatically -->
```
### Example 3: Stripe Webhooks (Correct)
```xml
<!-- Claude automates everything -->
<task type="auto">
<name>Configure Stripe webhooks</name>
<files>.env, src/app/api/webhooks/route.ts</files>
<action>
1. Use Stripe API to create webhook endpoint pointing to /api/webhooks
2. Subscribe to events: payment_intent.succeeded, customer.subscription.updated
3. Save webhook signing secret to .env
4. Implement webhook handler in route.ts
</action>
<verify>
- Stripe API returns webhook endpoint ID
- .env contains STRIPE_WEBHOOK_SECRET
- curl webhook endpoint returns 200
</verify>
<done>Stripe webhooks configured and handler implemented</done>
</task>
<!-- Human verifies in Stripe dashboard -->
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Stripe webhook configured via API</what-built>
<how-to-verify>
Visit Stripe Dashboard > Developers > Webhooks
Confirm: Endpoint shows https://myapp.com/api/webhooks with correct events
</how-to-verify>
<resume-signal>Type "yes" if correct</resume-signal>
</task>
```
## Anti-Patterns
### ❌ BAD: Asking human to automate
```xml
<task type="checkpoint:human-action" gate="blocking">
<action>Deploy to Vercel</action>
<instructions>
1. Visit vercel.com/new
2. Import Git repository
3. Click Deploy
4. Copy deployment URL
</instructions>
<verification>Deployment exists</verification>
<resume-signal>Paste URL</resume-signal>
</task>
```
**Why bad:** Vercel has a CLI. Claude should run `vercel --yes`.
### ✅ GOOD: Claude automates, human verifies
```xml
<task type="auto">
<name>Deploy to Vercel</name>
<action>Run `vercel --yes`. Capture URL.</action>
<verify>vercel ls shows deployment, curl returns 200</verify>
</task>
<task type="checkpoint:human-verify">
<what-built>Deployed to {url}</what-built>
<how-to-verify>Visit {url}, check homepage loads</how-to-verify>
<resume-signal>Type "approved"</resume-signal>
</task>
```
### ❌ BAD: Too many checkpoints
```xml
<task type="auto">Create schema</task>
<task type="checkpoint:human-verify">Check schema</task>
<task type="auto">Create API route</task>
<task type="checkpoint:human-verify">Check API</task>
<task type="auto">Create UI form</task>
<task type="checkpoint:human-verify">Check form</task>
```
**Why bad:** Verification fatigue. Combine into one checkpoint at end.
### ✅ GOOD: Single verification checkpoint
```xml
<task type="auto">Create schema</task>
<task type="auto">Create API route</task>
<task type="auto">Create UI form</task>
<task type="checkpoint:human-verify">
<what-built>Complete auth flow (schema + API + UI)</what-built>
<how-to-verify>Test full flow: register, login, access protected page</how-to-verify>
<resume-signal>Type "approved"</resume-signal>
</task>
```
### ❌ BAD: Asking for automatable file operations
```xml
<task type="checkpoint:human-action">
<action>Create .env file</action>
<instructions>
1. Create .env in project root
2. Add: DATABASE_URL=...
3. Add: STRIPE_KEY=...
</instructions>
</task>
```
**Why bad:** Claude has Write tool. This should be `type="auto"`.
## Summary
Checkpoints formalize human-in-the-loop points. Use them when Claude cannot complete a task autonomously OR when human verification is required for correctness.
**The golden rule:** If Claude CAN automate it, Claude MUST automate it.
**Checkpoint priority:**
1. **checkpoint:human-verify** (90% of checkpoints) - Claude automated everything, human confirms visual/functional correctness
2. **checkpoint:decision** (9% of checkpoints) - Human makes architectural/technology choices
3. **checkpoint:human-action** (1% of checkpoints) - Truly unavoidable manual steps with no API/CLI
**See also:** references/cli-automation.md for exhaustive list of what Claude can automate.

View File

@@ -0,0 +1,497 @@
# CLI and API Automation Reference
**Core principle:** If it has a CLI or API, Claude does it. Never ask the human to perform manual steps that Claude can automate.
This reference documents what Claude CAN and SHOULD automate during plan execution.
## Deployment Platforms
### Vercel
**CLI:** `vercel`
**What Claude automates:**
- Create and deploy projects: `vercel --yes`
- Set environment variables: `vercel env add KEY production`
- Link to git repo: `vercel link`
- Trigger deployments: `vercel --prod`
- Get deployment URLs: `vercel ls`
- Manage domains: `vercel domains add example.com`
**Never ask human to:**
- Visit vercel.com/new to create project
- Click through dashboard to add env vars
- Manually link repository
**Checkpoint pattern:**
```xml
<task type="auto">
<name>Deploy to Vercel</name>
<action>Run `vercel --yes` to deploy. Capture deployment URL.</action>
<verify>vercel ls shows deployment, curl {url} returns 200</verify>
</task>
<task type="checkpoint:human-verify">
<what-built>Deployed to {url}</what-built>
<how-to-verify>Visit {url} - check homepage loads</how-to-verify>
<resume-signal>Type "yes" if correct</resume-signal>
</task>
```
### Railway
**CLI:** `railway`
**What Claude automates:**
- Initialize project: `railway init`
- Link to repo: `railway link`
- Deploy: `railway up`
- Set variables: `railway variables set KEY=value`
- Get deployment URL: `railway domain`
### Fly.io
**CLI:** `fly`
**What Claude automates:**
- Launch app: `fly launch --no-deploy`
- Deploy: `fly deploy`
- Set secrets: `fly secrets set KEY=value`
- Scale: `fly scale count 2`
## Payment & Billing
### Stripe
**CLI:** `stripe`
**What Claude automates:**
- Create webhook endpoints: `stripe listen --forward-to localhost:3000/api/webhooks`
- Trigger test events: `stripe trigger payment_intent.succeeded`
- Create products/prices: Stripe API via curl/fetch
- Manage customers: Stripe API via curl/fetch
- Check webhook logs: `stripe webhooks list`
**Never ask human to:**
- Visit dashboard.stripe.com to create webhook
- Click through UI to create products
- Manually copy webhook signing secret
**Checkpoint pattern:**
```xml
<task type="auto">
<name>Configure Stripe webhooks</name>
<action>Use Stripe API to create webhook endpoint at /api/webhooks. Save signing secret to .env.</action>
<verify>stripe webhooks list shows endpoint, .env contains STRIPE_WEBHOOK_SECRET</verify>
</task>
<task type="checkpoint:human-verify">
<what-built>Stripe webhook configured</what-built>
<how-to-verify>Check Stripe dashboard > Developers > Webhooks shows endpoint with correct URL</how-to-verify>
<resume-signal>Type "yes" if correct</resume-signal>
</task>
```
## Databases & Backend
### Supabase
**CLI:** `supabase`
**What Claude automates:**
- Initialize project: `supabase init`
- Link to remote: `supabase link --project-ref {ref}`
- Create migrations: `supabase migration new {name}`
- Push migrations: `supabase db push`
- Generate types: `supabase gen types typescript`
- Deploy functions: `supabase functions deploy {name}`
**Never ask human to:**
- Visit supabase.com to create project manually
- Click through dashboard to run migrations
- Copy/paste connection strings
**Note:** Project creation may require web dashboard initially (no CLI for initial project creation), but all subsequent work (migrations, functions, etc.) is CLI-automated.
### Upstash (Redis/Kafka)
**CLI:** `upstash`
**What Claude automates:**
- Create Redis database: `upstash redis create {name} --region {region}`
- Get connection details: `upstash redis get {id}`
- Create Kafka cluster: `upstash kafka create {name} --region {region}`
**Never ask human to:**
- Visit console.upstash.com
- Click through UI to create database
- Copy/paste connection URLs manually
**Checkpoint pattern:**
```xml
<task type="auto">
<name>Create Upstash Redis database</name>
<action>Run `upstash redis create myapp-cache --region us-east-1`. Save URL to .env.</action>
<verify>.env contains UPSTASH_REDIS_URL, upstash redis list shows database</verify>
</task>
```
### PlanetScale
**CLI:** `pscale`
**What Claude automates:**
- Create database: `pscale database create {name} --region {region}`
- Create branch: `pscale branch create {db} {branch}`
- Deploy request: `pscale deploy-request create {db} {branch}`
- Connection string: `pscale connect {db} {branch}`
## Version Control & CI/CD
### GitHub
**CLI:** `gh`
**What Claude automates:**
- Create repo: `gh repo create {name} --public/--private`
- Create issues: `gh issue create --title "{title}" --body "{body}"`
- Create PR: `gh pr create --title "{title}" --body "{body}"`
- Manage secrets: `gh secret set {KEY}`
- Trigger workflows: `gh workflow run {name}`
- Check status: `gh run list`
**Never ask human to:**
- Visit github.com to create repo
- Click through UI to add secrets
- Manually create issues/PRs
## Build Tools & Testing
### Node/npm/pnpm/bun
**What Claude automates:**
- Install dependencies: `npm install`, `pnpm install`, `bun install`
- Run builds: `npm run build`
- Run tests: `npm test`, `npm run test:e2e`
- Type checking: `tsc --noEmit`
**Never ask human to:** Run these commands manually
### Xcode (macOS/iOS)
**CLI:** `xcodebuild`
**What Claude automates:**
- Build project: `xcodebuild -project App.xcodeproj -scheme App build`
- Run tests: `xcodebuild test -project App.xcodeproj -scheme App`
- Archive: `xcodebuild archive -project App.xcodeproj -scheme App`
- Check compilation: Parse xcodebuild output for errors
**Never ask human to:**
- Open Xcode and click Product > Build
- Click Product > Test manually
- Check for errors by looking at Xcode UI
**Checkpoint pattern:**
```xml
<task type="auto">
<name>Build macOS app</name>
<action>Run `xcodebuild -project App.xcodeproj -scheme App build`. Check output for errors.</action>
<verify>Build succeeds with "BUILD SUCCEEDED" in output</verify>
</task>
<task type="checkpoint:human-verify">
<what-built>Built macOS app at DerivedData/Build/Products/Debug/App.app</what-built>
<how-to-verify>Open App.app and check: login flow works, no visual glitches</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
## Environment Configuration
### .env Files
**Tool:** Write tool
**What Claude automates:**
- Create .env files: Use Write tool
- Append variables: Use Edit tool
- Read current values: Use Read tool
**Never ask human to:**
- Manually create .env file
- Copy/paste values into .env
- Edit .env in text editor
**Pattern:**
```xml
<task type="auto">
<name>Configure environment variables</name>
<action>Write .env file with: DATABASE_URL, STRIPE_KEY, JWT_SECRET (generated).</action>
<verify>Read .env confirms all variables present</verify>
</task>
```
## Email & Communication
### Resend
**API:** Resend API via HTTP
**What Claude automates:**
- Create API keys via dashboard API (if available) or instructions for one-time setup
- Send emails: Resend API
- Configure domains: Resend API
### SendGrid
**API:** SendGrid API via HTTP
**What Claude automates:**
- Create API keys via API
- Send emails: SendGrid API
- Configure webhooks: SendGrid API
**Note:** Initial account setup may require email verification (checkpoint:human-action), but all subsequent work is API-automated.
## Authentication Gates
**Critical distinction:** When Claude tries to use a CLI/API and gets an authentication error, this is NOT a failure - it's a gate that requires human input to unblock automation.
**Pattern: Claude encounters auth error → creates checkpoint → you authenticate → Claude continues**
### Example: Vercel CLI Not Authenticated
```xml
<task type="auto">
<name>Deploy to Vercel</name>
<files>.vercel/, vercel.json</files>
<action>Run `vercel --yes` to deploy</action>
<verify>vercel ls shows deployment</verify>
</task>
<!-- If vercel returns "Error: Not authenticated" -->
<task type="checkpoint:human-action" gate="blocking">
<action>Authenticate Vercel CLI so I can continue deployment</action>
<instructions>
I tried to deploy but got authentication error.
Run: vercel login
This will open your browser - complete the authentication flow.
</instructions>
<verification>vercel whoami returns your account email</verification>
<resume-signal>Type "done" when authenticated</resume-signal>
</task>
<!-- After authentication, Claude retries automatically -->
<task type="auto">
<name>Retry Vercel deployment</name>
<action>Run `vercel --yes` (now authenticated)</action>
<verify>vercel ls shows deployment, curl returns 200</verify>
</task>
```
### Example: Stripe CLI Needs API Key
```xml
<task type="auto">
<name>Create Stripe webhook endpoint</name>
<action>Use Stripe API to create webhook at /api/webhooks</action>
</task>
<!-- If API returns 401 Unauthorized -->
<task type="checkpoint:human-action" gate="blocking">
<action>Provide Stripe API key so I can continue webhook configuration</action>
<instructions>
I need your Stripe API key to create webhooks.
1. Visit dashboard.stripe.com/apikeys
2. Copy your "Secret key" (starts with sk_test_ or sk_live_)
3. Paste it here or run: export STRIPE_SECRET_KEY=sk_...
</instructions>
<verification>Stripe API key works: curl test succeeds</verification>
<resume-signal>Type "done" or paste the key</resume-signal>
</task>
<!-- After key provided, Claude writes to .env and continues -->
<task type="auto">
<name>Save Stripe key and create webhook</name>
<action>
1. Write STRIPE_SECRET_KEY to .env
2. Create webhook endpoint via Stripe API
3. Save webhook secret to .env
</action>
<verify>.env contains both keys, webhook endpoint exists</verify>
</task>
```
### Example: GitHub CLI Not Logged In
```xml
<task type="auto">
<name>Create GitHub repository</name>
<action>Run `gh repo create myapp --public`</action>
</task>
<!-- If gh returns "Not logged in" -->
<task type="checkpoint:human-action" gate="blocking">
<action>Authenticate GitHub CLI so I can create repository</action>
<instructions>
I need GitHub authentication to create the repo.
Run: gh auth login
Follow the prompts to authenticate (browser or token).
</instructions>
<verification>gh auth status shows "Logged in"</verification>
<resume-signal>Type "done" when authenticated</resume-signal>
</task>
<task type="auto">
<name>Create repository (authenticated)</name>
<action>Run `gh repo create myapp --public`</action>
<verify>gh repo view shows repository exists</verify>
</task>
```
### Example: Upstash CLI Needs API Key
```xml
<task type="auto">
<name>Create Upstash Redis database</name>
<action>Run `upstash redis create myapp-cache --region us-east-1`</action>
</task>
<!-- If upstash returns auth error -->
<task type="checkpoint:human-action" gate="blocking">
<action>Configure Upstash CLI credentials so I can create database</action>
<instructions>
I need Upstash authentication to create Redis database.
1. Visit console.upstash.com/account/api
2. Copy your API key
3. Run: upstash auth login
4. Paste your API key when prompted
</instructions>
<verification>upstash auth status shows authenticated</verification>
<resume-signal>Type "done" when authenticated</resume-signal>
</task>
<task type="auto">
<name>Create Redis database (authenticated)</name>
<action>
1. Run `upstash redis create myapp-cache --region us-east-1`
2. Capture connection URL
3. Write to .env: UPSTASH_REDIS_URL={url}
</action>
<verify>upstash redis list shows database, .env contains URL</verify>
</task>
```
### Authentication Gate Protocol
**When Claude encounters authentication error during execution:**
1. **Recognize it's not a failure** - Missing auth is expected, not a bug
2. **Stop current task** - Don't retry repeatedly
3. **Create checkpoint:human-action on the fly** - Dynamic checkpoint, not pre-planned
4. **Provide exact authentication steps** - CLI commands, where to get keys
5. **Verify authentication** - Test that auth works before continuing
6. **Retry the original task** - Resume automation where it left off
7. **Continue normally** - One auth gate doesn't break the flow
**Key difference from pre-planned checkpoints:**
- Pre-planned: "I need you to do X" (wrong - Claude should automate)
- Auth gate: "I tried to automate X but need credentials to continue" (correct - unblocks automation)
**This preserves agentic flow:**
- Claude tries automation first
- Only asks for help when blocked by credentials
- Continues automating after unblocked
- You never manually deploy/create resources - just provide keys
## When checkpoint:human-action is REQUIRED
**Truly rare cases where no CLI/API exists:**
1. **Email verification links** - Account signup requires clicking verification email
2. **SMS verification codes** - 2FA requiring phone
3. **Manual account approvals** - Platform requires human review before API access
4. **Domain DNS records at registrar** - Some registrars have no API
5. **Credit card input** - Payment methods requiring 3D Secure web flow
6. **OAuth app approval** - Some platforms require web-based app approval flow
**For these rare cases:**
```xml
<task type="checkpoint:human-action" gate="blocking">
<action>Complete email verification for SendGrid account</action>
<instructions>
I created the account and requested verification email.
Check your inbox for verification link and click it.
</instructions>
<verification>SendGrid API key works: curl test succeeds</verification>
<resume-signal>Type "done" when verified</resume-signal>
</task>
```
**Key difference:** Claude does EVERYTHING possible first (account creation, API requests), only asks human for the one thing with no automation path.
## Quick Reference: "Can Claude automate this?"
| Action | CLI/API? | Claude does it? |
|--------|----------|-----------------|
| Deploy to Vercel | ✅ `vercel` | YES |
| Create Stripe webhook | ✅ Stripe API | YES |
| Run xcodebuild | ✅ `xcodebuild` | YES |
| Write .env file | ✅ Write tool | YES |
| Create Upstash DB | ✅ `upstash` CLI | YES |
| Install npm packages | ✅ `npm` | YES |
| Create GitHub repo | ✅ `gh` | YES |
| Run tests | ✅ `npm test` | YES |
| Create Supabase project | ⚠️ Web dashboard | NO (then CLI for everything else) |
| Click email verification link | ❌ No API | NO |
| Enter credit card with 3DS | ❌ No API | NO |
**Default answer: YES.** Unless explicitly in the "NO" category, Claude automates it.
## Decision Tree
```
┌─────────────────────────────────────┐
│ Task requires external resource? │
└──────────────┬──────────────────────┘
┌─────────────────────────────────────┐
│ Does it have CLI/API/tool access? │
└──────────────┬──────────────────────┘
┌─────┴─────┐
│ │
▼ ▼
YES NO
│ │
│ ▼
│ ┌──────────────────────────────┐
│ │ checkpoint:human-action │
│ │ (email links, 2FA, etc.) │
│ └──────────────────────────────┘
┌────────────────────────────────────────┐
│ task type="auto" │
│ Claude automates via CLI/API │
└────────────┬───────────────────────────┘
┌────────────────────────────────────────┐
│ checkpoint:human-verify │
│ Human confirms visual/functional │
└────────────────────────────────────────┘
```
## Summary
**The rule:** If Claude CAN do it, Claude MUST do it.
Checkpoints are for:
- **Verification** - Confirming Claude's automated work looks/behaves correctly
- **Decisions** - Choosing between valid approaches
- **True blockers** - Rare actions with literally no API/CLI (email links, 2FA)
Checkpoints are NOT for:
- Deploying (use CLI)
- Creating resources (use CLI/API)
- Running builds (use Bash)
- Writing files (use Write tool)
- Anything with automation available
**This keeps the agentic coding workflow intact - Claude does the work, you verify results.**

View File

@@ -0,0 +1,138 @@
<overview>
Claude has a finite context window. This reference defines how to monitor usage and handle approaching limits gracefully.
</overview>
<context_awareness>
Claude receives system warnings showing token usage:
```
Token usage: 150000/200000; 50000 remaining
```
This information appears in `<system_warning>` tags during the conversation.
</context_awareness>
<thresholds>
<threshold level="comfortable" remaining="50%+">
**Status**: Plenty of room
**Action**: Work normally
</threshold>
<threshold level="getting_full" remaining="25%">
**Status**: Context accumulating
**Action**: Mention to user: "Context getting full. Consider wrapping up or creating handoff soon."
**No immediate action required.**
</threshold>
<threshold level="low" remaining="15%">
**Status**: Running low
**Action**:
1. Pause at next safe point (complete current atomic operation)
2. Ask user: "Running low on context (~30k tokens remaining). Options:
- Create handoff now and resume in fresh session
- Push through (risky if complex work remains)"
3. Await user decision
**Do not start new large operations.**
</threshold>
<threshold level="critical" remaining="10%">
**Status**: Must stop
**Action**:
1. Complete current atomic task (don't leave broken state)
2. **Automatically create handoff** without asking
3. Tell user: "Context limit reached. Created handoff at [location]. Start fresh session to continue."
4. **Stop working** - do not start any new tasks
This is non-negotiable. Running out of context mid-task is worse than stopping early.
</threshold>
</thresholds>
<what_counts_as_atomic>
An atomic operation is one that shouldn't be interrupted:
**Atomic (finish before stopping)**:
- Writing a single file
- Running a validation command
- Completing a single task from the plan
**Not atomic (can pause between)**:
- Multiple tasks in sequence
- Multi-file changes (can pause between files)
- Research + implementation (can pause between)
When hitting 10% threshold, finish current atomic operation, then stop.
</what_counts_as_atomic>
<handoff_content_at_limit>
When auto-creating handoff at 10%, include:
```yaml
---
phase: [current phase]
task: [current task number]
total_tasks: [total]
status: context_limit_reached
last_updated: [timestamp]
---
```
Body must capture:
1. What was just completed
2. What task was in progress (and how far)
3. What remains
4. Any decisions/context from this session
Be thorough - the next session starts fresh.
</handoff_content_at_limit>
<preventing_context_bloat>
Strategies to extend context life:
**Don't re-read files unnecessarily**
- Read once, remember content
- Don't cat the same file multiple times
**Summarize rather than quote**
- "The schema has 5 models including User and Session"
- Not: [paste entire schema]
**Use targeted reads**
- Read specific functions, not entire files
- Use grep to find relevant sections
**Clear completed work from "memory"**
- Once a task is done, don't keep referencing it
- Move forward, don't re-explain
**Avoid verbose output**
- Concise responses
- Don't repeat user's question back
- Don't over-explain obvious things
</preventing_context_bloat>
<user_signals>
Watch for user signals that suggest context concern:
- "Let's wrap up"
- "Save my place"
- "I need to step away"
- "Pack it up"
- "Create a handoff"
- "Running low on context?"
Any of these → trigger handoff workflow immediately.
</user_signals>
<fresh_session_guidance>
When user returns in fresh session:
1. They invoke skill
2. Context scan finds handoff
3. Resume workflow activates
4. Load handoff, present summary
5. Delete handoff after confirmation
6. Continue from saved state
The fresh session has full context available again.
</fresh_session_guidance>

View File

@@ -0,0 +1,170 @@
# Domain Expertise Structure
Guide for creating domain expertise skills that work efficiently with create-plans.
## Purpose
Domain expertise provides context-specific knowledge (Swift/macOS patterns, Next.js conventions, Unity workflows) that makes plans more accurate and actionable.
**Critical:** Domain skills must be context-efficient. Loading 20k+ tokens of references defeats the purpose.
## File Structure
```
~/.claude/skills/expertise/[domain-name]/
├── SKILL.md # Core principles + references_index (5-7k tokens)
├── references/ # Selective loading based on phase type
│ ├── always-useful.md # Conventions, patterns used in all phases
│ ├── database.md # Database-specific guidance
│ ├── ui-layout.md # UI-specific guidance
│ ├── api-routes.md # API-specific guidance
│ └── ...
└── workflows/ # Optional: domain-specific workflows
└── ...
```
## SKILL.md Template
```markdown
---
name: [domain-name]
description: [What this expertise covers]
---
<principles>
## Core Principles
[Fundamental patterns that apply to ALL work in this domain]
[Should be complete enough to plan without loading references]
Examples:
- File organization patterns
- Naming conventions
- Architecture patterns
- Common gotchas to avoid
- Framework-specific requirements
**Keep this section comprehensive but concise (~3-5k tokens).**
</principles>
<references_index>
## Reference Loading Guide
When planning phases, load references based on phase type:
**For [phase-type-1] phases:**
- references/[file1].md - [What it contains]
- references/[file2].md - [What it contains]
**For [phase-type-2] phases:**
- references/[file3].md - [What it contains]
- references/[file4].md - [What it contains]
**Always useful (load for any phase):**
- references/conventions.md - [What it contains]
- references/common-patterns.md - [What it contains]
**Examples of phase type mapping:**
- Database/persistence phases → database.md, migrations.md
- UI/layout phases → ui-patterns.md, design-system.md
- API/backend phases → api-routes.md, auth.md
- Integration phases → system-apis.md, third-party.md
</references_index>
<workflows>
## Optional Workflows
[If domain has specific workflows, list them here]
[These are NOT auto-loaded - only used when specifically invoked]
</workflows>
```
## Reference File Guidelines
Each reference file should be:
**1. Focused** - Single concern (database patterns, UI layout, API design)
**2. Actionable** - Contains patterns Claude can directly apply
```markdown
# Database Patterns
## Table Naming
- Singular nouns (User, not Users)
- snake_case for SQL, PascalCase for models
## Common Patterns
- Soft deletes: deleted_at timestamp
- Audit columns: created_at, updated_at
- Foreign keys: [table]_id format
```
**3. Sized appropriately** - 500-2000 lines (~1-5k tokens)
- Too small: Not worth separate file
- Too large: Split into more focused files
**4. Self-contained** - Can be understood without reading other references
## Context Efficiency Examples
**Bad (old approach):**
```
Load all references: 10,728 lines = ~27k tokens
Result: 50% context before planning starts
```
**Good (new approach):**
```
Load SKILL.md: ~5k tokens
Planning UI phase → load ui-layout.md + conventions.md: ~7k tokens
Total: ~12k tokens (saves 15k for workspace)
```
## Phase Type Classification
Help create-plans determine which references to load:
**Common phase types:**
- **Foundation/Setup** - Project structure, dependencies, configuration
- **Database/Data** - Schema, models, migrations, queries
- **API/Backend** - Routes, controllers, business logic, auth
- **UI/Frontend** - Components, layouts, styling, interactions
- **Integration** - External APIs, system services, third-party SDKs
- **Features** - Domain-specific functionality
- **Polish** - Performance, accessibility, error handling
**References should map to these types** so create-plans can load the right context.
## Migration Guide
If you have an existing domain skill with many references:
1. **Audit references** - What's actually useful vs. reference dumps?
2. **Consolidate principles** - Move core patterns into SKILL.md principles section
3. **Create references_index** - Map phase types to relevant references
4. **Test loading** - Verify you can plan a phase with <15k token overhead
5. **Iterate** - Adjust groupings based on actual planning needs
## Example: macos-apps
**Before (inefficient):**
- 20 reference files
- Load all: 10,728 lines (~27k tokens)
**After (efficient):**
SKILL.md contains:
- Swift/SwiftUI core principles
- macOS app architecture patterns
- Common patterns (MV VM, data flow)
- references_index mapping:
- UI phases → swiftui-layout.md, appleHIG.md (~4k)
- Data phases → core-data.md, swift-concurrency.md (~5k)
- System phases → appkit-integration.md, menu-bar.md (~3k)
- Always → swift-conventions.md (~2k)
**Result:** 5-12k tokens instead of 27k (saves 15-22k for planning)

View File

@@ -0,0 +1,106 @@
# Git Integration Reference
## Core Principle
**Commit outcomes, not process.**
The git log should read like a changelog of what shipped, not a diary of planning activity.
## Commit Points (Only 3)
| Event | Commit? | Why |
|-------|---------|-----|
| BRIEF + ROADMAP created | YES | Project initialization |
| PLAN.md created | NO | Intermediate - commit with completion |
| RESEARCH.md created | NO | Intermediate |
| FINDINGS.md created | NO | Intermediate |
| **Phase completed** | YES | Actual code shipped |
| Handoff created | YES | WIP state preserved |
## Git Check on Invocation
```bash
git rev-parse --git-dir 2>/dev/null || echo "NO_GIT_REPO"
```
If NO_GIT_REPO:
- Inline: "No git repo found. Initialize one? (Recommended for version control)"
- If yes: `git init`
## Commit Message Formats
### 1. Project Initialization (brief + roadmap together)
```
docs: initialize [project-name] ([N] phases)
[One-liner from BRIEF.md]
Phases:
1. [phase-name]: [goal]
2. [phase-name]: [goal]
3. [phase-name]: [goal]
```
What to commit:
```bash
git add .planning/
git commit
```
### 2. Phase Completion
```
feat([domain]): [one-liner from SUMMARY.md]
- [Key accomplishment 1]
- [Key accomplishment 2]
- [Key accomplishment 3]
[If issues encountered:]
Note: [issue and resolution]
```
Use `fix([domain])` for bug fix phases.
What to commit:
```bash
git add .planning/phases/XX-name/ # PLAN.md + SUMMARY.md
git add src/ # Actual code created
git commit
```
### 3. Handoff (WIP)
```
wip: [phase-name] paused at task [X]/[Y]
Current: [task name]
[If blocked:] Blocked: [reason]
```
What to commit:
```bash
git add .planning/
git commit
```
## Example Clean Git Log
```
a]7f2d1 feat(checkout): Stripe payments with webhook verification
b]3e9c4 feat(products): catalog with search, filters, and pagination
c]8a1b2 feat(auth): JWT with refresh rotation using jose
d]5c3d7 feat(foundation): Next.js 15 + Prisma + Tailwind scaffold
e]2f4a8 docs: initialize ecommerce-app (5 phases)
```
## What NOT To Commit Separately
- PLAN.md creation (wait for phase completion)
- RESEARCH.md (intermediate)
- FINDINGS.md (intermediate)
- Minor planning tweaks
- "Fixed typo in roadmap"
These create noise. Commit outcomes, not process.

View File

@@ -0,0 +1,142 @@
<overview>
The planning hierarchy ensures context flows down and progress flows up.
Each level builds on the previous and enables the next.
</overview>
<hierarchy>
```
BRIEF.md ← Vision (human-focused)
ROADMAP.md ← Structure (phases)
phases/XX/PLAN.md ← Implementation (Claude-executable)
prompts/ ← Execution (via create-meta-prompts)
```
</hierarchy>
<level name="brief">
**Purpose**: Capture vision, goals, constraints
**Audience**: Human (the user)
**Contains**: What we're building, why, success criteria, out of scope
**Creates**: `.planning/BRIEF.md`
**Requires**: Nothing (can start here)
**Enables**: Roadmap creation
This is the ONLY document optimized for human reading.
</level>
<level name="roadmap">
**Purpose**: Define phases and sequence
**Audience**: Both human and Claude
**Contains**: Phase names, goals, dependencies, progress tracking
**Creates**: `.planning/ROADMAP.md`, `.planning/phases/` directories
**Requires**: Brief (or quick context if skipping)
**Enables**: Phase planning
Roadmap looks UP to Brief for scope, looks DOWN to track phase completion.
</level>
<level name="phase_plan">
**Purpose**: Define Claude-executable tasks
**Audience**: Claude (the implementer)
**Contains**: Tasks with Files/Action/Verification/Done-when
**Creates**: `.planning/phases/XX-name/PLAN.md`
**Requires**: Roadmap (to know phase scope)
**Enables**: Prompt generation, direct execution
Phase plan looks UP to Roadmap for scope, produces implementation details.
</level>
<level name="prompts">
**Purpose**: Optimized execution instructions
**Audience**: Claude (via create-meta-prompts)
**Contains**: Research/Plan/Do prompts with metadata
**Creates**: `.planning/phases/XX-name/prompts/`
**Requires**: Phase plan (tasks to execute)
**Enables**: Autonomous execution
Prompts are generated from phase plan via create-meta-prompts skill.
</level>
<navigation_rules>
<looking_up>
When creating a lower-level artifact, ALWAYS read higher levels for context:
- Creating Roadmap → Read Brief
- Planning Phase → Read Roadmap AND Brief
- Generating Prompts → Read Phase Plan AND Roadmap
This ensures alignment with overall vision.
</looking_up>
<looking_down>
When updating a higher-level artifact, check lower levels for status:
- Updating Roadmap progress → Check which phase PLANs exist, completion state
- Reviewing Brief → See how far we've come via Roadmap
This enables progress tracking.
</looking_down>
<missing_prerequisites>
If a prerequisite doesn't exist:
```
Creating phase plan but no roadmap exists.
Options:
1. Create roadmap first (recommended)
2. Create quick roadmap placeholder
3. Proceed anyway (not recommended - loses hierarchy benefits)
```
Always offer to create missing pieces rather than skipping.
</missing_prerequisites>
</navigation_rules>
<file_locations>
All planning artifacts in `.planning/`:
```
.planning/
├── BRIEF.md # One per project
├── ROADMAP.md # One per project
└── phases/
├── 01-phase-name/
│ ├── PLAN.md # One per phase
│ ├── .continue-here.md # Temporary (when paused)
│ └── prompts/ # Generated execution prompts
├── 02-phase-name/
│ ├── PLAN.md
│ └── prompts/
└── ...
```
Phase directories use `XX-kebab-case` for consistent ordering.
</file_locations>
<scope_inheritance>
Each level inherits and narrows scope:
**Brief**: "Build a task management app"
**Roadmap**: "Phase 1: Core task CRUD, Phase 2: Projects, Phase 3: Collaboration"
**Phase 1 Plan**: "Task 1: Database schema, Task 2: API endpoints, Task 3: UI"
Scope flows DOWN and gets more specific.
Progress flows UP and gets aggregated.
</scope_inheritance>
<cross_phase_context>
When planning Phase N, Claude should understand:
- What Phase N-1 delivered (completed work)
- What Phase N should build on (foundations)
- What Phase N+1 will need (don't paint into corner)
Read previous phase's PLAN.md to understand current state.
</cross_phase_context>

View File

@@ -0,0 +1,495 @@
# Milestone Management & Greenfield/Brownfield Planning
Milestones mark shipped versions. They solve the "what happens after v1.0?" problem.
## The Core Problem
**After shipping v1.0:**
- Planning artifacts optimized for greenfield (starting from scratch)
- But now you have: existing code, users, constraints, shipped features
- Need brownfield awareness without losing planning structure
**Solution:** Milestone-bounded extensions with updated BRIEF.
## Three Planning Modes
### 1. Greenfield (v1.0 Initial Development)
**Characteristics:**
- No existing code
- No users
- No constraints from shipped versions
- Pure "build from scratch" mode
**Planning structure:**
```
.planning/
├── BRIEF.md # Original vision
├── ROADMAP.md # Phases 1-4
└── phases/
├── 01-foundation/
├── 02-features/
├── 03-polish/
└── 04-launch/
```
**BRIEF.md looks like:**
```markdown
# Project Brief: AppName
**Vision:** Build a thing that does X
**Purpose:** Solve problem Y
**Scope:**
- Feature A
- Feature B
- Feature C
**Success:** Ships and works
```
**Workflow:** Normal planning → execution → transition flow
---
### 2. Brownfield Extensions (v1.1, v1.2 - Same Codebase)
**Characteristics:**
- v1.0 shipped and in use
- Adding features / fixing issues
- Same codebase, continuous evolution
- Existing code referenced in new plans
**Planning structure:**
```
.planning/
├── BRIEF.md # Updated with "Current State"
├── ROADMAP.md # Phases 1-6 (grouped by milestone)
├── MILESTONES.md # v1.0 entry
└── phases/
├── 01-foundation/ # ✓ v1.0
├── 02-features/ # ✓ v1.0
├── 03-polish/ # ✓ v1.0
├── 04-launch/ # ✓ v1.0
├── 05-security/ # 🚧 v1.1 (in progress)
└── 06-performance/ # 📋 v1.1 (planned)
```
**BRIEF.md updated:**
```markdown
# Project Brief: AppName
## Current State (Updated: 2025-12-01)
**Shipped:** v1.0 MVP (2025-11-25)
**Users:** 500 downloads, 50 daily actives
**Feedback:** Requesting dark mode, occasional crashes on network errors
**Codebase:** 2,450 lines Swift, macOS 13.0+, AppKit
## v1.1 Goals
**Vision:** Harden reliability and add dark mode based on user feedback
**Motivation:**
- 5 crash reports related to network errors
- 15 users requested dark mode
- Want to improve before marketing push
**Scope (v1.1):**
- Comprehensive error handling
- Dark mode support
- Crash reporting integration
---
<details>
<summary>Original Vision (v1.0 - Archived)</summary>
[Original brief content]
</details>
```
**ROADMAP.md updated:**
```markdown
# Roadmap: AppName
## Milestones
-**v1.0 MVP** - Phases 1-4 (shipped 2025-11-25)
- 🚧 **v1.1 Hardening** - Phases 5-6 (in progress)
## Phases
<details>
<summary>✅ v1.0 MVP (Phases 1-4) - SHIPPED 2025-11-25</summary>
- [x] Phase 1: Foundation
- [x] Phase 2: Core Features
- [x] Phase 3: Polish
- [x] Phase 4: Launch
</details>
### 🚧 v1.1 Hardening (In Progress)
- [ ] Phase 5: Error Handling & Stability
- [ ] Phase 6: Dark Mode UI
```
**How plans become brownfield-aware:**
When planning Phase 5, the PLAN.md automatically gets context:
```markdown
<context>
@.planning/BRIEF.md # Knows: v1.0 shipped, codebase exists
@.planning/MILESTONES.md # Knows: what v1.0 delivered
@AppName/NetworkManager.swift # Existing code to improve
@AppName/APIClient.swift # Existing code to fix
</context>
<tasks>
<task type="auto">
<name>Add comprehensive error handling to NetworkManager</name>
<files>AppName/NetworkManager.swift</files>
<action>Existing NetworkManager has basic try/catch. Add: retry logic (3 attempts with exponential backoff), specific error types (NetworkError enum), user-friendly error messages. Maintain existing public API - internal improvements only.</action>
<verify>Build succeeds, existing tests pass, new error tests pass</verify>
<done>All network calls have retry logic, error messages are user-friendly</done>
</task>
```
**Key difference from greenfield:**
- PLAN references existing files in `<context>`
- Tasks say "update existing X" not "create X"
- Verify includes "existing tests pass" (regression check)
- Checkpoints may verify existing behavior still works
---
### 3. Major Iterations (v2.0+ - Still Same Codebase)
**Characteristics:**
- Large rewrites within same codebase
- 8-15+ phases planned
- Breaking changes, new architecture
- Still continuous from v1.x
**Planning structure:**
```
.planning/
├── BRIEF.md # Updated for v2.0 vision
├── ROADMAP.md # Phases 1-14 (grouped)
├── MILESTONES.md # v1.0, v1.1 entries
└── phases/
├── 01-foundation/ # ✓ v1.0
├── 02-features/ # ✓ v1.0
├── 03-polish/ # ✓ v1.0
├── 04-launch/ # ✓ v1.0
├── 05-security/ # ✓ v1.1
├── 06-performance/ # ✓ v1.1
├── 07-swiftui-core/ # 🚧 v2.0 (in progress)
├── 08-swiftui-views/ # 📋 v2.0 (planned)
├── 09-new-arch/ # 📋 v2.0
└── ... # Up to 14
```
**ROADMAP.md:**
```markdown
## Milestones
-**v1.0 MVP** - Phases 1-4 (shipped 2025-11-25)
-**v1.1 Hardening** - Phases 5-6 (shipped 2025-12-10)
- 🚧 **v2.0 SwiftUI Redesign** - Phases 7-14 (in progress)
## Phases
<details>
<summary>✅ v1.0 MVP (Phases 1-4)</summary>
[Collapsed]
</details>
<details>
<summary>✅ v1.1 Hardening (Phases 5-6)</summary>
[Collapsed]
</details>
### 🚧 v2.0 SwiftUI Redesign (In Progress)
- [ ] Phase 7: SwiftUI Core Migration
- [ ] Phase 8: SwiftUI Views
- [ ] Phase 9: New Architecture
- [ ] Phase 10: Widget Support
- [ ] Phase 11: iOS Companion
- [ ] Phase 12: Performance
- [ ] Phase 13: Testing
- [ ] Phase 14: Launch
```
**Same rules apply:** Continuous phase numbering, milestone groupings, brownfield-aware plans.
---
## When to Archive and Start Fresh
**Archive ONLY for these scenarios:**
### Scenario 1: Separate Codebase
**Example:**
- Built: WeatherBar (macOS app) ✓ shipped
- Now building: WeatherBar-iOS (separate Xcode project, different repo or workspace)
**Action:**
```
.planning/
├── archive/
│ └── v1-macos/
│ ├── BRIEF.md
│ ├── ROADMAP.md
│ ├── MILESTONES.md
│ └── phases/
├── BRIEF.md # Fresh: iOS app
├── ROADMAP.md # Fresh: starts at phase 01
└── phases/
└── 01-ios-foundation/
```
**Why:** Different codebase = different planning context. Old planning doesn't help with iOS-specific decisions.
### Scenario 2: Complete Rewrite (Different Repo)
**Example:**
- Built: AppName v1 (AppKit, shipped) ✓
- Now building: AppName v2 (complete SwiftUI rewrite, new git repo)
**Action:** Same as Scenario 1 - archive v1, fresh planning for v2
**Why:** New repo, starting from scratch, v1 planning doesn't transfer.
### Scenario 3: Different Product
**Example:**
- Built: WeatherBar (weather app) ✓
- Now building: TaskBar (task management app)
**Action:** New project entirely, new `.planning/` directory
**Why:** Completely different product, no relationship.
---
## Decision Tree
```
Starting new work?
├─ Same codebase/repo?
│ │
│ ├─ YES → Extend existing roadmap
│ │ ├─ Add phases 5-6+ to ROADMAP
│ │ ├─ Update BRIEF "Current State"
│ │ ├─ Plans reference existing code in @context
│ │ └─ Continue normal workflow
│ │
│ └─ NO → Is it a separate platform/codebase for same product?
│ │
│ ├─ YES (e.g., iOS version of Mac app)
│ │ └─ Archive existing planning
│ │ └─ Start fresh with new BRIEF/ROADMAP
│ │ └─ Reference original in "Context" section
│ │
│ └─ NO (completely different product)
│ └─ New project, new planning directory
└─ Is this v1.0 initial delivery?
└─ YES → Greenfield mode
└─ Just follow normal workflow
```
---
## Milestone Workflow Triggers
### When completing v1.0 (first ship):
**User:** "I'm ready to ship v1.0"
**Action:**
1. Verify phases 1-4 complete (all summaries exist)
2. `/milestone:complete "v1.0 MVP"`
3. Creates MILESTONES.md entry
4. Updates BRIEF with "Current State"
5. Reorganizes ROADMAP with milestone grouping
6. Git tag v1.0
7. Commit milestone changes
**Result:** Historical record created, ready for v1.1 work
### When adding v1.1 work:
**User:** "Add dark mode and notifications"
**Action:**
1. Check BRIEF "Current State" - sees v1.0 shipped
2. Ask: "Add phases 5-6 to existing roadmap? (yes / archive and start fresh)"
3. User: "yes"
4. Update BRIEF with v1.1 goals
5. Add Phase 5-6 to ROADMAP under "v1.1" milestone heading
6. Continue normal planning workflow
**Result:** Phases 5-6 added, brownfield-aware through updated BRIEF
### When completing v1.1:
**User:** "Ship v1.1"
**Action:**
1. Verify phases 5-6 complete
2. `/milestone:complete "v1.1 Security"`
3. Add v1.1 entry to MILESTONES.md (prepended, newest first)
4. Update BRIEF current state to v1.1
5. Collapse phases 5-6 in ROADMAP
6. Git tag v1.1
**Result:** v1.0 and v1.1 both in MILESTONES.md, ROADMAP shows history
---
## Brownfield Plan Patterns
**How a brownfield plan differs from greenfield:**
### Greenfield Plan (v1.0):
```markdown
<objective>
Create authentication system from scratch.
</objective>
<context>
@.planning/BRIEF.md
@.planning/ROADMAP.md
</context>
<tasks>
<task type="auto">
<name>Create User model</name>
<files>src/models/User.ts</files>
<action>Create User interface with id, email, passwordHash, createdAt fields. Export from models/index.</action>
<verify>TypeScript compiles, User type exported</verify>
<done>User model exists and is importable</done>
</task>
```
### Brownfield Plan (v1.1):
```markdown
<objective>
Add MFA to existing authentication system.
</objective>
<context>
@.planning/BRIEF.md # Shows v1.0 shipped, auth exists
@.planning/MILESTONES.md # Shows what v1.0 delivered
@src/models/User.ts # Existing User model
@src/auth/AuthService.ts # Existing auth logic
</context>
<tasks>
<task type="auto">
<name>Add MFA fields to User model</name>
<files>src/models/User.ts</files>
<action>Add to existing User interface: mfaEnabled (boolean), mfaSecret (string | null), mfaBackupCodes (string[]). Maintain backward compatibility - all new fields optional or have defaults.</action>
<verify>TypeScript compiles, existing User usages still work</verify>
<done>User model has MFA fields, no breaking changes</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>MFA enrollment flow</what-built>
<how-to-verify>
1. Run: npm run dev
2. Login as existing user (test@example.com)
3. Navigate to Settings → Security
4. Click "Enable MFA" - should show QR code
5. Scan with authenticator app (Google Authenticator)
6. Enter code - should enable successfully
7. Logout, login again - should prompt for MFA code
8. Verify: existing users without MFA can still login (backward compat)
</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
**Key differences:**
1. **@context** includes existing code files
2. **Actions** say "add to existing" / "update existing" / "maintain backward compat"
3. **Verification** includes regression checks ("existing X still works")
4. **Checkpoints** may verify existing user flows still work
---
## BRIEF Current State Section
The "Current State" section in BRIEF.md is what makes plans brownfield-aware.
**After v1.0 ships:**
```markdown
## Current State (Updated: 2025-11-25)
**Shipped:** v1.0 MVP (2025-11-25)
**Status:** Production
**Users:** 500 downloads, 50 daily actives, growing 10% weekly
**Feedback:**
- "Love the simplicity" (common theme)
- 15 requests for dark mode
- 5 crash reports on network errors
- 3 requests for multiple accounts
**Codebase:**
- 2,450 lines of Swift
- macOS 13.0+ (AppKit)
- OpenWeather API integration
- Auto-refresh every 30 min
- Signed and notarized
**Known Issues:**
- Network errors crash app (no retry logic)
- Memory leak in auto-refresh timer
- No dark mode support
```
When planning Phase 5 (v1.1), Claude reads this and knows:
- Code exists (2,450 lines Swift)
- Users exist (500 downloads)
- Feedback exists (15 want dark mode)
- Issues exist (network crashes, memory leak)
Plans automatically become brownfield-aware because BRIEF says "this is what we have."
---
## Summary
**Greenfield (v1.0):**
- Fresh BRIEF with vision
- Phases 1-4 (or however many)
- Plans create from scratch
- Ship → complete milestone
**Brownfield (v1.1+):**
- Update BRIEF "Current State"
- Add phases 5-6+ to ROADMAP
- Plans reference existing code
- Plans include regression checks
- Ship → complete milestone
**Archive (rare):**
- Only for separate codebases or different products
- Move `.planning/` to `.planning/archive/v1-name/`
- Start fresh with new BRIEF/ROADMAP
- New planning references old in context
**Key insight:** Same roadmap, continuous phase numbering (01-99), milestone groupings keep it organized. BRIEF "Current State" makes everything brownfield-aware automatically.
This scales from "hello world" to 100 shipped versions.

View File

@@ -0,0 +1,377 @@
<overview>
Claude-executable plans have a specific format that enables Claude to implement without interpretation. This reference defines what makes a plan executable vs. vague.
**Key insight:** PLAN.md IS the executable prompt. It contains everything Claude needs to execute the phase, including objective, context references, tasks, verification, success criteria, and output specification.
</overview>
<core_principle>
A plan is Claude-executable when Claude can read the PLAN.md and immediately start implementing without asking clarifying questions.
If Claude has to guess, interpret, or make assumptions - the task is too vague.
</core_principle>
<prompt_structure>
Every PLAN.md follows this XML structure:
```markdown
---
phase: XX-name
type: execute
domain: [optional]
---
<objective>
[What and why]
Purpose: [...]
Output: [...]
</objective>
<context>
@.planning/BRIEF.md
@.planning/ROADMAP.md
@relevant/source/files.ts
</context>
<tasks>
<task type="auto">
<name>Task N: [Name]</name>
<files>[paths]</files>
<action>[what to do, what to avoid and WHY]</action>
<verify>[command/check]</verify>
<done>[criteria]</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>[what Claude automated]</what-built>
<how-to-verify>[numbered verification steps]</how-to-verify>
<resume-signal>[how to continue - "approved" or describe issues]</resume-signal>
</task>
<task type="checkpoint:decision" gate="blocking">
<decision>[what needs deciding]</decision>
<context>[why this matters]</context>
<options>
<option id="option-a"><name>[Name]</name><pros>[pros]</pros><cons>[cons]</cons></option>
<option id="option-b"><name>[Name]</name><pros>[pros]</pros><cons>[cons]</cons></option>
</options>
<resume-signal>[how to indicate choice]</resume-signal>
</task>
</tasks>
<verification>
[Overall phase checks]
</verification>
<success_criteria>
[Measurable completion]
</success_criteria>
<output>
[SUMMARY.md specification]
</output>
```
</prompt_structure>
<task_anatomy>
Every task has four required fields:
<field name="files">
**What it is**: Exact file paths that will be created or modified.
**Good**: `src/app/api/auth/login/route.ts`, `prisma/schema.prisma`
**Bad**: "the auth files", "relevant components"
Be specific. If you don't know the file path, figure it out first.
</field>
<field name="action">
**What it is**: Specific implementation instructions, including what to avoid and WHY.
**Good**: "Create POST endpoint that accepts {email, password}, validates using bcrypt against User table, returns JWT in httpOnly cookie with 15-min expiry. Use jose library (not jsonwebtoken - CommonJS issues with Next.js Edge runtime)."
**Bad**: "Add authentication", "Make login work"
Include: technology choices, data structures, behavior details, pitfalls to avoid.
</field>
<field name="verify">
**What it is**: How to prove the task is complete.
**Good**:
- `npm test` passes
- `curl -X POST /api/auth/login` returns 200 with Set-Cookie header
- Build completes without errors
**Bad**: "It works", "Looks good", "User can log in"
Must be executable - a command, a test, an observable behavior.
</field>
<field name="done">
**What it is**: Acceptance criteria - the measurable state of completion.
**Good**: "Valid credentials return 200 + JWT cookie, invalid credentials return 401"
**Bad**: "Authentication is complete"
Should be testable without subjective judgment.
</field>
</task_anatomy>
<task_types>
Tasks have a `type` attribute that determines how they execute:
<type name="auto">
**Default task type** - Claude executes autonomously.
**Structure:**
```xml
<task type="auto">
<name>Task 3: Create login endpoint with JWT</name>
<files>src/app/api/auth/login/route.ts</files>
<action>POST endpoint accepting {email, password}. Query User by email, compare password with bcrypt. On match, create JWT with jose library, set as httpOnly cookie (15-min expiry). Return 200. On mismatch, return 401.</action>
<verify>curl -X POST localhost:3000/api/auth/login returns 200 with Set-Cookie header</verify>
<done>Valid credentials → 200 + cookie. Invalid → 401.</done>
</task>
```
Use for: Everything Claude can do independently (code, tests, builds, file operations).
</type>
<type name="checkpoint:human-action">
**RARELY USED** - Only for actions with NO CLI/API. Claude automates everything possible first.
**Structure:**
```xml
<task type="checkpoint:human-action" gate="blocking">
<action>[Unavoidable manual step - email link, 2FA code]</action>
<instructions>
[What Claude already automated]
[The ONE thing requiring human action]
</instructions>
<verification>[What Claude can check afterward]</verification>
<resume-signal>[How to continue]</resume-signal>
</task>
```
Use ONLY for: Email verification links, SMS 2FA codes, manual approvals with no API, 3D Secure payment flows.
Do NOT use for: Anything with a CLI (Vercel, Stripe, Upstash, Railway, GitHub), builds, tests, file creation, deployments.
See: references/cli-automation.md for what Claude can automate.
**Execution:** Claude automates everything with CLI/API, stops only for truly unavoidable manual steps.
</type>
<type name="checkpoint:human-verify">
**Human must verify Claude's work** - Visual checks, UX testing.
**Structure:**
```xml
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Responsive dashboard layout</what-built>
<how-to-verify>
1. Run: npm run dev
2. Visit: http://localhost:3000/dashboard
3. Desktop (>1024px): Verify sidebar left, content right
4. Tablet (768px): Verify sidebar collapses to hamburger
5. Mobile (375px): Verify single column, bottom nav
6. Check: No layout shift, no horizontal scroll
</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
Use for: UI/UX verification, visual design checks, animation smoothness, accessibility testing.
**Execution:** Claude builds the feature, stops, provides testing instructions, waits for approval/feedback.
</type>
<type name="checkpoint:decision">
**Human must make implementation choice** - Direction-setting decisions.
**Structure:**
```xml
<task type="checkpoint:decision" gate="blocking">
<decision>Select authentication provider</decision>
<context>We need user authentication. Three approaches with different tradeoffs:</context>
<options>
<option id="supabase">
<name>Supabase Auth</name>
<pros>Built-in with Supabase, generous free tier</pros>
<cons>Less customizable UI, tied to ecosystem</cons>
</option>
<option id="clerk">
<name>Clerk</name>
<pros>Beautiful pre-built UI, best DX</pros>
<cons>Paid after 10k MAU</cons>
</option>
<option id="nextauth">
<name>NextAuth.js</name>
<pros>Free, self-hosted, maximum control</pros>
<cons>More setup, you manage security</cons>
</option>
</options>
<resume-signal>Select: supabase, clerk, or nextauth</resume-signal>
</task>
```
Use for: Technology selection, architecture decisions, design choices, feature prioritization.
**Execution:** Claude presents options with balanced pros/cons, waits for decision, proceeds with chosen direction.
</type>
**When to use checkpoints:**
- Visual/UX verification (after Claude builds) → `checkpoint:human-verify`
- Implementation direction choice → `checkpoint:decision`
- Truly unavoidable manual actions (email links, 2FA) → `checkpoint:human-action` (rare)
**When NOT to use checkpoints:**
- Anything with CLI/API (Claude automates it) → `type="auto"`
- Deployments (Vercel, Railway, Fly) → `type="auto"` with CLI
- Creating resources (Upstash, Stripe, GitHub) → `type="auto"` with CLI/API
- File operations, tests, builds → `type="auto"`
**Golden rule:** If Claude CAN automate it, Claude MUST automate it. See: references/cli-automation.md
See `references/checkpoints.md` for comprehensive checkpoint guidance.
</task_types>
<context_references>
Use @file references to load context for the prompt:
```markdown
<context>
@.planning/BRIEF.md # Project vision
@.planning/ROADMAP.md # Phase structure
@.planning/phases/02-auth/FINDINGS.md # Research results
@src/lib/db.ts # Existing database setup
@src/types/user.ts # Existing type definitions
</context>
```
Reference files that Claude needs to understand before implementing.
</context_references>
<verification_section>
Overall phase verification (beyond individual task verification):
```markdown
<verification>
Before declaring phase complete:
- [ ] `npm run build` succeeds without errors
- [ ] `npm test` passes all tests
- [ ] No TypeScript errors
- [ ] Feature works end-to-end manually
</verification>
```
</verification_section>
<success_criteria_section>
Measurable criteria for phase completion:
```markdown
<success_criteria>
- All tasks completed
- All verification checks pass
- No errors or warnings introduced
- JWT auth flow works end-to-end
- Protected routes redirect unauthenticated users
</success_criteria>
```
</success_criteria_section>
<output_section>
Specify the SUMMARY.md structure:
```markdown
<output>
After completion, create `.planning/phases/XX-name/SUMMARY.md`:
# Phase X: Name Summary
**[Substantive one-liner]**
## Accomplishments
## Files Created/Modified
## Decisions Made
## Issues Encountered
## Next Phase Readiness
</output>
```
</output_section>
<specificity_levels>
<too_vague>
```xml
<task type="auto">
<name>Task 1: Add authentication</name>
<files>???</files>
<action>Implement auth</action>
<verify>???</verify>
<done>Users can authenticate</done>
</task>
```
Claude: "How? What type? What library? Where?"
</too_vague>
<just_right>
```xml
<task type="auto">
<name>Task 1: Create login endpoint with JWT</name>
<files>src/app/api/auth/login/route.ts</files>
<action>POST endpoint accepting {email, password}. Query User by email, compare password with bcrypt. On match, create JWT with jose library, set as httpOnly cookie (15-min expiry). Return 200. On mismatch, return 401. Use jose instead of jsonwebtoken (CommonJS issues with Edge).</action>
<verify>curl -X POST localhost:3000/api/auth/login -H "Content-Type: application/json" -d '{"email":"test@test.com","password":"test123"}' returns 200 with Set-Cookie header containing JWT</verify>
<done>Valid credentials → 200 + cookie. Invalid → 401. Missing fields → 400.</done>
</task>
```
Claude can implement this immediately.
</just_right>
<too_detailed>
Writing the actual code in the plan. Trust Claude to implement from clear instructions.
</too_detailed>
</specificity_levels>
<anti_patterns>
<vague_actions>
- "Set up the infrastructure"
- "Handle edge cases"
- "Make it production-ready"
- "Add proper error handling"
These require Claude to decide WHAT to do. Specify it.
</vague_actions>
<unverifiable_completion>
- "It works correctly"
- "User experience is good"
- "Code is clean"
- "Tests pass" (which tests? do they exist?)
These require subjective judgment. Make it objective.
</unverifiable_completion>
<missing_context>
- "Use the standard approach"
- "Follow best practices"
- "Like the other endpoints"
Claude doesn't know your standards. Be explicit.
</missing_context>
</anti_patterns>
<sizing_tasks>
Good task size: 15-60 minutes of Claude work.
**Too small**: "Add import statement for bcrypt" (combine with related task)
**Just right**: "Create login endpoint with JWT validation" (focused, specific)
**Too big**: "Implement full authentication system" (split into multiple plans)
If a task takes multiple sessions, break it down.
If a task is trivial, combine with related tasks.
**Note on scope:** If a phase has >7 tasks or spans multiple subsystems, split into multiple plans using the naming convention `{phase}-{plan}-PLAN.md`. See `references/scope-estimation.md` for guidance.
</sizing_tasks>

View File

@@ -0,0 +1,198 @@
# Research Pitfalls - Known Patterns to Avoid
## Purpose
This document catalogs research mistakes discovered in production use, providing specific patterns to avoid and verification strategies to prevent recurrence.
## Known Pitfalls
### Pitfall 1: Configuration Scope Assumptions
**What**: Assuming global configuration means no project-scoping exists
**Example**: Concluding "MCP servers are configured GLOBALLY only" while missing project-scoped `.mcp.json`
**Why it happens**: Not explicitly checking all known configuration patterns
**Prevention**:
```xml
<verification_checklist>
**CRITICAL**: Verify ALL configuration scopes:
□ User/global scope - System-wide configuration
□ Project scope - Project-level configuration files
□ Local scope - Project-specific user overrides
□ Workspace scope - IDE/tool workspace settings
□ Environment scope - Environment variables
</verification_checklist>
```
### Pitfall 2: "Search for X" Vagueness
**What**: Asking researchers to "search for documentation" without specifying where
**Example**: "Research MCP documentation" → finds outdated community blog instead of official docs
**Why it happens**: Vague research instructions don't specify exact sources
**Prevention**:
```xml
<sources>
Official sources (use WebFetch):
- https://exact-url-to-official-docs
- https://exact-url-to-api-reference
Search queries (use WebSearch):
- "specific search query {current_year}"
- "another specific query {current_year}"
</sources>
```
### Pitfall 3: Deprecated vs Current Features
**What**: Finding archived/old documentation and concluding feature doesn't exist
**Example**: Finding 2022 docs saying "feature not supported" when current version added it
**Why it happens**: Not checking multiple sources or recent updates
**Prevention**:
```xml
<verification_checklist>
□ Check current official documentation
□ Review changelog/release notes for recent updates
□ Verify version numbers and publication dates
□ Cross-reference multiple authoritative sources
</verification_checklist>
```
### Pitfall 4: Tool-Specific Variations
**What**: Conflating capabilities across different tools/environments
**Example**: "Claude Desktop supports X" ≠ "Claude Code supports X"
**Why it happens**: Not explicitly checking each environment separately
**Prevention**:
```xml
<verification_checklist>
□ Claude Desktop capabilities
□ Claude Code capabilities
□ VS Code extension capabilities
□ API/SDK capabilities
Document which environment supports which features
</verification_checklist>
```
### Pitfall 5: Confident Negative Claims Without Citations
**What**: Making definitive "X is not possible" statements without official source verification
**Example**: "Folder-scoped MCP configuration is not supported" (missing `.mcp.json`)
**Why it happens**: Drawing conclusions from absence of evidence rather than evidence of absence
**Prevention**:
```xml
<critical_claims_audit>
For any "X is not possible" or "Y is the only way" statement:
- [ ] Is this verified by official documentation stating it explicitly?
- [ ] Have I checked for recent updates that might change this?
- [ ] Have I verified all possible approaches/mechanisms?
- [ ] Am I confusing "I didn't find it" with "it doesn't exist"?
</critical_claims_audit>
```
### Pitfall 6: Missing Enumeration
**What**: Investigating open-ended scope without enumerating known possibilities first
**Example**: "Research configuration options" instead of listing specific options to verify
**Why it happens**: Not creating explicit checklist of items to investigate
**Prevention**:
```xml
<verification_checklist>
Enumerate ALL known options FIRST:
□ Option 1: [specific item]
□ Option 2: [specific item]
□ Option 3: [specific item]
□ Check for additional unlisted options
For each option above, document:
- Existence (confirmed/not found/unclear)
- Official source URL
- Current status (active/deprecated/beta)
</verification_checklist>
```
### Pitfall 7: Single-Source Verification
**What**: Relying on a single source for critical claims
**Example**: Using only Stack Overflow answer from 2021 for current best practices
**Why it happens**: Not cross-referencing multiple authoritative sources
**Prevention**:
```xml
<source_verification>
For critical claims, require multiple sources:
- [ ] Official documentation (primary)
- [ ] Release notes/changelog (for currency)
- [ ] Additional authoritative source (for verification)
- [ ] Contradiction check (ensure sources agree)
</source_verification>
```
### Pitfall 8: Assumed Completeness
**What**: Assuming search results are complete and authoritative
**Example**: First Google result is outdated but assumed current
**Why it happens**: Not verifying publication dates and source authority
**Prevention**:
```xml
<source_verification>
For each source consulted:
- [ ] Publication/update date verified (prefer recent/current)
- [ ] Source authority confirmed (official docs, not blogs)
- [ ] Version relevance checked (matches current version)
- [ ] Multiple search queries tried (not just one)
</source_verification>
```
## Red Flags in Research Outputs
### 🚩 Red Flag 1: Zero "Not Found" Results
**Warning**: Every investigation succeeds perfectly
**Problem**: Real research encounters dead ends, ambiguity, and unknowns
**Action**: Expect honest reporting of limitations, contradictions, and gaps
### 🚩 Red Flag 2: No Confidence Indicators
**Warning**: All findings presented as equally certain
**Problem**: Can't distinguish verified facts from educated guesses
**Action**: Require confidence levels (High/Medium/Low) for key findings
### 🚩 Red Flag 3: Missing URLs
**Warning**: "According to documentation..." without specific URL
**Problem**: Can't verify claims or check for updates
**Action**: Require actual URLs for all official documentation claims
### 🚩 Red Flag 4: Definitive Statements Without Evidence
**Warning**: "X cannot do Y" or "Z is the only way" without citation
**Problem**: Strong claims require strong evidence
**Action**: Flag for verification against official sources
### 🚩 Red Flag 5: Incomplete Enumeration
**Warning**: Verification checklist lists 4 items, output covers 2
**Problem**: Systematic gaps in coverage
**Action**: Ensure all enumerated items addressed or marked "not found"
## Continuous Improvement
When research gaps occur:
1. **Document the gap**
- What was missed or incorrect?
- What was the actual correct information?
- What was the impact?
2. **Root cause analysis**
- Why wasn't it caught?
- Which verification step would have prevented it?
- What pattern does this reveal?
3. **Update this document**
- Add new pitfall entry
- Update relevant checklists
- Share lesson learned
## Quick Reference Checklist
Before submitting research, verify:
- [ ] All enumerated items investigated (not just some)
- [ ] Negative claims verified with official docs
- [ ] Multiple sources cross-referenced for critical claims
- [ ] URLs provided for all official documentation
- [ ] Publication dates checked (prefer recent/current)
- [ ] Tool/environment-specific variations documented
- [ ] Confidence levels assigned honestly
- [ ] Assumptions distinguished from verified facts
- [ ] "What might I have missed?" review completed
---
**Living Document**: Update after each significant research gap
**Lessons From**: MCP configuration research gap (missed `.mcp.json`)

View File

@@ -0,0 +1,415 @@
# Scope Estimation & Quality-Driven Plan Splitting
Plans must maintain consistent quality from first task to last. This requires understanding the **quality degradation curve** and splitting aggressively to stay in the peak quality zone.
## The Quality Degradation Curve
**Critical insight:** Claude doesn't degrade at arbitrary percentages - it degrades when it *perceives* context pressure and enters "completion mode."
```
Context Usage │ Quality Level │ Claude's Mental State
─────────────────────────────────────────────────────────
0-30% │ ████████ PEAK │ "I can be thorough and comprehensive"
│ │ No anxiety, full detail, best work
30-50% │ ██████ GOOD │ "Still have room, maintaining quality"
│ │ Engaged, confident, solid work
50-70% │ ███ DEGRADING │ "Getting tight, need to be efficient"
│ │ Efficiency mode, compression begins
70%+ │ █ POOR │ "Running out, must finish quickly"
│ │ Self-lobotomization, rushed, minimal
```
**The 40-50% inflection point:**
This is where quality breaks. Claude sees context mounting and thinks "I'd better conserve now or I won't finish." Result: The classic mid-execution statement "I'll complete the remaining tasks more concisely" = quality crash.
**The fundamental rule:** Stop BEFORE quality degrades, not at context limit.
## Target: 50% Context Maximum
**Plans should complete within ~50% of context usage.**
Why 50% not 80%?
- Huge safety buffer
- No context anxiety possible
- Quality maintained from start to finish
- Room for unexpected complexity
- Space for iteration and fixes
**If you target 80%, you're planning for failure.** By the time you hit 80%, you've already spent 40% in degradation mode.
## The 2-3 Task Rule
**Each plan should contain 2-3 tasks maximum.**
Why this number?
**Task 1 (0-15% context):**
- Fresh context
- Peak quality
- Comprehensive implementation
- Full testing
- Complete documentation
**Task 2 (15-35% context):**
- Still in peak zone
- Quality maintained
- Buffer feels safe
- No anxiety
**Task 3 (35-50% context):**
- Beginning to feel pressure
- Quality still good but managing it
- Natural stopping point
- Better to commit here
**Task 4+ (50%+ context):**
- DEGRADATION ZONE
- "I'll do this concisely" appears
- Quality crashes
- Should have split before this
**The principle:** Each task is independently committable. 2-3 focused changes per commit creates beautiful, surgical git history.
## Signals to Split Into Multiple Plans
### Always Split If:
**1. More than 3 tasks**
- Even if tasks seem small
- Each additional task increases degradation risk
- Split into logical groups of 2-3
**2. Multiple subsystems**
```
❌ Bad (1 plan):
- Database schema (3 files)
- API routes (5 files)
- UI components (8 files)
Total: 16 files, 1 plan → guaranteed degradation
✅ Good (3 plans):
- 01-01-PLAN.md: Database schema (3 files, 2 tasks)
- 01-02-PLAN.md: API routes (5 files, 3 tasks)
- 01-03-PLAN.md: UI components (8 files, 3 tasks)
Total: 16 files, 3 plans → consistent quality
```
**3. Any task with >5 file modifications**
- Large tasks burn context fast
- Split by file groups or logical units
- Better: 3 plans of 2 files each vs 1 plan of 6 files
**4. Checkpoint + implementation work**
- Checkpoints require user interaction (context preserved)
- Implementation after checkpoint should be separate plan
```
✅ Good split:
- 02-01-PLAN.md: Setup (checkpoint: decision on auth provider)
- 02-02-PLAN.md: Implement chosen auth solution
```
**5. Research + implementation**
- Research produces FINDINGS.md (separate plan)
- Implementation consumes FINDINGS.md (separate plan)
- Clear boundary, clean handoff
### Consider Splitting If:
**1. Estimated >5 files modified total**
- Context from reading existing code
- Context from diffs
- Context from responses
- Adds up faster than expected
**2. Complex domains (auth, payments, data modeling)**
- These require careful thinking
- Burns more context per task than simple CRUD
- Split more aggressively
**3. Any uncertainty about approach**
- "Figure out X" phase separate from "implement X" phase
- Don't mix exploration and implementation
**4. Natural semantic boundaries**
- Setup → Core → Features
- Backend → Frontend
- Configuration → Implementation → Testing
## Splitting Strategies
### By Subsystem
**Phase:** "Authentication System"
**Split:**
```
- 03-01-PLAN.md: Database models (User, Session tables + relations)
- 03-02-PLAN.md: Auth API (register, login, logout endpoints)
- 03-03-PLAN.md: Protected routes (middleware, JWT validation)
- 03-04-PLAN.md: UI components (login form, registration form)
```
Each plan: 2-3 tasks, single subsystem, clean commits.
### By Dependency
**Phase:** "Payment Integration"
**Split:**
```
- 04-01-PLAN.md: Stripe setup (webhook endpoints via API, env vars, test mode)
- 04-02-PLAN.md: Subscription logic (plans, checkout, customer portal)
- 04-03-PLAN.md: Frontend integration (pricing page, payment flow)
```
Later plans depend on earlier completion. Sequential execution, fresh context each time.
### By Complexity
**Phase:** "Dashboard Buildout"
**Split:**
```
- 05-01-PLAN.md: Layout shell (simple: sidebar, header, routing)
- 05-02-PLAN.md: Data fetching (moderate: TanStack Query setup, API integration)
- 05-03-PLAN.md: Data visualization (complex: charts, tables, real-time updates)
```
Complex work gets its own plan with full context budget.
### By Verification Points
**Phase:** "Deployment Pipeline"
**Split:**
```
- 06-01-PLAN.md: Vercel setup (deploy via CLI, configure domains)
→ Ends with checkpoint:human-verify "check xyz.vercel.app loads"
- 06-02-PLAN.md: Environment config (secrets via CLI, env vars)
→ Autonomous (no checkpoints) → subagent execution
- 06-03-PLAN.md: CI/CD (GitHub Actions, preview deploys)
→ Ends with checkpoint:human-verify "check PR preview works"
```
Verification checkpoints create natural boundaries. Autonomous plans between checkpoints execute via subagent with fresh context.
## Autonomous vs Interactive Plans
**Critical optimization:** Plans without checkpoints don't need main context.
### Autonomous Plans (No Checkpoints)
- Contains only `type="auto"` tasks
- No user interaction needed
- **Execute via subagent with fresh 200k context**
- Impossible to degrade (always starts at 0%)
- Creates SUMMARY, commits, reports back
- Can run in parallel (multiple subagents)
### Interactive Plans (Has Checkpoints)
- Contains `checkpoint:human-verify` or `checkpoint:decision` tasks
- Requires user interaction
- Must execute in main context
- Still target 50% context (2-3 tasks)
**Planning guidance:** If splitting a phase, try to:
- Group autonomous work together (→ subagent)
- Separate interactive work (→ main context)
- Maximize autonomous plans (more fresh contexts)
Example:
```
Phase: Feature X
- 07-01-PLAN.md: Backend (autonomous) → subagent
- 07-02-PLAN.md: Frontend (autonomous) → subagent
- 07-03-PLAN.md: Integration test (has checkpoint:human-verify) → main context
```
Two fresh contexts, one interactive verification. Perfect.
## Anti-Patterns
### ❌ The "Comprehensive Plan" Anti-Pattern
```
Plan: "Complete Authentication System"
Tasks:
1. Database models
2. Migration files
3. Auth API endpoints
4. JWT utilities
5. Protected route middleware
6. Password hashing
7. Login form component
8. Registration form component
Result: 8 tasks, 80%+ context, degradation at task 4-5
```
**Why this fails:**
- Task 1-3: Good quality
- Task 4-5: "I'll do these concisely" = degradation begins
- Task 6-8: Rushed, minimal, poor quality
### ✅ The "Atomic Plan" Pattern
```
Split into 4 plans:
Plan 1: "Auth Database Models" (2 tasks)
- Database schema (User, Session)
- Migration files
Plan 2: "Auth API Core" (3 tasks)
- Register endpoint
- Login endpoint
- JWT utilities
Plan 3: "Auth API Protection" (2 tasks)
- Protected route middleware
- Logout endpoint
Plan 4: "Auth UI Components" (2 tasks)
- Login form
- Registration form
```
**Why this succeeds:**
- Each plan: 2-3 tasks, 30-40% context
- All tasks: Peak quality throughout
- Git history: 4 focused commits
- Easy to verify each piece
- Rollback is surgical
### ❌ The "Efficiency Trap" Anti-Pattern
```
Thinking: "These tasks are small, let's do 6 to be efficient"
Result: Task 1-2 are good, task 3-4 begin degrading, task 5-6 are rushed
```
**Why this fails:** You're optimizing for fewer plans, not quality. The "efficiency" is false - poor quality requires more rework.
### ✅ The "Quality First" Pattern
```
Thinking: "These tasks are small, but let's do 2-3 to guarantee quality"
Result: All tasks peak quality, clean commits, no rework needed
```
**Why this succeeds:** You optimize for quality, which is true efficiency. No rework = faster overall.
## Estimating Context Usage
**Rough heuristics for plan size:**
### File Counts
- 0-3 files modified: Small task (~10-15% context)
- 4-6 files modified: Medium task (~20-30% context)
- 7+ files modified: Large task (~40%+ context) - split this
### Complexity
- Simple CRUD: ~15% per task
- Business logic: ~25% per task
- Complex algorithms: ~40% per task
- Domain modeling: ~35% per task
### 2-Task Plan (Safe)
- 2 simple tasks: ~30% total ✅ Plenty of room
- 2 medium tasks: ~50% total ✅ At target
- 2 complex tasks: ~80% total ❌ Too tight, split
### 3-Task Plan (Risky)
- 3 simple tasks: ~45% total ✅ Good
- 3 medium tasks: ~75% total ⚠️ Pushing it
- 3 complex tasks: 120% total ❌ Impossible, split
**Conservative principle:** When in doubt, split. Better to have an extra plan than degraded quality.
## The Atomic Commit Philosophy
**What we're optimizing for:** Beautiful git history where each commit is:
- Focused (2-3 related changes)
- Complete (fully implemented, tested)
- Documented (clear commit message)
- Reviewable (small enough to understand)
- Revertable (surgical rollback possible)
**Bad git history (large plans):**
```
feat(auth): Complete authentication system
- Added 16 files
- Modified 8 files
- 1200 lines changed
- Contains: models, API, UI, middleware, utilities
```
Impossible to review, hard to understand, can't revert without losing everything.
**Good git history (atomic plans):**
```
feat(auth-01): Add User and Session database models
- Added schema files
- Added migration
- 45 lines changed
feat(auth-02): Implement register and login API endpoints
- Added /api/auth/register
- Added /api/auth/login
- Added JWT utilities
- 120 lines changed
feat(auth-03): Add protected route middleware
- Added middleware/auth.ts
- Added tests
- 60 lines changed
feat(auth-04): Build login and registration forms
- Added LoginForm component
- Added RegisterForm component
- 90 lines changed
```
Each commit tells a story. Each is reviewable. Each is revertable. This is craftsmanship.
## Quality Assurance Through Scope Control
**The guarantee:** When you follow the 2-3 task rule with 50% context target:
1. **Consistency:** First task has same quality as last task
2. **Thoroughness:** No "I'll complete X concisely" degradation
3. **Documentation:** Full context budget for comments/tests
4. **Error handling:** Space for proper validation and edge cases
5. **Testing:** Room for comprehensive test coverage
**The cost:** More plans to manage.
**The benefit:** Consistent excellence. No rework. Clean history. Maintainable code.
**The trade-off is worth it.**
## Summary
**Old way (3-6 tasks, 80% target):**
- Tasks 1-2: Good
- Tasks 3-4: Degrading
- Tasks 5-6: Poor
- Git: Large, unreviewable commits
- Quality: Inconsistent
**New way (2-3 tasks, 50% target):**
- All tasks: Peak quality
- Git: Atomic, surgical commits
- Quality: Consistent excellence
- Autonomous plans: Subagent execution (fresh context)
**The principle:** Aggressive atomicity. More plans, smaller scope, consistent quality.
**The rule:** If in doubt, split. Quality over consolidation. Always.

View File

@@ -0,0 +1,72 @@
# User Gates Reference
User gates prevent Claude from charging ahead at critical decision points.
## Question Types
### AskUserQuestion Tool
Use for **structured choices** (2-4 options):
- Selecting from distinct approaches
- Domain/type selection
- When user needs to see options to decide
Examples:
- "What type of project?" (macos-app / iphone-app / web-app / other)
- "Research confidence is low. How to proceed?" (dig deeper / proceed anyway / pause)
- "Multiple valid approaches exist:" (Option A / Option B / Option C)
### Inline Questions
Use for **simple confirmations**:
- Yes/no decisions
- "Does this look right?"
- "Ready to proceed?"
Examples:
- "Here's the task breakdown: [list]. Does this look right?"
- "Proceed with this approach?"
- "I'll initialize a git repo. OK?"
## Decision Gate Loop
After gathering context, ALWAYS offer:
```
Ready to [action], or would you like me to ask more questions?
1. Proceed - I have enough context
2. Ask more questions - There are details to clarify
3. Let me add context - I want to provide additional information
```
Loop continues until user selects "Proceed".
## Mandatory Gate Points
| Location | Gate Type | Trigger |
|----------|-----------|---------|
| plan-phase | Inline | Confirm task breakdown |
| plan-phase | AskUserQuestion | Multiple valid approaches |
| plan-phase | AskUserQuestion | Decision gate before writing |
| research-phase | AskUserQuestion | Low confidence findings |
| research-phase | Inline | Open questions acknowledgment |
| execute-phase | Inline | Verification failure |
| execute-phase | Inline | Issues review before proceeding |
| execute-phase | AskUserQuestion | Previous phase had issues |
| create-brief | AskUserQuestion | Decision gate before writing |
| create-roadmap | Inline | Confirm phase breakdown |
| create-roadmap | AskUserQuestion | Decision gate before writing |
| handoff | Inline | Handoff acknowledgment |
## Good vs Bad Gating
### Good
- Gate before writing artifacts (not after)
- Gate when genuinely ambiguous
- Gate when issues affect next steps
- Quick inline for simple confirmations
### Bad
- Asking obvious choices ("Should I save the file?")
- Multiple gates for same decision
- AskUserQuestion for yes/no
- Gates after the fact