Files
gh-jschulte-claude-plugins-…/commands/batch.md
2025-11-30 08:29:31 +08:00

556 lines
15 KiB
Markdown

---
description: Batch process multiple repos with StackShift analysis running in parallel. Analyzes 5 repos at a time, tracks progress, and aggregates results. Perfect for analyzing monorepo services or multiple related projects.
---
# StackShift Batch Processing
**Analyze multiple repositories in parallel**
Run StackShift on 10, 50, or 100+ repos simultaneously with progress tracking and result aggregation.
---
## Quick Start
**Analyze all services in a monorepo:**
```bash
# From monorepo services directory
cd ~/git/my-monorepo/services
# Let me analyze all service-* directories in batches of 5
```
I'll:
1. ✅ Find all service-* directories
2. ✅ Filter to valid repos (has package.json)
3. ✅ Process in batches of 5 (configurable)
4. ✅ Track progress in `batch-results/`
5. ✅ Aggregate results when complete
---
## What I'll Do
### Step 1: Discovery
```bash
echo "=== Discovering repositories in ~/git/my-monorepo/services ==="
# Find all service directories
find ~/git/my-monorepo/services -maxdepth 1 -type d -name "service-*" | sort > /tmp/services-to-analyze.txt
# Count
SERVICE_COUNT=$(wc -l < /tmp/services-to-analyze.txt)
echo "Found $SERVICE_COUNT services"
# Show first 10
head -10 /tmp/services-to-analyze.txt
```
### Step 2: Batch Configuration
**IMPORTANT:** I'll ask ALL configuration questions upfront, ONCE. Your answers will be saved to a batch session file and automatically applied to ALL repos in all batches. You won't need to answer these questions again during this batch run!
I'll ask you:
**Question 1: How many to process?**
- A) All services ($WIDGET_COUNT total)
- B) First 10 (test run)
- C) First 25 (small batch)
- D) Custom number
**Question 2: Parallel batch size?**
- A) 3 at a time (conservative)
- B) 5 at a time (recommended)
- C) 10 at a time (aggressive, may slow down)
- D) Sequential (1 at a time, safest)
**Question 3: What route?**
- A) Auto-detect (auto-detect (monorepo for service-*), ask for others)
- B) Force monorepo-service for all
- C) Force greenfield for all
- D) Force brownfield for all
**Question 4: Brownfield mode?** _(If route = brownfield)_
- A) Standard - Just create specs for current state
- B) Upgrade - Create specs + upgrade all dependencies
**Question 5: Transmission?**
- A) Manual - Review each gear before proceeding
- B) Cruise Control - Shift through all gears automatically
**Question 6: Clarifications strategy?** _(If transmission = cruise control)_
- A) Defer - Mark them, continue around them
- B) Prompt - Stop and ask questions
- C) Skip - Only implement fully-specified features
**Question 7: Implementation scope?** _(If transmission = cruise control)_
- A) None - Stop after specs are ready
- B) P0 only - Critical features only
- C) P0 + P1 - Critical + high-value features
- D) All - Every feature
**Question 8: Spec output location?** _(If route = greenfield)_
- A) Current repository (default)
- B) New application repository
- C) Separate documentation repository
- D) Custom location
**Question 9: Target stack?** _(If greenfield + implementation scope != none)_
- Examples:
- Next.js 15 + TypeScript + Prisma + PostgreSQL
- Python/FastAPI + SQLAlchemy + PostgreSQL
- Your choice: [specify]
**Question 10: Build location?** _(If greenfield + implementation scope != none)_
- A) Subfolder (recommended) - e.g., greenfield/, v2/
- B) Separate directory - e.g., ~/git/my-new-app
- C) Replace in place (destructive)
**Then I'll:**
1. ✅ Save all answers to `.stackshift-batch-session.json` (in current directory)
2. ✅ Show batch session summary
3. ✅ Start processing batches with auto-applied configuration
4. ✅ Clear batch session when complete (or keep if you want)
**Why directory-scoped?**
- Multiple batch sessions can run simultaneously in different directories
- Each batch (monorepo services, etc.) has its own isolated configuration
- No conflicts between parallel batch runs
- Session file is co-located with the repos being processed
### Step 3: Create Batch Session & Spawn Agents
**First: Create batch session with all answers**
```bash
# After collecting all configuration answers, create batch session
# Stored in current directory for isolation from other batch runs
cat > .stackshift-batch-session.json <<EOF
{
"sessionId": "batch-$(date +%s)",
"startedAt": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"batchRootDirectory": "$(pwd)",
"totalRepos": ${TOTAL_REPOS},
"batchSize": ${BATCH_SIZE},
"answers": {
"route": "${ROUTE}",
"transmission": "${TRANSMISSION}",
"spec_output_location": "${SPEC_OUTPUT}",
"target_stack": "${TARGET_STACK}",
"build_location": "${BUILD_LOCATION}",
"clarifications_strategy": "${CLARIFICATIONS}",
"implementation_scope": "${SCOPE}"
},
"processedRepos": []
}
EOF
echo "✅ Batch session created: $(pwd)/.stackshift-batch-session.json"
echo "📦 Configuration will be auto-applied to all ${TOTAL_REPOS} repos"
```
**Then: Spawn parallel agents (they'll auto-use batch session)**
```typescript
// Use Task tool to spawn parallel agents
const batch1 = [
'service-user-api',
'service-inventory',
'service-contact',
'service-search',
'service-pricing'
];
// Spawn 5 agents in parallel
const agents = batch1.map(service => ({
task: `Analyze ${service} service with StackShift`,
description: `StackShift analysis: ${service}`,
subagent_type: 'general-purpose',
prompt: `
cd ~/git/my-monorepo/services/${service}
IMPORTANT: Batch session is active (will be auto-detected by walking up to parent)
Parent directory has: .stackshift-batch-session.json
All configuration will be auto-applied. DO NOT ask configuration questions.
Run StackShift Gear 1: Analyze
- Will auto-detect route (batch session: ${ROUTE})
- Will use spec output location: ${SPEC_OUTPUT}
- Analyze service + shared packages
- Generate analysis-report.md
Then run Gear 2: Reverse Engineer
- Extract business logic
- Document all shared package dependencies
- Create comprehensive documentation
Then run Gear 3: Create Specifications
- Generate .specify/ structure
- Create constitution
- Generate feature specs
Save all results to:
${SPEC_OUTPUT}/${service}/
When complete, create completion marker:
${SPEC_OUTPUT}/${service}/.complete
`
}));
// Launch all 5 in parallel
agents.forEach(agent => spawnAgent(agent));
```
### Step 4: Progress Tracking
```bash
# Create tracking directory
mkdir -p ~/git/stackshift-batch-results
# Monitor progress
while true; do
COMPLETE=$(find ~/git/stackshift-batch-results -name ".complete" | wc -l)
echo "Completed: $COMPLETE / $WIDGET_COUNT"
# Check if batch done
if [ $COMPLETE -ge 5 ]; then
echo "✅ Batch 1 complete"
break
fi
sleep 30
done
# Start next batch...
```
### Step 5: Result Aggregation
```bash
# After all batches complete
echo "=== Aggregating Results ==="
# Create master report
cat > ~/git/stackshift-batch-results/BATCH_SUMMARY.md <<EOF
# StackShift Batch Analysis Results
**Date:** $(date)
**Widgets Analyzed:** $WIDGET_COUNT
**Batches:** $(($WIDGET_COUNT / 5))
**Total Time:** [calculated]
## Completion Status
$(for service in $(cat /tmp/services-to-analyze.txt); do
service_name=$(basename $service)
if [ -f ~/git/stackshift-batch-results/$service_name/.complete ]; then
echo "- ✅ $service_name - Complete"
else
echo "- ❌ $service_name - Failed or incomplete"
fi
done)
## Results by Widget
$(for service in $(cat /tmp/services-to-analyze.txt); do
service_name=$(basename $service)
if [ -f ~/git/stackshift-batch-results/$service_name/.complete ]; then
echo "### $service_name"
echo ""
echo "**Specs created:** $(find ~/git/stackshift-batch-results/$service_name/.specify/memory/specifications -name "*.md" 2>/dev/null | wc -l)"
echo "**Modules analyzed:** $(cat ~/git/stackshift-batch-results/$service_name/.stackshift-state.json 2>/dev/null | jq -r '.metadata.modulesAnalyzed // 0')"
echo ""
fi
done)
## Next Steps
All specifications are ready for review:
- Review specs in each service's batch-results directory
- Merge specs to actual repos if satisfied
- Run Gears 4-6 as needed
EOF
cat ~/git/stackshift-batch-results/BATCH_SUMMARY.md
```
---
## Result Structure
```
~/git/stackshift-batch-results/
├── BATCH_SUMMARY.md # Master summary
├── batch-progress.json # Real-time tracking
├── service-user-api/
│ ├── .complete # Marker file
│ ├── .stackshift-state.json # State
│ ├── analysis-report.md # Gear 1 output
│ ├── docs/reverse-engineering/ # Gear 2 output
│ │ ├── functional-specification.md
│ │ ├── service-logic.md
│ │ ├── modules/
│ │ │ ├── shared-pricing-utils.md
│ │ │ └── shared-discount-utils.md
│ │ └── [7 more docs]
│ └── .specify/ # Gear 3 output
│ └── memory/
│ ├── constitution.md
│ └── specifications/
│ ├── pricing-display.md
│ ├── incentive-logic.md
│ └── [more specs]
├── service-inventory/
│ └── [same structure]
└── [88 more services...]
```
---
## Monitoring Progress
**Real-time status:**
```bash
# I'll show you periodic updates
echo "=== Batch Progress ==="
echo "Batch 1 (5 services): 3/5 complete"
echo " ✅ service-user-api - Complete (12 min)"
echo " ✅ service-inventory - Complete (8 min)"
echo " ✅ service-contact - Complete (15 min)"
echo " 🔄 service-search - Running (7 min elapsed)"
echo " ⏳ service-pricing - Queued"
echo ""
echo "Estimated time remaining: 25 minutes"
```
---
## Error Handling
**If a service fails:**
```bash
# Retry failed services
failed_services=(service-search service-pricing)
for service in "${failed_services[@]}"; do
echo "Retrying: $service"
# Spawn new agent for retry
done
```
**Common failures:**
- Missing package.json
- Tests failing (can continue anyway)
- Module source not found (prompt for location)
---
## Use Cases
**1. Entire monorepo migration:**
```
Analyze all 90+ ws-* services for migration planning
Result: Complete business logic extracted from entire platform
Use specs to plan Next.js migration strategy
```
**2. Selective analysis:**
```
Analyze just the 10 high-priority services first
Review results
Then batch process remaining 80
```
**3. Module analysis:**
```
cd ~/git/my-monorepo/services
Analyze all shared packages (not services)
Result: Shared module documentation
Understand dependencies before service migration
```
---
## Configuration Options
I'll ask you to configure:
- **Repository list:** All in folder, or custom list?
- **Batch size:** How many parallel (3/5/10)?
- **Gears to run:** 1-3 only or full 1-6?
- **Route:** Auto-detect or force specific route?
- **Output location:** Central results dir or per-repo?
- **Error handling:** Stop on failure or continue?
---
## Comparison with thoth-cli
**thoth-cli (Upgrades):**
- Orchestrates 90+ service upgrades
- 3 phases: coverage → discovery → implementation
- Tracks in .upgrade-state.json
- Parallel processing (2-5 at a time)
**StackShift Batch (Analysis):**
- Orchestrates 90+ service analyses
- 6 gears: analyze → reverse-engineer → create-specs → gap → clarify → implement
- Tracks in .stackshift-state.json
- Parallel processing (3-10 at a time)
- Can output to central location
---
## Example Session
```
You: "I want to analyze all Osiris services in ~/git/my-monorepo/services"
Me: "Found 92 services! Let me configure batch processing..."
[Asks questions via AskUserQuestion]
- Process all 92? ✅
- Batch size: 5
- Gears: 1-3 (just analyze and spec, no implementation)
- Output: Central results directory
Me: "Starting batch analysis..."
Batch 1 (5 services): service-user-api, service-inventory, service-contact, ws-inventory, service-pricing
[Spawns 5 parallel agents using Task tool]
[15 minutes later]
"Batch 1 complete! Starting batch 2..."
[3 hours later]
"✅ All 92 services analyzed!
Results: ~/git/stackshift-batch-results/
- 92 analysis reports
- 92 sets of specifications
- 890 total specs extracted
- Multiple shared packages documented
Next: Review specs and begin migration planning"
```
---
## Managing Batch Sessions
### View Current Batch Session
```bash
# Check if batch session exists in current directory and view configuration
if [ -f .stackshift-batch-session.json ]; then
echo "📦 Active Batch Session in $(pwd)"
cat .stackshift-batch-session.json | jq '.'
else
echo "No active batch session in current directory"
fi
```
### View All Batch Sessions
```bash
# Find all active batch sessions
echo "🔍 Finding all active batch sessions..."
find ~/git -name ".stackshift-batch-session.json" -type f 2>/dev/null | while read session; do
echo ""
echo "📦 $(dirname $session)"
cat "$session" | jq -r '" Route: \(.answers.route) | Repos: \(.processedRepos | length)/\(.totalRepos)"'
done
```
### Clear Batch Session
**After batch completes:**
```bash
# I'll ask you:
# "Batch processing complete! Clear batch session? (Y/n)"
# If yes:
rm .stackshift-batch-session.json
echo "✅ Batch session cleared"
# If no:
echo "✅ Batch session kept (will be used for next batch run in this directory)"
```
**Manual clear (current directory):**
```bash
# Clear batch session in current directory
rm .stackshift-batch-session.json
```
**Manual clear (specific directory):**
```bash
# Clear batch session in specific directory
rm ~/git/my-monorepo/services/.stackshift-batch-session.json
```
**Why keep batch session?**
- Run another batch with same configuration
- Process more repos later in same directory
- Continue interrupted batch
- Consistent settings for related batches
**Why clear batch session?**
- Done with current migration
- Want different configuration for next batch
- Starting fresh analysis
- Free up directory for different batch type
---
## Batch Session Benefits
**Without batch session (old way):**
```
Batch 1: Answer 10 questions ⏱️ 2 min
↓ Process 3 repos (15 min)
Batch 2: Answer 10 questions AGAIN ⏱️ 2 min
↓ Process 3 repos (15 min)
Batch 3: Answer 10 questions AGAIN ⏱️ 2 min
↓ Process 3 repos (15 min)
Total: 30 questions answered, 6 min wasted
```
**With batch session (new way):**
```
Setup: Answer 10 questions ONCE ⏱️ 2 min
↓ Batch 1: Process 3 repos (15 min)
↓ Batch 2: Process 3 repos (15 min)
↓ Batch 3: Process 3 repos (15 min)
Total: 10 questions answered, 0 min wasted
Saved: 4 minutes per 9 repos processed
```
**For 90 repos in batches of 3:**
- Old way: 300 questions answered (60 min of clicking)
- New way: 10 questions answered (2 min of clicking)
- **Time saved: 58 minutes!** ⚡
---
**This batch processing system is perfect for:**
- Monorepo migration (90+ services)
- Multi-repo monorepo analysis
- Department-wide code audits
- Portfolio modernization projects