--- description: Batch process multiple repos with StackShift analysis running in parallel. Analyzes 5 repos at a time, tracks progress, and aggregates results. Perfect for analyzing monorepo services or multiple related projects. --- # StackShift Batch Processing **Analyze multiple repositories in parallel** Run StackShift on 10, 50, or 100+ repos simultaneously with progress tracking and result aggregation. --- ## Quick Start **Analyze all services in a monorepo:** ```bash # From monorepo services directory cd ~/git/my-monorepo/services # Let me analyze all service-* directories in batches of 5 ``` I'll: 1. ✅ Find all service-* directories 2. ✅ Filter to valid repos (has package.json) 3. ✅ Process in batches of 5 (configurable) 4. ✅ Track progress in `batch-results/` 5. ✅ Aggregate results when complete --- ## What I'll Do ### Step 1: Discovery ```bash echo "=== Discovering repositories in ~/git/my-monorepo/services ===" # Find all service directories find ~/git/my-monorepo/services -maxdepth 1 -type d -name "service-*" | sort > /tmp/services-to-analyze.txt # Count SERVICE_COUNT=$(wc -l < /tmp/services-to-analyze.txt) echo "Found $SERVICE_COUNT services" # Show first 10 head -10 /tmp/services-to-analyze.txt ``` ### Step 2: Batch Configuration **IMPORTANT:** I'll ask ALL configuration questions upfront, ONCE. Your answers will be saved to a batch session file and automatically applied to ALL repos in all batches. You won't need to answer these questions again during this batch run! I'll ask you: **Question 1: How many to process?** - A) All services ($WIDGET_COUNT total) - B) First 10 (test run) - C) First 25 (small batch) - D) Custom number **Question 2: Parallel batch size?** - A) 3 at a time (conservative) - B) 5 at a time (recommended) - C) 10 at a time (aggressive, may slow down) - D) Sequential (1 at a time, safest) **Question 3: What route?** - A) Auto-detect (auto-detect (monorepo for service-*), ask for others) - B) Force monorepo-service for all - C) Force greenfield for all - D) Force brownfield for all **Question 4: Brownfield mode?** _(If route = brownfield)_ - A) Standard - Just create specs for current state - B) Upgrade - Create specs + upgrade all dependencies **Question 5: Transmission?** - A) Manual - Review each gear before proceeding - B) Cruise Control - Shift through all gears automatically **Question 6: Clarifications strategy?** _(If transmission = cruise control)_ - A) Defer - Mark them, continue around them - B) Prompt - Stop and ask questions - C) Skip - Only implement fully-specified features **Question 7: Implementation scope?** _(If transmission = cruise control)_ - A) None - Stop after specs are ready - B) P0 only - Critical features only - C) P0 + P1 - Critical + high-value features - D) All - Every feature **Question 8: Spec output location?** _(If route = greenfield)_ - A) Current repository (default) - B) New application repository - C) Separate documentation repository - D) Custom location **Question 9: Target stack?** _(If greenfield + implementation scope != none)_ - Examples: - Next.js 15 + TypeScript + Prisma + PostgreSQL - Python/FastAPI + SQLAlchemy + PostgreSQL - Your choice: [specify] **Question 10: Build location?** _(If greenfield + implementation scope != none)_ - A) Subfolder (recommended) - e.g., greenfield/, v2/ - B) Separate directory - e.g., ~/git/my-new-app - C) Replace in place (destructive) **Then I'll:** 1. ✅ Save all answers to `.stackshift-batch-session.json` (in current directory) 2. ✅ Show batch session summary 3. ✅ Start processing batches with auto-applied configuration 4. ✅ Clear batch session when complete (or keep if you want) **Why directory-scoped?** - Multiple batch sessions can run simultaneously in different directories - Each batch (monorepo services, etc.) has its own isolated configuration - No conflicts between parallel batch runs - Session file is co-located with the repos being processed ### Step 3: Create Batch Session & Spawn Agents **First: Create batch session with all answers** ```bash # After collecting all configuration answers, create batch session # Stored in current directory for isolation from other batch runs cat > .stackshift-batch-session.json < ({ task: `Analyze ${service} service with StackShift`, description: `StackShift analysis: ${service}`, subagent_type: 'general-purpose', prompt: ` cd ~/git/my-monorepo/services/${service} IMPORTANT: Batch session is active (will be auto-detected by walking up to parent) Parent directory has: .stackshift-batch-session.json All configuration will be auto-applied. DO NOT ask configuration questions. Run StackShift Gear 1: Analyze - Will auto-detect route (batch session: ${ROUTE}) - Will use spec output location: ${SPEC_OUTPUT} - Analyze service + shared packages - Generate analysis-report.md Then run Gear 2: Reverse Engineer - Extract business logic - Document all shared package dependencies - Create comprehensive documentation Then run Gear 3: Create Specifications - Generate .specify/ structure - Create constitution - Generate feature specs Save all results to: ${SPEC_OUTPUT}/${service}/ When complete, create completion marker: ${SPEC_OUTPUT}/${service}/.complete ` })); // Launch all 5 in parallel agents.forEach(agent => spawnAgent(agent)); ``` ### Step 4: Progress Tracking ```bash # Create tracking directory mkdir -p ~/git/stackshift-batch-results # Monitor progress while true; do COMPLETE=$(find ~/git/stackshift-batch-results -name ".complete" | wc -l) echo "Completed: $COMPLETE / $WIDGET_COUNT" # Check if batch done if [ $COMPLETE -ge 5 ]; then echo "✅ Batch 1 complete" break fi sleep 30 done # Start next batch... ``` ### Step 5: Result Aggregation ```bash # After all batches complete echo "=== Aggregating Results ===" # Create master report cat > ~/git/stackshift-batch-results/BATCH_SUMMARY.md </dev/null | wc -l)" echo "**Modules analyzed:** $(cat ~/git/stackshift-batch-results/$service_name/.stackshift-state.json 2>/dev/null | jq -r '.metadata.modulesAnalyzed // 0')" echo "" fi done) ## Next Steps All specifications are ready for review: - Review specs in each service's batch-results directory - Merge specs to actual repos if satisfied - Run Gears 4-6 as needed EOF cat ~/git/stackshift-batch-results/BATCH_SUMMARY.md ``` --- ## Result Structure ``` ~/git/stackshift-batch-results/ ├── BATCH_SUMMARY.md # Master summary ├── batch-progress.json # Real-time tracking │ ├── service-user-api/ │ ├── .complete # Marker file │ ├── .stackshift-state.json # State │ ├── analysis-report.md # Gear 1 output │ ├── docs/reverse-engineering/ # Gear 2 output │ │ ├── functional-specification.md │ │ ├── service-logic.md │ │ ├── modules/ │ │ │ ├── shared-pricing-utils.md │ │ │ └── shared-discount-utils.md │ │ └── [7 more docs] │ └── .specify/ # Gear 3 output │ └── memory/ │ ├── constitution.md │ └── specifications/ │ ├── pricing-display.md │ ├── incentive-logic.md │ └── [more specs] │ ├── service-inventory/ │ └── [same structure] │ └── [88 more services...] ``` --- ## Monitoring Progress **Real-time status:** ```bash # I'll show you periodic updates echo "=== Batch Progress ===" echo "Batch 1 (5 services): 3/5 complete" echo " ✅ service-user-api - Complete (12 min)" echo " ✅ service-inventory - Complete (8 min)" echo " ✅ service-contact - Complete (15 min)" echo " 🔄 service-search - Running (7 min elapsed)" echo " ⏳ service-pricing - Queued" echo "" echo "Estimated time remaining: 25 minutes" ``` --- ## Error Handling **If a service fails:** ```bash # Retry failed services failed_services=(service-search service-pricing) for service in "${failed_services[@]}"; do echo "Retrying: $service" # Spawn new agent for retry done ``` **Common failures:** - Missing package.json - Tests failing (can continue anyway) - Module source not found (prompt for location) --- ## Use Cases **1. Entire monorepo migration:** ``` Analyze all 90+ ws-* services for migration planning ↓ Result: Complete business logic extracted from entire platform ↓ Use specs to plan Next.js migration strategy ``` **2. Selective analysis:** ``` Analyze just the 10 high-priority services first ↓ Review results ↓ Then batch process remaining 80 ``` **3. Module analysis:** ``` cd ~/git/my-monorepo/services Analyze all shared packages (not services) ↓ Result: Shared module documentation ↓ Understand dependencies before service migration ``` --- ## Configuration Options I'll ask you to configure: - **Repository list:** All in folder, or custom list? - **Batch size:** How many parallel (3/5/10)? - **Gears to run:** 1-3 only or full 1-6? - **Route:** Auto-detect or force specific route? - **Output location:** Central results dir or per-repo? - **Error handling:** Stop on failure or continue? --- ## Comparison with thoth-cli **thoth-cli (Upgrades):** - Orchestrates 90+ service upgrades - 3 phases: coverage → discovery → implementation - Tracks in .upgrade-state.json - Parallel processing (2-5 at a time) **StackShift Batch (Analysis):** - Orchestrates 90+ service analyses - 6 gears: analyze → reverse-engineer → create-specs → gap → clarify → implement - Tracks in .stackshift-state.json - Parallel processing (3-10 at a time) - Can output to central location --- ## Example Session ``` You: "I want to analyze all Osiris services in ~/git/my-monorepo/services" Me: "Found 92 services! Let me configure batch processing..." [Asks questions via AskUserQuestion] - Process all 92? ✅ - Batch size: 5 - Gears: 1-3 (just analyze and spec, no implementation) - Output: Central results directory Me: "Starting batch analysis..." Batch 1 (5 services): service-user-api, service-inventory, service-contact, ws-inventory, service-pricing [Spawns 5 parallel agents using Task tool] [15 minutes later] "Batch 1 complete! Starting batch 2..." [3 hours later] "✅ All 92 services analyzed! Results: ~/git/stackshift-batch-results/ - 92 analysis reports - 92 sets of specifications - 890 total specs extracted - Multiple shared packages documented Next: Review specs and begin migration planning" ``` --- ## Managing Batch Sessions ### View Current Batch Session ```bash # Check if batch session exists in current directory and view configuration if [ -f .stackshift-batch-session.json ]; then echo "📦 Active Batch Session in $(pwd)" cat .stackshift-batch-session.json | jq '.' else echo "No active batch session in current directory" fi ``` ### View All Batch Sessions ```bash # Find all active batch sessions echo "🔍 Finding all active batch sessions..." find ~/git -name ".stackshift-batch-session.json" -type f 2>/dev/null | while read session; do echo "" echo "📦 $(dirname $session)" cat "$session" | jq -r '" Route: \(.answers.route) | Repos: \(.processedRepos | length)/\(.totalRepos)"' done ``` ### Clear Batch Session **After batch completes:** ```bash # I'll ask you: # "Batch processing complete! Clear batch session? (Y/n)" # If yes: rm .stackshift-batch-session.json echo "✅ Batch session cleared" # If no: echo "✅ Batch session kept (will be used for next batch run in this directory)" ``` **Manual clear (current directory):** ```bash # Clear batch session in current directory rm .stackshift-batch-session.json ``` **Manual clear (specific directory):** ```bash # Clear batch session in specific directory rm ~/git/my-monorepo/services/.stackshift-batch-session.json ``` **Why keep batch session?** - Run another batch with same configuration - Process more repos later in same directory - Continue interrupted batch - Consistent settings for related batches **Why clear batch session?** - Done with current migration - Want different configuration for next batch - Starting fresh analysis - Free up directory for different batch type --- ## Batch Session Benefits **Without batch session (old way):** ``` Batch 1: Answer 10 questions ⏱️ 2 min ↓ Process 3 repos (15 min) Batch 2: Answer 10 questions AGAIN ⏱️ 2 min ↓ Process 3 repos (15 min) Batch 3: Answer 10 questions AGAIN ⏱️ 2 min ↓ Process 3 repos (15 min) Total: 30 questions answered, 6 min wasted ``` **With batch session (new way):** ``` Setup: Answer 10 questions ONCE ⏱️ 2 min ↓ Batch 1: Process 3 repos (15 min) ↓ Batch 2: Process 3 repos (15 min) ↓ Batch 3: Process 3 repos (15 min) Total: 10 questions answered, 0 min wasted Saved: 4 minutes per 9 repos processed ``` **For 90 repos in batches of 3:** - Old way: 300 questions answered (60 min of clicking) - New way: 10 questions answered (2 min of clicking) - **Time saved: 58 minutes!** ⚡ --- **This batch processing system is perfect for:** - Monorepo migration (90+ services) - Multi-repo monorepo analysis - Department-wide code audits - Portfolio modernization projects