556 lines
15 KiB
Markdown
556 lines
15 KiB
Markdown
---
|
|
description: Batch process multiple repos with StackShift analysis running in parallel. Analyzes 5 repos at a time, tracks progress, and aggregates results. Perfect for analyzing monorepo services or multiple related projects.
|
|
---
|
|
|
|
# StackShift Batch Processing
|
|
|
|
**Analyze multiple repositories in parallel**
|
|
|
|
Run StackShift on 10, 50, or 100+ repos simultaneously with progress tracking and result aggregation.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
**Analyze all services in a monorepo:**
|
|
|
|
```bash
|
|
# From monorepo services directory
|
|
cd ~/git/my-monorepo/services
|
|
|
|
# Let me analyze all service-* directories in batches of 5
|
|
```
|
|
|
|
I'll:
|
|
1. ✅ Find all service-* directories
|
|
2. ✅ Filter to valid repos (has package.json)
|
|
3. ✅ Process in batches of 5 (configurable)
|
|
4. ✅ Track progress in `batch-results/`
|
|
5. ✅ Aggregate results when complete
|
|
|
|
---
|
|
|
|
## What I'll Do
|
|
|
|
### Step 1: Discovery
|
|
|
|
```bash
|
|
echo "=== Discovering repositories in ~/git/my-monorepo/services ==="
|
|
|
|
# Find all service directories
|
|
find ~/git/my-monorepo/services -maxdepth 1 -type d -name "service-*" | sort > /tmp/services-to-analyze.txt
|
|
|
|
# Count
|
|
SERVICE_COUNT=$(wc -l < /tmp/services-to-analyze.txt)
|
|
echo "Found $SERVICE_COUNT services"
|
|
|
|
# Show first 10
|
|
head -10 /tmp/services-to-analyze.txt
|
|
```
|
|
|
|
### Step 2: Batch Configuration
|
|
|
|
**IMPORTANT:** I'll ask ALL configuration questions upfront, ONCE. Your answers will be saved to a batch session file and automatically applied to ALL repos in all batches. You won't need to answer these questions again during this batch run!
|
|
|
|
I'll ask you:
|
|
|
|
**Question 1: How many to process?**
|
|
- A) All services ($WIDGET_COUNT total)
|
|
- B) First 10 (test run)
|
|
- C) First 25 (small batch)
|
|
- D) Custom number
|
|
|
|
**Question 2: Parallel batch size?**
|
|
- A) 3 at a time (conservative)
|
|
- B) 5 at a time (recommended)
|
|
- C) 10 at a time (aggressive, may slow down)
|
|
- D) Sequential (1 at a time, safest)
|
|
|
|
**Question 3: What route?**
|
|
- A) Auto-detect (auto-detect (monorepo for service-*), ask for others)
|
|
- B) Force monorepo-service for all
|
|
- C) Force greenfield for all
|
|
- D) Force brownfield for all
|
|
|
|
**Question 4: Brownfield mode?** _(If route = brownfield)_
|
|
- A) Standard - Just create specs for current state
|
|
- B) Upgrade - Create specs + upgrade all dependencies
|
|
|
|
**Question 5: Transmission?**
|
|
- A) Manual - Review each gear before proceeding
|
|
- B) Cruise Control - Shift through all gears automatically
|
|
|
|
**Question 6: Clarifications strategy?** _(If transmission = cruise control)_
|
|
- A) Defer - Mark them, continue around them
|
|
- B) Prompt - Stop and ask questions
|
|
- C) Skip - Only implement fully-specified features
|
|
|
|
**Question 7: Implementation scope?** _(If transmission = cruise control)_
|
|
- A) None - Stop after specs are ready
|
|
- B) P0 only - Critical features only
|
|
- C) P0 + P1 - Critical + high-value features
|
|
- D) All - Every feature
|
|
|
|
**Question 8: Spec output location?** _(If route = greenfield)_
|
|
- A) Current repository (default)
|
|
- B) New application repository
|
|
- C) Separate documentation repository
|
|
- D) Custom location
|
|
|
|
**Question 9: Target stack?** _(If greenfield + implementation scope != none)_
|
|
- Examples:
|
|
- Next.js 15 + TypeScript + Prisma + PostgreSQL
|
|
- Python/FastAPI + SQLAlchemy + PostgreSQL
|
|
- Your choice: [specify]
|
|
|
|
**Question 10: Build location?** _(If greenfield + implementation scope != none)_
|
|
- A) Subfolder (recommended) - e.g., greenfield/, v2/
|
|
- B) Separate directory - e.g., ~/git/my-new-app
|
|
- C) Replace in place (destructive)
|
|
|
|
**Then I'll:**
|
|
1. ✅ Save all answers to `.stackshift-batch-session.json` (in current directory)
|
|
2. ✅ Show batch session summary
|
|
3. ✅ Start processing batches with auto-applied configuration
|
|
4. ✅ Clear batch session when complete (or keep if you want)
|
|
|
|
**Why directory-scoped?**
|
|
- Multiple batch sessions can run simultaneously in different directories
|
|
- Each batch (monorepo services, etc.) has its own isolated configuration
|
|
- No conflicts between parallel batch runs
|
|
- Session file is co-located with the repos being processed
|
|
|
|
### Step 3: Create Batch Session & Spawn Agents
|
|
|
|
**First: Create batch session with all answers**
|
|
|
|
```bash
|
|
# After collecting all configuration answers, create batch session
|
|
# Stored in current directory for isolation from other batch runs
|
|
cat > .stackshift-batch-session.json <<EOF
|
|
{
|
|
"sessionId": "batch-$(date +%s)",
|
|
"startedAt": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
|
|
"batchRootDirectory": "$(pwd)",
|
|
"totalRepos": ${TOTAL_REPOS},
|
|
"batchSize": ${BATCH_SIZE},
|
|
"answers": {
|
|
"route": "${ROUTE}",
|
|
"transmission": "${TRANSMISSION}",
|
|
"spec_output_location": "${SPEC_OUTPUT}",
|
|
"target_stack": "${TARGET_STACK}",
|
|
"build_location": "${BUILD_LOCATION}",
|
|
"clarifications_strategy": "${CLARIFICATIONS}",
|
|
"implementation_scope": "${SCOPE}"
|
|
},
|
|
"processedRepos": []
|
|
}
|
|
EOF
|
|
|
|
echo "✅ Batch session created: $(pwd)/.stackshift-batch-session.json"
|
|
echo "📦 Configuration will be auto-applied to all ${TOTAL_REPOS} repos"
|
|
```
|
|
|
|
**Then: Spawn parallel agents (they'll auto-use batch session)**
|
|
|
|
```typescript
|
|
// Use Task tool to spawn parallel agents
|
|
const batch1 = [
|
|
'service-user-api',
|
|
'service-inventory',
|
|
'service-contact',
|
|
'service-search',
|
|
'service-pricing'
|
|
];
|
|
|
|
// Spawn 5 agents in parallel
|
|
const agents = batch1.map(service => ({
|
|
task: `Analyze ${service} service with StackShift`,
|
|
description: `StackShift analysis: ${service}`,
|
|
subagent_type: 'general-purpose',
|
|
prompt: `
|
|
cd ~/git/my-monorepo/services/${service}
|
|
|
|
IMPORTANT: Batch session is active (will be auto-detected by walking up to parent)
|
|
Parent directory has: .stackshift-batch-session.json
|
|
All configuration will be auto-applied. DO NOT ask configuration questions.
|
|
|
|
Run StackShift Gear 1: Analyze
|
|
- Will auto-detect route (batch session: ${ROUTE})
|
|
- Will use spec output location: ${SPEC_OUTPUT}
|
|
- Analyze service + shared packages
|
|
- Generate analysis-report.md
|
|
|
|
Then run Gear 2: Reverse Engineer
|
|
- Extract business logic
|
|
- Document all shared package dependencies
|
|
- Create comprehensive documentation
|
|
|
|
Then run Gear 3: Create Specifications
|
|
- Generate .specify/ structure
|
|
- Create constitution
|
|
- Generate feature specs
|
|
|
|
Save all results to:
|
|
${SPEC_OUTPUT}/${service}/
|
|
|
|
When complete, create completion marker:
|
|
${SPEC_OUTPUT}/${service}/.complete
|
|
`
|
|
}));
|
|
|
|
// Launch all 5 in parallel
|
|
agents.forEach(agent => spawnAgent(agent));
|
|
```
|
|
|
|
### Step 4: Progress Tracking
|
|
|
|
```bash
|
|
# Create tracking directory
|
|
mkdir -p ~/git/stackshift-batch-results
|
|
|
|
# Monitor progress
|
|
while true; do
|
|
COMPLETE=$(find ~/git/stackshift-batch-results -name ".complete" | wc -l)
|
|
echo "Completed: $COMPLETE / $WIDGET_COUNT"
|
|
|
|
# Check if batch done
|
|
if [ $COMPLETE -ge 5 ]; then
|
|
echo "✅ Batch 1 complete"
|
|
break
|
|
fi
|
|
|
|
sleep 30
|
|
done
|
|
|
|
# Start next batch...
|
|
```
|
|
|
|
### Step 5: Result Aggregation
|
|
|
|
```bash
|
|
# After all batches complete
|
|
echo "=== Aggregating Results ==="
|
|
|
|
# Create master report
|
|
cat > ~/git/stackshift-batch-results/BATCH_SUMMARY.md <<EOF
|
|
# StackShift Batch Analysis Results
|
|
|
|
**Date:** $(date)
|
|
**Widgets Analyzed:** $WIDGET_COUNT
|
|
**Batches:** $(($WIDGET_COUNT / 5))
|
|
**Total Time:** [calculated]
|
|
|
|
## Completion Status
|
|
|
|
$(for service in $(cat /tmp/services-to-analyze.txt); do
|
|
service_name=$(basename $service)
|
|
if [ -f ~/git/stackshift-batch-results/$service_name/.complete ]; then
|
|
echo "- ✅ $service_name - Complete"
|
|
else
|
|
echo "- ❌ $service_name - Failed or incomplete"
|
|
fi
|
|
done)
|
|
|
|
## Results by Widget
|
|
|
|
$(for service in $(cat /tmp/services-to-analyze.txt); do
|
|
service_name=$(basename $service)
|
|
if [ -f ~/git/stackshift-batch-results/$service_name/.complete ]; then
|
|
echo "### $service_name"
|
|
echo ""
|
|
echo "**Specs created:** $(find ~/git/stackshift-batch-results/$service_name/.specify/memory/specifications -name "*.md" 2>/dev/null | wc -l)"
|
|
echo "**Modules analyzed:** $(cat ~/git/stackshift-batch-results/$service_name/.stackshift-state.json 2>/dev/null | jq -r '.metadata.modulesAnalyzed // 0')"
|
|
echo ""
|
|
fi
|
|
done)
|
|
|
|
## Next Steps
|
|
|
|
All specifications are ready for review:
|
|
- Review specs in each service's batch-results directory
|
|
- Merge specs to actual repos if satisfied
|
|
- Run Gears 4-6 as needed
|
|
EOF
|
|
|
|
cat ~/git/stackshift-batch-results/BATCH_SUMMARY.md
|
|
```
|
|
|
|
---
|
|
|
|
## Result Structure
|
|
|
|
```
|
|
~/git/stackshift-batch-results/
|
|
├── BATCH_SUMMARY.md # Master summary
|
|
├── batch-progress.json # Real-time tracking
|
|
│
|
|
├── service-user-api/
|
|
│ ├── .complete # Marker file
|
|
│ ├── .stackshift-state.json # State
|
|
│ ├── analysis-report.md # Gear 1 output
|
|
│ ├── docs/reverse-engineering/ # Gear 2 output
|
|
│ │ ├── functional-specification.md
|
|
│ │ ├── service-logic.md
|
|
│ │ ├── modules/
|
|
│ │ │ ├── shared-pricing-utils.md
|
|
│ │ │ └── shared-discount-utils.md
|
|
│ │ └── [7 more docs]
|
|
│ └── .specify/ # Gear 3 output
|
|
│ └── memory/
|
|
│ ├── constitution.md
|
|
│ └── specifications/
|
|
│ ├── pricing-display.md
|
|
│ ├── incentive-logic.md
|
|
│ └── [more specs]
|
|
│
|
|
├── service-inventory/
|
|
│ └── [same structure]
|
|
│
|
|
└── [88 more services...]
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring Progress
|
|
|
|
**Real-time status:**
|
|
|
|
```bash
|
|
# I'll show you periodic updates
|
|
echo "=== Batch Progress ==="
|
|
echo "Batch 1 (5 services): 3/5 complete"
|
|
echo " ✅ service-user-api - Complete (12 min)"
|
|
echo " ✅ service-inventory - Complete (8 min)"
|
|
echo " ✅ service-contact - Complete (15 min)"
|
|
echo " 🔄 service-search - Running (7 min elapsed)"
|
|
echo " ⏳ service-pricing - Queued"
|
|
echo ""
|
|
echo "Estimated time remaining: 25 minutes"
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
**If a service fails:**
|
|
```bash
|
|
# Retry failed services
|
|
failed_services=(service-search service-pricing)
|
|
|
|
for service in "${failed_services[@]}"; do
|
|
echo "Retrying: $service"
|
|
# Spawn new agent for retry
|
|
done
|
|
```
|
|
|
|
**Common failures:**
|
|
- Missing package.json
|
|
- Tests failing (can continue anyway)
|
|
- Module source not found (prompt for location)
|
|
|
|
---
|
|
|
|
## Use Cases
|
|
|
|
**1. Entire monorepo migration:**
|
|
```
|
|
Analyze all 90+ ws-* services for migration planning
|
|
↓
|
|
Result: Complete business logic extracted from entire platform
|
|
↓
|
|
Use specs to plan Next.js migration strategy
|
|
```
|
|
|
|
**2. Selective analysis:**
|
|
```
|
|
Analyze just the 10 high-priority services first
|
|
↓
|
|
Review results
|
|
↓
|
|
Then batch process remaining 80
|
|
```
|
|
|
|
**3. Module analysis:**
|
|
```
|
|
cd ~/git/my-monorepo/services
|
|
Analyze all shared packages (not services)
|
|
↓
|
|
Result: Shared module documentation
|
|
↓
|
|
Understand dependencies before service migration
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Options
|
|
|
|
I'll ask you to configure:
|
|
|
|
- **Repository list:** All in folder, or custom list?
|
|
- **Batch size:** How many parallel (3/5/10)?
|
|
- **Gears to run:** 1-3 only or full 1-6?
|
|
- **Route:** Auto-detect or force specific route?
|
|
- **Output location:** Central results dir or per-repo?
|
|
- **Error handling:** Stop on failure or continue?
|
|
|
|
---
|
|
|
|
## Comparison with thoth-cli
|
|
|
|
**thoth-cli (Upgrades):**
|
|
- Orchestrates 90+ service upgrades
|
|
- 3 phases: coverage → discovery → implementation
|
|
- Tracks in .upgrade-state.json
|
|
- Parallel processing (2-5 at a time)
|
|
|
|
**StackShift Batch (Analysis):**
|
|
- Orchestrates 90+ service analyses
|
|
- 6 gears: analyze → reverse-engineer → create-specs → gap → clarify → implement
|
|
- Tracks in .stackshift-state.json
|
|
- Parallel processing (3-10 at a time)
|
|
- Can output to central location
|
|
|
|
---
|
|
|
|
## Example Session
|
|
|
|
```
|
|
You: "I want to analyze all Osiris services in ~/git/my-monorepo/services"
|
|
|
|
Me: "Found 92 services! Let me configure batch processing..."
|
|
|
|
[Asks questions via AskUserQuestion]
|
|
- Process all 92? ✅
|
|
- Batch size: 5
|
|
- Gears: 1-3 (just analyze and spec, no implementation)
|
|
- Output: Central results directory
|
|
|
|
Me: "Starting batch analysis..."
|
|
|
|
Batch 1 (5 services): service-user-api, service-inventory, service-contact, ws-inventory, service-pricing
|
|
[Spawns 5 parallel agents using Task tool]
|
|
|
|
[15 minutes later]
|
|
"Batch 1 complete! Starting batch 2..."
|
|
|
|
[3 hours later]
|
|
"✅ All 92 services analyzed!
|
|
|
|
Results: ~/git/stackshift-batch-results/
|
|
- 92 analysis reports
|
|
- 92 sets of specifications
|
|
- 890 total specs extracted
|
|
- Multiple shared packages documented
|
|
|
|
Next: Review specs and begin migration planning"
|
|
```
|
|
|
|
---
|
|
|
|
## Managing Batch Sessions
|
|
|
|
### View Current Batch Session
|
|
|
|
```bash
|
|
# Check if batch session exists in current directory and view configuration
|
|
if [ -f .stackshift-batch-session.json ]; then
|
|
echo "📦 Active Batch Session in $(pwd)"
|
|
cat .stackshift-batch-session.json | jq '.'
|
|
else
|
|
echo "No active batch session in current directory"
|
|
fi
|
|
```
|
|
|
|
### View All Batch Sessions
|
|
|
|
```bash
|
|
# Find all active batch sessions
|
|
echo "🔍 Finding all active batch sessions..."
|
|
find ~/git -name ".stackshift-batch-session.json" -type f 2>/dev/null | while read session; do
|
|
echo ""
|
|
echo "📦 $(dirname $session)"
|
|
cat "$session" | jq -r '" Route: \(.answers.route) | Repos: \(.processedRepos | length)/\(.totalRepos)"'
|
|
done
|
|
```
|
|
|
|
### Clear Batch Session
|
|
|
|
**After batch completes:**
|
|
```bash
|
|
# I'll ask you:
|
|
# "Batch processing complete! Clear batch session? (Y/n)"
|
|
|
|
# If yes:
|
|
rm .stackshift-batch-session.json
|
|
echo "✅ Batch session cleared"
|
|
|
|
# If no:
|
|
echo "✅ Batch session kept (will be used for next batch run in this directory)"
|
|
```
|
|
|
|
**Manual clear (current directory):**
|
|
```bash
|
|
# Clear batch session in current directory
|
|
rm .stackshift-batch-session.json
|
|
```
|
|
|
|
**Manual clear (specific directory):**
|
|
```bash
|
|
# Clear batch session in specific directory
|
|
rm ~/git/my-monorepo/services/.stackshift-batch-session.json
|
|
```
|
|
|
|
**Why keep batch session?**
|
|
- Run another batch with same configuration
|
|
- Process more repos later in same directory
|
|
- Continue interrupted batch
|
|
- Consistent settings for related batches
|
|
|
|
**Why clear batch session?**
|
|
- Done with current migration
|
|
- Want different configuration for next batch
|
|
- Starting fresh analysis
|
|
- Free up directory for different batch type
|
|
|
|
---
|
|
|
|
## Batch Session Benefits
|
|
|
|
**Without batch session (old way):**
|
|
```
|
|
Batch 1: Answer 10 questions ⏱️ 2 min
|
|
↓ Process 3 repos (15 min)
|
|
|
|
Batch 2: Answer 10 questions AGAIN ⏱️ 2 min
|
|
↓ Process 3 repos (15 min)
|
|
|
|
Batch 3: Answer 10 questions AGAIN ⏱️ 2 min
|
|
↓ Process 3 repos (15 min)
|
|
|
|
Total: 30 questions answered, 6 min wasted
|
|
```
|
|
|
|
**With batch session (new way):**
|
|
```
|
|
Setup: Answer 10 questions ONCE ⏱️ 2 min
|
|
↓ Batch 1: Process 3 repos (15 min)
|
|
↓ Batch 2: Process 3 repos (15 min)
|
|
↓ Batch 3: Process 3 repos (15 min)
|
|
|
|
Total: 10 questions answered, 0 min wasted
|
|
Saved: 4 minutes per 9 repos processed
|
|
```
|
|
|
|
**For 90 repos in batches of 3:**
|
|
- Old way: 300 questions answered (60 min of clicking)
|
|
- New way: 10 questions answered (2 min of clicking)
|
|
- **Time saved: 58 minutes!** ⚡
|
|
|
|
---
|
|
|
|
**This batch processing system is perfect for:**
|
|
- Monorepo migration (90+ services)
|
|
- Multi-repo monorepo analysis
|
|
- Department-wide code audits
|
|
- Portfolio modernization projects |