Files
gh-jschulte-claude-plugins-…/commands/batch.md
2025-11-30 08:29:31 +08:00

15 KiB

description
description
Batch process multiple repos with StackShift analysis running in parallel. Analyzes 5 repos at a time, tracks progress, and aggregates results. Perfect for analyzing monorepo services or multiple related projects.

StackShift Batch Processing

Analyze multiple repositories in parallel

Run StackShift on 10, 50, or 100+ repos simultaneously with progress tracking and result aggregation.


Quick Start

Analyze all services in a monorepo:

# From monorepo services directory
cd ~/git/my-monorepo/services

# Let me analyze all service-* directories in batches of 5

I'll:

  1. Find all service-* directories
  2. Filter to valid repos (has package.json)
  3. Process in batches of 5 (configurable)
  4. Track progress in batch-results/
  5. Aggregate results when complete

What I'll Do

Step 1: Discovery

echo "=== Discovering repositories in ~/git/my-monorepo/services ==="

# Find all service directories
find ~/git/my-monorepo/services -maxdepth 1 -type d -name "service-*" | sort > /tmp/services-to-analyze.txt

# Count
SERVICE_COUNT=$(wc -l < /tmp/services-to-analyze.txt)
echo "Found $SERVICE_COUNT services"

# Show first 10
head -10 /tmp/services-to-analyze.txt

Step 2: Batch Configuration

IMPORTANT: I'll ask ALL configuration questions upfront, ONCE. Your answers will be saved to a batch session file and automatically applied to ALL repos in all batches. You won't need to answer these questions again during this batch run!

I'll ask you:

Question 1: How many to process?

  • A) All services ($WIDGET_COUNT total)
  • B) First 10 (test run)
  • C) First 25 (small batch)
  • D) Custom number

Question 2: Parallel batch size?

  • A) 3 at a time (conservative)
  • B) 5 at a time (recommended)
  • C) 10 at a time (aggressive, may slow down)
  • D) Sequential (1 at a time, safest)

Question 3: What route?

  • A) Auto-detect (auto-detect (monorepo for service-*), ask for others)
  • B) Force monorepo-service for all
  • C) Force greenfield for all
  • D) Force brownfield for all

Question 4: Brownfield mode? (If route = brownfield)

  • A) Standard - Just create specs for current state
  • B) Upgrade - Create specs + upgrade all dependencies

Question 5: Transmission?

  • A) Manual - Review each gear before proceeding
  • B) Cruise Control - Shift through all gears automatically

Question 6: Clarifications strategy? (If transmission = cruise control)

  • A) Defer - Mark them, continue around them
  • B) Prompt - Stop and ask questions
  • C) Skip - Only implement fully-specified features

Question 7: Implementation scope? (If transmission = cruise control)

  • A) None - Stop after specs are ready
  • B) P0 only - Critical features only
  • C) P0 + P1 - Critical + high-value features
  • D) All - Every feature

Question 8: Spec output location? (If route = greenfield)

  • A) Current repository (default)
  • B) New application repository
  • C) Separate documentation repository
  • D) Custom location

Question 9: Target stack? (If greenfield + implementation scope != none)

  • Examples:
    • Next.js 15 + TypeScript + Prisma + PostgreSQL
    • Python/FastAPI + SQLAlchemy + PostgreSQL
    • Your choice: [specify]

Question 10: Build location? (If greenfield + implementation scope != none)

  • A) Subfolder (recommended) - e.g., greenfield/, v2/
  • B) Separate directory - e.g., ~/git/my-new-app
  • C) Replace in place (destructive)

Then I'll:

  1. Save all answers to .stackshift-batch-session.json (in current directory)
  2. Show batch session summary
  3. Start processing batches with auto-applied configuration
  4. Clear batch session when complete (or keep if you want)

Why directory-scoped?

  • Multiple batch sessions can run simultaneously in different directories
  • Each batch (monorepo services, etc.) has its own isolated configuration
  • No conflicts between parallel batch runs
  • Session file is co-located with the repos being processed

Step 3: Create Batch Session & Spawn Agents

First: Create batch session with all answers

# After collecting all configuration answers, create batch session
# Stored in current directory for isolation from other batch runs
cat > .stackshift-batch-session.json <<EOF
{
  "sessionId": "batch-$(date +%s)",
  "startedAt": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "batchRootDirectory": "$(pwd)",
  "totalRepos": ${TOTAL_REPOS},
  "batchSize": ${BATCH_SIZE},
  "answers": {
    "route": "${ROUTE}",
    "transmission": "${TRANSMISSION}",
    "spec_output_location": "${SPEC_OUTPUT}",
    "target_stack": "${TARGET_STACK}",
    "build_location": "${BUILD_LOCATION}",
    "clarifications_strategy": "${CLARIFICATIONS}",
    "implementation_scope": "${SCOPE}"
  },
  "processedRepos": []
}
EOF

echo "✅ Batch session created: $(pwd)/.stackshift-batch-session.json"
echo "📦 Configuration will be auto-applied to all ${TOTAL_REPOS} repos"

Then: Spawn parallel agents (they'll auto-use batch session)

// Use Task tool to spawn parallel agents
const batch1 = [
  'service-user-api',
  'service-inventory',
  'service-contact',
  'service-search',
  'service-pricing'
];

// Spawn 5 agents in parallel
const agents = batch1.map(service => ({
  task: `Analyze ${service} service with StackShift`,
  description: `StackShift analysis: ${service}`,
  subagent_type: 'general-purpose',
  prompt: `
    cd ~/git/my-monorepo/services/${service}

    IMPORTANT: Batch session is active (will be auto-detected by walking up to parent)
    Parent directory has: .stackshift-batch-session.json
    All configuration will be auto-applied. DO NOT ask configuration questions.

    Run StackShift Gear 1: Analyze
    - Will auto-detect route (batch session: ${ROUTE})
    - Will use spec output location: ${SPEC_OUTPUT}
    - Analyze service + shared packages
    - Generate analysis-report.md

    Then run Gear 2: Reverse Engineer
    - Extract business logic
    - Document all shared package dependencies
    - Create comprehensive documentation

    Then run Gear 3: Create Specifications
    - Generate .specify/ structure
    - Create constitution
    - Generate feature specs

    Save all results to:
    ${SPEC_OUTPUT}/${service}/

    When complete, create completion marker:
    ${SPEC_OUTPUT}/${service}/.complete
  `
}));

// Launch all 5 in parallel
agents.forEach(agent => spawnAgent(agent));

Step 4: Progress Tracking

# Create tracking directory
mkdir -p ~/git/stackshift-batch-results

# Monitor progress
while true; do
  COMPLETE=$(find ~/git/stackshift-batch-results -name ".complete" | wc -l)
  echo "Completed: $COMPLETE / $WIDGET_COUNT"

  # Check if batch done
  if [ $COMPLETE -ge 5 ]; then
    echo "✅ Batch 1 complete"
    break
  fi

  sleep 30
done

# Start next batch...

Step 5: Result Aggregation

# After all batches complete
echo "=== Aggregating Results ==="

# Create master report
cat > ~/git/stackshift-batch-results/BATCH_SUMMARY.md <<EOF
# StackShift Batch Analysis Results

**Date:** $(date)
**Widgets Analyzed:** $WIDGET_COUNT
**Batches:** $(($WIDGET_COUNT / 5))
**Total Time:** [calculated]

## Completion Status

$(for service in $(cat /tmp/services-to-analyze.txt); do
  service_name=$(basename $service)
  if [ -f ~/git/stackshift-batch-results/$service_name/.complete ]; then
    echo "- ✅ $service_name - Complete"
  else
    echo "- ❌ $service_name - Failed or incomplete"
  fi
done)

## Results by Widget

$(for service in $(cat /tmp/services-to-analyze.txt); do
  service_name=$(basename $service)
  if [ -f ~/git/stackshift-batch-results/$service_name/.complete ]; then
    echo "### $service_name"
    echo ""
    echo "**Specs created:** $(find ~/git/stackshift-batch-results/$service_name/.specify/memory/specifications -name "*.md" 2>/dev/null | wc -l)"
    echo "**Modules analyzed:** $(cat ~/git/stackshift-batch-results/$service_name/.stackshift-state.json 2>/dev/null | jq -r '.metadata.modulesAnalyzed // 0')"
    echo ""
  fi
done)

## Next Steps

All specifications are ready for review:
- Review specs in each service's batch-results directory
- Merge specs to actual repos if satisfied
- Run Gears 4-6 as needed
EOF

cat ~/git/stackshift-batch-results/BATCH_SUMMARY.md

Result Structure

~/git/stackshift-batch-results/
├── BATCH_SUMMARY.md                    # Master summary
├── batch-progress.json                 # Real-time tracking
│
├── service-user-api/
│   ├── .complete                       # Marker file
│   ├── .stackshift-state.json         # State
│   ├── analysis-report.md              # Gear 1 output
│   ├── docs/reverse-engineering/       # Gear 2 output
│   │   ├── functional-specification.md
│   │   ├── service-logic.md
│   │   ├── modules/
│   │   │   ├── shared-pricing-utils.md
│   │   │   └── shared-discount-utils.md
│   │   └── [7 more docs]
│   └── .specify/                       # Gear 3 output
│       └── memory/
│           ├── constitution.md
│           └── specifications/
│               ├── pricing-display.md
│               ├── incentive-logic.md
│               └── [more specs]
│
├── service-inventory/
│   └── [same structure]
│
└── [88 more services...]

Monitoring Progress

Real-time status:

# I'll show you periodic updates
echo "=== Batch Progress ==="
echo "Batch 1 (5 services): 3/5 complete"
echo "  ✅ service-user-api - Complete (12 min)"
echo "  ✅ service-inventory - Complete (8 min)"
echo "  ✅ service-contact - Complete (15 min)"
echo "  🔄 service-search - Running (7 min elapsed)"
echo "  ⏳ service-pricing - Queued"
echo ""
echo "Estimated time remaining: 25 minutes"

Error Handling

If a service fails:

# Retry failed services
failed_services=(service-search service-pricing)

for service in "${failed_services[@]}"; do
  echo "Retrying: $service"
  # Spawn new agent for retry
done

Common failures:

  • Missing package.json
  • Tests failing (can continue anyway)
  • Module source not found (prompt for location)

Use Cases

1. Entire monorepo migration:

Analyze all 90+ ws-* services for migration planning
↓
Result: Complete business logic extracted from entire platform
↓
Use specs to plan Next.js migration strategy

2. Selective analysis:

Analyze just the 10 high-priority services first
↓
Review results
↓
Then batch process remaining 80

3. Module analysis:

cd ~/git/my-monorepo/services
Analyze all shared packages (not services)
↓
Result: Shared module documentation
↓
Understand dependencies before service migration

Configuration Options

I'll ask you to configure:

  • Repository list: All in folder, or custom list?
  • Batch size: How many parallel (3/5/10)?
  • Gears to run: 1-3 only or full 1-6?
  • Route: Auto-detect or force specific route?
  • Output location: Central results dir or per-repo?
  • Error handling: Stop on failure or continue?

Comparison with thoth-cli

thoth-cli (Upgrades):

  • Orchestrates 90+ service upgrades
  • 3 phases: coverage → discovery → implementation
  • Tracks in .upgrade-state.json
  • Parallel processing (2-5 at a time)

StackShift Batch (Analysis):

  • Orchestrates 90+ service analyses
  • 6 gears: analyze → reverse-engineer → create-specs → gap → clarify → implement
  • Tracks in .stackshift-state.json
  • Parallel processing (3-10 at a time)
  • Can output to central location

Example Session

You: "I want to analyze all Osiris services in ~/git/my-monorepo/services"

Me: "Found 92 services! Let me configure batch processing..."

[Asks questions via AskUserQuestion]
- Process all 92? ✅
- Batch size: 5
- Gears: 1-3 (just analyze and spec, no implementation)
- Output: Central results directory

Me: "Starting batch analysis..."

Batch 1 (5 services): service-user-api, service-inventory, service-contact, ws-inventory, service-pricing
[Spawns 5 parallel agents using Task tool]

[15 minutes later]
"Batch 1 complete! Starting batch 2..."

[3 hours later]
"✅ All 92 services analyzed!

Results: ~/git/stackshift-batch-results/
- 92 analysis reports
- 92 sets of specifications
- 890 total specs extracted
- Multiple shared packages documented

Next: Review specs and begin migration planning"

Managing Batch Sessions

View Current Batch Session

# Check if batch session exists in current directory and view configuration
if [ -f .stackshift-batch-session.json ]; then
  echo "📦 Active Batch Session in $(pwd)"
  cat .stackshift-batch-session.json | jq '.'
else
  echo "No active batch session in current directory"
fi

View All Batch Sessions

# Find all active batch sessions
echo "🔍 Finding all active batch sessions..."
find ~/git -name ".stackshift-batch-session.json" -type f 2>/dev/null | while read session; do
  echo ""
  echo "📦 $(dirname $session)"
  cat "$session" | jq -r '"  Route: \(.answers.route) | Repos: \(.processedRepos | length)/\(.totalRepos)"'
done

Clear Batch Session

After batch completes:

# I'll ask you:
# "Batch processing complete! Clear batch session? (Y/n)"

# If yes:
rm .stackshift-batch-session.json
echo "✅ Batch session cleared"

# If no:
echo "✅ Batch session kept (will be used for next batch run in this directory)"

Manual clear (current directory):

# Clear batch session in current directory
rm .stackshift-batch-session.json

Manual clear (specific directory):

# Clear batch session in specific directory
rm ~/git/my-monorepo/services/.stackshift-batch-session.json

Why keep batch session?

  • Run another batch with same configuration
  • Process more repos later in same directory
  • Continue interrupted batch
  • Consistent settings for related batches

Why clear batch session?

  • Done with current migration
  • Want different configuration for next batch
  • Starting fresh analysis
  • Free up directory for different batch type

Batch Session Benefits

Without batch session (old way):

Batch 1: Answer 10 questions ⏱️ 2 min
  ↓ Process 3 repos (15 min)

Batch 2: Answer 10 questions AGAIN ⏱️ 2 min
  ↓ Process 3 repos (15 min)

Batch 3: Answer 10 questions AGAIN ⏱️ 2 min
  ↓ Process 3 repos (15 min)

Total: 30 questions answered, 6 min wasted

With batch session (new way):

Setup: Answer 10 questions ONCE ⏱️ 2 min
  ↓ Batch 1: Process 3 repos (15 min)
  ↓ Batch 2: Process 3 repos (15 min)
  ↓ Batch 3: Process 3 repos (15 min)

Total: 10 questions answered, 0 min wasted
Saved: 4 minutes per 9 repos processed

For 90 repos in batches of 3:

  • Old way: 300 questions answered (60 min of clicking)
  • New way: 10 questions answered (2 min of clicking)
  • Time saved: 58 minutes!

This batch processing system is perfect for:

  • Monorepo migration (90+ services)
  • Multi-repo monorepo analysis
  • Department-wide code audits
  • Portfolio modernization projects