Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:48:52 +08:00
commit 6ec3196ecc
434 changed files with 125248 additions and 0 deletions

215
skills/repomix/SKILL.md Normal file
View File

@@ -0,0 +1,215 @@
---
name: repomix
description: Package entire code repositories into single AI-friendly files using Repomix. Capabilities include pack codebases with customizable include/exclude patterns, generate multiple output formats (XML, Markdown, plain text), preserve file structure and context, optimize for AI consumption with token counting, filter by file types and directories, add custom headers and summaries. Use when packaging codebases for AI analysis, creating repository snapshots for LLM context, analyzing third-party libraries, preparing for security audits, generating documentation context, or evaluating unfamiliar codebases.
---
# Repomix Skill
Repomix packs entire repositories into single, AI-friendly files. Perfect for feeding codebases to LLMs like Claude, ChatGPT, and Gemini.
## When to Use
Use when:
- Packaging codebases for AI analysis
- Creating repository snapshots for LLM context
- Analyzing third-party libraries
- Preparing for security audits
- Generating documentation context
- Investigating bugs across large codebases
- Creating AI-friendly code representations
## Quick Start
### Check Installation
```bash
repomix --version
```
### Install
```bash
# npm
npm install -g repomix
# Homebrew (macOS/Linux)
brew install repomix
```
### Basic Usage
```bash
# Package current directory (generates repomix-output.xml)
repomix
# Specify output format
repomix --style markdown
repomix --style json
# Package remote repository
npx repomix --remote owner/repo
# Custom output with filters
repomix --include "src/**/*.ts" --remove-comments -o output.md
```
## Core Capabilities
### Repository Packaging
- AI-optimized formatting with clear separators
- Multiple output formats: XML, Markdown, JSON, Plain text
- Git-aware processing (respects .gitignore)
- Token counting for LLM context management
- Security checks for sensitive information
### Remote Repository Support
Process remote repositories without cloning:
```bash
# Shorthand
npx repomix --remote yamadashy/repomix
# Full URL
npx repomix --remote https://github.com/owner/repo
# Specific commit
npx repomix --remote https://github.com/owner/repo/commit/hash
```
### Comment Removal
Strip comments from supported languages (HTML, CSS, JavaScript, TypeScript, Vue, Svelte, Python, PHP, Ruby, C, C#, Java, Go, Rust, Swift, Kotlin, Dart, Shell, YAML):
```bash
repomix --remove-comments
```
## Common Use Cases
### Code Review Preparation
```bash
# Package feature branch for AI review
repomix --include "src/**/*.ts" --remove-comments -o review.md --style markdown
```
### Security Audit
```bash
# Package third-party library
npx repomix --remote vendor/library --style xml -o audit.xml
```
### Documentation Generation
```bash
# Package with docs and code
repomix --include "src/**,docs/**,*.md" --style markdown -o context.md
```
### Bug Investigation
```bash
# Package specific modules
repomix --include "src/auth/**,src/api/**" -o debug-context.xml
```
### Implementation Planning
```bash
# Full codebase context
repomix --remove-comments --copy
```
## Command Line Reference
### File Selection
```bash
# Include specific patterns
repomix --include "src/**/*.ts,*.md"
# Ignore additional patterns
repomix -i "tests/**,*.test.js"
# Disable .gitignore rules
repomix --no-gitignore
```
### Output Options
```bash
# Output format
repomix --style markdown # or xml, json, plain
# Output file path
repomix -o output.md
# Remove comments
repomix --remove-comments
# Copy to clipboard
repomix --copy
```
### Configuration
```bash
# Use custom config file
repomix -c custom-config.json
# Initialize new config
repomix --init # creates repomix.config.json
```
## Token Management
Repomix automatically counts tokens for individual files, total repository, and per-format output.
Typical LLM context limits:
- Claude Sonnet 4.5: ~200K tokens
- GPT-4: ~128K tokens
- GPT-3.5: ~16K tokens
## Security Considerations
Repomix uses Secretlint to detect sensitive data (API keys, passwords, credentials, private keys, AWS secrets).
Best practices:
1. Always review output before sharing
2. Use `.repomixignore` for sensitive files
3. Enable security checks for unknown codebases
4. Avoid packaging `.env` files
5. Check for hardcoded credentials
Disable security checks if needed:
```bash
repomix --no-security-check
```
## Implementation Workflow
When user requests repository packaging:
1. **Assess Requirements**
- Identify target repository (local/remote)
- Determine output format needed
- Check for sensitive data concerns
2. **Configure Filters**
- Set include patterns for relevant files
- Add ignore patterns for unnecessary files
- Enable/disable comment removal
3. **Execute Packaging**
- Run repomix with appropriate options
- Monitor token counts
- Verify security checks
4. **Validate Output**
- Review generated file
- Confirm no sensitive data
- Check token limits for target LLM
5. **Deliver Context**
- Provide packaged file to user
- Include token count summary
- Note any warnings or issues
## Reference Documentation
For detailed information, see:
- [Configuration Reference](./references/configuration.md) - Config files, include/exclude patterns, output formats, advanced options
- [Usage Patterns](./references/usage-patterns.md) - AI analysis workflows, security audit preparation, documentation generation, library evaluation
## Additional Resources
- GitHub: https://github.com/yamadashy/repomix
- Documentation: https://repomix.com/guide/
- MCP Server: Available for AI assistant integration

View File

@@ -0,0 +1,211 @@
# Configuration Reference
Detailed configuration options for Repomix.
## Configuration File
Create `repomix.config.json` in project root:
```json
{
"output": {
"filePath": "repomix-output.xml",
"style": "xml",
"removeComments": false,
"showLineNumbers": true,
"copyToClipboard": false
},
"include": ["**/*"],
"ignore": {
"useGitignore": true,
"useDefaultPatterns": true,
"customPatterns": ["additional-folder", "**/*.log", "**/tmp/**"]
},
"security": {
"enableSecurityCheck": true
}
}
```
### Output Options
- `filePath`: Output file path (default: `repomix-output.xml`)
- `style`: Format - `xml`, `markdown`, `json`, `plain` (default: `xml`)
- `removeComments`: Strip comments (default: `false`). Supports HTML, CSS, JS/TS, Vue, Svelte, Python, PHP, Ruby, C, C#, Java, Go, Rust, Swift, Kotlin, Dart, Shell, YAML
- `showLineNumbers`: Include line numbers (default: `true`)
- `copyToClipboard`: Auto-copy output (default: `false`)
### Include/Ignore
- `include`: Glob patterns for files to include (default: `["**/*"]`)
- `useGitignore`: Respect .gitignore (default: `true`)
- `useDefaultPatterns`: Use default ignore patterns (default: `true`)
- `customPatterns`: Additional ignore patterns (same format as .gitignore)
### Security
- `enableSecurityCheck`: Scan for sensitive data with Secretlint (default: `true`)
- Detects: API keys, passwords, credentials, private keys, AWS secrets, DB connections
## Glob Patterns
**Wildcards:**
- `*` - Any chars except `/`
- `**` - Any chars including `/`
- `?` - Single char
- `[abc]` - Char from set
- `{js,ts}` - Either extension
**Examples:**
- `**/*.ts` - All TypeScript
- `src/**` - Specific dir
- `**/*.{js,jsx,ts,tsx}` - Multiple extensions
- `!**/*.test.ts` - Exclude tests
### CLI Options
```bash
# Include patterns
repomix --include "src/**/*.ts,*.md"
# Ignore patterns
repomix -i "tests/**,*.test.js"
# Disable .gitignore
repomix --no-gitignore
# Disable defaults
repomix --no-default-patterns
```
### .repomixignore File
Create `.repomixignore` for Repomix-specific patterns (same format as .gitignore):
```
# Build artifacts
dist/
build/
*.min.js
out/
# Test files
**/*.test.ts
**/*.spec.ts
coverage/
__tests__/
# Dependencies
node_modules/
vendor/
packages/*/node_modules/
# Large files
*.mp4
*.zip
*.tar.gz
*.iso
# Sensitive files
.env*
secrets/
*.key
*.pem
# IDE files
.vscode/
.idea/
*.swp
# Logs
logs/
**/*.log
```
### Pattern Precedence
Order (highest to lowest priority):
1. CLI ignore patterns (`-i`)
2. `.repomixignore` file
3. Custom patterns in config
4. `.gitignore` (if enabled)
5. Default patterns (if enabled)
### Pattern Examples
**TypeScript:**
```json
{"include": ["**/*.ts", "**/*.tsx"], "ignore": {"customPatterns": ["**/*.test.ts", "dist/"]}}
```
**React:**
```json
{"include": ["src/**/*.{js,jsx,ts,tsx}", "*.md"], "ignore": {"customPatterns": ["build/"]}}
```
**Monorepo:**
```json
{"include": ["packages/*/src/**"], "ignore": {"customPatterns": ["packages/*/dist/"]}}
```
## Output Formats
### XML (Default)
```bash
repomix --style xml
```
Structured AI consumption. Features: tags, hierarchy, metadata, AI-optimized separators.
Use for: LLMs, structured analysis, programmatic parsing.
### Markdown
```bash
repomix --style markdown
```
Human-readable with syntax highlighting. Features: syntax highlighting, headers, TOC.
Use for: documentation, code review, sharing.
### JSON
```bash
repomix --style json
```
Programmatic processing. Features: structured data, easy parsing, metadata.
Use for: API integration, custom tooling, data analysis.
### Plain Text
```bash
repomix --style plain
```
Simple concatenation. Features: no formatting, minimal overhead.
Use for: simple analysis, minimal processing.
## Advanced Options
```bash
# Verbose - show processing details
repomix --verbose
# Custom config file
repomix -c /path/to/custom-config.json
# Initialize config
repomix --init
# Disable line numbers - smaller output
repomix --no-line-numbers
```
### Performance
**Worker Threads:** Parallel processing handles large codebases efficiently (e.g., facebook/react: 29x faster, 123s → 4s)
**Optimization:**
```bash
# Exclude unnecessary files
repomix -i "node_modules/**,dist/**,*.min.js"
# Specific directories only
repomix --include "src/**/*.ts"
# Remove comments, disable line numbers
repomix --remove-comments --no-line-numbers
```

View File

@@ -0,0 +1,232 @@
# Usage Patterns
Practical workflows and patterns for using Repomix in different scenarios.
## AI Analysis Workflows
### Full Repository
```bash
repomix --remove-comments --style markdown -o full-repo.md
```
**Use:** New codebase, architecture review, complete LLM context, planning
**Tips:** Remove comments, use markdown, check token limits, review before sharing
### Focused Module
```bash
repomix --include "src/auth/**,src/api/**" -o modules.xml
```
**Use:** Feature analysis, debugging specific areas, targeted refactoring
**Tips:** Include related files only, stay within token limits, use XML for AI
### Incremental Analysis
```bash
git checkout feature-branch && repomix --include "src/**" -o feature.xml
git checkout main && repomix --include "src/**" -o main.xml
```
**Use:** Feature branch review, change impact, before/after comparison, migration planning
### Cross-Repository
```bash
npx repomix --remote org/repo1 -o repo1.xml
npx repomix --remote org/repo2 -o repo2.xml
```
**Use:** Microservices, library comparisons, consistency checks, integration analysis
## Security Audit
### Third-Party Library
```bash
npx repomix --remote vendor/library --style xml -o audit.xml
```
**Workflow:** Package library → enable security checks → review vulnerabilities → check suspicious patterns → AI analysis
**Check for:** API keys, hardcoded credentials, network calls, obfuscation, malicious patterns
### Pre-Deployment
```bash
repomix --include "src/**,config/**" --style xml -o pre-deploy-audit.xml
```
**Checklist:** No sensitive data, no test credentials, env vars correct, security practices, no debug code
### Dependency Audit
```bash
repomix --include "**/package.json,**/package-lock.json" -o deps.md --style markdown
repomix --include "node_modules/suspicious-package/**" -o dep-audit.xml
```
**Use:** Suspicious dependency, security advisory, license compliance, vulnerability assessment
### Compliance
```bash
repomix --include "src/**,LICENSE,README.md,docs/**" --style markdown -o compliance.md
```
**Include:** Source, licenses, docs, configs. **Exclude:** Test data, dependencies
## Documentation
### Doc Context
```bash
repomix --include "src/**,docs/**,*.md" --style markdown -o doc-context.md
```
**Use:** API docs, architecture docs, user guides, onboarding
**Tips:** Include existing docs, include source, use markdown
### API Documentation
```bash
repomix --include "src/api/**,src/routes/**,src/controllers/**" --remove-comments -o api-context.xml
```
**Include:** Routes, controllers, schemas, middleware
**Workflow:** Package → AI → OpenAPI/Swagger → endpoint docs → examples
### Architecture
```bash
repomix --include "src/**/*.ts,*.md" -i "**/*.test.ts" --style markdown -o architecture.md
```
**Focus:** Module structure, dependencies, design patterns, data flow
### Examples
```bash
repomix --include "examples/**,demos/**,*.example.js" --style markdown -o examples.md
```
## Library Evaluation
### Quick Assessment
```bash
npx repomix --remote owner/library --style markdown -o library-eval.md
```
**Evaluate:** Code quality, architecture, dependencies, tests, docs, maintenance
### Feature Comparison
```bash
npx repomix --remote owner/lib-a --style xml -o lib-a.xml
npx repomix --remote owner/lib-b --style xml -o lib-b.xml
```
**Compare:** Features, API design, performance, bundle size, dependencies, maintenance
### Integration Feasibility
```bash
npx repomix --remote vendor/library --include "src/**,*.md" -o library.xml
repomix --include "src/integrations/**" -o our-integrations.xml
```
Analyze compatibility between target library and your integration points
### Migration Planning
```bash
repomix --include "node_modules/old-lib/**" -o old-lib.xml
npx repomix --remote owner/new-lib -o new-lib.xml
```
Compare current vs target library, analyze usage patterns
## Workflow Integration
### CI/CD
```yaml
# GitHub Actions
- name: Generate Snapshot
run: |
npm install -g repomix
repomix --style markdown -o release-snapshot.md
- name: Upload Artifact
uses: actions/upload-artifact@v3
with: {name: repo-snapshot, path: release-snapshot.md}
```
**Use:** Release docs, compliance archives, change tracking, audit trails
### Git Hooks
```bash
#!/bin/bash
# .git/hooks/pre-commit
git diff --cached --name-only > staged-files.txt
repomix --include "$(cat staged-files.txt | tr '\n' ',')" -o .context/latest.xml
```
### IDE (VS Code)
```json
{"version": "2.0.0", "tasks": [{"label": "Package for AI", "type": "shell", "command": "repomix --include 'src/**' --remove-comments --copy"}]}
```
### Claude Code
```bash
repomix --style markdown --copy # Then paste into Claude
```
## Language-Specific Patterns
### TypeScript
```bash
repomix --include "**/*.ts,**/*.tsx" --remove-comments --no-line-numbers
```
**Exclude:** `**/*.test.ts`, `dist/`, `coverage/`
### React
```bash
repomix --include "src/**/*.{js,jsx,ts,tsx},public/**" -i "build/,*.test.*"
```
**Include:** Components, hooks, utils, public assets
### Node.js Backend
```bash
repomix --include "src/**/*.js,config/**" -i "node_modules/,logs/,tmp/"
```
**Focus:** Routes, controllers, models, middleware, configs
### Python
```bash
repomix --include "**/*.py,requirements.txt,*.md" -i "**/__pycache__/,venv/"
```
**Exclude:** `__pycache__/`, `*.pyc`, `venv/`, `.pytest_cache/`
### Monorepo
```bash
repomix --include "packages/*/src/**" -i "packages/*/node_modules/,packages/*/dist/"
```
**Consider:** Package-specific patterns, shared deps, cross-package refs, workspace structure
## Troubleshooting
### Output Too Large
**Problem:** Exceeds LLM token limits
**Fix:**
```bash
repomix -i "node_modules/**,dist/**,coverage/**" --include "src/core/**" --remove-comments --no-line-numbers
```
### Missing Files
**Problem:** Expected files not included
**Debug:**
```bash
cat .gitignore .repomixignore # Check ignore patterns
repomix --no-gitignore --no-default-patterns --verbose
```
### Sensitive Data Warnings
**Problem:** Security scanner flags secrets
**Actions:** Review files → add to `.repomixignore` → remove sensitive data → use env vars
```bash
repomix --no-security-check # Use carefully for false positives
```
### Performance Issues
**Problem:** Slow on large repo
**Optimize:**
```bash
repomix --include "src/**/*.ts" -i "node_modules/**,dist/**,vendor/**"
```
### Remote Access
**Problem:** Cannot access remote repo
**Fix:**
```bash
npx repomix --remote https://github.com/owner/repo # Full URL
npx repomix --remote https://github.com/owner/repo/commit/abc123 # Specific commit
# For private: clone first, run locally
```
## Best Practices
**Planning:** Define scope → identify files → check token limits → consider security
**Execution:** Start broad, refine narrow → use appropriate format → enable security checks → monitor tokens
**Review:** Verify no sensitive data → check completeness → validate format → test with LLM
**Iteration:** Refine patterns → adjust format → optimize tokens → document patterns

View File

@@ -0,0 +1,179 @@
# Repomix Scripts
Utility scripts for batch processing repositories with Repomix.
## repomix_batch.py
Batch process multiple repositories (local or remote) using the repomix CLI tool.
### Features
- Process multiple repositories in one command
- Support local and remote repositories
- Configurable output formats (XML, Markdown, JSON, Plain)
- Environment variable loading from multiple .env file locations
- Comprehensive error handling
- Progress reporting
### Installation
Requires Python 3.10+ and repomix CLI:
```bash
# Install repomix
npm install -g repomix
# Install Python dependencies (if needed)
pip install pytest pytest-cov pytest-mock # For running tests
```
### Usage
**Process single repository:**
```bash
python repomix_batch.py /path/to/repo
```
**Process multiple repositories:**
```bash
python repomix_batch.py /repo1 /repo2 /repo3
```
**Process remote repositories:**
```bash
python repomix_batch.py owner/repo1 owner/repo2 --remote
```
**From JSON file:**
```bash
python repomix_batch.py -f repos.json
```
**With options:**
```bash
python repomix_batch.py /repo1 /repo2 \
--style markdown \
--output-dir output \
--remove-comments \
--include "src/**/*.ts" \
--ignore "tests/**" \
--verbose
```
### Configuration File Format
Create `repos.json` with repository configurations:
```json
[
{
"path": "/path/to/local/repo",
"output": "custom-output.xml"
},
{
"path": "owner/repo",
"remote": true
},
{
"path": "https://github.com/owner/repo",
"remote": true,
"output": "repo-output.md"
}
]
```
### Environment Variables
Loads .env files in order of precedence:
1. Process environment (highest priority)
2. `./repomix/.env` (skill-specific)
3. `./skills/.env` (skills directory)
4. `./.claude/.env` (lowest priority)
### Command Line Options
```
positional arguments:
repos Repository paths or URLs to process
options:
-h, --help Show help message
-f, --file FILE JSON file containing repository configurations
--style {xml,markdown,json,plain}
Output format (default: xml)
-o, --output-dir DIR Output directory (default: repomix-output)
--remove-comments Remove comments from source files
--include PATTERN Include pattern (glob)
--ignore PATTERN Ignore pattern (glob)
--no-security-check Disable security checks
-v, --verbose Verbose output
--remote Treat all repos as remote URLs
```
### Examples
**Process local repositories:**
```bash
python repomix_batch.py /path/to/repo1 /path/to/repo2 --style markdown
```
**Process remote repositories:**
```bash
python repomix_batch.py yamadashy/repomix facebook/react --remote
```
**Mixed configuration:**
```bash
python repomix_batch.py \
/local/repo \
--remote owner/remote-repo \
-f additional-repos.json \
--style json \
--remove-comments
```
**TypeScript projects only:**
```bash
python repomix_batch.py /repo1 /repo2 \
--include "**/*.ts,**/*.tsx" \
--ignore "**/*.test.ts,dist/" \
--remove-comments \
--style markdown
```
### Testing
Run tests with coverage:
```bash
cd tests
pytest test_repomix_batch.py -v --cov=repomix_batch --cov-report=term-missing
```
Current coverage: 99%
### Exit Codes
- `0` - All repositories processed successfully
- `1` - One or more repositories failed or error occurred
### Troubleshooting
**repomix not found:**
```bash
npm install -g repomix
```
**Permission denied:**
```bash
chmod +x repomix_batch.py
```
**Timeout errors:**
- Default timeout: 5 minutes per repository
- Reduce scope with `--include` patterns
- Exclude large directories with `--ignore`
**No repositories specified:**
- Provide repository paths as arguments
- Or use `-f` flag with JSON config file

View File

@@ -0,0 +1,455 @@
#!/usr/bin/env python3
"""
Batch process multiple repositories using Repomix.
This script processes multiple repositories (local or remote) using the repomix CLI tool.
Supports configuration through environment variables loaded from multiple .env file locations.
"""
import os
import sys
import subprocess
import json
from pathlib import Path
from typing import List, Dict, Optional, Tuple
from dataclasses import dataclass
import argparse
@dataclass
class RepomixConfig:
"""Configuration for repomix execution."""
style: str = "xml"
output_dir: str = "repomix-output"
remove_comments: bool = False
include_pattern: Optional[str] = None
ignore_pattern: Optional[str] = None
no_security_check: bool = False
verbose: bool = False
class EnvLoader:
"""Load environment variables from multiple .env file locations."""
@staticmethod
def load_env_files() -> Dict[str, str]:
"""
Load environment variables from .env files in order of precedence.
Order: process.env > skill/.env > skills/.env > .claude/.env
Returns:
Dictionary of environment variables
"""
env_vars = {}
script_dir = Path(__file__).parent.resolve()
# Define search paths in reverse order (lowest to highest priority)
search_paths = [
script_dir.parent.parent.parent / ".env", # .claude/.env
script_dir.parent.parent / ".env", # skills/.env
script_dir.parent / ".env", # skill/.env (repomix/.env)
]
# Load from files (lower priority first)
for env_path in search_paths:
if env_path.exists():
env_vars.update(EnvLoader._parse_env_file(env_path))
# Override with process environment (highest priority)
env_vars.update(os.environ)
return env_vars
@staticmethod
def _parse_env_file(path: Path) -> Dict[str, str]:
"""
Parse a .env file and return key-value pairs.
Args:
path: Path to .env file
Returns:
Dictionary of environment variables
"""
env_vars = {}
try:
with open(path, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
# Skip comments and empty lines
if not line or line.startswith('#'):
continue
# Parse KEY=VALUE
if '=' in line:
key, value = line.split('=', 1)
key = key.strip()
value = value.strip()
# Remove quotes if present
if value.startswith('"') and value.endswith('"'):
value = value[1:-1]
elif value.startswith("'") and value.endswith("'"):
value = value[1:-1]
env_vars[key] = value
except Exception as e:
print(f"Warning: Failed to parse {path}: {e}", file=sys.stderr)
return env_vars
class RepomixBatchProcessor:
"""Process multiple repositories with repomix."""
def __init__(self, config: RepomixConfig):
"""
Initialize batch processor.
Args:
config: Repomix configuration
"""
self.config = config
self.env_vars = EnvLoader.load_env_files()
def check_repomix_installed(self) -> bool:
"""
Check if repomix is installed and accessible.
Returns:
True if repomix is installed, False otherwise
"""
try:
result = subprocess.run(
["repomix", "--version"],
capture_output=True,
text=True,
timeout=5,
env=self.env_vars
)
return result.returncode == 0
except (subprocess.SubprocessError, FileNotFoundError):
return False
def process_repository(
self,
repo_path: str,
output_name: Optional[str] = None,
is_remote: bool = False
) -> Tuple[bool, str]:
"""
Process a single repository with repomix.
Args:
repo_path: Path to local repository or remote repository URL
output_name: Custom output filename (optional)
is_remote: Whether repo_path is a remote URL
Returns:
Tuple of (success, message)
"""
# Create output directory if it doesn't exist
output_dir = Path(self.config.output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# Determine output filename
if output_name:
output_file = output_dir / output_name
else:
if is_remote:
# Extract repo name from URL
repo_name = repo_path.rstrip('/').split('/')[-1]
else:
repo_name = Path(repo_path).name
extension = self._get_extension(self.config.style)
output_file = output_dir / f"{repo_name}-output.{extension}"
# Build repomix command
cmd = self._build_command(repo_path, output_file, is_remote)
if self.config.verbose:
print(f"Executing: {' '.join(cmd)}")
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=300, # 5 minute timeout
env=self.env_vars
)
if result.returncode == 0:
return True, f"Successfully processed {repo_path} -> {output_file}"
else:
error_msg = result.stderr or result.stdout or "Unknown error"
return False, f"Failed to process {repo_path}: {error_msg}"
except subprocess.TimeoutExpired:
return False, f"Timeout processing {repo_path} (exceeded 5 minutes)"
except Exception as e:
return False, f"Error processing {repo_path}: {str(e)}"
def _build_command(
self,
repo_path: str,
output_file: Path,
is_remote: bool
) -> List[str]:
"""
Build repomix command with configuration options.
Args:
repo_path: Path to repository
output_file: Output file path
is_remote: Whether this is a remote repository
Returns:
Command as list of strings
"""
cmd = ["npx" if is_remote else "repomix"]
if is_remote:
cmd.extend(["repomix", "--remote", repo_path])
else:
cmd.append(repo_path)
# Add configuration options
cmd.extend(["--style", self.config.style])
cmd.extend(["-o", str(output_file)])
if self.config.remove_comments:
cmd.append("--remove-comments")
if self.config.include_pattern:
cmd.extend(["--include", self.config.include_pattern])
if self.config.ignore_pattern:
cmd.extend(["-i", self.config.ignore_pattern])
if self.config.no_security_check:
cmd.append("--no-security-check")
if self.config.verbose:
cmd.append("--verbose")
return cmd
@staticmethod
def _get_extension(style: str) -> str:
"""
Get file extension for output style.
Args:
style: Output style (xml, markdown, json, plain)
Returns:
File extension
"""
extensions = {
"xml": "xml",
"markdown": "md",
"json": "json",
"plain": "txt"
}
return extensions.get(style, "xml")
def process_batch(
self,
repositories: List[Dict[str, str]]
) -> Dict[str, List[str]]:
"""
Process multiple repositories.
Args:
repositories: List of repository configurations
Each dict should contain:
- 'path': Repository path or URL
- 'output': Optional output filename
- 'remote': Optional boolean for remote repos
Returns:
Dictionary with 'success' and 'failed' lists
"""
results = {"success": [], "failed": []}
for repo in repositories:
repo_path = repo.get("path")
if not repo_path:
results["failed"].append("Missing 'path' in repository config")
continue
output_name = repo.get("output")
is_remote = repo.get("remote", False)
success, message = self.process_repository(
repo_path,
output_name,
is_remote
)
if success:
results["success"].append(message)
else:
results["failed"].append(message)
print(message)
return results
def load_repositories_from_file(file_path: str) -> List[Dict[str, str]]:
"""
Load repository configurations from JSON file.
Expected format:
[
{"path": "/path/to/repo", "output": "custom.xml"},
{"path": "owner/repo", "remote": true},
...
]
Args:
file_path: Path to JSON file
Returns:
List of repository configurations
"""
try:
with open(file_path, 'r', encoding='utf-8') as f:
data = json.load(f)
if isinstance(data, list):
return data
else:
print(f"Error: Expected array in {file_path}", file=sys.stderr)
return []
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON in {file_path}: {e}", file=sys.stderr)
return []
except Exception as e:
print(f"Error: Failed to read {file_path}: {e}", file=sys.stderr)
return []
def main():
"""Main entry point for the script."""
parser = argparse.ArgumentParser(
description="Batch process multiple repositories with repomix"
)
# Input options
parser.add_argument(
"repos",
nargs="*",
help="Repository paths or URLs to process"
)
parser.add_argument(
"-f", "--file",
help="JSON file containing repository configurations"
)
# Output options
parser.add_argument(
"--style",
choices=["xml", "markdown", "json", "plain"],
default="xml",
help="Output format (default: xml)"
)
parser.add_argument(
"-o", "--output-dir",
default="repomix-output",
help="Output directory (default: repomix-output)"
)
# Processing options
parser.add_argument(
"--remove-comments",
action="store_true",
help="Remove comments from source files"
)
parser.add_argument(
"--include",
help="Include pattern (glob)"
)
parser.add_argument(
"--ignore",
help="Ignore pattern (glob)"
)
parser.add_argument(
"--no-security-check",
action="store_true",
help="Disable security checks"
)
parser.add_argument(
"-v", "--verbose",
action="store_true",
help="Verbose output"
)
parser.add_argument(
"--remote",
action="store_true",
help="Treat all repos as remote URLs"
)
args = parser.parse_args()
# Create configuration
config = RepomixConfig(
style=args.style,
output_dir=args.output_dir,
remove_comments=args.remove_comments,
include_pattern=args.include,
ignore_pattern=args.ignore,
no_security_check=args.no_security_check,
verbose=args.verbose
)
# Initialize processor
processor = RepomixBatchProcessor(config)
# Check if repomix is installed
if not processor.check_repomix_installed():
print("Error: repomix is not installed or not in PATH", file=sys.stderr)
print("Install with: npm install -g repomix", file=sys.stderr)
return 1
# Collect repositories to process
repositories = []
# Load from file if specified
if args.file:
repositories.extend(load_repositories_from_file(args.file))
# Add command line repositories
if args.repos:
for repo_path in args.repos:
repositories.append({
"path": repo_path,
"remote": args.remote
})
# Validate we have repositories to process
if not repositories:
print("Error: No repositories specified", file=sys.stderr)
print("Use: repomix_batch.py <repo1> <repo2> ...", file=sys.stderr)
print("Or: repomix_batch.py -f repos.json", file=sys.stderr)
return 1
# Process batch
print(f"Processing {len(repositories)} repositories...")
results = processor.process_batch(repositories)
# Print summary
print("\n" + "=" * 50)
print(f"Success: {len(results['success'])}")
print(f"Failed: {len(results['failed'])}")
if results['failed']:
print("\nFailed repositories:")
for failure in results['failed']:
print(f" - {failure}")
return 0 if not results['failed'] else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,15 @@
[
{
"path": "/path/to/local/repo",
"output": "local-repo-output.xml"
},
{
"path": "owner/repo",
"remote": true,
"output": "remote-repo.xml"
},
{
"path": "https://github.com/yamadashy/repomix",
"remote": true
}
]

View File

@@ -0,0 +1,15 @@
# Repomix Skill Dependencies
# Python 3.10+ required
# No Python package dependencies - uses only standard library
# Testing dependencies (dev)
pytest>=8.0.0
pytest-cov>=4.1.0
pytest-mock>=3.12.0
# Note: This script requires the Repomix CLI tool
# Install Repomix globally:
# npm install -g repomix
# pnpm add -g repomix
# yarn global add repomix

View File

@@ -0,0 +1,531 @@
"""
Tests for repomix_batch.py
Run with: pytest test_repomix_batch.py -v --cov=repomix_batch --cov-report=term-missing
"""
import os
import sys
import json
import subprocess
from pathlib import Path
from unittest.mock import Mock, patch, mock_open, MagicMock
import pytest
# Add parent directory to path to import the module
sys.path.insert(0, str(Path(__file__).parent.parent))
from repomix_batch import (
RepomixConfig,
EnvLoader,
RepomixBatchProcessor,
load_repositories_from_file,
main
)
class TestRepomixConfig:
"""Test RepomixConfig dataclass."""
def test_default_values(self):
"""Test default configuration values."""
config = RepomixConfig()
assert config.style == "xml"
assert config.output_dir == "repomix-output"
assert config.remove_comments is False
assert config.include_pattern is None
assert config.ignore_pattern is None
assert config.no_security_check is False
assert config.verbose is False
def test_custom_values(self):
"""Test custom configuration values."""
config = RepomixConfig(
style="markdown",
output_dir="custom-output",
remove_comments=True,
include_pattern="src/**",
ignore_pattern="tests/**",
no_security_check=True,
verbose=True
)
assert config.style == "markdown"
assert config.output_dir == "custom-output"
assert config.remove_comments is True
assert config.include_pattern == "src/**"
assert config.ignore_pattern == "tests/**"
assert config.no_security_check is True
assert config.verbose is True
class TestEnvLoader:
"""Test EnvLoader class."""
def test_parse_env_file_basic(self, tmp_path):
"""Test parsing basic .env file."""
env_file = tmp_path / ".env"
env_file.write_text("KEY1=value1\nKEY2=value2\n")
result = EnvLoader._parse_env_file(env_file)
assert result == {"KEY1": "value1", "KEY2": "value2"}
def test_parse_env_file_with_quotes(self, tmp_path):
"""Test parsing .env file with quoted values."""
env_file = tmp_path / ".env"
env_file.write_text('KEY1="value with spaces"\nKEY2=\'single quotes\'\n')
result = EnvLoader._parse_env_file(env_file)
assert result == {"KEY1": "value with spaces", "KEY2": "single quotes"}
def test_parse_env_file_with_comments(self, tmp_path):
"""Test parsing .env file with comments."""
env_file = tmp_path / ".env"
env_file.write_text("# Comment\nKEY1=value1\n\n# Another comment\nKEY2=value2\n")
result = EnvLoader._parse_env_file(env_file)
assert result == {"KEY1": "value1", "KEY2": "value2"}
def test_parse_env_file_with_empty_lines(self, tmp_path):
"""Test parsing .env file with empty lines."""
env_file = tmp_path / ".env"
env_file.write_text("KEY1=value1\n\n\nKEY2=value2\n")
result = EnvLoader._parse_env_file(env_file)
assert result == {"KEY1": "value1", "KEY2": "value2"}
def test_parse_env_file_with_equals_in_value(self, tmp_path):
"""Test parsing .env file with equals sign in value."""
env_file = tmp_path / ".env"
env_file.write_text("KEY1=value=with=equals\n")
result = EnvLoader._parse_env_file(env_file)
assert result == {"KEY1": "value=with=equals"}
def test_parse_env_file_invalid(self, tmp_path):
"""Test parsing invalid .env file."""
env_file = tmp_path / ".env"
env_file.write_text("INVALID LINE WITHOUT EQUALS\n")
result = EnvLoader._parse_env_file(env_file)
assert result == {}
def test_parse_env_file_not_found(self, tmp_path):
"""Test parsing non-existent .env file."""
env_file = tmp_path / "nonexistent.env"
result = EnvLoader._parse_env_file(env_file)
assert result == {}
@patch.dict(os.environ, {"PROCESS_VAR": "from_process"}, clear=True)
def test_load_env_files_process_env_priority(self):
"""Test that process environment has highest priority."""
with patch.object(Path, 'exists', return_value=False):
env_vars = EnvLoader.load_env_files()
assert env_vars.get("PROCESS_VAR") == "from_process"
class TestRepomixBatchProcessor:
"""Test RepomixBatchProcessor class."""
def test_init(self):
"""Test processor initialization."""
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
assert processor.config == config
assert isinstance(processor.env_vars, dict)
@patch("subprocess.run")
def test_check_repomix_installed_success(self, mock_run):
"""Test checking if repomix is installed (success)."""
mock_run.return_value = Mock(returncode=0)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
assert processor.check_repomix_installed() is True
mock_run.assert_called_once()
args = mock_run.call_args
assert args[0][0] == ["repomix", "--version"]
@patch("subprocess.run")
def test_check_repomix_installed_failure(self, mock_run):
"""Test checking if repomix is installed (failure)."""
mock_run.return_value = Mock(returncode=1)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
assert processor.check_repomix_installed() is False
@patch("subprocess.run")
def test_check_repomix_installed_not_found(self, mock_run):
"""Test checking if repomix is not found."""
mock_run.side_effect = FileNotFoundError()
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
assert processor.check_repomix_installed() is False
def test_get_extension(self):
"""Test getting file extension for style."""
assert RepomixBatchProcessor._get_extension("xml") == "xml"
assert RepomixBatchProcessor._get_extension("markdown") == "md"
assert RepomixBatchProcessor._get_extension("json") == "json"
assert RepomixBatchProcessor._get_extension("plain") == "txt"
assert RepomixBatchProcessor._get_extension("unknown") == "xml"
def test_build_command_local(self):
"""Test building command for local repository."""
config = RepomixConfig(style="markdown", remove_comments=True)
processor = RepomixBatchProcessor(config)
output_file = Path("output.md")
cmd = processor._build_command("/path/to/repo", output_file, is_remote=False)
assert cmd[0] == "repomix"
assert "/path/to/repo" in cmd
assert "--style" in cmd
assert "markdown" in cmd
assert "--remove-comments" in cmd
assert "-o" in cmd
def test_build_command_remote(self):
"""Test building command for remote repository."""
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
output_file = Path("output.xml")
cmd = processor._build_command("owner/repo", output_file, is_remote=True)
assert cmd[0] == "npx"
assert "repomix" in cmd
assert "--remote" in cmd
assert "owner/repo" in cmd
def test_build_command_with_patterns(self):
"""Test building command with include/ignore patterns."""
config = RepomixConfig(
include_pattern="src/**/*.ts",
ignore_pattern="tests/**"
)
processor = RepomixBatchProcessor(config)
output_file = Path("output.xml")
cmd = processor._build_command("/path/to/repo", output_file, is_remote=False)
assert "--include" in cmd
assert "src/**/*.ts" in cmd
assert "-i" in cmd
assert "tests/**" in cmd
def test_build_command_verbose(self):
"""Test building command with verbose flag."""
config = RepomixConfig(verbose=True)
processor = RepomixBatchProcessor(config)
output_file = Path("output.xml")
cmd = processor._build_command("/path/to/repo", output_file, is_remote=False)
assert "--verbose" in cmd
def test_build_command_no_security_check(self):
"""Test building command with security check disabled."""
config = RepomixConfig(no_security_check=True)
processor = RepomixBatchProcessor(config)
output_file = Path("output.xml")
cmd = processor._build_command("/path/to/repo", output_file, is_remote=False)
assert "--no-security-check" in cmd
@patch("subprocess.run")
@patch("pathlib.Path.mkdir")
def test_process_repository_success(self, mock_mkdir, mock_run):
"""Test processing repository successfully."""
mock_run.return_value = Mock(returncode=0)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
success, message = processor.process_repository("/path/to/repo")
assert success is True
assert "Successfully processed" in message
mock_mkdir.assert_called_once()
mock_run.assert_called_once()
@patch("subprocess.run")
@patch("pathlib.Path.mkdir")
def test_process_repository_failure(self, mock_mkdir, mock_run):
"""Test processing repository with failure."""
mock_run.return_value = Mock(
returncode=1,
stderr="Error message",
stdout=""
)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
success, message = processor.process_repository("/path/to/repo")
assert success is False
assert "Failed to process" in message
assert "Error message" in message
@patch("subprocess.run")
@patch("pathlib.Path.mkdir")
def test_process_repository_timeout(self, mock_mkdir, mock_run):
"""Test processing repository with timeout."""
mock_run.side_effect = subprocess.TimeoutExpired(cmd=[], timeout=300)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
success, message = processor.process_repository("/path/to/repo")
assert success is False
assert "Timeout" in message
@patch("subprocess.run")
@patch("pathlib.Path.mkdir")
def test_process_repository_exception(self, mock_mkdir, mock_run):
"""Test processing repository with exception."""
mock_run.side_effect = Exception("Unexpected error")
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
success, message = processor.process_repository("/path/to/repo")
assert success is False
assert "Error processing" in message
assert "Unexpected error" in message
@patch("subprocess.run")
@patch("pathlib.Path.mkdir")
def test_process_repository_with_custom_output(self, mock_mkdir, mock_run):
"""Test processing repository with custom output name."""
mock_run.return_value = Mock(returncode=0)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
success, message = processor.process_repository(
"/path/to/repo",
output_name="custom-output.xml"
)
assert success is True
assert "custom-output.xml" in message
@patch("subprocess.run")
@patch("pathlib.Path.mkdir")
def test_process_repository_remote(self, mock_mkdir, mock_run):
"""Test processing remote repository."""
mock_run.return_value = Mock(returncode=0)
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
success, message = processor.process_repository(
"owner/repo",
is_remote=True
)
assert success is True
cmd = mock_run.call_args[0][0]
assert "npx" in cmd
assert "--remote" in cmd
@patch.object(RepomixBatchProcessor, "process_repository")
def test_process_batch_success(self, mock_process):
"""Test processing batch of repositories."""
mock_process.return_value = (True, "Success")
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
repositories = [
{"path": "/repo1"},
{"path": "/repo2", "output": "custom.xml"},
{"path": "owner/repo", "remote": True}
]
results = processor.process_batch(repositories)
assert len(results["success"]) == 3
assert len(results["failed"]) == 0
assert mock_process.call_count == 3
@patch.object(RepomixBatchProcessor, "process_repository")
def test_process_batch_with_failures(self, mock_process):
"""Test processing batch with some failures."""
mock_process.side_effect = [
(True, "Success 1"),
(False, "Failed"),
(True, "Success 2")
]
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
repositories = [
{"path": "/repo1"},
{"path": "/repo2"},
{"path": "/repo3"}
]
results = processor.process_batch(repositories)
assert len(results["success"]) == 2
assert len(results["failed"]) == 1
def test_process_batch_missing_path(self):
"""Test processing batch with missing path."""
config = RepomixConfig()
processor = RepomixBatchProcessor(config)
repositories = [
{"output": "custom.xml"} # Missing 'path'
]
results = processor.process_batch(repositories)
assert len(results["success"]) == 0
assert len(results["failed"]) == 1
assert "Missing 'path'" in results["failed"][0]
class TestLoadRepositoriesFromFile:
"""Test load_repositories_from_file function."""
def test_load_valid_json(self, tmp_path):
"""Test loading valid JSON file."""
json_file = tmp_path / "repos.json"
repos = [
{"path": "/repo1"},
{"path": "owner/repo", "remote": True}
]
json_file.write_text(json.dumps(repos))
result = load_repositories_from_file(str(json_file))
assert result == repos
def test_load_invalid_json(self, tmp_path):
"""Test loading invalid JSON file."""
json_file = tmp_path / "invalid.json"
json_file.write_text("not valid json {")
result = load_repositories_from_file(str(json_file))
assert result == []
def test_load_non_array_json(self, tmp_path):
"""Test loading JSON file with non-array content."""
json_file = tmp_path / "object.json"
json_file.write_text('{"path": "/repo"}')
result = load_repositories_from_file(str(json_file))
assert result == []
def test_load_nonexistent_file(self):
"""Test loading non-existent file."""
result = load_repositories_from_file("/nonexistent/file.json")
assert result == []
class TestMain:
"""Test main function."""
@patch("sys.argv", ["repomix_batch.py", "/repo1", "/repo2"])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=True)
@patch.object(RepomixBatchProcessor, "process_batch")
def test_main_with_repos(self, mock_process_batch, mock_check):
"""Test main function with repository arguments."""
mock_process_batch.return_value = {"success": ["msg1", "msg2"], "failed": []}
result = main()
assert result == 0
mock_check.assert_called_once()
mock_process_batch.assert_called_once()
# Verify repositories passed
call_args = mock_process_batch.call_args[0][0]
assert len(call_args) == 2
assert call_args[0]["path"] == "/repo1"
assert call_args[1]["path"] == "/repo2"
@patch("sys.argv", ["repomix_batch.py", "-f", "repos.json"])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=True)
@patch.object(RepomixBatchProcessor, "process_batch")
@patch("repomix_batch.load_repositories_from_file")
def test_main_with_file(self, mock_load, mock_process_batch, mock_check):
"""Test main function with file argument."""
mock_load.return_value = [{"path": "/repo1"}]
mock_process_batch.return_value = {"success": ["msg1"], "failed": []}
result = main()
assert result == 0
mock_load.assert_called_once_with("repos.json")
mock_process_batch.assert_called_once()
@patch("sys.argv", ["repomix_batch.py"])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=True)
def test_main_no_repos(self, mock_check):
"""Test main function with no repositories."""
result = main()
assert result == 1
@patch("sys.argv", ["repomix_batch.py", "/repo1"])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=False)
def test_main_repomix_not_installed(self, mock_check):
"""Test main function when repomix is not installed."""
result = main()
assert result == 1
@patch("sys.argv", ["repomix_batch.py", "/repo1"])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=True)
@patch.object(RepomixBatchProcessor, "process_batch")
def test_main_with_failures(self, mock_process_batch, mock_check):
"""Test main function with processing failures."""
mock_process_batch.return_value = {
"success": ["msg1"],
"failed": ["error1"]
}
result = main()
assert result == 1
@patch("sys.argv", [
"repomix_batch.py",
"/repo1",
"--style", "markdown",
"--remove-comments",
"--verbose"
])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=True)
@patch.object(RepomixBatchProcessor, "process_batch")
def test_main_with_options(self, mock_process_batch, mock_check):
"""Test main function with various options."""
mock_process_batch.return_value = {"success": ["msg1"], "failed": []}
result = main()
assert result == 0
# Verify config passed to processor
# The processor is created inside main, so we check it was called
mock_process_batch.assert_called_once()
@patch("sys.argv", ["repomix_batch.py", "/repo1", "--remote"])
@patch.object(RepomixBatchProcessor, "check_repomix_installed", return_value=True)
@patch.object(RepomixBatchProcessor, "process_batch")
def test_main_with_remote_flag(self, mock_process_batch, mock_check):
"""Test main function with --remote flag."""
mock_process_batch.return_value = {"success": ["msg1"], "failed": []}
result = main()
assert result == 0
# Verify remote flag is set
call_args = mock_process_batch.call_args[0][0]
assert call_args[0]["remote"] is True