Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:26:59 +08:00
commit d61dbe6a6c
39 changed files with 3981 additions and 0 deletions

429
skills/codex-skill/SKILL.md Normal file
View File

@@ -0,0 +1,429 @@
---
name: codex-skill
description: Use when user asks to leverage codex, gpt-5, or gpt-5.1 to implement something (usually implement a plan or feature designed by Claude). Provides non-interactive automation mode for hands-off task execution without approval prompts.
---
# Codex
You are operating in **codex exec** - a non-interactive automation mode for hands-off task execution.
## Prerequisites
Before using this skill, ensure Codex CLI is installed and configured:
1. **Installation verification**:
```bash
codex --version
```
2. **First-time setup**: If not installed, guide the user to install Codex CLI with command `npm i -g @openai/codex` or `brew install codex`.
## Core Principles
### Autonomous Execution
- Execute tasks from start to finish without seeking approval for each action
- Make confident decisions based on best practices and task requirements
- Only ask questions if critical information is genuinely missing
- Prioritize completing the workflow over explaining every step
### Output Behavior
- Stream progress updates as you work
- Provide a clear, structured final summary upon completion
- Focus on actionable results and metrics over lengthy explanations
- Report what was done, not what could have been done
### Operating Modes
Codex uses sandbox policies to control what operations are permitted:
**Read-Only Mode (Default)**
- Analyze code, search files, read documentation
- Provide insights, recommendations, and execution plans
- No modifications to the codebase
- Safe for exploration and analysis tasks
- **This is the default mode when running `codex exec`**
**Workspace-Write Mode (Recommended for Programming)**
- Read and write files within the workspace
- Implement features, fix bugs, refactor code
- Create, modify, and delete files in the workspace
- Execute build commands and tests
- **Use `--full-auto` or `-s workspace-write` to enable file editing**
- **This is the recommended mode for most programming tasks**
**Danger-Full-Access Mode**
- All workspace-write capabilities
- Network access for fetching dependencies
- System-level operations outside workspace
- Access to all files on the system
- **Use only when explicitly requested and necessary**
- Use flag: `-s danger-full-access` or `--sandbox danger-full-access`
## Codex CLI Commands
**Note**: The following commands include both documented features from the Codex exec documentation and additional flags available in the CLI (verified via `codex exec --help`).
### Model Selection
Specify which model to use with `-m` or `--model` (possible values: gpt-5, gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-max, etc):
```bash
codex exec -m gpt-5.1 "refactor the payment processing module"
codex exec -m gpt-5.1-codex "implement the user authentication feature"
codex exec -m gpt-5.1-codex-max "analyze the codebase architecture"
```
### Sandbox Modes
Control execution permissions with `-s` or `--sandbox` (possible values: read-only, workspace-write, danger-full-access):
#### Read-Only Mode
```bash
codex exec -s read-only "analyze the codebase structure and count lines of code"
codex exec --sandbox read-only "review code quality and suggest improvements"
```
Analyze code without making any modifications.
#### Workspace-Write Mode (Recommended for Programming)
```bash
codex exec -s workspace-write "implement the user authentication feature"
codex exec --sandbox workspace-write "fix the bug in login flow"
```
Read and write files within the workspace. **Must be explicitly enabled (not the default). Use this for most programming tasks.**
#### Danger-Full-Access Mode
```bash
codex exec -s danger-full-access "install dependencies and update the API integration"
codex exec --sandbox danger-full-access "setup development environment with npm packages"
```
Network access and system-level operations. Use only when necessary.
### Full-Auto Mode (Convenience Alias)
```bash
codex exec --full-auto "implement the user authentication feature"
```
**Convenience alias for**: `-s workspace-write` (enables file editing).
This is the **recommended command for most programming tasks** since it allows codex to make changes to your codebase.
### Configuration Profiles
Use saved profiles from `~/.codex/config.toml` with `-p` or `--profile` (if supported in your version):
```bash
codex exec -p production "deploy the latest changes"
codex exec --profile development "run integration tests"
```
Profiles can specify default model, sandbox mode, and other options.
*Verify availability with `codex exec --help`*
### Working Directory
Specify a different working directory with `-C` or `--cd` (if supported in your version):
```bash
codex exec -C /path/to/project "implement the feature"
codex exec --cd ~/projects/myapp "run tests and fix failures"
```
*Verify availability with `codex exec --help`*
### Additional Writable Directories
Allow writing to additional directories outside the main workspace with `--add-dir` (if supported in your version):
```bash
codex exec --add-dir /tmp/output --add-dir ~/shared "generate reports in multiple locations"
```
Useful when the task needs to write to specific external directories.
*Verify availability with `codex exec --help`*
### JSON Output
```bash
codex exec --json "run tests and report results"
codex exec --json -s read-only "analyze security vulnerabilities"
```
Outputs structured JSON Lines format with reasoning, commands, file changes, and metrics.
### Save Output to File
```bash
codex exec -o report.txt "generate a security audit report"
codex exec -o results.json --json "run performance benchmarks"
```
Writes the final message to a file instead of stdout.
### Skip Git Repository Check
```bash
codex exec --skip-git-repo-check "analyze this non-git directory"
```
Bypasses the requirement for the directory to be a git repository.
### Resume Previous Session
```bash
codex exec resume --last "now implement the next feature"
```
Resumes the last session and continues with a new task.
### Bypass Approvals and Sandbox (If Available)
**⚠️ WARNING: Verify this flag exists before using ⚠️**
Some versions of Codex may support `--dangerously-bypass-approvals-and-sandbox`:
```bash
codex exec --dangerously-bypass-approvals-and-sandbox "perform the task"
```
**If this flag is available**:
- Skips ALL confirmation prompts
- Executes commands WITHOUT sandboxing
- Should ONLY be used in externally sandboxed environments (containers, VMs)
- **EXTREMELY DANGEROUS - NEVER use on your development machine**
**Verify availability first**: Run `codex exec --help` to check if this flag is supported in your version.
### Combined Examples
Combine multiple flags for complex scenarios:
```bash
# Use specific model with workspace write and JSON output
codex exec -m gpt-5.1-codex -s workspace-write --json "implement authentication and output results"
# Use profile with custom working directory
codex exec -p production -C /var/www/app "deploy updates"
# Full-auto with additional directories and output file
codex exec --full-auto --add-dir /tmp/logs -o summary.txt "refactor and log changes"
# Skip git check with specific model in different directory
codex exec -m gpt-5.1-codex -C ~/non-git-project --skip-git-repo-check "analyze and improve code"
```
## Execution Workflow
1. **Parse the Request**: Understand the complete objective and scope
2. **Plan Efficiently**: Create a minimal, focused execution plan
3. **Execute Autonomously**: Implement the solution with confidence
4. **Verify Results**: Run tests, checks, or validations as appropriate
5. **Report Clearly**: Provide a structured summary of accomplishments
## Best Practices
### Speed and Efficiency
- Make reasonable assumptions when minor details are ambiguous
- Use parallel operations whenever possible (read multiple files, run multiple commands)
- Avoid verbose explanations during execution - focus on doing
- Don't seek confirmation for standard operations
### Scope Management
- Focus strictly on the requested task
- Don't add unrequested features or improvements
- Avoid refactoring code that isn't part of the task
- Keep solutions minimal and direct
### Quality Standards
- Follow existing code patterns and conventions
- Run relevant tests after making changes
- Verify the solution actually works
- Report any errors or limitations encountered
## When to Interrupt Execution
Only pause for user input when encountering:
- **Destructive operations**: Deleting databases, force pushing to main, dropping tables
- **Security decisions**: Exposing credentials, changing authentication, opening ports
- **Ambiguous requirements**: Multiple valid approaches with significant trade-offs
- **Missing critical information**: Cannot proceed without user-specific data
For all other decisions, proceed autonomously using best judgment.
## Final Output Format
Always conclude with a structured summary:
```
✓ Task completed successfully
Changes made:
- [List of files modified/created]
- [Key code changes]
Results:
- [Metrics: lines changed, files affected, tests run]
- [What now works that didn't before]
Verification:
- [Tests run, checks performed]
Next steps (if applicable):
- [Suggestions for follow-up tasks]
```
## Example Usage Scenarios
### Code Analysis (Read-Only)
**User**: "Count the lines of code in this project by language"
**Mode**: Read-only
**Command**:
```bash
codex exec -s read-only "count the total number of lines of code in this project, broken down by language"
```
**Action**: Search all files, categorize by extension, count lines, report totals
### Bug Fixing (Workspace-Write)
**User**: "Use gpt-5 to fix the authentication bug in the login flow"
**Mode**: Workspace-write
**Command**:
```bash
codex exec -m gpt-5 --full-auto "fix the authentication bug in the login flow"
```
**Action**: Find the bug, implement fix, run tests, commit changes
### Feature Implementation (Workspace-Write)
**User**: "Let codex implement dark mode support for the UI"
**Mode**: Workspace-write
**Command**:
```bash
codex exec --full-auto "add dark mode support to the UI with theme context and style updates"
```
**Action**: Identify components, add theme context, update styles, test in both modes
### Batch Operations (Workspace-Write)
**User**: "Have gpt-5.1 update all imports from old-lib to new-lib"
**Mode**: Workspace-write
**Command**:
```bash
codex exec -m gpt-5.1 -s workspace-write "update all imports from old-lib to new-lib across the entire codebase"
```
**Action**: Find all imports, perform replacements, verify syntax, run tests
### Generate Report with JSON Output (Read-Only)
**User**: "Analyze security vulnerabilities and output as JSON"
**Mode**: Read-only
**Command**:
```bash
codex exec -s read-only --json "analyze the codebase for security vulnerabilities and provide a detailed report"
```
**Action**: Scan code, identify issues, output structured JSON with findings
### Install Dependencies and Integrate API (Danger-Full-Access)
**User**: "Install the new payment SDK and integrate it"
**Mode**: Danger-Full-Access
**Command**:
```bash
codex exec -s danger-full-access "install the payment SDK dependencies and integrate the API"
```
**Action**: Install packages, update code, add integration points, test functionality
### Multi-Project Work (Custom Directory)
**User**: "Use codex to implement the API in the backend project"
**Mode**: Workspace-write
**Command**:
```bash
codex exec -C ~/projects/backend --full-auto "implement the REST API endpoints for user management"
```
**Action**: Switch to backend directory, implement API endpoints, write tests
### Refactoring with Logging (Additional Directories)
**User**: "Refactor the database layer and log changes"
**Mode**: Workspace-write
**Command**:
```bash
codex exec --full-auto --add-dir /tmp/refactor-logs "refactor the database layer for better performance and log all changes"
```
**Action**: Refactor code, write logs to external directory, run tests
### Production Deployment (Using Profile)
**User**: "Deploy using the production profile"
**Mode**: Profile-based
**Command**:
```bash
codex exec -p production "deploy the latest changes to production environment"
```
**Action**: Use production config, deploy code, verify deployment
### Non-Git Project Analysis
**User**: "Analyze this legacy codebase that's not in git"
**Mode**: Read-only
**Command**:
```bash
codex exec -s read-only --skip-git-repo-check "analyze the architecture and suggest modernization approach"
```
**Action**: Analyze code structure, provide modernization recommendations
## Error Handling
When errors occur:
1. Attempt automatic recovery if possible
2. Log the error clearly in the output
3. Continue with remaining tasks if error is non-blocking
4. Report all errors in the final summary
5. Only stop if the error makes continuation impossible
## Resumable Execution
If execution is interrupted:
- Clearly state what was completed
- Provide exact commands/steps to resume
- List any state that needs to be preserved
- Explain what remains to be done

View File

@@ -0,0 +1,136 @@
---
name: nanobanana-skill
description: Generate or edit images using Google Gemini API via nanobanana. Use when the user asks to create, generate, edit images with nanobanana, or mentions image generation/editing tasks.
allowed-tools: Bash
---
# Nanobanana Image Generation Skill
Generate or edit images using Google Gemini API through the nanobanana tool.
## Requirements
1. **GEMINI_API_KEY**: Must be configured in `~/.nanobanana.env` or `export GEMINI_API_KEY=<your-api-key>`
2. **Python3 with depedent packages installed**: google-genai, Pillow, python-dotenv. They could be installed via `python3 -m pip install -r ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/requirements.txt` if not installed yet.
3. **Executable**: `${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py`
## Instructions
### For image generation
1. Ask the user for:
- What they want to create (the prompt)
- Desired aspect ratio/size (optional, defaults to 9:16 portrait)
- Output filename (optional, auto-generates UUID if not specified)
- Model preference (optional, defaults to gemini-3-pro-image-preview)
- Resolution (optional, defaults to 1K)
2. Run the nanobanana script with appropriate parameters:
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py --prompt "description of image" --output "filename.png"
```
3. Show the user the saved image path when complete
### For image editing
1. Ask the user for:
- Input image file(s) to edit
- What changes they want (the prompt)
- Output filename (optional)
2. Run with input images:
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py --prompt "editing instructions" --input image1.png image2.png --output "edited.png"
```
## Available Options
### Aspect Ratios (--size)
- `1024x1024` (1:1) - Square
- `832x1248` (2:3) - Portrait
- `1248x832` (3:2) - Landscape
- `864x1184` (3:4) - Portrait
- `1184x864` (4:3) - Landscape
- `896x1152` (4:5) - Portrait
- `1152x896` (5:4) - Landscape
- `768x1344` (9:16) - Portrait (default)
- `1344x768` (16:9) - Landscape
- `1536x672` (21:9) - Ultra-wide
### Models (--model)
- `gemini-3-pro-image-preview` (default) - Higher quality
- `gemini-2.5-flash-image` - Faster generation
### Resolution (--resolution)
- `1K` (default)
- `2K`
- `4K`
## Examples
### Generate a simple image
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py --prompt "A serene mountain landscape at sunset with a lake"
```
### Generate with specific size and output
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
--prompt "Modern minimalist logo for a tech startup" \
--size 1024x1024 \
--output "logo.png"
```
### Generate landscape image with high resolution
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
--prompt "Futuristic cityscape with flying cars" \
--size 1344x768 \
--resolution 2K \
--output "cityscape.png"
```
### Edit existing images
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
--prompt "Add a rainbow in the sky" \
--input photo.png \
--output "photo-with-rainbow.png"
```
### Use faster model
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
--prompt "Quick sketch of a cat" \
--model gemini-2.5-flash-image \
--output "cat-sketch.png"
```
## Error Handling
If the script fails:
- Check that `GEMINI_API_KEY` is exported or set in ~/.nanobanana.env
- Verify input image files exist and are readable
- Ensure the output directory is writable
- If no image is generated, try making the prompt more specific about wanting an image
## Best Practices
1. Be descriptive in prompts - include style, mood, colors, composition
2. For logos/graphics, use square aspect ratio (1024x1024)
3. For social media posts, use 9:16 for stories or 1:1 for posts
4. For wallpapers, use 16:9 or 21:9
5. Start with 1K resolution for testing, upgrade to 2K/4K for final output
6. Use gemini-3-pro-image-preview for best quality, gemini-2.5-flash-image for speed

View File

@@ -0,0 +1,147 @@
#!/usr/bin/env python3
# Generate or edit images using Google Gemini API
import os
import argparse
import uuid
from pathlib import Path
from dotenv import load_dotenv
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
# Load environment variables
load_dotenv(os.path.expanduser("~") + "/.nanobanana.env")
# Google API configuration from environment variables
api_key = os.getenv("GEMINI_API_KEY") or ""
if not api_key:
raise ValueError(
"Missing GEMINI_API_KEY environment variable. Please check your .env file."
)
# Initialize Gemini client
client = genai.Client(api_key=api_key)
# Aspect ratio to resolution mapping
ASPECT_RATIO_MAP = {
"1024x1024": "1:1", # 1:1
"832x1248": "2:3", # 2:3
"1248x832": "3:2", # 3:2
"864x1184": "3:4", # 3:4
"1184x864": "4:3", # 4:3
"896x1152": "4:5", # 4:5
"1152x896": "5:4", # 5:4
"768x1344": "9:16", # 9:16
"1344x768": "16:9", # 16:9
"1536x672": "21:9", # 21:9
}
def main():
# Parse command-line arguments
parser = argparse.ArgumentParser(
description="Generate or edit images using Google Gemini API"
)
parser.add_argument(
"--prompt",
type=str,
required=True,
help="Prompt for image generation or editing",
)
parser.add_argument(
"--output",
type=str,
default=f"nanobanana-{uuid.uuid4()}.png",
help="Output image filename (default: nanobanana-<UUID>.png)",
)
parser.add_argument(
"--input", type=str, nargs="*", help="Input image files for editing (optional)"
)
parser.add_argument(
"--size",
type=str,
default="768x1344",
choices=list(ASPECT_RATIO_MAP.keys()),
help="Size/aspect ratio of the generated image (default: 768x1344 / 9:16)",
)
parser.add_argument(
"--model",
type=str,
default="gemini-3-pro-image-preview",
choices=["gemini-3-pro-image-preview", "gemini-2.5-flash-image"],
help="Model to use for image generation (default: gemini-3-pro-image-preview)",
)
parser.add_argument(
"--resolution",
type=str,
default="1K",
choices=["1K", "2K", "4K"],
help="Resolution of the generated image (default: 1K)",
)
args = parser.parse_args()
# Get aspect ratio from size
aspect_ratio = ASPECT_RATIO_MAP.get(args.size, "16:9")
# Build contents list for the API call
contents = []
# Check if input images are provided
if args.input and len(args.input) > 0:
# Use images.generate_content() with images for editing
print(f"Editing images with prompt: {args.prompt}")
print(f"Input images: {args.input}")
print(f"Aspect ratio: {aspect_ratio} ({args.size})")
# Add prompt first
contents.append(args.prompt)
# Add all input images
for img_path in args.input:
image = Image.open(img_path)
contents.append(image)
else:
print(f"Generating image (size: {args.size}) with prompt: {args.prompt}")
contents.append(args.prompt)
# Generate or edit image with config
response = client.models.generate_content(
model=args.model,
contents=contents,
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
tools=[types.Tool(google_search=types.GoogleSearch())],
image_config=types.ImageConfig(
aspect_ratio=aspect_ratio,
image_size=args.resolution,
),
),
)
if (response.candidates is None
or len(response.candidates) == 0
or response.candidates[0].content is None
or response.candidates[0].content.parts is None):
raise ValueError("No data received from the API.")
# Extract image from response
image_saved = False
for part in response.candidates[0].content.parts:
if part.text is not None:
print(f"{part.text}", end="")
elif part.inline_data is not None and part.inline_data.data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save(args.output)
image_saved = True
print(f"\n\nImage saved to: {args.output}")
if not image_saved:
print(f"\n\nWarning: No image data found in the API response. This usually means the model returned only text. Please try again with a different prompt to make image generation more clear.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,4 @@
python-dotenv
httpx[socks]
google-genai
Pillow