Initial commit
This commit is contained in:
429
skills/codex-skill/SKILL.md
Normal file
429
skills/codex-skill/SKILL.md
Normal file
@@ -0,0 +1,429 @@
|
||||
---
|
||||
name: codex-skill
|
||||
description: Use when user asks to leverage codex, gpt-5, or gpt-5.1 to implement something (usually implement a plan or feature designed by Claude). Provides non-interactive automation mode for hands-off task execution without approval prompts.
|
||||
---
|
||||
|
||||
# Codex
|
||||
|
||||
You are operating in **codex exec** - a non-interactive automation mode for hands-off task execution.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using this skill, ensure Codex CLI is installed and configured:
|
||||
|
||||
1. **Installation verification**:
|
||||
|
||||
```bash
|
||||
codex --version
|
||||
```
|
||||
|
||||
2. **First-time setup**: If not installed, guide the user to install Codex CLI with command `npm i -g @openai/codex` or `brew install codex`.
|
||||
|
||||
## Core Principles
|
||||
|
||||
### Autonomous Execution
|
||||
|
||||
- Execute tasks from start to finish without seeking approval for each action
|
||||
- Make confident decisions based on best practices and task requirements
|
||||
- Only ask questions if critical information is genuinely missing
|
||||
- Prioritize completing the workflow over explaining every step
|
||||
|
||||
### Output Behavior
|
||||
|
||||
- Stream progress updates as you work
|
||||
- Provide a clear, structured final summary upon completion
|
||||
- Focus on actionable results and metrics over lengthy explanations
|
||||
- Report what was done, not what could have been done
|
||||
|
||||
### Operating Modes
|
||||
|
||||
Codex uses sandbox policies to control what operations are permitted:
|
||||
|
||||
**Read-Only Mode (Default)**
|
||||
|
||||
- Analyze code, search files, read documentation
|
||||
- Provide insights, recommendations, and execution plans
|
||||
- No modifications to the codebase
|
||||
- Safe for exploration and analysis tasks
|
||||
- **This is the default mode when running `codex exec`**
|
||||
|
||||
**Workspace-Write Mode (Recommended for Programming)**
|
||||
|
||||
- Read and write files within the workspace
|
||||
- Implement features, fix bugs, refactor code
|
||||
- Create, modify, and delete files in the workspace
|
||||
- Execute build commands and tests
|
||||
- **Use `--full-auto` or `-s workspace-write` to enable file editing**
|
||||
- **This is the recommended mode for most programming tasks**
|
||||
|
||||
**Danger-Full-Access Mode**
|
||||
|
||||
- All workspace-write capabilities
|
||||
- Network access for fetching dependencies
|
||||
- System-level operations outside workspace
|
||||
- Access to all files on the system
|
||||
- **Use only when explicitly requested and necessary**
|
||||
- Use flag: `-s danger-full-access` or `--sandbox danger-full-access`
|
||||
|
||||
## Codex CLI Commands
|
||||
|
||||
**Note**: The following commands include both documented features from the Codex exec documentation and additional flags available in the CLI (verified via `codex exec --help`).
|
||||
|
||||
### Model Selection
|
||||
|
||||
Specify which model to use with `-m` or `--model` (possible values: gpt-5, gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-max, etc):
|
||||
|
||||
```bash
|
||||
codex exec -m gpt-5.1 "refactor the payment processing module"
|
||||
codex exec -m gpt-5.1-codex "implement the user authentication feature"
|
||||
codex exec -m gpt-5.1-codex-max "analyze the codebase architecture"
|
||||
```
|
||||
|
||||
### Sandbox Modes
|
||||
|
||||
Control execution permissions with `-s` or `--sandbox` (possible values: read-only, workspace-write, danger-full-access):
|
||||
|
||||
#### Read-Only Mode
|
||||
|
||||
```bash
|
||||
codex exec -s read-only "analyze the codebase structure and count lines of code"
|
||||
codex exec --sandbox read-only "review code quality and suggest improvements"
|
||||
```
|
||||
|
||||
Analyze code without making any modifications.
|
||||
|
||||
#### Workspace-Write Mode (Recommended for Programming)
|
||||
|
||||
```bash
|
||||
codex exec -s workspace-write "implement the user authentication feature"
|
||||
codex exec --sandbox workspace-write "fix the bug in login flow"
|
||||
```
|
||||
|
||||
Read and write files within the workspace. **Must be explicitly enabled (not the default). Use this for most programming tasks.**
|
||||
|
||||
#### Danger-Full-Access Mode
|
||||
|
||||
```bash
|
||||
codex exec -s danger-full-access "install dependencies and update the API integration"
|
||||
codex exec --sandbox danger-full-access "setup development environment with npm packages"
|
||||
```
|
||||
|
||||
Network access and system-level operations. Use only when necessary.
|
||||
|
||||
### Full-Auto Mode (Convenience Alias)
|
||||
|
||||
```bash
|
||||
codex exec --full-auto "implement the user authentication feature"
|
||||
```
|
||||
|
||||
**Convenience alias for**: `-s workspace-write` (enables file editing).
|
||||
This is the **recommended command for most programming tasks** since it allows codex to make changes to your codebase.
|
||||
|
||||
### Configuration Profiles
|
||||
|
||||
Use saved profiles from `~/.codex/config.toml` with `-p` or `--profile` (if supported in your version):
|
||||
|
||||
```bash
|
||||
codex exec -p production "deploy the latest changes"
|
||||
codex exec --profile development "run integration tests"
|
||||
```
|
||||
|
||||
Profiles can specify default model, sandbox mode, and other options.
|
||||
*Verify availability with `codex exec --help`*
|
||||
|
||||
### Working Directory
|
||||
|
||||
Specify a different working directory with `-C` or `--cd` (if supported in your version):
|
||||
|
||||
```bash
|
||||
codex exec -C /path/to/project "implement the feature"
|
||||
codex exec --cd ~/projects/myapp "run tests and fix failures"
|
||||
```
|
||||
|
||||
*Verify availability with `codex exec --help`*
|
||||
|
||||
### Additional Writable Directories
|
||||
|
||||
Allow writing to additional directories outside the main workspace with `--add-dir` (if supported in your version):
|
||||
|
||||
```bash
|
||||
codex exec --add-dir /tmp/output --add-dir ~/shared "generate reports in multiple locations"
|
||||
```
|
||||
|
||||
Useful when the task needs to write to specific external directories.
|
||||
*Verify availability with `codex exec --help`*
|
||||
|
||||
### JSON Output
|
||||
|
||||
```bash
|
||||
codex exec --json "run tests and report results"
|
||||
codex exec --json -s read-only "analyze security vulnerabilities"
|
||||
```
|
||||
|
||||
Outputs structured JSON Lines format with reasoning, commands, file changes, and metrics.
|
||||
|
||||
### Save Output to File
|
||||
|
||||
```bash
|
||||
codex exec -o report.txt "generate a security audit report"
|
||||
codex exec -o results.json --json "run performance benchmarks"
|
||||
```
|
||||
|
||||
Writes the final message to a file instead of stdout.
|
||||
|
||||
### Skip Git Repository Check
|
||||
|
||||
```bash
|
||||
codex exec --skip-git-repo-check "analyze this non-git directory"
|
||||
```
|
||||
|
||||
Bypasses the requirement for the directory to be a git repository.
|
||||
|
||||
### Resume Previous Session
|
||||
|
||||
```bash
|
||||
codex exec resume --last "now implement the next feature"
|
||||
```
|
||||
|
||||
Resumes the last session and continues with a new task.
|
||||
|
||||
### Bypass Approvals and Sandbox (If Available)
|
||||
|
||||
**⚠️ WARNING: Verify this flag exists before using ⚠️**
|
||||
|
||||
Some versions of Codex may support `--dangerously-bypass-approvals-and-sandbox`:
|
||||
|
||||
```bash
|
||||
codex exec --dangerously-bypass-approvals-and-sandbox "perform the task"
|
||||
```
|
||||
|
||||
**If this flag is available**:
|
||||
- Skips ALL confirmation prompts
|
||||
- Executes commands WITHOUT sandboxing
|
||||
- Should ONLY be used in externally sandboxed environments (containers, VMs)
|
||||
- **EXTREMELY DANGEROUS - NEVER use on your development machine**
|
||||
|
||||
**Verify availability first**: Run `codex exec --help` to check if this flag is supported in your version.
|
||||
|
||||
### Combined Examples
|
||||
|
||||
Combine multiple flags for complex scenarios:
|
||||
|
||||
```bash
|
||||
# Use specific model with workspace write and JSON output
|
||||
codex exec -m gpt-5.1-codex -s workspace-write --json "implement authentication and output results"
|
||||
|
||||
# Use profile with custom working directory
|
||||
codex exec -p production -C /var/www/app "deploy updates"
|
||||
|
||||
# Full-auto with additional directories and output file
|
||||
codex exec --full-auto --add-dir /tmp/logs -o summary.txt "refactor and log changes"
|
||||
|
||||
# Skip git check with specific model in different directory
|
||||
codex exec -m gpt-5.1-codex -C ~/non-git-project --skip-git-repo-check "analyze and improve code"
|
||||
```
|
||||
|
||||
## Execution Workflow
|
||||
|
||||
1. **Parse the Request**: Understand the complete objective and scope
|
||||
2. **Plan Efficiently**: Create a minimal, focused execution plan
|
||||
3. **Execute Autonomously**: Implement the solution with confidence
|
||||
4. **Verify Results**: Run tests, checks, or validations as appropriate
|
||||
5. **Report Clearly**: Provide a structured summary of accomplishments
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Speed and Efficiency
|
||||
|
||||
- Make reasonable assumptions when minor details are ambiguous
|
||||
- Use parallel operations whenever possible (read multiple files, run multiple commands)
|
||||
- Avoid verbose explanations during execution - focus on doing
|
||||
- Don't seek confirmation for standard operations
|
||||
|
||||
### Scope Management
|
||||
|
||||
- Focus strictly on the requested task
|
||||
- Don't add unrequested features or improvements
|
||||
- Avoid refactoring code that isn't part of the task
|
||||
- Keep solutions minimal and direct
|
||||
|
||||
### Quality Standards
|
||||
|
||||
- Follow existing code patterns and conventions
|
||||
- Run relevant tests after making changes
|
||||
- Verify the solution actually works
|
||||
- Report any errors or limitations encountered
|
||||
|
||||
## When to Interrupt Execution
|
||||
|
||||
Only pause for user input when encountering:
|
||||
|
||||
- **Destructive operations**: Deleting databases, force pushing to main, dropping tables
|
||||
- **Security decisions**: Exposing credentials, changing authentication, opening ports
|
||||
- **Ambiguous requirements**: Multiple valid approaches with significant trade-offs
|
||||
- **Missing critical information**: Cannot proceed without user-specific data
|
||||
|
||||
For all other decisions, proceed autonomously using best judgment.
|
||||
|
||||
## Final Output Format
|
||||
|
||||
Always conclude with a structured summary:
|
||||
|
||||
```
|
||||
✓ Task completed successfully
|
||||
|
||||
Changes made:
|
||||
- [List of files modified/created]
|
||||
- [Key code changes]
|
||||
|
||||
Results:
|
||||
- [Metrics: lines changed, files affected, tests run]
|
||||
- [What now works that didn't before]
|
||||
|
||||
Verification:
|
||||
- [Tests run, checks performed]
|
||||
|
||||
Next steps (if applicable):
|
||||
- [Suggestions for follow-up tasks]
|
||||
```
|
||||
|
||||
## Example Usage Scenarios
|
||||
|
||||
### Code Analysis (Read-Only)
|
||||
|
||||
**User**: "Count the lines of code in this project by language"
|
||||
**Mode**: Read-only
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -s read-only "count the total number of lines of code in this project, broken down by language"
|
||||
```
|
||||
|
||||
**Action**: Search all files, categorize by extension, count lines, report totals
|
||||
|
||||
### Bug Fixing (Workspace-Write)
|
||||
|
||||
**User**: "Use gpt-5 to fix the authentication bug in the login flow"
|
||||
**Mode**: Workspace-write
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -m gpt-5 --full-auto "fix the authentication bug in the login flow"
|
||||
```
|
||||
|
||||
**Action**: Find the bug, implement fix, run tests, commit changes
|
||||
|
||||
### Feature Implementation (Workspace-Write)
|
||||
|
||||
**User**: "Let codex implement dark mode support for the UI"
|
||||
**Mode**: Workspace-write
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec --full-auto "add dark mode support to the UI with theme context and style updates"
|
||||
```
|
||||
|
||||
**Action**: Identify components, add theme context, update styles, test in both modes
|
||||
|
||||
### Batch Operations (Workspace-Write)
|
||||
|
||||
**User**: "Have gpt-5.1 update all imports from old-lib to new-lib"
|
||||
**Mode**: Workspace-write
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -m gpt-5.1 -s workspace-write "update all imports from old-lib to new-lib across the entire codebase"
|
||||
```
|
||||
|
||||
**Action**: Find all imports, perform replacements, verify syntax, run tests
|
||||
|
||||
### Generate Report with JSON Output (Read-Only)
|
||||
|
||||
**User**: "Analyze security vulnerabilities and output as JSON"
|
||||
**Mode**: Read-only
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -s read-only --json "analyze the codebase for security vulnerabilities and provide a detailed report"
|
||||
```
|
||||
|
||||
**Action**: Scan code, identify issues, output structured JSON with findings
|
||||
|
||||
### Install Dependencies and Integrate API (Danger-Full-Access)
|
||||
|
||||
**User**: "Install the new payment SDK and integrate it"
|
||||
**Mode**: Danger-Full-Access
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -s danger-full-access "install the payment SDK dependencies and integrate the API"
|
||||
```
|
||||
|
||||
**Action**: Install packages, update code, add integration points, test functionality
|
||||
|
||||
### Multi-Project Work (Custom Directory)
|
||||
|
||||
**User**: "Use codex to implement the API in the backend project"
|
||||
**Mode**: Workspace-write
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -C ~/projects/backend --full-auto "implement the REST API endpoints for user management"
|
||||
```
|
||||
|
||||
**Action**: Switch to backend directory, implement API endpoints, write tests
|
||||
|
||||
### Refactoring with Logging (Additional Directories)
|
||||
|
||||
**User**: "Refactor the database layer and log changes"
|
||||
**Mode**: Workspace-write
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec --full-auto --add-dir /tmp/refactor-logs "refactor the database layer for better performance and log all changes"
|
||||
```
|
||||
|
||||
**Action**: Refactor code, write logs to external directory, run tests
|
||||
|
||||
### Production Deployment (Using Profile)
|
||||
|
||||
**User**: "Deploy using the production profile"
|
||||
**Mode**: Profile-based
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -p production "deploy the latest changes to production environment"
|
||||
```
|
||||
|
||||
**Action**: Use production config, deploy code, verify deployment
|
||||
|
||||
### Non-Git Project Analysis
|
||||
|
||||
**User**: "Analyze this legacy codebase that's not in git"
|
||||
**Mode**: Read-only
|
||||
**Command**:
|
||||
|
||||
```bash
|
||||
codex exec -s read-only --skip-git-repo-check "analyze the architecture and suggest modernization approach"
|
||||
```
|
||||
|
||||
**Action**: Analyze code structure, provide modernization recommendations
|
||||
|
||||
## Error Handling
|
||||
|
||||
When errors occur:
|
||||
|
||||
1. Attempt automatic recovery if possible
|
||||
2. Log the error clearly in the output
|
||||
3. Continue with remaining tasks if error is non-blocking
|
||||
4. Report all errors in the final summary
|
||||
5. Only stop if the error makes continuation impossible
|
||||
|
||||
## Resumable Execution
|
||||
|
||||
If execution is interrupted:
|
||||
|
||||
- Clearly state what was completed
|
||||
- Provide exact commands/steps to resume
|
||||
- List any state that needs to be preserved
|
||||
- Explain what remains to be done
|
||||
136
skills/nanobanana-skill/SKILL.md
Normal file
136
skills/nanobanana-skill/SKILL.md
Normal file
@@ -0,0 +1,136 @@
|
||||
---
|
||||
name: nanobanana-skill
|
||||
description: Generate or edit images using Google Gemini API via nanobanana. Use when the user asks to create, generate, edit images with nanobanana, or mentions image generation/editing tasks.
|
||||
allowed-tools: Bash
|
||||
---
|
||||
|
||||
# Nanobanana Image Generation Skill
|
||||
|
||||
Generate or edit images using Google Gemini API through the nanobanana tool.
|
||||
|
||||
## Requirements
|
||||
|
||||
1. **GEMINI_API_KEY**: Must be configured in `~/.nanobanana.env` or `export GEMINI_API_KEY=<your-api-key>`
|
||||
2. **Python3 with depedent packages installed**: google-genai, Pillow, python-dotenv. They could be installed via `python3 -m pip install -r ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/requirements.txt` if not installed yet.
|
||||
3. **Executable**: `${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py`
|
||||
|
||||
## Instructions
|
||||
|
||||
### For image generation
|
||||
|
||||
1. Ask the user for:
|
||||
- What they want to create (the prompt)
|
||||
- Desired aspect ratio/size (optional, defaults to 9:16 portrait)
|
||||
- Output filename (optional, auto-generates UUID if not specified)
|
||||
- Model preference (optional, defaults to gemini-3-pro-image-preview)
|
||||
- Resolution (optional, defaults to 1K)
|
||||
|
||||
2. Run the nanobanana script with appropriate parameters:
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py --prompt "description of image" --output "filename.png"
|
||||
```
|
||||
|
||||
3. Show the user the saved image path when complete
|
||||
|
||||
### For image editing
|
||||
|
||||
1. Ask the user for:
|
||||
- Input image file(s) to edit
|
||||
- What changes they want (the prompt)
|
||||
- Output filename (optional)
|
||||
|
||||
2. Run with input images:
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py --prompt "editing instructions" --input image1.png image2.png --output "edited.png"
|
||||
```
|
||||
|
||||
## Available Options
|
||||
|
||||
### Aspect Ratios (--size)
|
||||
|
||||
- `1024x1024` (1:1) - Square
|
||||
- `832x1248` (2:3) - Portrait
|
||||
- `1248x832` (3:2) - Landscape
|
||||
- `864x1184` (3:4) - Portrait
|
||||
- `1184x864` (4:3) - Landscape
|
||||
- `896x1152` (4:5) - Portrait
|
||||
- `1152x896` (5:4) - Landscape
|
||||
- `768x1344` (9:16) - Portrait (default)
|
||||
- `1344x768` (16:9) - Landscape
|
||||
- `1536x672` (21:9) - Ultra-wide
|
||||
|
||||
### Models (--model)
|
||||
|
||||
- `gemini-3-pro-image-preview` (default) - Higher quality
|
||||
- `gemini-2.5-flash-image` - Faster generation
|
||||
|
||||
### Resolution (--resolution)
|
||||
|
||||
- `1K` (default)
|
||||
- `2K`
|
||||
- `4K`
|
||||
|
||||
## Examples
|
||||
|
||||
### Generate a simple image
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py --prompt "A serene mountain landscape at sunset with a lake"
|
||||
```
|
||||
|
||||
### Generate with specific size and output
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
|
||||
--prompt "Modern minimalist logo for a tech startup" \
|
||||
--size 1024x1024 \
|
||||
--output "logo.png"
|
||||
```
|
||||
|
||||
### Generate landscape image with high resolution
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
|
||||
--prompt "Futuristic cityscape with flying cars" \
|
||||
--size 1344x768 \
|
||||
--resolution 2K \
|
||||
--output "cityscape.png"
|
||||
```
|
||||
|
||||
### Edit existing images
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
|
||||
--prompt "Add a rainbow in the sky" \
|
||||
--input photo.png \
|
||||
--output "photo-with-rainbow.png"
|
||||
```
|
||||
|
||||
### Use faster model
|
||||
|
||||
```bash
|
||||
python3 ${CLAUDE_PLUGIN_ROOT}/skills/nanobanana-skill/nanobanana.py \
|
||||
--prompt "Quick sketch of a cat" \
|
||||
--model gemini-2.5-flash-image \
|
||||
--output "cat-sketch.png"
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
If the script fails:
|
||||
|
||||
- Check that `GEMINI_API_KEY` is exported or set in ~/.nanobanana.env
|
||||
- Verify input image files exist and are readable
|
||||
- Ensure the output directory is writable
|
||||
- If no image is generated, try making the prompt more specific about wanting an image
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. Be descriptive in prompts - include style, mood, colors, composition
|
||||
2. For logos/graphics, use square aspect ratio (1024x1024)
|
||||
3. For social media posts, use 9:16 for stories or 1:1 for posts
|
||||
4. For wallpapers, use 16:9 or 21:9
|
||||
5. Start with 1K resolution for testing, upgrade to 2K/4K for final output
|
||||
6. Use gemini-3-pro-image-preview for best quality, gemini-2.5-flash-image for speed
|
||||
147
skills/nanobanana-skill/nanobanana.py
Executable file
147
skills/nanobanana-skill/nanobanana.py
Executable file
@@ -0,0 +1,147 @@
|
||||
#!/usr/bin/env python3
|
||||
# Generate or edit images using Google Gemini API
|
||||
import os
|
||||
import argparse
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
from PIL import Image
|
||||
from io import BytesIO
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv(os.path.expanduser("~") + "/.nanobanana.env")
|
||||
|
||||
# Google API configuration from environment variables
|
||||
api_key = os.getenv("GEMINI_API_KEY") or ""
|
||||
|
||||
if not api_key:
|
||||
raise ValueError(
|
||||
"Missing GEMINI_API_KEY environment variable. Please check your .env file."
|
||||
)
|
||||
|
||||
# Initialize Gemini client
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Aspect ratio to resolution mapping
|
||||
ASPECT_RATIO_MAP = {
|
||||
"1024x1024": "1:1", # 1:1
|
||||
"832x1248": "2:3", # 2:3
|
||||
"1248x832": "3:2", # 3:2
|
||||
"864x1184": "3:4", # 3:4
|
||||
"1184x864": "4:3", # 4:3
|
||||
"896x1152": "4:5", # 4:5
|
||||
"1152x896": "5:4", # 5:4
|
||||
"768x1344": "9:16", # 9:16
|
||||
"1344x768": "16:9", # 16:9
|
||||
"1536x672": "21:9", # 21:9
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
# Parse command-line arguments
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate or edit images using Google Gemini API"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--prompt",
|
||||
type=str,
|
||||
required=True,
|
||||
help="Prompt for image generation or editing",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=str,
|
||||
default=f"nanobanana-{uuid.uuid4()}.png",
|
||||
help="Output image filename (default: nanobanana-<UUID>.png)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input", type=str, nargs="*", help="Input image files for editing (optional)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--size",
|
||||
type=str,
|
||||
default="768x1344",
|
||||
choices=list(ASPECT_RATIO_MAP.keys()),
|
||||
help="Size/aspect ratio of the generated image (default: 768x1344 / 9:16)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
type=str,
|
||||
default="gemini-3-pro-image-preview",
|
||||
choices=["gemini-3-pro-image-preview", "gemini-2.5-flash-image"],
|
||||
help="Model to use for image generation (default: gemini-3-pro-image-preview)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--resolution",
|
||||
type=str,
|
||||
default="1K",
|
||||
choices=["1K", "2K", "4K"],
|
||||
help="Resolution of the generated image (default: 1K)",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Get aspect ratio from size
|
||||
aspect_ratio = ASPECT_RATIO_MAP.get(args.size, "16:9")
|
||||
|
||||
# Build contents list for the API call
|
||||
contents = []
|
||||
|
||||
# Check if input images are provided
|
||||
if args.input and len(args.input) > 0:
|
||||
# Use images.generate_content() with images for editing
|
||||
print(f"Editing images with prompt: {args.prompt}")
|
||||
print(f"Input images: {args.input}")
|
||||
print(f"Aspect ratio: {aspect_ratio} ({args.size})")
|
||||
|
||||
# Add prompt first
|
||||
contents.append(args.prompt)
|
||||
|
||||
# Add all input images
|
||||
for img_path in args.input:
|
||||
image = Image.open(img_path)
|
||||
contents.append(image)
|
||||
else:
|
||||
print(f"Generating image (size: {args.size}) with prompt: {args.prompt}")
|
||||
contents.append(args.prompt)
|
||||
|
||||
# Generate or edit image with config
|
||||
response = client.models.generate_content(
|
||||
model=args.model,
|
||||
contents=contents,
|
||||
config=types.GenerateContentConfig(
|
||||
response_modalities=['TEXT', 'IMAGE'],
|
||||
tools=[types.Tool(google_search=types.GoogleSearch())],
|
||||
image_config=types.ImageConfig(
|
||||
aspect_ratio=aspect_ratio,
|
||||
image_size=args.resolution,
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
if (response.candidates is None
|
||||
or len(response.candidates) == 0
|
||||
or response.candidates[0].content is None
|
||||
or response.candidates[0].content.parts is None):
|
||||
raise ValueError("No data received from the API.")
|
||||
|
||||
# Extract image from response
|
||||
image_saved = False
|
||||
for part in response.candidates[0].content.parts:
|
||||
if part.text is not None:
|
||||
print(f"{part.text}", end="")
|
||||
elif part.inline_data is not None and part.inline_data.data is not None:
|
||||
image = Image.open(BytesIO(part.inline_data.data))
|
||||
|
||||
image.save(args.output)
|
||||
image_saved = True
|
||||
print(f"\n\nImage saved to: {args.output}")
|
||||
|
||||
if not image_saved:
|
||||
print(f"\n\nWarning: No image data found in the API response. This usually means the model returned only text. Please try again with a different prompt to make image generation more clear.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
4
skills/nanobanana-skill/requirements.txt
Normal file
4
skills/nanobanana-skill/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
python-dotenv
|
||||
httpx[socks]
|
||||
google-genai
|
||||
Pillow
|
||||
Reference in New Issue
Block a user