Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:23:22 +08:00
commit befa40959b
10 changed files with 1318 additions and 0 deletions

137
commands/completion.md Normal file
View File

@@ -0,0 +1,137 @@
# Completion Command
Generate shell completion scripts for bash, zsh, or fish shells.
## Usage
```bash
gemini-nano-banana-tool completion SHELL
```
## Arguments
- `SHELL` - Shell type (bash, zsh, or fish)
## Examples
### Bash Completion
```bash
# Generate completion script
gemini-nano-banana-tool completion bash
# Add to ~/.bashrc
eval "$(gemini-nano-banana-tool completion bash)"
```
### Zsh Completion
```bash
# Generate completion script
gemini-nano-banana-tool completion zsh
# Add to ~/.zshrc
eval "$(gemini-nano-banana-tool completion zsh)"
```
### Fish Completion
```bash
# Generate completion script
gemini-nano-banana-tool completion fish
# Save to completions directory
gemini-nano-banana-tool completion fish > \
~/.config/fish/completions/gemini-nano-banana-tool.fish
```
## What Shell Completion Does
Shell completion provides:
- **Command Completion**: Tab to complete command names
- **Argument Completion**: Tab to complete option names
- **Value Completion**: Tab to complete option values (file paths, choices)
- **Help Integration**: Shows available options while typing
## Installation
### Bash
Add to `~/.bashrc`:
```bash
eval "$(gemini-nano-banana-tool completion bash)"
```
Then reload:
```bash
source ~/.bashrc
```
### Zsh
Add to `~/.zshrc`:
```bash
eval "$(gemini-nano-banana-tool completion zsh)"
```
Then reload:
```bash
source ~/.zshrc
```
### Fish
Save to completions directory:
```bash
gemini-nano-banana-tool completion fish > \
~/.config/fish/completions/gemini-nano-banana-tool.fish
```
Then reload (automatic in most cases):
```bash
exec fish
```
## Verification
Test completion by typing:
```bash
gemini-nano-banana-tool <TAB>
```
You should see available commands:
- `completion`
- `generate`
- `generate-conversation`
- `list-aspect-ratios`
- `list-models`
- `promptgen`
## Troubleshooting
If completion doesn't work:
1. **Check Installation**: Verify completion script is loaded
2. **Reload Shell**: Close and reopen terminal or source config file
3. **Check Permissions**: Ensure completion file is readable
4. **Check Path**: Verify `gemini-nano-banana-tool` is in PATH
## Pattern
This follows the industry-standard pattern used by:
- `kubectl completion`
- `helm completion`
- `docker completion`
- `gh completion`
## Benefits
- **Faster Typing**: Tab completion reduces typing
- **Discover Options**: See available options without `--help`
- **Reduce Errors**: Autocomplete prevents typos
- **Better UX**: Professional CLI experience

View File

@@ -0,0 +1,182 @@
# Generate Conversation Command
Multi-turn image generation with progressive refinement. Each turn builds on previous context, allowing iterative improvements without starting over.
## Usage
```bash
gemini-nano-banana-tool generate-conversation PROMPT -o OUTPUT [OPTIONS]
```
## Arguments
- `PROMPT` - Text prompt for this turn (required)
## Required Options
- `-o, --output PATH` - Output image file path (required)
## Conversation Options
- `-f, --file PATH` - Conversation file (creates new if doesn't exist)
- `-a, --aspect-ratio TEXT` - Aspect ratio (default: 1:1, only for new conversations)
- `-m, --model TEXT` - Gemini model (default: gemini-2.5-flash-image, only for new)
## Authentication Options
- `--api-key TEXT` - Override API key from environment
- `--use-vertex` - Use Vertex AI instead of Developer API
- `--project TEXT` - Google Cloud project (for Vertex AI)
- `--location TEXT` - Google Cloud location (for Vertex AI)
## Other Options
- `-v, --verbose` - Multi-level verbosity (-v INFO, -vv DEBUG, -vvv TRACE)
## How It Works
1. **First Turn**: Create initial image from prompt, save conversation state
2. **Subsequent Turns**: Previous output automatically becomes reference image
3. **Persistence**: All turns, prompts, and metadata saved to JSON file
4. **Resume**: Load conversation file to continue refinement
## Examples
### Basic Multi-Turn Workflow
```bash
# Turn 1: Initial generation
gemini-nano-banana-tool generate-conversation "A sunset over mountains" \
-o sunset1.png --file conversation.json
# Turn 2: Refinement (loads conversation automatically)
gemini-nano-banana-tool generate-conversation "Make the sky more orange" \
-o sunset2.png --file conversation.json
# Turn 3: Further refinement
gemini-nano-banana-tool generate-conversation "Add a lake in the foreground" \
-o sunset3.png --file conversation.json
```
### Interior Design Example
```bash
# Turn 1: Initial room
gemini-nano-banana-tool generate-conversation \
"A modern minimalist living room with large windows" \
-o room-v1.png --file interior.json -a 16:9
# Turn 2: Add furniture
gemini-nano-banana-tool generate-conversation \
"Add a gray sofa and wooden coffee table" \
-o room-v2.png --file interior.json
# Turn 3: Adjust lighting
gemini-nano-banana-tool generate-conversation \
"Make the lighting warmer and add floor lamp" \
-o room-v3.png --file interior.json
# Turn 4: Final touches
gemini-nano-banana-tool generate-conversation \
"Add plants and artwork on the walls" \
-o room-final.png --file interior.json
```
### Product Photography Example
```bash
# Turn 1: Initial product shot
gemini-nano-banana-tool generate-conversation \
"Professional product photo of wireless headphones" \
-o headphones-v1.png --file product.json -a 1:1
# Turn 2: Adjust angle
gemini-nano-banana-tool generate-conversation \
"Rotate to show the left side" \
-o headphones-v2.png --file product.json
# Turn 3: Change background
gemini-nano-banana-tool generate-conversation \
"Change background to dark gradient" \
-o headphones-v3.png --file product.json
```
## Conversation File Format
The conversation file stores complete history in JSON:
```json
{
"conversation_id": "20251120_181305",
"model": "gemini-2.5-flash-image",
"aspect_ratio": "16:9",
"turns": [
{
"prompt": "A sunset over mountains",
"output_path": "/path/to/sunset1.png",
"reference_images": [],
"metadata": {
"token_count": 1295,
"resolution": "1344x768",
"finish_reason": "STOP"
},
"timestamp": "2025-11-20T18:13:11.428020"
},
{
"prompt": "Make the sky more orange",
"output_path": "/path/to/sunset2.png",
"reference_images": ["/path/to/sunset1.png"],
"metadata": {
"token_count": 1554,
"resolution": "1344x768",
"finish_reason": "STOP"
},
"timestamp": "2025-11-20T18:13:27.318416"
}
],
"created_at": "2025-11-20T18:13:05.751502",
"updated_at": "2025-11-20T18:13:27.318430"
}
```
## Use Cases
- **Product Photography**: Iteratively adjust lighting, angles, styling
- **Character Design**: Refine poses, clothing, expressions progressively
- **Interior Design**: Build rooms by adding furniture and decor step by step
- **Marketing Materials**: Test variations while maintaining consistency
- **Concept Art**: Explore different iterations of a design
- **Fashion E-commerce**: Try products on models or in different settings
## Important Notes
- **Model & Aspect Ratio**: Only set when creating new conversation (locked for subsequent turns)
- **Reference Images**: Previous output automatically used (no manual `-i` needed)
- **Resume Anytime**: Load conversation file to continue from any turn
- **No File Flag**: Can use without `--file` but conversation won't be saved
## Why Use Conversation Mode?
- **Progressive Refinement**: Iteratively improve without losing context
- **Experiment Safely**: Try variations while maintaining consistency
- **Context Awareness**: Each turn references previous outputs automatically
- **Evolution Tracking**: Complete history of prompts and changes
- **Resume Anytime**: Continue conversations across sessions
## Cost Information
Each turn costs the same as a single generation:
- **Flash Model**: ~$0.039 per turn
- **Pro Model 1K/2K**: ~$0.134 per turn
- **Pro Model 4K**: ~$0.24 per turn
Cost is tracked per turn in conversation file metadata.
## Authentication
Set environment variable:
```bash
export GEMINI_API_KEY='your-api-key'
```
Get API key: https://aistudio.google.com/app/apikey

194
commands/generate.md Normal file
View File

@@ -0,0 +1,194 @@
# Generate Command
Generate images from text prompts with optional reference images using Google Gemini models.
**Alias:** `generate-image` (can be used interchangeably with `generate`)
## Usage
```bash
gemini-nano-banana-tool generate [PROMPT] -o OUTPUT [OPTIONS]
gemini-nano-banana-tool generate-image [PROMPT] -o OUTPUT [OPTIONS] # Alias
```
## Arguments
- `PROMPT` - Text prompt (positional, mutually exclusive with `-f` and `-s`)
## Required Options
- `-o, --output PATH` - Output image file path (required)
## Prompt Input Options (mutually exclusive)
- `PROMPT` - Positional argument (default)
- `-f, --prompt-file PATH` - Read prompt from file
- `-s, --stdin` - Read prompt from stdin (for piping)
## Image Options
- `-i, --image PATH` - Reference image (can be used up to 3 times for Flash, 14 for Pro)
- `-a, --aspect-ratio TEXT` - Aspect ratio (default: 1:1)
- `-m, --model TEXT` - Gemini model (default: gemini-2.5-flash-image)
- `-r, --resolution TEXT` - Resolution quality for Pro model (1K/2K/4K)
## Authentication Options
- `--api-key TEXT` - Override API key from environment
- `--use-vertex` - Use Vertex AI instead of Developer API
- `--project TEXT` - Google Cloud project (for Vertex AI)
- `--location TEXT` - Google Cloud location (for Vertex AI)
## Other Options
- `-v, --verbose` - Multi-level verbosity (-v INFO, -vv DEBUG, -vvv TRACE)
## Examples
### Basic Text-to-Image
```bash
# Simple generation (both commands work)
gemini-nano-banana-tool generate "A cat wearing a wizard hat" -o cat.png
gemini-nano-banana-tool generate-image "A cat wearing a wizard hat" -o cat.png
# With aspect ratio
gemini-nano-banana-tool generate "Mountain landscape" -o landscape.png -a 16:9
```
### Prompt from File or Stdin
```bash
# From file
gemini-nano-banana-tool generate -f prompt.txt -o output.png
# From stdin (piping)
echo "Beautiful sunset" | gemini-nano-banana-tool generate -o sunset.png -s
# With promptgen
gemini-nano-banana-tool promptgen "wizard cat" | \
gemini-nano-banana-tool generate -o cat.png -s -a 16:9
```
### Image Editing with Reference Images
```bash
# Single reference image
gemini-nano-banana-tool generate "Add a birthday hat" -o edited.png \
-i original.jpg
# Multiple reference images
gemini-nano-banana-tool generate "Put the dress on the model in garden" \
-o fashion.png -i dress.jpg -i model.jpg
# Up to 3 references (Flash) or 14 (Pro)
gemini-nano-banana-tool generate "Combine these styles" -o result.png \
-i ref1.jpg -i ref2.jpg -i ref3.jpg
```
### Different Aspect Ratios
```bash
# Square (Instagram post)
gemini-nano-banana-tool generate "Modern design" -o square.png -a 1:1
# Widescreen (YouTube thumbnail)
gemini-nano-banana-tool generate "Epic scene" -o wide.png -a 16:9
# Vertical (Instagram story)
gemini-nano-banana-tool generate "Portrait" -o vertical.png -a 9:16
# Cinematic (ultra-wide)
gemini-nano-banana-tool generate "Sci-fi panorama" -o cinema.png -a 21:9
```
### Model Selection and Resolution
```bash
# Default Flash model (fast, cost-effective)
gemini-nano-banana-tool generate "Your prompt" -o output.png
# Pro model (higher quality)
gemini-nano-banana-tool generate "Your prompt" -o output.png \
-m gemini-3-pro-image-preview
# Pro with 4K resolution (maximum quality)
gemini-nano-banana-tool generate "Your prompt" -o output.png \
-m gemini-3-pro-image-preview -r 4K
```
### Verbosity Levels
```bash
# Normal (warnings only)
gemini-nano-banana-tool generate "test" -o output.png
# Verbose (show operations)
gemini-nano-banana-tool generate "test" -o output.png -v
# Debug (detailed info)
gemini-nano-banana-tool generate "test" -o output.png -vv
# Trace (full HTTP logs)
gemini-nano-banana-tool generate "test" -o output.png -vvv
```
## Output Format
Returns JSON with generation details:
```json
{
"output_path": "output.png",
"model": "gemini-2.5-flash-image",
"aspect_ratio": "16:9",
"resolution": "1344x768",
"resolution_quality": "1K",
"reference_image_count": 0,
"token_count": 1295,
"estimated_cost_usd": 0.0389,
"metadata": {
"finish_reason": "STOP",
"safety_ratings": null
}
}
```
## Cost Information
Automatic cost tracking based on token usage:
- **Flash Model**: ~$0.039 per image (1,290 tokens × $0.00003)
- **Pro Model 1K/2K**: ~$0.134 per image (1,120 tokens × $0.00012)
- **Pro Model 4K**: ~$0.24 per image (2,000 tokens × $0.00012)
Cost is always included in output JSON as `estimated_cost_usd`.
## Supported Aspect Ratios
- `1:1` - Square (1024×1024)
- `16:9` - Widescreen (1344×768)
- `9:16` - Vertical (768×1344)
- `4:3` - Traditional (1184×864)
- `3:4` - Portrait (864×1184)
- `3:2` - DSLR (1248×832)
- `2:3` - Portrait photo (832×1248)
- `21:9` - Cinematic (1536×672)
- `4:5` - Instagram portrait (896×1152)
- `5:4` - Medium format (1152×896)
## Authentication
Set environment variable:
```bash
export GEMINI_API_KEY='your-api-key'
```
Get API key: https://aistudio.google.com/app/apikey
For Vertex AI:
```bash
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT='your-project-id'
export GOOGLE_CLOUD_LOCATION='us-central1'
```

View File

@@ -0,0 +1,94 @@
# List Aspect Ratios Command
List supported aspect ratios with resolutions and use cases.
## Usage
```bash
gemini-nano-banana-tool list-aspect-ratios
```
## Output
```
Available Aspect Ratios:
1:1 (1024x1024) - Square (Instagram post, social media)
16:9 (1344x768) - Widescreen (YouTube thumbnail, desktop)
9:16 (768x1344) - Vertical (Instagram story, TikTok, mobile)
4:3 (1184x864) - Traditional (classic photography)
3:4 (864x1184) - Portrait orientation
3:2 (1248x832) - DSLR photography
2:3 (832x1248) - Portrait photography
21:9 (1536x672) - Cinematic (ultra-wide)
4:5 (896x1152) - Instagram portrait
5:4 (1152x896) - Medium format photography
```
## Usage with Generate Command
```bash
# Square (Instagram post)
gemini-nano-banana-tool generate "Modern design" -o square.png -a 1:1
# Widescreen (YouTube thumbnail)
gemini-nano-banana-tool generate "Epic scene" -o wide.png -a 16:9
# Vertical (Instagram story)
gemini-nano-banana-tool generate "Portrait" -o vertical.png -a 9:16
# Cinematic (ultra-wide)
gemini-nano-banana-tool generate "Sci-fi panorama" -o cinema.png -a 21:9
```
## Common Platform Aspect Ratios
### Social Media
- **Instagram Post**: 1:1 (square)
- **Instagram Story**: 9:16 (vertical)
- **Instagram Portrait**: 4:5
- **Twitter Post**: 16:9 or 1:1
- **Facebook Post**: 1:1 or 16:9
- **LinkedIn Post**: 1:1 or 16:9
### Video Platforms
- **YouTube Thumbnail**: 16:9
- **YouTube Banner**: 16:9 (wide)
- **TikTok**: 9:16 (vertical)
- **YouTube Shorts**: 9:16 (vertical)
### Photography
- **DSLR Standard**: 3:2
- **Medium Format**: 5:4
- **Classic Film**: 4:3
- **Portrait**: 2:3 or 3:4
### Displays
- **Desktop/Laptop**: 16:9
- **Ultrawide Monitor**: 21:9
- **Mobile Portrait**: 9:16
- **Tablet**: 4:3
## Choosing an Aspect Ratio
**For social media content:**
- Use 1:1 for maximum compatibility
- Use 9:16 for stories and vertical video
- Use 4:5 for Instagram portrait posts
**For professional photography:**
- Use 3:2 for DSLR standard
- Use 5:4 for medium format look
- Use 2:3 for traditional portrait
**For video/cinema:**
- Use 16:9 for standard widescreen
- Use 21:9 for cinematic ultra-wide
- Use 9:16 for vertical mobile video
**For presentations/displays:**
- Use 16:9 for modern displays
- Use 4:3 for traditional presentations

78
commands/list-models.md Normal file
View File

@@ -0,0 +1,78 @@
# List Models Command
List available Gemini image generation models with descriptions.
## Usage
```bash
gemini-nano-banana-tool list-models
```
## Output
```
Available Gemini Image Generation Models:
• gemini-2.5-flash-image (default) - Fast, high-quality image generation
• gemini-3-pro-image-preview - Advanced model with higher quality and more features
```
## Model Details
### Flash Model (gemini-2.5-flash-image)
**Default Model**
- **Speed**: Fast (seconds per image)
- **Quality**: High quality
- **Resolution**: Fixed ~1024p
- **Cost**: $0.00003 per token (~$0.039 per image)
- **Reference Images**: Up to 3
- **Best For**: Quick iterations, cost-effective production, high-volume generation
### Pro Model (gemini-3-pro-image-preview)
**Advanced Model**
- **Speed**: Slower (better quality)
- **Quality**: Maximum quality
- **Resolution**: Variable (1K/2K/4K)
- **Cost**: $0.00012 per token (~$0.134-0.24 per image)
- **Reference Images**: Up to 14
- **Best For**: Professional assets, high-quality requirements, complex scenes
## Usage with Generate Command
```bash
# Default Flash model
gemini-nano-banana-tool generate "Your prompt" -o output.png
# Pro model
gemini-nano-banana-tool generate "Your prompt" -o output.png \
-m gemini-3-pro-image-preview
# Pro model with 4K resolution
gemini-nano-banana-tool generate "Your prompt" -o output.png \
-m gemini-3-pro-image-preview -r 4K
```
## Cost Comparison
| Model | Typical Cost | Speed | Quality | Resolution |
|-------|-------------|-------|---------|------------|
| Flash | $0.039 | Fast | High | Fixed ~1024p |
| Pro 1K/2K | $0.134 | Medium | Higher | 1K-2K |
| Pro 4K | $0.24 | Slower | Maximum | Up to 4K |
## Choosing a Model
**Use Flash when:**
- Prototyping and testing
- High-volume generation
- Cost is a concern
- Speed matters
**Use Pro when:**
- Final production images
- Complex scenes with fine details
- Higher resolution needed
- Professional/commercial work

111
commands/promptgen.md Normal file
View File

@@ -0,0 +1,111 @@
# Promptgen Command
Transform simple descriptions into detailed, optimized image generation prompts using Gemini 2.0 Flash AI.
## Usage
```bash
gemini-nano-banana-tool promptgen DESCRIPTION [OPTIONS]
```
## Arguments
- `DESCRIPTION` - Simple description to enhance (e.g., "wizard cat", "cyberpunk city")
## Options
- `-t, --template TEXT` - Apply specialized template (photography, character, scene, food, abstract, logo)
- `-s, --style TEXT` - Style hint (photorealistic, artistic, minimalist, etc.)
- `-o, --output PATH` - Save prompt to file instead of stdout
- `--json` - Output in JSON format
- `-v, --verbose` - Show analysis and reasoning
- `--list-templates` - Show all available templates
## Examples
### Basic Usage
```bash
# Generate detailed prompt from simple description
gemini-nano-banana-tool promptgen "wizard cat"
# With template for better results
gemini-nano-banana-tool promptgen "wizard cat" --template character
# Save to file for reuse
gemini-nano-banana-tool promptgen "sunset landscape" -o prompt.txt
```
### Pipeline: Generate Prompt → Create Image
```bash
# Single pipeline: description → optimized prompt → image
gemini-nano-banana-tool promptgen "wizard cat in magical library" | \
gemini-nano-banana-tool generate --stdin -o wizard-cat.png -a 16:9
```
### With Templates
```bash
# Photography template - technical camera details
gemini-nano-banana-tool promptgen "mountain landscape" --template photography
# Food template - plating and lighting
gemini-nano-banana-tool promptgen "pasta dish" --template food
# Scene template - foreground/midground/background
gemini-nano-banana-tool promptgen "cyberpunk city" --template scene
```
### JSON Output (for automation)
```bash
# Get structured output
gemini-nano-banana-tool promptgen "sunset" --json
# Output format:
# {
# "original": "sunset",
# "prompt": "A breathtaking golden hour sunset...",
# "template": null,
# "style": null,
# "tokens_used": 156
# }
```
## Available Templates
- `photography` - Professional photography with technical details (aperture, focal length, lighting)
- `character` - Character design with pose, attire, expression
- `scene` - Scene composition with foreground/midground/background layers
- `food` - Food photography with plating, garnish, lighting
- `abstract` - Abstract art with shapes, colors, patterns
- `logo` - Logo design with typography, symbolism, brand identity
## Why Use Promptgen?
Creating effective image generation prompts requires knowledge of:
- Photography and composition terminology
- Lighting and technical details
- Artistic styles and techniques
- Color theory and palettes
Promptgen automates this expertise using AI, transforming simple ideas into detailed, effective prompts.
## Output Modes
- **Plain text** (default) - Ready for piping to generate command
- **JSON** (`--json`) - Structured output for scripts and automation
- **Verbose** (`-v`) - Shows AI analysis and reasoning process
## Cost
Promptgen uses Gemini 2.0 Flash for prompt generation:
- Cost: ~$0.001-0.003 per prompt optimization
- Much cheaper than trial-and-error generation
## Authentication
Requires `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable.
Get your API key: https://aistudio.google.com/app/apikey