Files
2025-11-30 08:35:59 +08:00

154 lines
4.2 KiB
Markdown

---
name: image-gen
description: Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) with workflow-based prompting
triggers:
- "create image"
- "generate image"
- "make infographic"
- "create infographic"
- "generate diagram"
- "make diagram"
- "design visual"
- "create visual"
allowed-tools: Read, Write, Bash
version: 0.1.0
---
# Image Generation Skill
Generate professional images, infographics, and diagrams using Google's Nano Banana Pro model (gemini-3-pro-image-preview).
## Model Capabilities
**Nano Banana Pro** (released November 20, 2025):
- **Text rendering** - Accurate, legible text in images
- **Google Search grounding** - Real-time data (weather, stocks, etc.)
- **Multi-turn conversation** - Iterative refinement
- **Up to 14 reference images** - For composition and style transfer
- **Resolutions**: 1K, 2K, 4K
- **Aspect ratios**: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9
## Scripts
All scripts use Python via `uv run` with inline dependencies.
### generate.py - Text to Image
```bash
uv run scripts/generate.py "prompt" output.png [aspect_ratio] [size]
```
**Examples:**
```bash
# Basic image
uv run scripts/generate.py "A cozy coffee shop in autumn" coffee.png
# Infographic with specific aspect ratio
uv run scripts/generate.py "Infographic explaining how neural networks work" nn.png 16:9 2K
# 4K professional image
uv run scripts/generate.py "Professional headshot, studio lighting" headshot.png 3:2 4K
```
### edit.py - Image Editing
```bash
uv run scripts/edit.py input.png "edit instructions" output.png
```
**Examples:**
```bash
# Edit existing image
uv run scripts/edit.py photo.png "Change the background to a beach sunset" edited.png
```
### compose.py - Multi-Image Composition
```bash
uv run scripts/compose.py "prompt" output.png --refs image1.png image2.png
```
**Examples:**
```bash
# Combine styles from multiple images
uv run scripts/compose.py "Combine these styles into a logo" logo.png --refs style1.png style2.png
```
## Workflows
Workflows provide structured approaches for specific visual types. Each workflow follows the PAI 6-step editorial process:
1. **Extract narrative** - Understand the complete story/concept
2. **Derive visual concept** - Single metaphor with 2-3 physical objects
3. **Apply aesthetic** - Define style, colors, mood
4. **Construct prompt** - Build detailed generation instructions
5. **Generate** - Execute via script
6. **Validate** - Check against criteria, regenerate if needed
### Available Workflows
- **infographic.md** - Data visualization, statistics, explainers
- **diagram.md** - Technical diagrams, flowcharts, architecture
## Workflow Usage
When generating images, follow the appropriate workflow:
### For Infographics
```markdown
1. What data/concept needs visualization?
2. What's the key insight or takeaway?
3. Aspect ratio: 16:9 (landscape) recommended
4. Include: clear hierarchy, minimal text, supporting icons
5. Generate at 2K minimum for text clarity
```
### For Diagrams
```markdown
1. What system/process is being illustrated?
2. What are the key components and relationships?
3. Style: flat colors, clean lines, minimal detail
4. Generate at 2K for label clarity
```
## Environment Setup
Requires `GEMINI_API_KEY` environment variable. This should be set from Geoffrey's secrets:
```bash
source ~/Library/Mobile\ Documents/com~apple~CloudDocs/Geoffrey/secrets/.env
```
## Best Practices
### Infographics
- Use simple, direct prompts: "Infographic explaining how X works"
- Model auto-includes relevant icons/logos
- 16:9 aspect ratio works best
- Generate at 2K+ for readable text
### General
- Multi-turn refinement: generate, then ask for specific changes
- Reference images improve consistency
- Be specific about style, mood, lighting
- SynthID watermark is automatic (Google provenance)
## Output Location
By default, save images to `/tmp/` or user-specified paths. For persistent storage, use:
```
~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/images/
```
## Limitations
- No photorealistic humans (safety filter)
- No copyrighted characters
- Maximum 14 reference images for composition
- 4K only available with Nano Banana Pro
## Pricing
| Size | Cost per Image |
|------|---------------|
| 1K | Free tier / $0.04 |
| 2K | $0.134 |
| 4K | $0.24 |