154 lines
4.2 KiB
Markdown
154 lines
4.2 KiB
Markdown
---
|
|
name: image-gen
|
|
description: Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) with workflow-based prompting
|
|
triggers:
|
|
- "create image"
|
|
- "generate image"
|
|
- "make infographic"
|
|
- "create infographic"
|
|
- "generate diagram"
|
|
- "make diagram"
|
|
- "design visual"
|
|
- "create visual"
|
|
allowed-tools: Read, Write, Bash
|
|
version: 0.1.0
|
|
---
|
|
|
|
# Image Generation Skill
|
|
|
|
Generate professional images, infographics, and diagrams using Google's Nano Banana Pro model (gemini-3-pro-image-preview).
|
|
|
|
## Model Capabilities
|
|
|
|
**Nano Banana Pro** (released November 20, 2025):
|
|
- **Text rendering** - Accurate, legible text in images
|
|
- **Google Search grounding** - Real-time data (weather, stocks, etc.)
|
|
- **Multi-turn conversation** - Iterative refinement
|
|
- **Up to 14 reference images** - For composition and style transfer
|
|
- **Resolutions**: 1K, 2K, 4K
|
|
- **Aspect ratios**: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9
|
|
|
|
## Scripts
|
|
|
|
All scripts use Python via `uv run` with inline dependencies.
|
|
|
|
### generate.py - Text to Image
|
|
```bash
|
|
uv run scripts/generate.py "prompt" output.png [aspect_ratio] [size]
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Basic image
|
|
uv run scripts/generate.py "A cozy coffee shop in autumn" coffee.png
|
|
|
|
# Infographic with specific aspect ratio
|
|
uv run scripts/generate.py "Infographic explaining how neural networks work" nn.png 16:9 2K
|
|
|
|
# 4K professional image
|
|
uv run scripts/generate.py "Professional headshot, studio lighting" headshot.png 3:2 4K
|
|
```
|
|
|
|
### edit.py - Image Editing
|
|
```bash
|
|
uv run scripts/edit.py input.png "edit instructions" output.png
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Edit existing image
|
|
uv run scripts/edit.py photo.png "Change the background to a beach sunset" edited.png
|
|
```
|
|
|
|
### compose.py - Multi-Image Composition
|
|
```bash
|
|
uv run scripts/compose.py "prompt" output.png --refs image1.png image2.png
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Combine styles from multiple images
|
|
uv run scripts/compose.py "Combine these styles into a logo" logo.png --refs style1.png style2.png
|
|
```
|
|
|
|
## Workflows
|
|
|
|
Workflows provide structured approaches for specific visual types. Each workflow follows the PAI 6-step editorial process:
|
|
|
|
1. **Extract narrative** - Understand the complete story/concept
|
|
2. **Derive visual concept** - Single metaphor with 2-3 physical objects
|
|
3. **Apply aesthetic** - Define style, colors, mood
|
|
4. **Construct prompt** - Build detailed generation instructions
|
|
5. **Generate** - Execute via script
|
|
6. **Validate** - Check against criteria, regenerate if needed
|
|
|
|
### Available Workflows
|
|
|
|
- **infographic.md** - Data visualization, statistics, explainers
|
|
- **diagram.md** - Technical diagrams, flowcharts, architecture
|
|
|
|
## Workflow Usage
|
|
|
|
When generating images, follow the appropriate workflow:
|
|
|
|
### For Infographics
|
|
```markdown
|
|
1. What data/concept needs visualization?
|
|
2. What's the key insight or takeaway?
|
|
3. Aspect ratio: 16:9 (landscape) recommended
|
|
4. Include: clear hierarchy, minimal text, supporting icons
|
|
5. Generate at 2K minimum for text clarity
|
|
```
|
|
|
|
### For Diagrams
|
|
```markdown
|
|
1. What system/process is being illustrated?
|
|
2. What are the key components and relationships?
|
|
3. Style: flat colors, clean lines, minimal detail
|
|
4. Generate at 2K for label clarity
|
|
```
|
|
|
|
## Environment Setup
|
|
|
|
Requires `GEMINI_API_KEY` environment variable. This should be set from Geoffrey's secrets:
|
|
|
|
```bash
|
|
source ~/Library/Mobile\ Documents/com~apple~CloudDocs/Geoffrey/secrets/.env
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### Infographics
|
|
- Use simple, direct prompts: "Infographic explaining how X works"
|
|
- Model auto-includes relevant icons/logos
|
|
- 16:9 aspect ratio works best
|
|
- Generate at 2K+ for readable text
|
|
|
|
### General
|
|
- Multi-turn refinement: generate, then ask for specific changes
|
|
- Reference images improve consistency
|
|
- Be specific about style, mood, lighting
|
|
- SynthID watermark is automatic (Google provenance)
|
|
|
|
## Output Location
|
|
|
|
By default, save images to `/tmp/` or user-specified paths. For persistent storage, use:
|
|
```
|
|
~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/images/
|
|
```
|
|
|
|
## Limitations
|
|
|
|
- No photorealistic humans (safety filter)
|
|
- No copyrighted characters
|
|
- Maximum 14 reference images for composition
|
|
- 4K only available with Nano Banana Pro
|
|
|
|
## Pricing
|
|
|
|
| Size | Cost per Image |
|
|
|------|---------------|
|
|
| 1K | Free tier / $0.04 |
|
|
| 2K | $0.134 |
|
|
| 4K | $0.24 |
|