Initial commit
This commit is contained in:
153
skills/image-gen/SKILL.md
Normal file
153
skills/image-gen/SKILL.md
Normal file
@@ -0,0 +1,153 @@
|
||||
---
|
||||
name: image-gen
|
||||
description: Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) with workflow-based prompting
|
||||
triggers:
|
||||
- "create image"
|
||||
- "generate image"
|
||||
- "make infographic"
|
||||
- "create infographic"
|
||||
- "generate diagram"
|
||||
- "make diagram"
|
||||
- "design visual"
|
||||
- "create visual"
|
||||
allowed-tools: Read, Write, Bash
|
||||
version: 0.1.0
|
||||
---
|
||||
|
||||
# Image Generation Skill
|
||||
|
||||
Generate professional images, infographics, and diagrams using Google's Nano Banana Pro model (gemini-3-pro-image-preview).
|
||||
|
||||
## Model Capabilities
|
||||
|
||||
**Nano Banana Pro** (released November 20, 2025):
|
||||
- **Text rendering** - Accurate, legible text in images
|
||||
- **Google Search grounding** - Real-time data (weather, stocks, etc.)
|
||||
- **Multi-turn conversation** - Iterative refinement
|
||||
- **Up to 14 reference images** - For composition and style transfer
|
||||
- **Resolutions**: 1K, 2K, 4K
|
||||
- **Aspect ratios**: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9
|
||||
|
||||
## Scripts
|
||||
|
||||
All scripts use Python via `uv run` with inline dependencies.
|
||||
|
||||
### generate.py - Text to Image
|
||||
```bash
|
||||
uv run scripts/generate.py "prompt" output.png [aspect_ratio] [size]
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
```bash
|
||||
# Basic image
|
||||
uv run scripts/generate.py "A cozy coffee shop in autumn" coffee.png
|
||||
|
||||
# Infographic with specific aspect ratio
|
||||
uv run scripts/generate.py "Infographic explaining how neural networks work" nn.png 16:9 2K
|
||||
|
||||
# 4K professional image
|
||||
uv run scripts/generate.py "Professional headshot, studio lighting" headshot.png 3:2 4K
|
||||
```
|
||||
|
||||
### edit.py - Image Editing
|
||||
```bash
|
||||
uv run scripts/edit.py input.png "edit instructions" output.png
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
```bash
|
||||
# Edit existing image
|
||||
uv run scripts/edit.py photo.png "Change the background to a beach sunset" edited.png
|
||||
```
|
||||
|
||||
### compose.py - Multi-Image Composition
|
||||
```bash
|
||||
uv run scripts/compose.py "prompt" output.png --refs image1.png image2.png
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
```bash
|
||||
# Combine styles from multiple images
|
||||
uv run scripts/compose.py "Combine these styles into a logo" logo.png --refs style1.png style2.png
|
||||
```
|
||||
|
||||
## Workflows
|
||||
|
||||
Workflows provide structured approaches for specific visual types. Each workflow follows the PAI 6-step editorial process:
|
||||
|
||||
1. **Extract narrative** - Understand the complete story/concept
|
||||
2. **Derive visual concept** - Single metaphor with 2-3 physical objects
|
||||
3. **Apply aesthetic** - Define style, colors, mood
|
||||
4. **Construct prompt** - Build detailed generation instructions
|
||||
5. **Generate** - Execute via script
|
||||
6. **Validate** - Check against criteria, regenerate if needed
|
||||
|
||||
### Available Workflows
|
||||
|
||||
- **infographic.md** - Data visualization, statistics, explainers
|
||||
- **diagram.md** - Technical diagrams, flowcharts, architecture
|
||||
|
||||
## Workflow Usage
|
||||
|
||||
When generating images, follow the appropriate workflow:
|
||||
|
||||
### For Infographics
|
||||
```markdown
|
||||
1. What data/concept needs visualization?
|
||||
2. What's the key insight or takeaway?
|
||||
3. Aspect ratio: 16:9 (landscape) recommended
|
||||
4. Include: clear hierarchy, minimal text, supporting icons
|
||||
5. Generate at 2K minimum for text clarity
|
||||
```
|
||||
|
||||
### For Diagrams
|
||||
```markdown
|
||||
1. What system/process is being illustrated?
|
||||
2. What are the key components and relationships?
|
||||
3. Style: flat colors, clean lines, minimal detail
|
||||
4. Generate at 2K for label clarity
|
||||
```
|
||||
|
||||
## Environment Setup
|
||||
|
||||
Requires `GEMINI_API_KEY` environment variable. This should be set from Geoffrey's secrets:
|
||||
|
||||
```bash
|
||||
source ~/Library/Mobile\ Documents/com~apple~CloudDocs/Geoffrey/secrets/.env
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Infographics
|
||||
- Use simple, direct prompts: "Infographic explaining how X works"
|
||||
- Model auto-includes relevant icons/logos
|
||||
- 16:9 aspect ratio works best
|
||||
- Generate at 2K+ for readable text
|
||||
|
||||
### General
|
||||
- Multi-turn refinement: generate, then ask for specific changes
|
||||
- Reference images improve consistency
|
||||
- Be specific about style, mood, lighting
|
||||
- SynthID watermark is automatic (Google provenance)
|
||||
|
||||
## Output Location
|
||||
|
||||
By default, save images to `/tmp/` or user-specified paths. For persistent storage, use:
|
||||
```
|
||||
~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/images/
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- No photorealistic humans (safety filter)
|
||||
- No copyrighted characters
|
||||
- Maximum 14 reference images for composition
|
||||
- 4K only available with Nano Banana Pro
|
||||
|
||||
## Pricing
|
||||
|
||||
| Size | Cost per Image |
|
||||
|------|---------------|
|
||||
| 1K | Free tier / $0.04 |
|
||||
| 2K | $0.134 |
|
||||
| 4K | $0.24 |
|
||||
139
skills/image-gen/scripts/compose.py
Normal file
139
skills/image-gen/scripts/compose.py
Normal file
@@ -0,0 +1,139 @@
|
||||
#!/usr/bin/env python3
|
||||
# /// script
|
||||
# dependencies = ["google-genai", "pillow", "python-dotenv"]
|
||||
# ///
|
||||
"""
|
||||
Compose images using multiple reference images with Google's Nano Banana Pro.
|
||||
|
||||
Usage:
|
||||
uv run compose.py "prompt" output.png --refs image1.png image2.png [...]
|
||||
|
||||
Arguments:
|
||||
prompt - Text description of desired composition
|
||||
output - Output file path (PNG)
|
||||
--refs - Flag followed by 1-14 reference images
|
||||
|
||||
Examples:
|
||||
uv run compose.py "Combine these styles into a cohesive logo" logo.png --refs style1.png style2.png
|
||||
uv run compose.py "Create a collage with these photos" collage.png --refs photo1.png photo2.png photo3.png
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from google import genai
|
||||
from google.genai.types import GenerateContentConfig, Part
|
||||
from PIL import Image
|
||||
|
||||
# Load API key from Geoffrey secrets
|
||||
SECRETS_PATH = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env"
|
||||
if SECRETS_PATH.exists():
|
||||
load_dotenv(SECRETS_PATH)
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 4 or "--refs" not in sys.argv:
|
||||
print("Usage: uv run compose.py \"prompt\" output.png --refs image1.png image2.png [...]")
|
||||
print("\nSupports up to 14 reference images.")
|
||||
sys.exit(1)
|
||||
|
||||
prompt = sys.argv[1]
|
||||
output_path = sys.argv[2]
|
||||
|
||||
# Parse reference images after --refs flag
|
||||
refs_index = sys.argv.index("--refs")
|
||||
ref_paths = sys.argv[refs_index + 1:]
|
||||
|
||||
if not ref_paths:
|
||||
print("Error: No reference images provided after --refs")
|
||||
sys.exit(1)
|
||||
|
||||
if len(ref_paths) > 14:
|
||||
print(f"Error: Maximum 14 reference images supported, got {len(ref_paths)}")
|
||||
sys.exit(1)
|
||||
|
||||
# Validate all reference images exist
|
||||
for path in ref_paths:
|
||||
if not os.path.exists(path):
|
||||
print(f"Error: Reference image not found: {path}")
|
||||
sys.exit(1)
|
||||
|
||||
# Initialize client
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
print("Error: GEMINI_API_KEY environment variable not set")
|
||||
sys.exit(1)
|
||||
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Load reference images
|
||||
print(f"Loading {len(ref_paths)} reference images...")
|
||||
ref_images = []
|
||||
for path in ref_paths:
|
||||
img = Image.open(path)
|
||||
ref_images.append(img)
|
||||
print(f" Loaded: {path}")
|
||||
|
||||
# Configure generation
|
||||
config = GenerateContentConfig(
|
||||
response_modalities=["TEXT", "IMAGE"]
|
||||
)
|
||||
|
||||
print(f"\nComposing image...")
|
||||
print(f" Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
|
||||
|
||||
try:
|
||||
# Build content with all reference images and prompt
|
||||
content_parts = []
|
||||
for img in ref_images:
|
||||
content_parts.append(Part.from_image(img))
|
||||
content_parts.append(prompt)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=content_parts,
|
||||
config=config
|
||||
)
|
||||
|
||||
# Extract and save composed image
|
||||
saved = False
|
||||
text_response = ""
|
||||
|
||||
for part in response.candidates[0].content.parts:
|
||||
if hasattr(part, 'inline_data') and part.inline_data:
|
||||
image = part.as_image()
|
||||
|
||||
output_file = Path(output_path)
|
||||
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
image.save(output_path)
|
||||
saved = True
|
||||
print(f"\nComposed image saved: {output_path}")
|
||||
elif hasattr(part, 'text') and part.text:
|
||||
text_response = part.text
|
||||
|
||||
if text_response:
|
||||
print(f"\nModel response: {text_response}")
|
||||
|
||||
if not saved:
|
||||
print("\nError: No composed image was generated")
|
||||
sys.exit(1)
|
||||
|
||||
result = {
|
||||
"success": True,
|
||||
"output": output_path,
|
||||
"reference_count": len(ref_paths),
|
||||
"text_response": text_response
|
||||
}
|
||||
print(f"\n{json.dumps(result)}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\nError composing image: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
120
skills/image-gen/scripts/edit.py
Normal file
120
skills/image-gen/scripts/edit.py
Normal file
@@ -0,0 +1,120 @@
|
||||
#!/usr/bin/env python3
|
||||
# /// script
|
||||
# dependencies = ["google-genai", "pillow", "python-dotenv"]
|
||||
# ///
|
||||
"""
|
||||
Edit existing images using Google's Nano Banana Pro.
|
||||
|
||||
Usage:
|
||||
uv run edit.py input.png "edit instructions" output.png
|
||||
|
||||
Arguments:
|
||||
input - Input image file path
|
||||
instructions - Text description of edits to make
|
||||
output - Output file path (PNG)
|
||||
|
||||
Examples:
|
||||
uv run edit.py photo.png "Change background to sunset" edited.png
|
||||
uv run edit.py logo.png "Make the text larger and blue" logo_v2.png
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from google import genai
|
||||
from google.genai.types import GenerateContentConfig, Part
|
||||
from PIL import Image
|
||||
|
||||
# Load API key from Geoffrey secrets
|
||||
SECRETS_PATH = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env"
|
||||
if SECRETS_PATH.exists():
|
||||
load_dotenv(SECRETS_PATH)
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 4:
|
||||
print("Usage: uv run edit.py input.png \"edit instructions\" output.png")
|
||||
sys.exit(1)
|
||||
|
||||
input_path = sys.argv[1]
|
||||
instructions = sys.argv[2]
|
||||
output_path = sys.argv[3]
|
||||
|
||||
# Validate input exists
|
||||
if not os.path.exists(input_path):
|
||||
print(f"Error: Input file not found: {input_path}")
|
||||
sys.exit(1)
|
||||
|
||||
# Initialize client
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
print("Error: GEMINI_API_KEY environment variable not set")
|
||||
sys.exit(1)
|
||||
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Load input image
|
||||
print(f"Loading input image: {input_path}")
|
||||
input_image = Image.open(input_path)
|
||||
|
||||
# Configure generation
|
||||
config = GenerateContentConfig(
|
||||
response_modalities=["TEXT", "IMAGE"]
|
||||
)
|
||||
|
||||
print(f"Editing image...")
|
||||
print(f" Instructions: {instructions[:100]}{'...' if len(instructions) > 100 else ''}")
|
||||
|
||||
try:
|
||||
# Create content with image and instructions
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=[
|
||||
Part.from_image(input_image),
|
||||
f"Edit this image: {instructions}"
|
||||
],
|
||||
config=config
|
||||
)
|
||||
|
||||
# Extract and save edited image
|
||||
saved = False
|
||||
text_response = ""
|
||||
|
||||
for part in response.candidates[0].content.parts:
|
||||
if hasattr(part, 'inline_data') and part.inline_data:
|
||||
image = part.as_image()
|
||||
|
||||
output_file = Path(output_path)
|
||||
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
image.save(output_path)
|
||||
saved = True
|
||||
print(f"\nEdited image saved: {output_path}")
|
||||
elif hasattr(part, 'text') and part.text:
|
||||
text_response = part.text
|
||||
|
||||
if text_response:
|
||||
print(f"\nModel response: {text_response}")
|
||||
|
||||
if not saved:
|
||||
print("\nError: No edited image was generated")
|
||||
sys.exit(1)
|
||||
|
||||
result = {
|
||||
"success": True,
|
||||
"input": input_path,
|
||||
"output": output_path,
|
||||
"text_response": text_response
|
||||
}
|
||||
print(f"\n{json.dumps(result)}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\nError editing image: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
135
skills/image-gen/scripts/generate.py
Normal file
135
skills/image-gen/scripts/generate.py
Normal file
@@ -0,0 +1,135 @@
|
||||
#!/usr/bin/env python3
|
||||
# /// script
|
||||
# dependencies = ["google-genai", "pillow", "python-dotenv"]
|
||||
# ///
|
||||
"""
|
||||
Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image).
|
||||
|
||||
Usage:
|
||||
uv run generate.py "prompt" output.png [aspect_ratio] [size]
|
||||
|
||||
Arguments:
|
||||
prompt - Text description of the image to generate
|
||||
output - Output file path (PNG)
|
||||
aspect_ratio - Optional: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9 (default: 1:1)
|
||||
size - Optional: 1K, 2K, 4K (default: 2K)
|
||||
|
||||
Examples:
|
||||
uv run generate.py "A cozy coffee shop" coffee.png
|
||||
uv run generate.py "Infographic about AI" ai.png 16:9 2K
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from google import genai
|
||||
from google.genai.types import GenerateContentConfig
|
||||
|
||||
# Load API key from Geoffrey secrets
|
||||
SECRETS_PATH = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env"
|
||||
if SECRETS_PATH.exists():
|
||||
load_dotenv(SECRETS_PATH)
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: uv run generate.py \"prompt\" output.png [aspect_ratio] [size]")
|
||||
print("\nAspect ratios: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9")
|
||||
print("Sizes: 1K, 2K, 4K")
|
||||
sys.exit(1)
|
||||
|
||||
prompt = sys.argv[1]
|
||||
output_path = sys.argv[2]
|
||||
aspect_ratio = sys.argv[3] if len(sys.argv) > 3 else "1:1"
|
||||
image_size = sys.argv[4] if len(sys.argv) > 4 else "2K"
|
||||
|
||||
# Validate aspect ratio
|
||||
valid_ratios = ["1:1", "2:3", "3:2", "4:3", "16:9", "21:9"]
|
||||
if aspect_ratio not in valid_ratios:
|
||||
print(f"Invalid aspect ratio: {aspect_ratio}")
|
||||
print(f"Valid options: {', '.join(valid_ratios)}")
|
||||
sys.exit(1)
|
||||
|
||||
# Validate size
|
||||
valid_sizes = ["1K", "2K", "4K"]
|
||||
if image_size not in valid_sizes:
|
||||
print(f"Invalid size: {image_size}")
|
||||
print(f"Valid options: {', '.join(valid_sizes)}")
|
||||
sys.exit(1)
|
||||
|
||||
# Initialize client (uses GEMINI_API_KEY env var)
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
print("Error: GEMINI_API_KEY environment variable not set")
|
||||
sys.exit(1)
|
||||
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Configure generation
|
||||
config = GenerateContentConfig(
|
||||
response_modalities=["TEXT", "IMAGE"],
|
||||
image_config={
|
||||
"aspect_ratio": aspect_ratio,
|
||||
"image_size": image_size
|
||||
}
|
||||
)
|
||||
|
||||
print(f"Generating image...")
|
||||
print(f" Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
|
||||
print(f" Aspect ratio: {aspect_ratio}")
|
||||
print(f" Size: {image_size}")
|
||||
|
||||
try:
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=[prompt],
|
||||
config=config
|
||||
)
|
||||
|
||||
# Extract and save image
|
||||
saved = False
|
||||
text_response = ""
|
||||
|
||||
for part in response.candidates[0].content.parts:
|
||||
if hasattr(part, 'inline_data') and part.inline_data:
|
||||
# Save image
|
||||
image = part.as_image()
|
||||
|
||||
# Ensure output directory exists
|
||||
output_file = Path(output_path)
|
||||
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
image.save(output_path)
|
||||
saved = True
|
||||
print(f"\nImage saved: {output_path}")
|
||||
elif hasattr(part, 'text') and part.text:
|
||||
text_response = part.text
|
||||
|
||||
if text_response:
|
||||
print(f"\nModel response: {text_response}")
|
||||
|
||||
if not saved:
|
||||
print("\nError: No image was generated")
|
||||
print("The model may have declined due to content policy.")
|
||||
sys.exit(1)
|
||||
|
||||
# Output JSON for programmatic use
|
||||
result = {
|
||||
"success": True,
|
||||
"output": output_path,
|
||||
"aspect_ratio": aspect_ratio,
|
||||
"size": image_size,
|
||||
"text_response": text_response
|
||||
}
|
||||
print(f"\n{json.dumps(result)}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\nError generating image: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
214
skills/image-gen/workflows/diagram.md
Normal file
214
skills/image-gen/workflows/diagram.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# Diagram Workflow
|
||||
|
||||
Create technical diagrams, flowcharts, architecture diagrams, and process visualizations.
|
||||
|
||||
## When to Use
|
||||
|
||||
- System architecture documentation
|
||||
- Process flows and workflows
|
||||
- Technical explanations
|
||||
- Decision trees
|
||||
- Network topologies
|
||||
- Component relationships
|
||||
|
||||
## 6-Step Process
|
||||
|
||||
### Step 1: Extract Narrative
|
||||
|
||||
**Goal:** Understand the system or process being illustrated.
|
||||
|
||||
Questions to answer:
|
||||
- What system/process is being shown?
|
||||
- What are the key components?
|
||||
- What are the relationships between components?
|
||||
- What is the flow direction (if any)?
|
||||
- What level of detail is needed?
|
||||
|
||||
**Output:** Component list and relationship description.
|
||||
|
||||
### Step 2: Derive Visual Concept
|
||||
|
||||
**Goal:** Choose the right diagram type.
|
||||
|
||||
**Diagram types:**
|
||||
| Type | Use When |
|
||||
|------|----------|
|
||||
| Flowchart | Sequential processes with decisions |
|
||||
| Architecture | System components and connections |
|
||||
| Sequence | Time-ordered interactions |
|
||||
| Network | Interconnected nodes |
|
||||
| Hierarchy | Parent-child relationships |
|
||||
| Venn | Overlapping categories |
|
||||
|
||||
**Output:** Diagram type and layout direction.
|
||||
|
||||
### Step 3: Apply Aesthetic
|
||||
|
||||
**Goal:** Define visual style for clarity.
|
||||
|
||||
Recommended for diagrams:
|
||||
- **Colors:** Limited palette (3-5 colors max)
|
||||
- **Style:** Flat, clean, no gradients
|
||||
- **Lines:** Consistent weight, clear arrows
|
||||
- **Shapes:** Simple geometric (rectangles, circles)
|
||||
- **Labels:** Sans-serif, high contrast
|
||||
|
||||
**Color coding conventions:**
|
||||
- Blue: Primary components
|
||||
- Green: Success/positive flow
|
||||
- Red: Error/warning
|
||||
- Orange: External systems
|
||||
- Gray: Supporting elements
|
||||
|
||||
**Output:** Color scheme and style notes.
|
||||
|
||||
### Step 4: Construct Prompt
|
||||
|
||||
**Goal:** Build the generation prompt.
|
||||
|
||||
**Template:**
|
||||
```
|
||||
Create a [diagram type] showing [system/process].
|
||||
|
||||
Components:
|
||||
- [Component 1]: [description]
|
||||
- [Component 2]: [description]
|
||||
- [Component 3]: [description]
|
||||
|
||||
Relationships:
|
||||
- [Component 1] connects to [Component 2] via [connection type]
|
||||
- [Component 2] sends data to [Component 3]
|
||||
|
||||
Layout: [direction - left-to-right, top-to-bottom, etc.]
|
||||
|
||||
Style: [aesthetic from Step 3]
|
||||
|
||||
Labels to include:
|
||||
- [Label 1]
|
||||
- [Label 2]
|
||||
```
|
||||
|
||||
**Output:** Complete prompt.
|
||||
|
||||
### Step 5: Generate
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
uv run scripts/generate.py "[prompt]" output.png [aspect_ratio] 2K
|
||||
```
|
||||
|
||||
**Aspect ratio by diagram type:**
|
||||
- Flowcharts: 3:2 or 16:9 (horizontal flow)
|
||||
- Architecture: 4:3 or 1:1 (balanced)
|
||||
- Sequence: 2:3 (vertical flow)
|
||||
- Network: 1:1 (balanced)
|
||||
|
||||
**Settings:**
|
||||
- Size: **2K minimum** for label clarity
|
||||
- Model: gemini-3-pro-image-preview
|
||||
|
||||
### Step 6: Validate
|
||||
|
||||
**Validation criteria:**
|
||||
|
||||
| Criterion | Check |
|
||||
|-----------|-------|
|
||||
| Completeness | All components present |
|
||||
| Accuracy | Relationships correctly shown |
|
||||
| Readability | All labels legible |
|
||||
| Flow clarity | Direction is obvious |
|
||||
| Consistency | Shapes/colors used consistently |
|
||||
| Simplicity | No unnecessary elements |
|
||||
|
||||
**If validation fails:**
|
||||
- Identify missing or incorrect elements
|
||||
- Adjust prompt
|
||||
- Regenerate (max 3 iterations)
|
||||
|
||||
## Example Workflow
|
||||
|
||||
**Request:** Create a diagram showing a CI/CD pipeline.
|
||||
|
||||
### Step 1: Extract Narrative
|
||||
"CI/CD pipeline with: code commit, build, test, deploy to staging, deploy to production. Shows automated flow with manual approval gates."
|
||||
|
||||
Components:
|
||||
- Git repository
|
||||
- Build server
|
||||
- Test suite
|
||||
- Staging environment
|
||||
- Production environment
|
||||
- Approval gates
|
||||
|
||||
### Step 2: Visual Concept
|
||||
Flowchart, left-to-right horizontal flow. Linear pipeline with branching for approval.
|
||||
|
||||
### Step 3: Aesthetic
|
||||
- Blue: Pipeline stages
|
||||
- Green: Success indicators
|
||||
- Orange: Approval gates
|
||||
- Gray: Arrows/connectors
|
||||
- Style: Flat rectangles with rounded corners, clear directional arrows
|
||||
|
||||
### Step 4: Prompt
|
||||
```
|
||||
Create a flowchart showing a CI/CD pipeline.
|
||||
|
||||
Components:
|
||||
- Git Repository: Code source
|
||||
- Build Server: Compiles code
|
||||
- Test Suite: Runs automated tests
|
||||
- Staging: Pre-production environment
|
||||
- Production: Live environment
|
||||
- Approval Gate: Manual review step
|
||||
|
||||
Flow:
|
||||
- Git Repository -> Build Server -> Test Suite -> Staging -> Approval Gate -> Production
|
||||
|
||||
Layout: Horizontal left-to-right flow
|
||||
|
||||
Style: Flat design with rounded rectangles. Blue for pipeline stages, green checkmarks for success, orange for approval gate, gray arrows between stages.
|
||||
|
||||
Labels: "Code", "Build", "Test", "Stage", "Approve", "Deploy"
|
||||
```
|
||||
|
||||
### Step 5: Generate
|
||||
```bash
|
||||
uv run scripts/generate.py "Create a flowchart showing a CI/CD pipeline..." cicd.png 16:9 2K
|
||||
```
|
||||
|
||||
### Step 6: Validate
|
||||
- All 6 stages present
|
||||
- Flow direction clear
|
||||
- Labels readable
|
||||
- Approval gate distinguished
|
||||
|
||||
## Tips for Better Results
|
||||
|
||||
1. **Keep it simple** - Fewer components = clearer diagram
|
||||
2. **Be explicit about connections** - State what connects to what
|
||||
3. **Specify layout direction** - Avoid ambiguous layouts
|
||||
4. **Use consistent terminology** - Same names throughout prompt
|
||||
5. **Include all labels** - List exact text for each component
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Missing components | List every component explicitly |
|
||||
| Unclear flow | State direction and connections |
|
||||
| Overlapping elements | Reduce components or use larger aspect ratio |
|
||||
| Inconsistent styling | Be more explicit about shapes/colors |
|
||||
| Wrong diagram type | Reconsider which type fits best |
|
||||
|
||||
## Alternative: Mermaid Diagrams
|
||||
|
||||
For simple diagrams, consider generating Mermaid code instead:
|
||||
- More precise control
|
||||
- Version-controllable
|
||||
- Easily editable
|
||||
|
||||
Use image generation for:
|
||||
- Visual appeal matters
|
||||
- Marketing/presentation use
|
||||
- Complex custom styling
|
||||
176
skills/image-gen/workflows/infographic.md
Normal file
176
skills/image-gen/workflows/infographic.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Infographic Workflow
|
||||
|
||||
Create data visualizations, explainers, and statistical infographics using the 6-step editorial process.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Explaining concepts or processes
|
||||
- Visualizing data or statistics
|
||||
- Creating how-to guides
|
||||
- Summarizing reports or research
|
||||
- Making comparisons
|
||||
|
||||
## 6-Step Process
|
||||
|
||||
### Step 1: Extract Narrative
|
||||
|
||||
**Goal:** Understand the complete story being told.
|
||||
|
||||
Questions to answer:
|
||||
- What is the main concept or data being explained?
|
||||
- What is the key insight or takeaway?
|
||||
- Who is the target audience?
|
||||
- What action should viewers take?
|
||||
|
||||
**Output:** 2-3 sentence summary of the narrative.
|
||||
|
||||
### Step 2: Derive Visual Concept
|
||||
|
||||
**Goal:** Translate narrative into a single visual metaphor.
|
||||
|
||||
Guidelines:
|
||||
- Choose 2-3 physical objects that represent the concept
|
||||
- Prefer familiar, universal metaphors
|
||||
- Avoid abstract shapes without meaning
|
||||
- Consider spatial relationships (hierarchy, flow, comparison)
|
||||
|
||||
**Examples:**
|
||||
- Data growth → Plant/tree growing
|
||||
- Security → Shield/lock
|
||||
- Process → Pipeline/conveyor belt
|
||||
- Comparison → Balance scale
|
||||
|
||||
**Output:** Visual metaphor description.
|
||||
|
||||
### Step 3: Apply Aesthetic
|
||||
|
||||
**Goal:** Define the visual style.
|
||||
|
||||
Recommended for infographics:
|
||||
- **Colors:** Muted palette with 1-2 accent colors
|
||||
- **Style:** Flat design, clean lines
|
||||
- **Typography:** Sans-serif, clear hierarchy
|
||||
- **Layout:** Clear sections, visual flow
|
||||
- **Icons:** Simple, consistent style
|
||||
|
||||
**Output:** Style description (2-3 sentences).
|
||||
|
||||
### Step 4: Construct Prompt
|
||||
|
||||
**Goal:** Build the generation prompt.
|
||||
|
||||
**Template:**
|
||||
```
|
||||
Create an infographic explaining [topic].
|
||||
|
||||
Visual concept: [metaphor from Step 2]
|
||||
|
||||
Key elements:
|
||||
- [Main data point or concept]
|
||||
- [Supporting element 1]
|
||||
- [Supporting element 2]
|
||||
|
||||
Style: [aesthetic from Step 3]
|
||||
|
||||
Layout: [horizontal/vertical], [sections description]
|
||||
|
||||
Text to include:
|
||||
- Title: "[title]"
|
||||
- Key stat: "[number or fact]"
|
||||
- [Other text elements]
|
||||
```
|
||||
|
||||
**Output:** Complete prompt.
|
||||
|
||||
### Step 5: Generate
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
uv run scripts/generate.py "[prompt]" output.png 16:9 2K
|
||||
```
|
||||
|
||||
**Settings for infographics:**
|
||||
- Aspect ratio: **16:9** (landscape) - best for infographics
|
||||
- Size: **2K minimum** - ensures text readability
|
||||
- Model: gemini-3-pro-image-preview (Nano Banana Pro)
|
||||
|
||||
### Step 6: Validate
|
||||
|
||||
**Validation criteria:**
|
||||
|
||||
| Criterion | Check |
|
||||
|-----------|-------|
|
||||
| Text legibility | All text is readable at 100% zoom |
|
||||
| Data accuracy | Numbers/facts are displayed correctly |
|
||||
| Visual hierarchy | Eye naturally flows through content |
|
||||
| Color contrast | Sufficient contrast for accessibility |
|
||||
| Completeness | All key elements are present |
|
||||
| Brand alignment | Matches intended style |
|
||||
|
||||
**If validation fails:**
|
||||
- Identify specific issues
|
||||
- Modify prompt to address them
|
||||
- Regenerate
|
||||
- Maximum 3 iterations
|
||||
|
||||
## Example Workflow
|
||||
|
||||
**Request:** Create an infographic about how neural networks learn.
|
||||
|
||||
### Step 1: Extract Narrative
|
||||
"Neural networks learn by adjusting connection weights through forward propagation and backpropagation. Key insight: the process is iterative and improves over time. Audience: Technical beginners."
|
||||
|
||||
### Step 2: Visual Concept
|
||||
"A network of interconnected nodes with signals flowing through, showing adjustment dials on connections. Like a city's road network with traffic lights being adjusted."
|
||||
|
||||
### Step 3: Aesthetic
|
||||
"Flat design with dark blue background, bright connection lines in cyan and orange. Minimal, clean style with clear node shapes."
|
||||
|
||||
### Step 4: Prompt
|
||||
```
|
||||
Create an infographic explaining how neural networks learn.
|
||||
|
||||
Visual concept: Network of connected nodes with adjustment dials on connections, signals flowing through like traffic.
|
||||
|
||||
Key elements:
|
||||
- Input layer with data entering
|
||||
- Hidden layers with connection weights
|
||||
- Output layer with result
|
||||
- Feedback loop showing backpropagation
|
||||
|
||||
Style: Dark blue background, cyan and orange accents, flat design, clean minimalist style.
|
||||
|
||||
Layout: Horizontal flow from left (input) to right (output), with backpropagation arrow below.
|
||||
|
||||
Text to include:
|
||||
- Title: "How Neural Networks Learn"
|
||||
- Labels: "Input", "Hidden Layers", "Output", "Backpropagation"
|
||||
```
|
||||
|
||||
### Step 5: Generate
|
||||
```bash
|
||||
uv run scripts/generate.py "Create an infographic explaining how neural networks learn..." neural_network.png 16:9 2K
|
||||
```
|
||||
|
||||
### Step 6: Validate
|
||||
- Text readable
|
||||
- Flow is clear left-to-right
|
||||
- Colors have good contrast
|
||||
- All labels present
|
||||
|
||||
## Tips for Better Results
|
||||
|
||||
1. **Simple prompts often work best** - "Infographic explaining X" can produce excellent results
|
||||
2. **Model understands context** - It will add relevant icons/imagery automatically
|
||||
3. **Be specific about text** - Include exact wording for titles and labels
|
||||
4. **Iterate with conversation** - Ask for specific changes after initial generation
|
||||
5. **Use reference images** - For style consistency across multiple infographics
|
||||
|
||||
## Common Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Text too small | Increase size to 4K or reduce text amount |
|
||||
| Cluttered layout | Simplify to fewer elements |
|
||||
| Wrong style | Be more explicit about aesthetic |
|
||||
| Missing elements | List all required elements explicitly |
|
||||
Reference in New Issue
Block a user