Initial commit

2025-11-30 08:35:59 +08:00
commit 90883a4d25
287 changed files with 75058 additions and 0 deletions
--- a/skills/image-gen/SKILL.md
+++ b/skills/image-gen/SKILL.md
@@ -0,0 +1,153 @@
+---
+name: image-gen
+description: Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) with workflow-based prompting
+triggers:
+  - "create image"
+  - "generate image"
+  - "make infographic"
+  - "create infographic"
+  - "generate diagram"
+  - "make diagram"
+  - "design visual"
+  - "create visual"
+allowed-tools: Read, Write, Bash
+version: 0.1.0
+---
+
+# Image Generation Skill
+
+Generate professional images, infographics, and diagrams using Google's Nano Banana Pro model (gemini-3-pro-image-preview).
+
+## Model Capabilities
+
+**Nano Banana Pro** (released November 20, 2025):
+- **Text rendering** - Accurate, legible text in images
+- **Google Search grounding** - Real-time data (weather, stocks, etc.)
+- **Multi-turn conversation** - Iterative refinement
+- **Up to 14 reference images** - For composition and style transfer
+- **Resolutions**: 1K, 2K, 4K
+- **Aspect ratios**: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9
+
+## Scripts
+
+All scripts use Python via `uv run` with inline dependencies.
+
+### generate.py - Text to Image
+```bash
+uv run scripts/generate.py "prompt" output.png [aspect_ratio] [size]
+```
+
+**Examples:**
+```bash
+# Basic image
+uv run scripts/generate.py "A cozy coffee shop in autumn" coffee.png
+
+# Infographic with specific aspect ratio
+uv run scripts/generate.py "Infographic explaining how neural networks work" nn.png 16:9 2K
+
+# 4K professional image
+uv run scripts/generate.py "Professional headshot, studio lighting" headshot.png 3:2 4K
+```
+
+### edit.py - Image Editing
+```bash
+uv run scripts/edit.py input.png "edit instructions" output.png
+```
+
+**Examples:**
+```bash
+# Edit existing image
+uv run scripts/edit.py photo.png "Change the background to a beach sunset" edited.png
+```
+
+### compose.py - Multi-Image Composition
+```bash
+uv run scripts/compose.py "prompt" output.png --refs image1.png image2.png
+```
+
+**Examples:**
+```bash
+# Combine styles from multiple images
+uv run scripts/compose.py "Combine these styles into a logo" logo.png --refs style1.png style2.png
+```
+
+## Workflows
+
+Workflows provide structured approaches for specific visual types. Each workflow follows the PAI 6-step editorial process:
+
+1. **Extract narrative** - Understand the complete story/concept
+2. **Derive visual concept** - Single metaphor with 2-3 physical objects
+3. **Apply aesthetic** - Define style, colors, mood
+4. **Construct prompt** - Build detailed generation instructions
+5. **Generate** - Execute via script
+6. **Validate** - Check against criteria, regenerate if needed
+
+### Available Workflows
+
+- **infographic.md** - Data visualization, statistics, explainers
+- **diagram.md** - Technical diagrams, flowcharts, architecture
+
+## Workflow Usage
+
+When generating images, follow the appropriate workflow:
+
+### For Infographics
+```markdown
+1. What data/concept needs visualization?
+2. What's the key insight or takeaway?
+3. Aspect ratio: 16:9 (landscape) recommended
+4. Include: clear hierarchy, minimal text, supporting icons
+5. Generate at 2K minimum for text clarity
+```
+
+### For Diagrams
+```markdown
+1. What system/process is being illustrated?
+2. What are the key components and relationships?
+3. Style: flat colors, clean lines, minimal detail
+4. Generate at 2K for label clarity
+```
+
+## Environment Setup
+
+Requires `GEMINI_API_KEY` environment variable. This should be set from Geoffrey's secrets:
+
+```bash
+source ~/Library/Mobile\ Documents/com~apple~CloudDocs/Geoffrey/secrets/.env
+```
+
+## Best Practices
+
+### Infographics
+- Use simple, direct prompts: "Infographic explaining how X works"
+- Model auto-includes relevant icons/logos
+- 16:9 aspect ratio works best
+- Generate at 2K+ for readable text
+
+### General
+- Multi-turn refinement: generate, then ask for specific changes
+- Reference images improve consistency
+- Be specific about style, mood, lighting
+- SynthID watermark is automatic (Google provenance)
+
+## Output Location
+
+By default, save images to `/tmp/` or user-specified paths. For persistent storage, use:
+```
+~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/images/
+```
+
+## Limitations
+
+- No photorealistic humans (safety filter)
+- No copyrighted characters
+- Maximum 14 reference images for composition
+- 4K only available with Nano Banana Pro
+
+## Pricing
+
+| Size | Cost per Image |
+|------|---------------|
+| 1K | Free tier / $0.04 |
+| 2K | $0.134 |
+| 4K | $0.24 |
--- a/skills/image-gen/scripts/compose.py
+++ b/skills/image-gen/scripts/compose.py
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+# /// script
+# dependencies = ["google-genai", "pillow", "python-dotenv"]
+# ///
+"""
+Compose images using multiple reference images with Google's Nano Banana Pro.
+
+Usage:
+    uv run compose.py "prompt" output.png --refs image1.png image2.png [...]
+
+Arguments:
+    prompt       - Text description of desired composition
+    output       - Output file path (PNG)
+    --refs       - Flag followed by 1-14 reference images
+
+Examples:
+    uv run compose.py "Combine these styles into a cohesive logo" logo.png --refs style1.png style2.png
+    uv run compose.py "Create a collage with these photos" collage.png --refs photo1.png photo2.png photo3.png
+"""
+
+import sys
+import os
+import json
+from pathlib import Path
+
+from dotenv import load_dotenv
+from google import genai
+from google.genai.types import GenerateContentConfig, Part
+from PIL import Image
+
+# Load API key from Geoffrey secrets
+SECRETS_PATH = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env"
+if SECRETS_PATH.exists():
+    load_dotenv(SECRETS_PATH)
+
+
+def main():
+    if len(sys.argv) < 4 or "--refs" not in sys.argv:
+        print("Usage: uv run compose.py \"prompt\" output.png --refs image1.png image2.png [...]")
+        print("\nSupports up to 14 reference images.")
+        sys.exit(1)
+
+    prompt = sys.argv[1]
+    output_path = sys.argv[2]
+
+    # Parse reference images after --refs flag
+    refs_index = sys.argv.index("--refs")
+    ref_paths = sys.argv[refs_index + 1:]
+
+    if not ref_paths:
+        print("Error: No reference images provided after --refs")
+        sys.exit(1)
+
+    if len(ref_paths) > 14:
+        print(f"Error: Maximum 14 reference images supported, got {len(ref_paths)}")
+        sys.exit(1)
+
+    # Validate all reference images exist
+    for path in ref_paths:
+        if not os.path.exists(path):
+            print(f"Error: Reference image not found: {path}")
+            sys.exit(1)
+
+    # Initialize client
+    api_key = os.environ.get("GEMINI_API_KEY")
+    if not api_key:
+        print("Error: GEMINI_API_KEY environment variable not set")
+        sys.exit(1)
+
+    client = genai.Client(api_key=api_key)
+
+    # Load reference images
+    print(f"Loading {len(ref_paths)} reference images...")
+    ref_images = []
+    for path in ref_paths:
+        img = Image.open(path)
+        ref_images.append(img)
+        print(f"  Loaded: {path}")
+
+    # Configure generation
+    config = GenerateContentConfig(
+        response_modalities=["TEXT", "IMAGE"]
+    )
+
+    print(f"\nComposing image...")
+    print(f"  Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
+
+    try:
+        # Build content with all reference images and prompt
+        content_parts = []
+        for img in ref_images:
+            content_parts.append(Part.from_image(img))
+        content_parts.append(prompt)
+
+        response = client.models.generate_content(
+            model="gemini-3-pro-image-preview",
+            contents=content_parts,
+            config=config
+        )
+
+        # Extract and save composed image
+        saved = False
+        text_response = ""
+
+        for part in response.candidates[0].content.parts:
+            if hasattr(part, 'inline_data') and part.inline_data:
+                image = part.as_image()
+
+                output_file = Path(output_path)
+                output_file.parent.mkdir(parents=True, exist_ok=True)
+
+                image.save(output_path)
+                saved = True
+                print(f"\nComposed image saved: {output_path}")
+            elif hasattr(part, 'text') and part.text:
+                text_response = part.text
+
+        if text_response:
+            print(f"\nModel response: {text_response}")
+
+        if not saved:
+            print("\nError: No composed image was generated")
+            sys.exit(1)
+
+        result = {
+            "success": True,
+            "output": output_path,
+            "reference_count": len(ref_paths),
+            "text_response": text_response
+        }
+        print(f"\n{json.dumps(result)}")
+
+    except Exception as e:
+        print(f"\nError composing image: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/skills/image-gen/scripts/edit.py
+++ b/skills/image-gen/scripts/edit.py
@@ -0,0 +1,120 @@
+#!/usr/bin/env python3
+# /// script
+# dependencies = ["google-genai", "pillow", "python-dotenv"]
+# ///
+"""
+Edit existing images using Google's Nano Banana Pro.
+
+Usage:
+    uv run edit.py input.png "edit instructions" output.png
+
+Arguments:
+    input        - Input image file path
+    instructions - Text description of edits to make
+    output       - Output file path (PNG)
+
+Examples:
+    uv run edit.py photo.png "Change background to sunset" edited.png
+    uv run edit.py logo.png "Make the text larger and blue" logo_v2.png
+"""
+
+import sys
+import os
+import json
+from pathlib import Path
+
+from dotenv import load_dotenv
+from google import genai
+from google.genai.types import GenerateContentConfig, Part
+from PIL import Image
+
+# Load API key from Geoffrey secrets
+SECRETS_PATH = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env"
+if SECRETS_PATH.exists():
+    load_dotenv(SECRETS_PATH)
+
+
+def main():
+    if len(sys.argv) < 4:
+        print("Usage: uv run edit.py input.png \"edit instructions\" output.png")
+        sys.exit(1)
+
+    input_path = sys.argv[1]
+    instructions = sys.argv[2]
+    output_path = sys.argv[3]
+
+    # Validate input exists
+    if not os.path.exists(input_path):
+        print(f"Error: Input file not found: {input_path}")
+        sys.exit(1)
+
+    # Initialize client
+    api_key = os.environ.get("GEMINI_API_KEY")
+    if not api_key:
+        print("Error: GEMINI_API_KEY environment variable not set")
+        sys.exit(1)
+
+    client = genai.Client(api_key=api_key)
+
+    # Load input image
+    print(f"Loading input image: {input_path}")
+    input_image = Image.open(input_path)
+
+    # Configure generation
+    config = GenerateContentConfig(
+        response_modalities=["TEXT", "IMAGE"]
+    )
+
+    print(f"Editing image...")
+    print(f"  Instructions: {instructions[:100]}{'...' if len(instructions) > 100 else ''}")
+
+    try:
+        # Create content with image and instructions
+        response = client.models.generate_content(
+            model="gemini-3-pro-image-preview",
+            contents=[
+                Part.from_image(input_image),
+                f"Edit this image: {instructions}"
+            ],
+            config=config
+        )
+
+        # Extract and save edited image
+        saved = False
+        text_response = ""
+
+        for part in response.candidates[0].content.parts:
+            if hasattr(part, 'inline_data') and part.inline_data:
+                image = part.as_image()
+
+                output_file = Path(output_path)
+                output_file.parent.mkdir(parents=True, exist_ok=True)
+
+                image.save(output_path)
+                saved = True
+                print(f"\nEdited image saved: {output_path}")
+            elif hasattr(part, 'text') and part.text:
+                text_response = part.text
+
+        if text_response:
+            print(f"\nModel response: {text_response}")
+
+        if not saved:
+            print("\nError: No edited image was generated")
+            sys.exit(1)
+
+        result = {
+            "success": True,
+            "input": input_path,
+            "output": output_path,
+            "text_response": text_response
+        }
+        print(f"\n{json.dumps(result)}")
+
+    except Exception as e:
+        print(f"\nError editing image: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/skills/image-gen/scripts/generate.py
+++ b/skills/image-gen/scripts/generate.py
@@ -0,0 +1,135 @@
+#!/usr/bin/env python3
+# /// script
+# dependencies = ["google-genai", "pillow", "python-dotenv"]
+# ///
+"""
+Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image).
+
+Usage:
+    uv run generate.py "prompt" output.png [aspect_ratio] [size]
+
+Arguments:
+    prompt       - Text description of the image to generate
+    output       - Output file path (PNG)
+    aspect_ratio - Optional: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9 (default: 1:1)
+    size         - Optional: 1K, 2K, 4K (default: 2K)
+
+Examples:
+    uv run generate.py "A cozy coffee shop" coffee.png
+    uv run generate.py "Infographic about AI" ai.png 16:9 2K
+"""
+
+import sys
+import os
+import json
+from pathlib import Path
+
+from dotenv import load_dotenv
+from google import genai
+from google.genai.types import GenerateContentConfig
+
+# Load API key from Geoffrey secrets
+SECRETS_PATH = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env"
+if SECRETS_PATH.exists():
+    load_dotenv(SECRETS_PATH)
+
+
+def main():
+    if len(sys.argv) < 3:
+        print("Usage: uv run generate.py \"prompt\" output.png [aspect_ratio] [size]")
+        print("\nAspect ratios: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9")
+        print("Sizes: 1K, 2K, 4K")
+        sys.exit(1)
+
+    prompt = sys.argv[1]
+    output_path = sys.argv[2]
+    aspect_ratio = sys.argv[3] if len(sys.argv) > 3 else "1:1"
+    image_size = sys.argv[4] if len(sys.argv) > 4 else "2K"
+
+    # Validate aspect ratio
+    valid_ratios = ["1:1", "2:3", "3:2", "4:3", "16:9", "21:9"]
+    if aspect_ratio not in valid_ratios:
+        print(f"Invalid aspect ratio: {aspect_ratio}")
+        print(f"Valid options: {', '.join(valid_ratios)}")
+        sys.exit(1)
+
+    # Validate size
+    valid_sizes = ["1K", "2K", "4K"]
+    if image_size not in valid_sizes:
+        print(f"Invalid size: {image_size}")
+        print(f"Valid options: {', '.join(valid_sizes)}")
+        sys.exit(1)
+
+    # Initialize client (uses GEMINI_API_KEY env var)
+    api_key = os.environ.get("GEMINI_API_KEY")
+    if not api_key:
+        print("Error: GEMINI_API_KEY environment variable not set")
+        sys.exit(1)
+
+    client = genai.Client(api_key=api_key)
+
+    # Configure generation
+    config = GenerateContentConfig(
+        response_modalities=["TEXT", "IMAGE"],
+        image_config={
+            "aspect_ratio": aspect_ratio,
+            "image_size": image_size
+        }
+    )
+
+    print(f"Generating image...")
+    print(f"  Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
+    print(f"  Aspect ratio: {aspect_ratio}")
+    print(f"  Size: {image_size}")
+
+    try:
+        response = client.models.generate_content(
+            model="gemini-3-pro-image-preview",
+            contents=[prompt],
+            config=config
+        )
+
+        # Extract and save image
+        saved = False
+        text_response = ""
+
+        for part in response.candidates[0].content.parts:
+            if hasattr(part, 'inline_data') and part.inline_data:
+                # Save image
+                image = part.as_image()
+
+                # Ensure output directory exists
+                output_file = Path(output_path)
+                output_file.parent.mkdir(parents=True, exist_ok=True)
+
+                image.save(output_path)
+                saved = True
+                print(f"\nImage saved: {output_path}")
+            elif hasattr(part, 'text') and part.text:
+                text_response = part.text
+
+        if text_response:
+            print(f"\nModel response: {text_response}")
+
+        if not saved:
+            print("\nError: No image was generated")
+            print("The model may have declined due to content policy.")
+            sys.exit(1)
+
+        # Output JSON for programmatic use
+        result = {
+            "success": True,
+            "output": output_path,
+            "aspect_ratio": aspect_ratio,
+            "size": image_size,
+            "text_response": text_response
+        }
+        print(f"\n{json.dumps(result)}")
+
+    except Exception as e:
+        print(f"\nError generating image: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/skills/image-gen/workflows/diagram.md
+++ b/skills/image-gen/workflows/diagram.md
@@ -0,0 +1,214 @@
+# Diagram Workflow
+
+Create technical diagrams, flowcharts, architecture diagrams, and process visualizations.
+
+## When to Use
+
+- System architecture documentation
+- Process flows and workflows
+- Technical explanations
+- Decision trees
+- Network topologies
+- Component relationships
+
+## 6-Step Process
+
+### Step 1: Extract Narrative
+
+**Goal:** Understand the system or process being illustrated.
+
+Questions to answer:
+- What system/process is being shown?
+- What are the key components?
+- What are the relationships between components?
+- What is the flow direction (if any)?
+- What level of detail is needed?
+
+**Output:** Component list and relationship description.
+
+### Step 2: Derive Visual Concept
+
+**Goal:** Choose the right diagram type.
+
+**Diagram types:**
+| Type | Use When |
+|------|----------|
+| Flowchart | Sequential processes with decisions |
+| Architecture | System components and connections |
+| Sequence | Time-ordered interactions |
+| Network | Interconnected nodes |
+| Hierarchy | Parent-child relationships |
+| Venn | Overlapping categories |
+
+**Output:** Diagram type and layout direction.
+
+### Step 3: Apply Aesthetic
+
+**Goal:** Define visual style for clarity.
+
+Recommended for diagrams:
+- **Colors:** Limited palette (3-5 colors max)
+- **Style:** Flat, clean, no gradients
+- **Lines:** Consistent weight, clear arrows
+- **Shapes:** Simple geometric (rectangles, circles)
+- **Labels:** Sans-serif, high contrast
+
+**Color coding conventions:**
+- Blue: Primary components
+- Green: Success/positive flow
+- Red: Error/warning
+- Orange: External systems
+- Gray: Supporting elements
+
+**Output:** Color scheme and style notes.
+
+### Step 4: Construct Prompt
+
+**Goal:** Build the generation prompt.
+
+**Template:**
+```
+Create a [diagram type] showing [system/process].
+
+Components:
+- [Component 1]: [description]
+- [Component 2]: [description]
+- [Component 3]: [description]
+
+Relationships:
+- [Component 1] connects to [Component 2] via [connection type]
+- [Component 2] sends data to [Component 3]
+
+Layout: [direction - left-to-right, top-to-bottom, etc.]
+
+Style: [aesthetic from Step 3]
+
+Labels to include:
+- [Label 1]
+- [Label 2]
+```
+
+**Output:** Complete prompt.
+
+### Step 5: Generate
+
+**Command:**
+```bash
+uv run scripts/generate.py "[prompt]" output.png [aspect_ratio] 2K
+```
+
+**Aspect ratio by diagram type:**
+- Flowcharts: 3:2 or 16:9 (horizontal flow)
+- Architecture: 4:3 or 1:1 (balanced)
+- Sequence: 2:3 (vertical flow)
+- Network: 1:1 (balanced)
+
+**Settings:**
+- Size: **2K minimum** for label clarity
+- Model: gemini-3-pro-image-preview
+
+### Step 6: Validate
+
+**Validation criteria:**
+
+| Criterion | Check |
+|-----------|-------|
+| Completeness | All components present |
+| Accuracy | Relationships correctly shown |
+| Readability | All labels legible |
+| Flow clarity | Direction is obvious |
+| Consistency | Shapes/colors used consistently |
+| Simplicity | No unnecessary elements |
+
+**If validation fails:**
+- Identify missing or incorrect elements
+- Adjust prompt
+- Regenerate (max 3 iterations)
+
+## Example Workflow
+
+**Request:** Create a diagram showing a CI/CD pipeline.
+
+### Step 1: Extract Narrative
+"CI/CD pipeline with: code commit, build, test, deploy to staging, deploy to production. Shows automated flow with manual approval gates."
+
+Components:
+- Git repository
+- Build server
+- Test suite
+- Staging environment
+- Production environment
+- Approval gates
+
+### Step 2: Visual Concept
+Flowchart, left-to-right horizontal flow. Linear pipeline with branching for approval.
+
+### Step 3: Aesthetic
+- Blue: Pipeline stages
+- Green: Success indicators
+- Orange: Approval gates
+- Gray: Arrows/connectors
+- Style: Flat rectangles with rounded corners, clear directional arrows
+
+### Step 4: Prompt
+```
+Create a flowchart showing a CI/CD pipeline.
+
+Components:
+- Git Repository: Code source
+- Build Server: Compiles code
+- Test Suite: Runs automated tests
+- Staging: Pre-production environment
+- Production: Live environment
+- Approval Gate: Manual review step
+
+Flow:
+- Git Repository -> Build Server -> Test Suite -> Staging -> Approval Gate -> Production
+
+Layout: Horizontal left-to-right flow
+
+Style: Flat design with rounded rectangles. Blue for pipeline stages, green checkmarks for success, orange for approval gate, gray arrows between stages.
+
+Labels: "Code", "Build", "Test", "Stage", "Approve", "Deploy"
+```
+
+### Step 5: Generate
+```bash
+uv run scripts/generate.py "Create a flowchart showing a CI/CD pipeline..." cicd.png 16:9 2K
+```
+
+### Step 6: Validate
+- All 6 stages present
+- Flow direction clear
+- Labels readable
+- Approval gate distinguished
+
+## Tips for Better Results
+
+1. **Keep it simple** - Fewer components = clearer diagram
+2. **Be explicit about connections** - State what connects to what
+3. **Specify layout direction** - Avoid ambiguous layouts
+4. **Use consistent terminology** - Same names throughout prompt
+5. **Include all labels** - List exact text for each component
+
+## Common Issues
+
+| Issue | Solution |
+|-------|----------|
+| Missing components | List every component explicitly |
+| Unclear flow | State direction and connections |
+| Overlapping elements | Reduce components or use larger aspect ratio |
+| Inconsistent styling | Be more explicit about shapes/colors |
+| Wrong diagram type | Reconsider which type fits best |
+
+## Alternative: Mermaid Diagrams
+
+For simple diagrams, consider generating Mermaid code instead:
+- More precise control
+- Version-controllable
+- Easily editable
+
+Use image generation for:
+- Visual appeal matters
+- Marketing/presentation use
+- Complex custom styling
--- a/skills/image-gen/workflows/infographic.md
+++ b/skills/image-gen/workflows/infographic.md
@@ -0,0 +1,176 @@
+# Infographic Workflow
+
+Create data visualizations, explainers, and statistical infographics using the 6-step editorial process.
+
+## When to Use
+
+- Explaining concepts or processes
+- Visualizing data or statistics
+- Creating how-to guides
+- Summarizing reports or research
+- Making comparisons
+
+## 6-Step Process
+
+### Step 1: Extract Narrative
+
+**Goal:** Understand the complete story being told.
+
+Questions to answer:
+- What is the main concept or data being explained?
+- What is the key insight or takeaway?
+- Who is the target audience?
+- What action should viewers take?
+
+**Output:** 2-3 sentence summary of the narrative.
+
+### Step 2: Derive Visual Concept
+
+**Goal:** Translate narrative into a single visual metaphor.
+
+Guidelines:
+- Choose 2-3 physical objects that represent the concept
+- Prefer familiar, universal metaphors
+- Avoid abstract shapes without meaning
+- Consider spatial relationships (hierarchy, flow, comparison)
+
+**Examples:**
+- Data growth → Plant/tree growing
+- Security → Shield/lock
+- Process → Pipeline/conveyor belt
+- Comparison → Balance scale
+
+**Output:** Visual metaphor description.
+
+### Step 3: Apply Aesthetic
+
+**Goal:** Define the visual style.
+
+Recommended for infographics:
+- **Colors:** Muted palette with 1-2 accent colors
+- **Style:** Flat design, clean lines
+- **Typography:** Sans-serif, clear hierarchy
+- **Layout:** Clear sections, visual flow
+- **Icons:** Simple, consistent style
+
+**Output:** Style description (2-3 sentences).
+
+### Step 4: Construct Prompt
+
+**Goal:** Build the generation prompt.
+
+**Template:**
+```
+Create an infographic explaining [topic].
+
+Visual concept: [metaphor from Step 2]
+
+Key elements:
+- [Main data point or concept]
+- [Supporting element 1]
+- [Supporting element 2]
+
+Style: [aesthetic from Step 3]
+
+Layout: [horizontal/vertical], [sections description]
+
+Text to include:
+- Title: "[title]"
+- Key stat: "[number or fact]"
+- [Other text elements]
+```
+
+**Output:** Complete prompt.
+
+### Step 5: Generate
+
+**Command:**
+```bash
+uv run scripts/generate.py "[prompt]" output.png 16:9 2K
+```
+
+**Settings for infographics:**
+- Aspect ratio: **16:9** (landscape) - best for infographics
+- Size: **2K minimum** - ensures text readability
+- Model: gemini-3-pro-image-preview (Nano Banana Pro)
+
+### Step 6: Validate
+
+**Validation criteria:**
+
+| Criterion | Check |
+|-----------|-------|
+| Text legibility | All text is readable at 100% zoom |
+| Data accuracy | Numbers/facts are displayed correctly |
+| Visual hierarchy | Eye naturally flows through content |
+| Color contrast | Sufficient contrast for accessibility |
+| Completeness | All key elements are present |
+| Brand alignment | Matches intended style |
+
+**If validation fails:**
+- Identify specific issues
+- Modify prompt to address them
+- Regenerate
+- Maximum 3 iterations
+
+## Example Workflow
+
+**Request:** Create an infographic about how neural networks learn.
+
+### Step 1: Extract Narrative
+"Neural networks learn by adjusting connection weights through forward propagation and backpropagation. Key insight: the process is iterative and improves over time. Audience: Technical beginners."
+
+### Step 2: Visual Concept
+"A network of interconnected nodes with signals flowing through, showing adjustment dials on connections. Like a city's road network with traffic lights being adjusted."
+
+### Step 3: Aesthetic
+"Flat design with dark blue background, bright connection lines in cyan and orange. Minimal, clean style with clear node shapes."
+
+### Step 4: Prompt
+```
+Create an infographic explaining how neural networks learn.
+
+Visual concept: Network of connected nodes with adjustment dials on connections, signals flowing through like traffic.
+
+Key elements:
+- Input layer with data entering
+- Hidden layers with connection weights
+- Output layer with result
+- Feedback loop showing backpropagation
+
+Style: Dark blue background, cyan and orange accents, flat design, clean minimalist style.
+
+Layout: Horizontal flow from left (input) to right (output), with backpropagation arrow below.
+
+Text to include:
+- Title: "How Neural Networks Learn"
+- Labels: "Input", "Hidden Layers", "Output", "Backpropagation"
+```
+
+### Step 5: Generate
+```bash
+uv run scripts/generate.py "Create an infographic explaining how neural networks learn..." neural_network.png 16:9 2K
+```
+
+### Step 6: Validate
+- Text readable
+- Flow is clear left-to-right
+- Colors have good contrast
+- All labels present
+
+## Tips for Better Results
+
+1. **Simple prompts often work best** - "Infographic explaining X" can produce excellent results
+2. **Model understands context** - It will add relevant icons/imagery automatically
+3. **Be specific about text** - Include exact wording for titles and labels
+4. **Iterate with conversation** - Ask for specific changes after initial generation
+5. **Use reference images** - For style consistency across multiple infographics
+
+## Common Issues
+
+| Issue | Solution |
+|-------|----------|
+| Text too small | Increase size to 4K or reduce text amount |
+| Cluttered layout | Simplify to fewer elements |
+| Wrong style | Be more explicit about aesthetic |
+| Missing elements | List all required elements explicitly |