Initial commit

2025-11-30 08:35:59 +08:00
commit 90883a4d25
287 changed files with 75058 additions and 0 deletions
--- a/skills/image-gen/SKILL.md
+++ b/skills/image-gen/SKILL.md
@@ -0,0 +1,153 @@
+---
+name: image-gen
+description: Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) with workflow-based prompting
+triggers:
+  - "create image"
+  - "generate image"
+  - "make infographic"
+  - "create infographic"
+  - "generate diagram"
+  - "make diagram"
+  - "design visual"
+  - "create visual"
+allowed-tools: Read, Write, Bash
+version: 0.1.0
+---
+
+# Image Generation Skill
+
+Generate professional images, infographics, and diagrams using Google's Nano Banana Pro model (gemini-3-pro-image-preview).
+
+## Model Capabilities
+
+**Nano Banana Pro** (released November 20, 2025):
+- **Text rendering** - Accurate, legible text in images
+- **Google Search grounding** - Real-time data (weather, stocks, etc.)
+- **Multi-turn conversation** - Iterative refinement
+- **Up to 14 reference images** - For composition and style transfer
+- **Resolutions**: 1K, 2K, 4K
+- **Aspect ratios**: 1:1, 2:3, 3:2, 4:3, 16:9, 21:9
+
+## Scripts
+
+All scripts use Python via `uv run` with inline dependencies.
+
+### generate.py - Text to Image
+```bash
+uv run scripts/generate.py "prompt" output.png [aspect_ratio] [size]
+```
+
+**Examples:**
+```bash
+# Basic image
+uv run scripts/generate.py "A cozy coffee shop in autumn" coffee.png
+
+# Infographic with specific aspect ratio
+uv run scripts/generate.py "Infographic explaining how neural networks work" nn.png 16:9 2K
+
+# 4K professional image
+uv run scripts/generate.py "Professional headshot, studio lighting" headshot.png 3:2 4K
+```
+
+### edit.py - Image Editing
+```bash
+uv run scripts/edit.py input.png "edit instructions" output.png
+```
+
+**Examples:**
+```bash
+# Edit existing image
+uv run scripts/edit.py photo.png "Change the background to a beach sunset" edited.png
+```
+
+### compose.py - Multi-Image Composition
+```bash
+uv run scripts/compose.py "prompt" output.png --refs image1.png image2.png
+```
+
+**Examples:**
+```bash
+# Combine styles from multiple images
+uv run scripts/compose.py "Combine these styles into a logo" logo.png --refs style1.png style2.png
+```
+
+## Workflows
+
+Workflows provide structured approaches for specific visual types. Each workflow follows the PAI 6-step editorial process:
+
+1. **Extract narrative** - Understand the complete story/concept
+2. **Derive visual concept** - Single metaphor with 2-3 physical objects
+3. **Apply aesthetic** - Define style, colors, mood
+4. **Construct prompt** - Build detailed generation instructions
+5. **Generate** - Execute via script
+6. **Validate** - Check against criteria, regenerate if needed
+
+### Available Workflows
+
+- **infographic.md** - Data visualization, statistics, explainers
+- **diagram.md** - Technical diagrams, flowcharts, architecture
+
+## Workflow Usage
+
+When generating images, follow the appropriate workflow:
+
+### For Infographics
+```markdown
+1. What data/concept needs visualization?
+2. What's the key insight or takeaway?
+3. Aspect ratio: 16:9 (landscape) recommended
+4. Include: clear hierarchy, minimal text, supporting icons
+5. Generate at 2K minimum for text clarity
+```
+
+### For Diagrams
+```markdown
+1. What system/process is being illustrated?
+2. What are the key components and relationships?
+3. Style: flat colors, clean lines, minimal detail
+4. Generate at 2K for label clarity
+```
+
+## Environment Setup
+
+Requires `GEMINI_API_KEY` environment variable. This should be set from Geoffrey's secrets:
+
+```bash
+source ~/Library/Mobile\ Documents/com~apple~CloudDocs/Geoffrey/secrets/.env
+```
+
+## Best Practices
+
+### Infographics
+- Use simple, direct prompts: "Infographic explaining how X works"
+- Model auto-includes relevant icons/logos
+- 16:9 aspect ratio works best
+- Generate at 2K+ for readable text
+
+### General
+- Multi-turn refinement: generate, then ask for specific changes
+- Reference images improve consistency
+- Be specific about style, mood, lighting
+- SynthID watermark is automatic (Google provenance)
+
+## Output Location
+
+By default, save images to `/tmp/` or user-specified paths. For persistent storage, use:
+```
+~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/images/
+```
+
+## Limitations
+
+- No photorealistic humans (safety filter)
+- No copyrighted characters
+- Maximum 14 reference images for composition
+- 4K only available with Nano Banana Pro
+
+## Pricing
+
+| Size | Cost per Image |
+|------|---------------|
+| 1K | Free tier / $0.04 |
+| 2K | $0.134 |
+| 4K | $0.24 |