Initial commit

2025-11-29 17:51:05 +08:00
commit 2e92090219
11 changed files with 1408 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,15 @@
 {
  "name": "explore-with-illustrations",
  "description": "Generate and edit images using Gemini API (Nano Banana Pro). Specialized for creating high-quality technical illustrations, architecture diagrams, code concept visualizations, and educational content from codebases.",
  "version": "1.0.0",
  "author": {
    "name": "Agney",
    "url": "https://github.com/agneym/agneym-claude-marketplace"
  },
  "skills": [
    "./skills"
  ],
  "commands": [
    "./commands"
  ]
 }
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
 # explore-with-illustrations
 Generate and edit images using Gemini API (Nano Banana Pro). Specialized for creating high-quality technical illustrations, architecture diagrams, code concept visualizations, and educational content from codebases.
--- a/commands/explore.md
+++ b/commands/explore.md
@@ -0,0 +1,185 @@
 ---
 description: Analyze and visualize architecture of a codebase area
 argument-hint: [directory-or-area]
 allowed-tools: *
 ---
 # Architecture Analysis
 Analyze the architecture of: **$ARGUMENTS**
 ## Your Task
 ### Setup: Create Visualization Submodule
 Before starting the analysis, create a git submodule for storing all visualization outputs:
 1. Create a directory called `visualizations/` in the user's repository root
 2. Initialize it as a git submodule (if it doesn't exist already)
 3. All outputs (HTML files, images, assets) will be stored in this submodule
 4. The HTML file will be the main entry point for viewing the visualization
 ### Workflow
 1. **Explore** the codebase area using Glob and Read to understand:
   - Key components and their responsibilities
   - Data flow and interactions
   - Dependencies and relationships
 2. **Explain** the architecture concisely:
   - Main components and their roles
   - How they interact
   - Key patterns or design decisions
 3. **Visualize** - You have complete freedom to choose the best visualization approach:
   - **HTML + JavaScript**: Create interactive visualizations using any JavaScript libraries (D3.js, Chart.js, Plotly, Three.js, etc.)
   - **Generated Images**: Use gemini-imagen skill scripts to generate diagrams
   - **Hybrid**: Combine both - generate images and embed them in an interactive HTML page
   **CRITICAL - Image Handling Rule:**
   - If you generate images (using gemini-imagen scripts), save them in the `visualizations/` submodule
   - Reference these images in an HTML file (also in the submodule)
   - Images should NOT be standalone - always create an HTML file that displays them
   - The HTML file serves as the entry point for viewing all visualizations
 ## Critical Visualization Guidelines
 ### For Image Generation (using gemini-imagen)
 **NEVER be vague in image prompts. The image model cannot see the codebase.**
 1. **Specify exact positions**: "Component A at top center, Component B at bottom left" (not "components arranged logically")
 2. **Label every connection**: "Arrow from A to B labeled 'POST /api/login with JWT'" (not "A calls B")
 3. **Include all details**: Methods, parameters, return types, HTTP verbs, data formats
 4. **Use specific colors**: "Requests in blue, responses in green, errors in red" (not "color-coded")
 5. **State cardinality**: "1 to N (many)" on relationship lines (not "has many")
 6. **Complete flows**: List every step sequentially with explicit labels
 ### For HTML/Interactive Visualizations
 You have complete freedom to create any type of interactive visualization. Consider:
 **JavaScript Libraries** (load via CDN):
 - **D3.js**: Complex data visualizations, force-directed graphs, hierarchies
 - **Chart.js**: Simple charts (bar, line, pie, radar)
 - **Plotly**: Interactive scientific/statistical charts
 - **Three.js**: 3D visualizations
 - **Mermaid.js**: Diagrams from text descriptions (flowcharts, sequence diagrams, etc.)
 - **Cytoscape.js**: Network/graph visualizations
 - **Vis.js**: Timeline, network, and graph visualizations
 - Or any other library you find appropriate
 **Visualization Types:**
 - Interactive architecture diagrams with clickable components
 - Animated data flow visualizations
 - Filterable/searchable dependency graphs
 - Timeline views of execution flows
 - Interactive code maps with zoom/pan
 - Combined visualizations (images + interactive overlays)
 **File Structure:**
 - Create `visualizations/index.html` as the main entry point
 - Can use multiple HTML files if needed
 - External CSS/JS files are allowed
 - Reference any generated images with relative paths
 **Best Practices:**
 - Use multiple files with external references when appropriate
 - Include clear navigation if creating multiple pages
 - Add interactivity where it enhances understanding (hover tooltips, click to expand, etc.)
 - Keep it simple - this is throwaway code, don't over-engineer
 ## Diagram Templates (for Image Generation)
 Choose the appropriate template based on what you discovered:
 ### Architecture Diagram
 ```
 "Technical architecture diagram: [COMPONENT_1] at top center, [COMPONENT_2] on left middle, [COMPONENT_3] on right middle.
 Arrow from [COMPONENT_1] to [COMPONENT_2] labeled '[HTTP_METHOD] [PATH] [PURPOSE]'.
 Arrow from [COMPONENT_2] to [COMPONENT_3] labeled '[PROTOCOL] [DATA_TYPE]'.
 [Repeat for ALL connections with explicit labels].
 Clean labeled boxes, directional arrows, white background."
 ```
 ### Data Flow Diagram
 ```
 "Data flow diagram: Step 1: [ENTITY_A] at left. Step 2: Arrow to [ENTITY_B] labeled '[METHOD] [PATH] with [DATA]'.
 Step 3: Arrow back labeled '[STATUS] [RESPONSE_TYPE]'. [Continue for all steps].
 Number each step, color-code: [TYPE_1] in blue, [TYPE_2] in green. Technical style, 16:9."
 ```
 ### Component Relationships (UML)
 ```
 "UML class diagram: [CLASS_1] box at top with attributes '[ATTRS]' and methods '[METHODS]'.
 [CLASS_2] box at bottom with '[ATTRS/METHODS]'.
 [CLASS_1] to [CLASS_2]: [RELATIONSHIP] shown with [ARROW_TYPE], labeled '1 to N'.
 [Repeat for all relationships]. Clean UML style."
 ```
 ### Code Execution Flow
 ```
 "Flowchart for [FUNCTION]: Start. Step 1: '[ACTION]' in blue rectangle.
 Step 2: Diamond '[CONDITION]' with YES arrow to [NEXT] and NO arrow to [ALT].
 [Continue all steps]. Errors in red rounded boxes, success in green. Label all arrows."
 ```
 ### Database Schema
 ```
 "Database schema: [TABLE_1] with columns '[COL] [TYPE] [CONSTRAINTS]'.
 [TABLE_2] with '[COLUMNS]'. Foreign key: [TABLE_2].[FK] → [TABLE_1].[PK]
 shown with line labeled '1 to N'. [Repeat for all tables]. Show PK icons."
 ```
 ### API Endpoints
 ```
 "REST API for [SERVICE]: Endpoint 1: [METHOD] [PATH] with body {[FIELDS]} returns {[RESPONSE]} [STATUS].
 [Repeat for all endpoints]. Color-code: GET blue, POST green, PUT yellow, DELETE red.
 Show full JSON examples."
 ```
 ## Educational Approach
 **Your goal: Create the best visualization for understanding, not just documentation.**
 Consider creative formats when appropriate:
 - **Metaphors**: Database transaction as restaurant order system
 - **Comics**: Function execution as sequential panels
 - **Real-world scenarios**: Authentication as bouncer checking IDs
 - **Analogies**: Cache as kitchen pantry with frequently-used items
 **Example creative prompt:**
 > "Comic strip showing JWT auth: Panel 1: User (detective) at API Gateway (security desk). Panel 2: Gateway calls Auth Service (background check). Panel 3: Auth returns golden badge (JWT). Panel 4: User shows badge to Resource Server (VIP room). Cartoon style."
 ## Output
 ### Creating Visualizations
 **Option 1: HTML + Interactive JavaScript**
 - Create `visualizations/index.html` with your interactive visualization
 - Use any JavaScript libraries via CDN
 - Can create additional HTML/CSS/JS files as needed
 - Reference any generated images with relative paths
 **Option 2: Generated Images (via gemini-imagen)**
 - Generate diagrams using: `scripts/generate_image.py "YOUR_EXPLICIT_PROMPT" visualizations/architecture-diagram.png --size 4K --aspect 16:9`
 - The script has a python and uv shebang - execute it directly instead of using python
 - Save all images in the `visualizations/` submodule
 - Create `visualizations/index.html` that displays the images
 - If complex, create multiple focused diagrams rather than one overwhelming image
 - Use descriptive filenames: `component-relationships.png`, `data-flow.png`, etc.
 **Option 3: Hybrid Approach**
 - Generate images using gemini-imagen scripts (save to `visualizations/`)
 - Create interactive HTML that embeds/references the images
 - Add JavaScript interactivity on top (zoom, annotations, navigation, etc.)
 ### Viewing the Visualization
 After creating the visualization, start a local server:
 ```bash
 python -m http.server --directory visualizations/
 ```
 Then open `http://localhost:8000` in a browser to view the visualization.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,73 @@
 {
  "$schema": "internal://schemas/plugin.lock.v1.json",
  "pluginId": "gh:agneym/agneym-claude-marketplace:plugins/explore-with-illustrations",
  "normalized": {
    "repo": null,
    "ref": "refs/tags/v20251128.0",
    "commit": "1b7690938521111ee8938f3ab8f78f8f61e9b351",
    "treeHash": "57bf45b2b23f3a3ff04f332342ca38658a28173c9025644ac8b5990299d1ac09",
    "generatedAt": "2025-11-28T10:13:02.101831Z",
    "toolVersion": "publish_plugins.py@0.2.0"
  },
  "origin": {
    "remote": "git@github.com:zhongweili/42plugin-data.git",
    "branch": "master",
    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
  },
  "manifest": {
    "name": "explore-with-illustrations",
    "description": "Generate and edit images using Gemini API (Nano Banana Pro). Specialized for creating high-quality technical illustrations, architecture diagrams, code concept visualizations, and educational content from codebases.",
    "version": "1.0.0"
  },
  "content": {
    "files": [
      {
        "path": "README.md",
        "sha256": "a71773d123857f7c100e09d11c90c6339138ddb6ee8e4a05964bf89838ca4c0f"
      },
      {
        "path": ".claude-plugin/plugin.json",
        "sha256": "ef6e992cf87ffe1741afee9898c450d7007d44c6b03bc47ee1c386690b10285d"
      },
      {
        "path": "commands/explore.md",
        "sha256": "7dbf290f0b822b7365d22fe83aa43c40c8c4d23e35acb672e341b339aff6b859"
      },
      {
        "path": "skills/gemini-imagen/mise.toml",
        "sha256": "dbb25cfa908fd44614ccebf8295635f2eee1e05ed950f63fd60faefef8889c56"
      },
      {
        "path": "skills/gemini-imagen/SKILL.md",
        "sha256": "e0f38d3d77c0b378c982f699200f849d36eaad54334f0e696d07c05d70a1d99c"
      },
      {
        "path": "skills/gemini-imagen/scripts/gemini_images.py",
        "sha256": "0f7f45c8ad0ab942ff05a4a2ae99900dfe7088235f754489106ddc1411938722"
      },
      {
        "path": "skills/gemini-imagen/scripts/compose_images.py",
        "sha256": "1dac8f1ba49d0f58a3d4dc1439c50ed1177b4c5c8335e622dc10dec78f4b42b1"
      },
      {
        "path": "skills/gemini-imagen/scripts/generate_image.py",
        "sha256": "66cd1b59e2b3be98eff8bc03c5f924ee34d7c445cba8981214f3194b037c009f"
      },
      {
        "path": "skills/gemini-imagen/scripts/multi_turn_chat.py",
        "sha256": "d12db12d52d6ebd35ed449ff1799acb40b7714a90e6ae118c0661152a8f2b2b6"
      },
      {
        "path": "skills/gemini-imagen/scripts/edit_image.py",
        "sha256": "c2d031289e65c64246daf4b296e86e9b8aa9af4af280ef97aa2531a1e5a12eb4"
      }
    ],
    "dirSha256": "57bf45b2b23f3a3ff04f332342ca38658a28173c9025644ac8b5990299d1ac09"
  },
  "security": {
    "scannedAt": null,
    "scannerVersion": null,
    "flags": []
  }
 }
--- a/skills/gemini-imagen/SKILL.md
+++ b/skills/gemini-imagen/SKILL.md
@@ -0,0 +1,193 @@
 ---
 name: gemini-imagegen
 description: Generate and edit images using Gemini API (Nano Banana Pro). Supports text-to-image, image editing, multi-turn refinement, Google Search grounding for factual accuracy, and composition from multiple reference images.
 ---
 # Gemini Image Generation (Nano Banana Pro)
 Generate professional-quality images using Google's **Gemini 3 Pro Image** model (aka Nano Banana Pro). The environment variable `GEMINI_API_KEY` must be set.
 ## Model
 **gemini-3-pro-image-preview** (Nano Banana Pro)
 - Resolution: Up to 4K (1K, 2K, 4K)
 - Built on Gemini 3 Pro with advanced reasoning and real-world knowledge
 - Best for: Professional assets, illustrations, diagrams, text rendering, product mockups
 - Features: Google Search grounding, automatic "Thinking" process for refined composition
 ## Quick Start Scripts
 CRITICAL FOR AGENTS: These are executable scripts in your PATH. All scripts now default to **gemini-3-pro-image-preview**.
 ### Text-to-Image
 ```bash
 scripts/generate_image.py "A technical diagram showing microservices architecture" output.png
 ```
 ### Edit Existing Image
 ```bash
 scripts/edit_image.py diagram.png "Add API gateway component with arrows showing data flow" output.png
 ```
 ### Multi-Turn Chat (Iterative Refinement)
 ```bash
 scripts/multi_turn_chat.py
 ```
 For high-resolution technical diagrams:
 ```bash
 scripts/generate_image.py "Your prompt" output.png --size 4K --aspect 16:9
 ```
 ## Core API Pattern
 All image generation uses the `generateContent` endpoint with `responseModalities: ["TEXT", "IMAGE"]`:
 ```python
 import os
 from google import genai
 client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Your prompt here"],
 )
 for part in response.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = part.as_image()
        image.save("output.png")
 ```
 ## Image Configuration Options
 Control output with `image_config`:
 ```python
 from google.genai import types
 response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",  # 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
            image_size="4K"       # 1K, 2K, 4K (Nano Banana Pro supports up to 4K)
        ),
    )
 )
 ```
 ## Editing Images
 Pass existing images with text prompts:
 ```python
 from PIL import Image
 img = Image.open("input.png")
 response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Add a sunset to this scene", img],
 )
 ```
 ## Multi-Turn Refinement
 Use chat for iterative editing:
 ```python
 from google.genai import types
 chat = client.chats.create(
    model="gemini-3-pro-image-preview",
    config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
 )
 response = chat.send_message("Create a logo for 'Acme Corp'")
 # Save first image...
 response = chat.send_message("Make the text bolder and add a blue gradient")
 # Save refined image...
 ```
 ## Prompting Best Practices
 ### Core Prompt Structure
 Keep prompts concise and specific. Research shows prompts under 25 words achieve **30% higher accuracy**. Structure as:
 **Subject + Adjectives + Action + Location/Context + Composition + Lighting + Style**
 ### Photorealistic Scenes
 Include camera details: lens type, lighting, angle, mood.
 > "Photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
 ### Stylized Art
 Specify style explicitly:
 > "Kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
 ### Text in Images
 Be explicit about font style and placement:
 > "Logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
 ### Product Mockups
 Describe lighting setup and surface:
 > "Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
 ### Technical Diagrams
 Be explicit about positions, relationships, and labels:
 > "Technical diagram: Component A at top, Component B at bottom. Arrow from A to B labeled 'HTTP GET'. Clean boxes, directional arrows, white background."
 ## Advanced Features
 ### Google Search Grounding
 Generate images based on real-time data:
 ```python
 response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Visualize today's weather in Tokyo as an infographic"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        tools=[{"google_search": {}}]
    )
 )
 ```
 ### Multiple Reference Images (Up to 14)
 Combine elements from multiple sources:
 ```python
 response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        "Create a group photo of these people in an office",
        Image.open("person1.png"),
        Image.open("person2.png"),
        Image.open("person3.png"),
    ],
 )
 ```
 ## REST API (curl)
 ```bash
 curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Technical diagram showing RESTful API architecture"}]}]
  }' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png
 ```
 ## Notes
 - All generated images include SynthID watermarks
 - Image-only mode (`responseModalities: ["IMAGE"]`) won't work with Google Search grounding
 - For editing, describe changes conversationally—the model understands semantic masking
 - Be specific about positions, colors, labels, and relationships for best results
--- a/skills/gemini-imagen/mise.toml
+++ b/skills/gemini-imagen/mise.toml
@@ -0,0 +1,3 @@
 [tools]
 python = "3.11"
--- a/skills/gemini-imagen/scripts/compose_images.py
+++ b/skills/gemini-imagen/scripts/compose_images.py
@@ -0,0 +1,162 @@
 #!/usr/bin/env -S uv run --script
 #
 # /// script
 # requires-python = ">=3.12"
 # dependencies = ["google-genai", "pillow"]
 # ///
 """
 Compose multiple images into a new image using Gemini API.
 Usage:
    python compose_images.py "instruction" output.png image1.png [image2.png ...]
 Examples:
    python compose_images.py "Create a group photo of these people" group.png person1.png person2.png
    python compose_images.py "Put the cat from the first image on the couch from the second" result.png cat.png couch.png
    python compose_images.py "Apply the art style from the first image to the scene in the second" styled.png style.png photo.png
 Note: Supports up to 14 reference images (Gemini 3 Pro only).
 Environment:
    GEMINI_API_KEY - Required API key
 """
 import argparse
 import os
 import sys
 from PIL import Image
 from google import genai
 from google.genai import types
 def compose_images(
    instruction: str,
    output_path: str,
    image_paths: list[str],
    model: str = "gemini-3-pro-image-preview",
    aspect_ratio: str | None = None,
    image_size: str | None = None,
 ) -> str | None:
    """Compose multiple images based on instructions.
    Args:
        instruction: Text description of how to combine images
        output_path: Path to save the result
        image_paths: List of input image paths (up to 14)
        model: Gemini model to use (pro recommended)
        aspect_ratio: Output aspect ratio
        image_size: Output resolution
    Returns:
        Any text response from the model, or None
    """
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise EnvironmentError("GEMINI_API_KEY environment variable not set")
    if len(image_paths) > 14:
        raise ValueError("Maximum 14 reference images supported")
    if len(image_paths) < 1:
        raise ValueError("At least one image is required")
    # Verify all images exist
    for path in image_paths:
        if not os.path.exists(path):
            raise FileNotFoundError(f"Image not found: {path}")
    client = genai.Client(api_key=api_key)
    # Load images
    images = [Image.open(path) for path in image_paths]
    # Build contents: instruction first, then images
    contents = [instruction] + images
    # Build config
    config_kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
    image_config_kwargs = {}
    if aspect_ratio:
        image_config_kwargs["aspect_ratio"] = aspect_ratio
    if image_size:
        image_config_kwargs["image_size"] = image_size
    if image_config_kwargs:
        config_kwargs["image_config"] = types.ImageConfig(**image_config_kwargs)
    config = types.GenerateContentConfig(**config_kwargs)
    response = client.models.generate_content(
        model=model,
        contents=contents,
        config=config,
    )
    text_response = None
    image_saved = False
    for part in response.parts:
        if part.text is not None:
            text_response = part.text
        elif part.inline_data is not None:
            image = part.as_image()
            image.save(output_path)
            image_saved = True
    if not image_saved:
        raise RuntimeError("No image was generated.")
    return text_response
 def main():
    parser = argparse.ArgumentParser(
        description="Compose multiple images using Gemini API",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__
    )
    parser.add_argument("instruction", help="Composition instruction")
    parser.add_argument("output", help="Output file path")
    parser.add_argument("images", nargs="+", help="Input images (up to 14)")
    parser.add_argument(
        "--model", "-m",
        default="gemini-3-pro-image-preview",
        choices=["gemini-2.5-flash-image", "gemini-3-pro-image-preview"],
        help="Model to use (pro recommended for composition)"
    )
    parser.add_argument(
        "--aspect", "-a",
        choices=["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"],
        help="Output aspect ratio"
    )
    parser.add_argument(
        "--size", "-s",
        choices=["1K", "2K", "4K"],
        help="Output resolution"
    )
    args = parser.parse_args()
    try:
        text = compose_images(
            instruction=args.instruction,
            output_path=args.output,
            image_paths=args.images,
            model=args.model,
            aspect_ratio=args.aspect,
            image_size=args.size,
        )
        print(f"Composed image saved to: {args.output}")
        if text:
            print(f"Model response: {text}")
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/skills/gemini-imagen/scripts/edit_image.py
+++ b/skills/gemini-imagen/scripts/edit_image.py
@@ -0,0 +1,148 @@
 #!/usr/bin/env -S uv run --script
 #
 # /// script
 # requires-python = ">=3.12"
 # dependencies = ["google-genai", "pillow"]
 # ///
 """
 Edit existing images using Gemini API (Nano Banana Pro).
 Usage:
    python edit_image.py input.png "edit instruction" output.png [options]
 Examples:
    python edit_image.py diagram.png "Add API Gateway component between client and services" edited.png
    python edit_image.py schema.png "Highlight the foreign key relationships in red" schema_edited.png
    python edit_image.py flowchart.png "Add error handling branch with red arrows" flowchart_v2.png
 Environment:
    GEMINI_API_KEY - Required API key
 """
 import argparse
 import os
 import sys
 from PIL import Image
 from google import genai
 from google.genai import types
 def edit_image(
    input_path: str,
    instruction: str,
    output_path: str,
    model: str = "gemini-3-pro-image-preview",
    aspect_ratio: str | None = None,
    image_size: str | None = None,
 ) -> str | None:
    """Edit an existing image based on text instructions using Nano Banana Pro.
    Args:
        input_path: Path to the input image
        instruction: Text description of edits to make
        output_path: Path to save the edited image
        model: Gemini model to use (defaults to Nano Banana Pro)
        aspect_ratio: Output aspect ratio
        image_size: Output resolution (up to 4K)
    Returns:
        Any text response from the model, or None
    """
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise EnvironmentError("GEMINI_API_KEY environment variable not set")
    if not os.path.exists(input_path):
        raise FileNotFoundError(f"Input image not found: {input_path}")
    client = genai.Client(api_key=api_key)
    # Load input image
    input_image = Image.open(input_path)
    # Build config
    config_kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
    image_config_kwargs = {}
    if aspect_ratio:
        image_config_kwargs["aspect_ratio"] = aspect_ratio
    if image_size:
        image_config_kwargs["image_size"] = image_size
    if image_config_kwargs:
        config_kwargs["image_config"] = types.ImageConfig(**image_config_kwargs)
    config = types.GenerateContentConfig(**config_kwargs)
    response = client.models.generate_content(
        model=model,
        contents=[instruction, input_image],
        config=config,
    )
    text_response = None
    image_saved = False
    for part in response.parts:
        if part.text is not None:
            text_response = part.text
        elif part.inline_data is not None:
            image = part.as_image()
            image.save(output_path)
            image_saved = True
    if not image_saved:
        raise RuntimeError("No image was generated. Check your instruction and try again.")
    return text_response
 def main():
    parser = argparse.ArgumentParser(
        description="Edit images using Gemini API",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__
    )
    parser.add_argument("input", help="Input image path")
    parser.add_argument("instruction", help="Edit instruction")
    parser.add_argument("output", help="Output file path")
    parser.add_argument(
        "--model", "-m",
        default="gemini-3-pro-image-preview",
        help="Model to use (default: gemini-3-pro-image-preview / Nano Banana Pro)"
    )
    parser.add_argument(
        "--aspect", "-a",
        choices=["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"],
        help="Output aspect ratio"
    )
    parser.add_argument(
        "--size", "-s",
        choices=["1K", "2K", "4K"],
        help="Output resolution"
    )
    args = parser.parse_args()
    try:
        text = edit_image(
            input_path=args.input,
            instruction=args.instruction,
            output_path=args.output,
            model=args.model,
            aspect_ratio=args.aspect,
            image_size=args.size,
        )
        print(f"Edited image saved to: {args.output}")
        if text:
            print(f"Model response: {text}")
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/skills/gemini-imagen/scripts/gemini_images.py
+++ b/skills/gemini-imagen/scripts/gemini_images.py
@@ -0,0 +1,269 @@
 #!/usr/bin/env -S uv run --script
 #
 # /// script
 # requires-python = ">=3.12"
 # dependencies = ["google-genai", "pillow"]
 # ///
 """
 Gemini Image Generation Library
 A simple Python library for generating and editing images with the Gemini API.
 Usage:
    from gemini_images import GeminiImageGenerator
    gen = GeminiImageGenerator()
    gen.generate("A sunset over mountains", "sunset.png")
    gen.edit("input.png", "Add clouds", "output.png")
 Environment:
    GEMINI_API_KEY - Required API key
 """
 import os
 from pathlib import Path
 from typing import Literal
 from PIL import Image
 from google import genai
 from google.genai import types
 AspectRatio = Literal["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"]
 ImageSize = Literal["1K", "2K", "4K"]
 Model = Literal["gemini-2.5-flash-image", "gemini-3-pro-image-preview"]
 class GeminiImageGenerator:
    """High-level interface for Gemini image generation."""
    FLASH = "gemini-2.5-flash-image"
    PRO = "gemini-3-pro-image-preview"
    def __init__(self, api_key: str | None = None, model: Model = FLASH):
        """Initialize the generator.
        Args:
            api_key: Gemini API key (defaults to GEMINI_API_KEY env var)
            model: Default model to use
        """
        self.api_key = api_key or os.environ.get("GEMINI_API_KEY")
        if not self.api_key:
            raise EnvironmentError("GEMINI_API_KEY not set")
        self.client = genai.Client(api_key=self.api_key)
        self.model = model
    def _build_config(
        self,
        aspect_ratio: AspectRatio | None = None,
        image_size: ImageSize | None = None,
        google_search: bool = False,
    ) -> types.GenerateContentConfig:
        """Build generation config."""
        kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
        img_config = {}
        if aspect_ratio:
            img_config["aspect_ratio"] = aspect_ratio
        if image_size:
            img_config["image_size"] = image_size
        if img_config:
            kwargs["image_config"] = types.ImageConfig(**img_config)
        if google_search:
            kwargs["tools"] = [{"google_search": {}}]
        return types.GenerateContentConfig(**kwargs)
    def generate(
        self,
        prompt: str,
        output: str | Path,
        *,
        model: Model | None = None,
        aspect_ratio: AspectRatio | None = None,
        image_size: ImageSize | None = None,
        google_search: bool = False,
    ) -> tuple[Path, str | None]:
        """Generate an image from a text prompt.
        Args:
            prompt: Text description
            output: Output file path
            model: Override default model
            aspect_ratio: Output aspect ratio
            image_size: Output resolution
            google_search: Enable Google Search grounding (Pro only)
        Returns:
            Tuple of (output path, optional text response)
        """
        output = Path(output)
        config = self._build_config(aspect_ratio, image_size, google_search)
        response = self.client.models.generate_content(
            model=model or self.model,
            contents=[prompt],
            config=config,
        )
        text = None
        for part in response.parts:
            if part.text:
                text = part.text
            elif part.inline_data:
                part.as_image().save(output)
        return output, text
    def edit(
        self,
        input_image: str | Path | Image.Image,
        instruction: str,
        output: str | Path,
        *,
        model: Model | None = None,
        aspect_ratio: AspectRatio | None = None,
        image_size: ImageSize | None = None,
    ) -> tuple[Path, str | None]:
        """Edit an existing image.
        Args:
            input_image: Input image (path or PIL Image)
            instruction: Edit instruction
            output: Output file path
            model: Override default model
            aspect_ratio: Output aspect ratio
            image_size: Output resolution
        Returns:
            Tuple of (output path, optional text response)
        """
        output = Path(output)
        if isinstance(input_image, (str, Path)):
            input_image = Image.open(input_image)
        config = self._build_config(aspect_ratio, image_size)
        response = self.client.models.generate_content(
            model=model or self.model,
            contents=[instruction, input_image],
            config=config,
        )
        text = None
        for part in response.parts:
            if part.text:
                text = part.text
            elif part.inline_data:
                part.as_image().save(output)
        return output, text
    def compose(
        self,
        instruction: str,
        images: list[str | Path | Image.Image],
        output: str | Path,
        *,
        model: Model | None = None,
        aspect_ratio: AspectRatio | None = None,
        image_size: ImageSize | None = None,
    ) -> tuple[Path, str | None]:
        """Compose multiple images into one.
        Args:
            instruction: Composition instruction
            images: List of input images (up to 14)
            output: Output file path
            model: Override default model (Pro recommended)
            aspect_ratio: Output aspect ratio
            image_size: Output resolution
        Returns:
            Tuple of (output path, optional text response)
        """
        output = Path(output)
        # Load images
        loaded = []
        for img in images:
            if isinstance(img, (str, Path)):
                loaded.append(Image.open(img))
            else:
                loaded.append(img)
        config = self._build_config(aspect_ratio, image_size)
        contents = [instruction] + loaded
        response = self.client.models.generate_content(
            model=model or self.PRO,  # Pro recommended for composition
            contents=contents,
            config=config,
        )
        text = None
        for part in response.parts:
            if part.text:
                text = part.text
            elif part.inline_data:
                part.as_image().save(output)
        return output, text
    def chat(self) -> "ImageChat":
        """Start an interactive chat session for iterative refinement."""
        return ImageChat(self.client, self.model)
 class ImageChat:
    """Multi-turn chat session for iterative image generation."""
    def __init__(self, client: genai.Client, model: Model):
        self.client = client
        self.model = model
        self._chat = client.chats.create(
            model=model,
            config=types.GenerateContentConfig(response_modalities=["TEXT", "IMAGE"]),
        )
        self.current_image: Image.Image | None = None
    def send(
        self,
        message: str,
        image: Image.Image | str | Path | None = None,
    ) -> tuple[Image.Image | None, str | None]:
        """Send a message and optionally an image.
        Returns:
            Tuple of (generated image or None, text response or None)
        """
        contents = [message]
        if image:
            if isinstance(image, (str, Path)):
                image = Image.open(image)
            contents.append(image)
        response = self._chat.send_message(contents)
        text = None
        img = None
        for part in response.parts:
            if part.text:
                text = part.text
            elif part.inline_data:
                img = part.as_image()
                self.current_image = img
        return img, text
    def reset(self):
        """Reset the chat session."""
        self._chat = self.client.chats.create(
            model=self.model,
            config=types.GenerateContentConfig(response_modalities=["TEXT", "IMAGE"]),
        )
        self.current_image = None
--- a/skills/gemini-imagen/scripts/generate_image.py
+++ b/skills/gemini-imagen/scripts/generate_image.py
@@ -0,0 +1,137 @@
 #!/usr/bin/env -S uv run --script
 #
 # /// script
 # requires-python = ">=3.12"
 # dependencies = ["google-genai", "pillow"]
 # ///
 """
 Generate images from text prompts using Gemini API (Nano Banana Pro).
 Usage:
    python generate_image.py "prompt" output.png [--aspect RATIO] [--size SIZE]
 Examples:
    python generate_image.py "Microservices architecture diagram with labeled components" diagram.png
    python generate_image.py "Logo for Acme Corp, clean sans-serif text" logo.png --aspect 1:1 --size 4K
    python generate_image.py "OAuth flow diagram with numbered steps" flow.png --aspect 16:9 --size 2K
 Environment:
    GEMINI_API_KEY - Required API key
 """
 import argparse
 import os
 import sys
 from google import genai
 from google.genai import types
 def generate_image(
    prompt: str,
    output_path: str,
    model: str = "gemini-3-pro-image-preview",
    aspect_ratio: str | None = None,
    image_size: str | None = None,
 ) -> str | None:
    """Generate an image from a text prompt using Nano Banana Pro.
    Args:
        prompt: Text description of the image to generate
        output_path: Path to save the generated image
        model: Gemini model to use (defaults to Nano Banana Pro)
        aspect_ratio: Aspect ratio (1:1, 16:9, 9:16, etc.)
        image_size: Resolution (1K, 2K, 4K)
    Returns:
        Any text response from the model, or None
    """
    api_key = os.environ.get("GEMINI_API_KEY")
    if not api_key:
        raise EnvironmentError("GEMINI_API_KEY environment variable not set")
    client = genai.Client(api_key=api_key)
    # Build config
    config_kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
    image_config_kwargs = {}
    if aspect_ratio:
        image_config_kwargs["aspect_ratio"] = aspect_ratio
    if image_size:
        image_config_kwargs["image_size"] = image_size
    if image_config_kwargs:
        config_kwargs["image_config"] = types.ImageConfig(**image_config_kwargs)
    config = types.GenerateContentConfig(**config_kwargs)
    response = client.models.generate_content(
        model=model,
        contents=[prompt],
        config=config,
    )
    text_response = None
    image_saved = False
    for part in response.parts:
        if part.text is not None:
            text_response = part.text
        elif part.inline_data is not None:
            image = part.as_image()
            image.save(output_path)
            image_saved = True
    if not image_saved:
        raise RuntimeError("No image was generated. Check your prompt and try again.")
    return text_response
 def main():
    parser = argparse.ArgumentParser(
        description="Generate images from text prompts using Gemini API",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__
    )
    parser.add_argument("prompt", help="Text prompt describing the image")
    parser.add_argument("output", help="Output file path (e.g., output.png)")
    parser.add_argument(
        "--model", "-m",
        default="gemini-3-pro-image-preview",
        help="Model to use (default: gemini-3-pro-image-preview / Nano Banana Pro)"
    )
    parser.add_argument(
        "--aspect", "-a",
        choices=["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"],
        help="Aspect ratio"
    )
    parser.add_argument(
        "--size", "-s",
        choices=["1K", "2K", "4K"],
        help="Image resolution (up to 4K with Nano Banana Pro)"
    )
    args = parser.parse_args()
    try:
        text = generate_image(
            prompt=args.prompt,
            output_path=args.output,
            model=args.model,
            aspect_ratio=args.aspect,
            image_size=args.size,
        )
        print(f"Image saved to: {args.output}")
        if text:
            print(f"Model response: {text}")
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)
 if __name__ == "__main__":
    main()
--- a/skills/gemini-imagen/scripts/multi_turn_chat.py
+++ b/skills/gemini-imagen/scripts/multi_turn_chat.py
@@ -0,0 +1,220 @@
 #!/usr/bin/env -S uv run --script
 #
 # /// script
 # requires-python = ">=3.12"
 # dependencies = ["google-genai", "pillow"]
 # ///
 """
 Interactive multi-turn image generation and refinement using Gemini API (Nano Banana Pro).
 Usage:
    python multi_turn_chat.py [--output-dir DIR]
 This starts an interactive session where you can:
 - Generate technical diagrams and illustrations from prompts
 - Iteratively refine images through conversation
 - Load existing images for editing
 - Save images at any point
 Commands:
    /save [filename]  - Save current image
    /load <path>      - Load an image into the conversation
    /clear            - Start fresh conversation
    /quit             - Exit
 Environment:
    GEMINI_API_KEY - Required API key
 """
 import argparse
 import os
 import sys
 from datetime import datetime
 from pathlib import Path
 from PIL import Image
 from google import genai
 from google.genai import types
 class ImageChat:
    """Interactive chat session for image generation and refinement using Nano Banana Pro."""
    def __init__(
        self,
        model: str = "gemini-3-pro-image-preview",
        output_dir: str = ".",
    ):
        api_key = os.environ.get("GEMINI_API_KEY")
        if not api_key:
            raise EnvironmentError("GEMINI_API_KEY environment variable not set")
        self.client = genai.Client(api_key=api_key)
        self.model = model
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        self.chat = None
        self.current_image = None
        self.image_count = 0
        self._init_chat()
    def _init_chat(self):
        """Initialize or reset the chat session."""
        config = types.GenerateContentConfig(
            response_modalities=["TEXT", "IMAGE"]
        )
        self.chat = self.client.chats.create(
            model=self.model,
            config=config,
        )
        self.current_image = None
    def send_message(self, message: str, image: Image.Image | None = None) -> tuple[str | None, Image.Image | None]:
        """Send a message and optionally an image, return response text and image."""
        contents = []
        if message:
            contents.append(message)
        if image:
            contents.append(image)
        if not contents:
            return None, None
        response = self.chat.send_message(contents)
        text_response = None
        image_response = None
        for part in response.parts:
            if part.text is not None:
                text_response = part.text
            elif part.inline_data is not None:
                image_response = part.as_image()
                self.current_image = image_response
        return text_response, image_response
    def save_image(self, filename: str | None = None) -> str | None:
        """Save the current image to a file."""
        if self.current_image is None:
            return None
        if filename is None:
            self.image_count += 1
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"image_{timestamp}_{self.image_count}.png"
        filepath = self.output_dir / filename
        self.current_image.save(filepath)
        return str(filepath)
    def load_image(self, path: str) -> Image.Image:
        """Load an image from disk."""
        img = Image.open(path)
        self.current_image = img
        return img
 def main():
    parser = argparse.ArgumentParser(
        description="Interactive multi-turn image generation using Nano Banana Pro",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=__doc__
    )
    parser.add_argument(
        "--model", "-m",
        default="gemini-3-pro-image-preview",
        help="Model to use (default: gemini-3-pro-image-preview / Nano Banana Pro)"
    )
    parser.add_argument(
        "--output-dir", "-o",
        default=".",
        help="Directory to save images"
    )
    args = parser.parse_args()
    try:
        chat = ImageChat(model=args.model, output_dir=args.output_dir)
    except Exception as e:
        print(f"Error initializing: {e}", file=sys.stderr)
        sys.exit(1)
    print(f"Gemini Image Chat ({args.model})")
    print("Commands: /save [name], /load <path>, /clear, /quit")
    print("-" * 50)
    while True:
        try:
            user_input = input("\nYou: ").strip()
        except (EOFError, KeyboardInterrupt):
            print("\nGoodbye!")
            break
        if not user_input:
            continue
        # Handle commands
        if user_input.startswith("/"):
            parts = user_input.split(maxsplit=1)
            cmd = parts[0].lower()
            arg = parts[1] if len(parts) > 1 else None
            if cmd == "/quit":
                print("Goodbye!")
                break
            elif cmd == "/clear":
                chat._init_chat()
                print("Conversation cleared.")
                continue
            elif cmd == "/save":
                path = chat.save_image(arg)
                if path:
                    print(f"Image saved to: {path}")
                else:
                    print("No image to save.")
                continue
            elif cmd == "/load":
                if not arg:
                    print("Usage: /load <path>")
                    continue
                try:
                    chat.load_image(arg)
                    print(f"Loaded: {arg}")
                    print("You can now describe edits to make.")
                except Exception as e:
                    print(f"Error loading image: {e}")
                continue
            else:
                print(f"Unknown command: {cmd}")
                continue
        # Send message to model
        try:
            # If we have a loaded image and this is first message, include it
            image_to_send = None
            if chat.current_image and not chat.chat.history:
                image_to_send = chat.current_image
            text, image = chat.send_message(user_input, image_to_send)
            if text:
                print(f"\nGemini: {text}")
            if image:
                # Auto-save
                path = chat.save_image()
                print(f"\n[Image generated: {path}]")
        except Exception as e:
            print(f"\nError: {e}")
 if __name__ == "__main__":
    main()
		`@@ -0,0 +1,3 @@`
							`# explore-with-illustrations`

							`Generate and edit images using Gemini API (Nano Banana Pro). Specialized for creating high-quality technical illustrations, architecture diagrams, code concept visualizations, and educational content from codebases.`