Initial commit
This commit is contained in:
190
skills/gemini-imagegen/SKILL.md
Normal file
190
skills/gemini-imagegen/SKILL.md
Normal file
@@ -0,0 +1,190 @@
|
||||
---
|
||||
name: gemini-imagegen
|
||||
description: Generate and edit images using the Gemini API (Nano Banana). Use this skill when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.
|
||||
---
|
||||
|
||||
# Gemini Image Generation (Nano Banana)
|
||||
|
||||
Generate and edit images using Google's Gemini API. The environment variable `GEMINI_API_KEY` must be set.
|
||||
|
||||
## Available Models
|
||||
|
||||
| Model | Alias | Resolution | Best For |
|
||||
|-------|-------|------------|----------|
|
||||
| `gemini-2.5-flash-image` | Nano Banana | 1024px | Speed, high-volume tasks |
|
||||
| `gemini-3-pro-image-preview` | Nano Banana Pro | Up to 4K | Professional assets, complex instructions, text rendering |
|
||||
|
||||
## Quick Start Scripts
|
||||
|
||||
### Text-to-Image
|
||||
```bash
|
||||
python scripts/generate_image.py "A cat wearing a wizard hat" output.png
|
||||
```
|
||||
|
||||
### Edit Existing Image
|
||||
```bash
|
||||
python scripts/edit_image.py input.png "Add a rainbow in the background" output.png
|
||||
```
|
||||
|
||||
### Multi-Turn Chat (Iterative Refinement)
|
||||
```bash
|
||||
python scripts/multi_turn_chat.py
|
||||
```
|
||||
|
||||
## Core API Pattern
|
||||
|
||||
All image generation uses the `generateContent` endpoint with `responseModalities: ["TEXT", "IMAGE"]`:
|
||||
|
||||
```python
|
||||
import os
|
||||
import base64
|
||||
from google import genai
|
||||
|
||||
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-2.5-flash-image",
|
||||
contents=["Your prompt here"],
|
||||
)
|
||||
|
||||
for part in response.parts:
|
||||
if part.text:
|
||||
print(part.text)
|
||||
elif part.inline_data:
|
||||
image = part.as_image()
|
||||
image.save("output.png")
|
||||
```
|
||||
|
||||
## Image Configuration Options
|
||||
|
||||
Control output with `image_config`:
|
||||
|
||||
```python
|
||||
from google.genai import types
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=[prompt],
|
||||
config=types.GenerateContentConfig(
|
||||
response_modalities=['TEXT', 'IMAGE'],
|
||||
image_config=types.ImageConfig(
|
||||
aspect_ratio="16:9", # 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
|
||||
image_size="2K" # 1K, 2K, 4K (Pro only for 4K)
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
## Editing Images
|
||||
|
||||
Pass existing images with text prompts:
|
||||
|
||||
```python
|
||||
from PIL import Image
|
||||
|
||||
img = Image.open("input.png")
|
||||
response = client.models.generate_content(
|
||||
model="gemini-2.5-flash-image",
|
||||
contents=["Add a sunset to this scene", img],
|
||||
)
|
||||
```
|
||||
|
||||
## Multi-Turn Refinement
|
||||
|
||||
Use chat for iterative editing:
|
||||
|
||||
```python
|
||||
from google.genai import types
|
||||
|
||||
chat = client.chats.create(
|
||||
model="gemini-2.5-flash-image",
|
||||
config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
|
||||
)
|
||||
|
||||
response = chat.send_message("Create a logo for 'Acme Corp'")
|
||||
# Save first image...
|
||||
|
||||
response = chat.send_message("Make the text bolder and add a blue gradient")
|
||||
# Save refined image...
|
||||
```
|
||||
|
||||
## Prompting Best Practices
|
||||
|
||||
### Photorealistic Scenes
|
||||
Include camera details: lens type, lighting, angle, mood.
|
||||
> "A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
|
||||
|
||||
### Stylized Art
|
||||
Specify style explicitly:
|
||||
> "A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
|
||||
|
||||
### Text in Images
|
||||
Be explicit about font style and placement. Use `gemini-3-pro-image-preview` for best results:
|
||||
> "Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
|
||||
|
||||
### Product Mockups
|
||||
Describe lighting setup and surface:
|
||||
> "Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
|
||||
|
||||
### Landing Pages
|
||||
Specify layout structure, color scheme, and target audience:
|
||||
> "Modern landing page hero section, gradient background from deep purple to blue, centered headline with CTA button, clean minimalist design, SaaS product"
|
||||
|
||||
> "Landing page for fitness app, energetic layout with workout photos, bright orange and black color scheme, mobile-first design, prominent download buttons"
|
||||
|
||||
### Website Design Ideas
|
||||
Describe overall aesthetic, navigation style, and content hierarchy:
|
||||
> "E-commerce homepage wireframe, grid layout for products, sticky navigation bar, warm earth tones, plenty of whitespace, professional photography style"
|
||||
|
||||
> "Portfolio website for photographer, full-screen image galleries, dark mode interface, elegant serif typography, minimal UI elements to highlight work"
|
||||
|
||||
> "Tech startup homepage, glassmorphism design trend, floating cards, neon accent colors on dark background, modern illustrations, hero section with product demo"
|
||||
|
||||
## Advanced Features (Pro Model Only)
|
||||
|
||||
### Google Search Grounding
|
||||
Generate images based on real-time data:
|
||||
|
||||
```python
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=["Visualize today's weather in Tokyo as an infographic"],
|
||||
config=types.GenerateContentConfig(
|
||||
response_modalities=['TEXT', 'IMAGE'],
|
||||
tools=[{"google_search": {}}]
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Multiple Reference Images (Up to 14)
|
||||
Combine elements from multiple sources:
|
||||
|
||||
```python
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=[
|
||||
"Create a group photo of these people in an office",
|
||||
Image.open("person1.png"),
|
||||
Image.open("person2.png"),
|
||||
Image.open("person3.png"),
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
## REST API (curl)
|
||||
|
||||
```bash
|
||||
curl -s -X POST \
|
||||
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \
|
||||
-H "x-goog-api-key: $GEMINI_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"contents": [{"parts": [{"text": "A serene mountain landscape"}]}]
|
||||
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- All generated images include SynthID watermarks
|
||||
- Image-only mode (`responseModalities: ["IMAGE"]`) won't work with Google Search grounding
|
||||
- For editing, describe changes conversationally—the model understands semantic masking
|
||||
157
skills/gemini-imagegen/scripts/compose_images.py
Executable file
157
skills/gemini-imagegen/scripts/compose_images.py
Executable file
@@ -0,0 +1,157 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Compose multiple images into a new image using Gemini API.
|
||||
|
||||
Usage:
|
||||
python compose_images.py "instruction" output.png image1.png [image2.png ...]
|
||||
|
||||
Examples:
|
||||
python compose_images.py "Create a group photo of these people" group.png person1.png person2.png
|
||||
python compose_images.py "Put the cat from the first image on the couch from the second" result.png cat.png couch.png
|
||||
python compose_images.py "Apply the art style from the first image to the scene in the second" styled.png style.png photo.png
|
||||
|
||||
Note: Supports up to 14 reference images (Gemini 3 Pro only).
|
||||
|
||||
Environment:
|
||||
GEMINI_API_KEY - Required API key
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
from PIL import Image
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
|
||||
|
||||
def compose_images(
|
||||
instruction: str,
|
||||
output_path: str,
|
||||
image_paths: list[str],
|
||||
model: str = "gemini-3-pro-image-preview",
|
||||
aspect_ratio: str | None = None,
|
||||
image_size: str | None = None,
|
||||
) -> str | None:
|
||||
"""Compose multiple images based on instructions.
|
||||
|
||||
Args:
|
||||
instruction: Text description of how to combine images
|
||||
output_path: Path to save the result
|
||||
image_paths: List of input image paths (up to 14)
|
||||
model: Gemini model to use (pro recommended)
|
||||
aspect_ratio: Output aspect ratio
|
||||
image_size: Output resolution
|
||||
|
||||
Returns:
|
||||
Any text response from the model, or None
|
||||
"""
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
raise EnvironmentError("GEMINI_API_KEY environment variable not set")
|
||||
|
||||
if len(image_paths) > 14:
|
||||
raise ValueError("Maximum 14 reference images supported")
|
||||
|
||||
if len(image_paths) < 1:
|
||||
raise ValueError("At least one image is required")
|
||||
|
||||
# Verify all images exist
|
||||
for path in image_paths:
|
||||
if not os.path.exists(path):
|
||||
raise FileNotFoundError(f"Image not found: {path}")
|
||||
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Load images
|
||||
images = [Image.open(path) for path in image_paths]
|
||||
|
||||
# Build contents: instruction first, then images
|
||||
contents = [instruction] + images
|
||||
|
||||
# Build config
|
||||
config_kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
|
||||
|
||||
image_config_kwargs = {}
|
||||
if aspect_ratio:
|
||||
image_config_kwargs["aspect_ratio"] = aspect_ratio
|
||||
if image_size:
|
||||
image_config_kwargs["image_size"] = image_size
|
||||
|
||||
if image_config_kwargs:
|
||||
config_kwargs["image_config"] = types.ImageConfig(**image_config_kwargs)
|
||||
|
||||
config = types.GenerateContentConfig(**config_kwargs)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model=model,
|
||||
contents=contents,
|
||||
config=config,
|
||||
)
|
||||
|
||||
text_response = None
|
||||
image_saved = False
|
||||
|
||||
for part in response.parts:
|
||||
if part.text is not None:
|
||||
text_response = part.text
|
||||
elif part.inline_data is not None:
|
||||
image = part.as_image()
|
||||
image.save(output_path)
|
||||
image_saved = True
|
||||
|
||||
if not image_saved:
|
||||
raise RuntimeError("No image was generated.")
|
||||
|
||||
return text_response
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Compose multiple images using Gemini API",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__
|
||||
)
|
||||
parser.add_argument("instruction", help="Composition instruction")
|
||||
parser.add_argument("output", help="Output file path")
|
||||
parser.add_argument("images", nargs="+", help="Input images (up to 14)")
|
||||
parser.add_argument(
|
||||
"--model", "-m",
|
||||
default="gemini-3-pro-image-preview",
|
||||
choices=["gemini-2.5-flash-image", "gemini-3-pro-image-preview"],
|
||||
help="Model to use (pro recommended for composition)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--aspect", "-a",
|
||||
choices=["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"],
|
||||
help="Output aspect ratio"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--size", "-s",
|
||||
choices=["1K", "2K", "4K"],
|
||||
help="Output resolution"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
text = compose_images(
|
||||
instruction=args.instruction,
|
||||
output_path=args.output,
|
||||
image_paths=args.images,
|
||||
model=args.model,
|
||||
aspect_ratio=args.aspect,
|
||||
image_size=args.size,
|
||||
)
|
||||
|
||||
print(f"Composed image saved to: {args.output}")
|
||||
if text:
|
||||
print(f"Model response: {text}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
144
skills/gemini-imagegen/scripts/edit_image.py
Executable file
144
skills/gemini-imagegen/scripts/edit_image.py
Executable file
@@ -0,0 +1,144 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Edit existing images using Gemini API.
|
||||
|
||||
Usage:
|
||||
python edit_image.py input.png "edit instruction" output.png [options]
|
||||
|
||||
Examples:
|
||||
python edit_image.py photo.png "Add a rainbow in the sky" edited.png
|
||||
python edit_image.py room.jpg "Change the sofa to red leather" room_edited.jpg
|
||||
python edit_image.py portrait.png "Make it look like a Van Gogh painting" artistic.png --model gemini-3-pro-image-preview
|
||||
|
||||
Environment:
|
||||
GEMINI_API_KEY - Required API key
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
from PIL import Image
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
|
||||
|
||||
def edit_image(
|
||||
input_path: str,
|
||||
instruction: str,
|
||||
output_path: str,
|
||||
model: str = "gemini-2.5-flash-image",
|
||||
aspect_ratio: str | None = None,
|
||||
image_size: str | None = None,
|
||||
) -> str | None:
|
||||
"""Edit an existing image based on text instructions.
|
||||
|
||||
Args:
|
||||
input_path: Path to the input image
|
||||
instruction: Text description of edits to make
|
||||
output_path: Path to save the edited image
|
||||
model: Gemini model to use
|
||||
aspect_ratio: Output aspect ratio
|
||||
image_size: Output resolution
|
||||
|
||||
Returns:
|
||||
Any text response from the model, or None
|
||||
"""
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
raise EnvironmentError("GEMINI_API_KEY environment variable not set")
|
||||
|
||||
if not os.path.exists(input_path):
|
||||
raise FileNotFoundError(f"Input image not found: {input_path}")
|
||||
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Load input image
|
||||
input_image = Image.open(input_path)
|
||||
|
||||
# Build config
|
||||
config_kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
|
||||
|
||||
image_config_kwargs = {}
|
||||
if aspect_ratio:
|
||||
image_config_kwargs["aspect_ratio"] = aspect_ratio
|
||||
if image_size:
|
||||
image_config_kwargs["image_size"] = image_size
|
||||
|
||||
if image_config_kwargs:
|
||||
config_kwargs["image_config"] = types.ImageConfig(**image_config_kwargs)
|
||||
|
||||
config = types.GenerateContentConfig(**config_kwargs)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model=model,
|
||||
contents=[instruction, input_image],
|
||||
config=config,
|
||||
)
|
||||
|
||||
text_response = None
|
||||
image_saved = False
|
||||
|
||||
for part in response.parts:
|
||||
if part.text is not None:
|
||||
text_response = part.text
|
||||
elif part.inline_data is not None:
|
||||
image = part.as_image()
|
||||
image.save(output_path)
|
||||
image_saved = True
|
||||
|
||||
if not image_saved:
|
||||
raise RuntimeError("No image was generated. Check your instruction and try again.")
|
||||
|
||||
return text_response
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Edit images using Gemini API",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__
|
||||
)
|
||||
parser.add_argument("input", help="Input image path")
|
||||
parser.add_argument("instruction", help="Edit instruction")
|
||||
parser.add_argument("output", help="Output file path")
|
||||
parser.add_argument(
|
||||
"--model", "-m",
|
||||
default="gemini-2.5-flash-image",
|
||||
choices=["gemini-2.5-flash-image", "gemini-3-pro-image-preview"],
|
||||
help="Model to use (default: gemini-2.5-flash-image)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--aspect", "-a",
|
||||
choices=["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"],
|
||||
help="Output aspect ratio"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--size", "-s",
|
||||
choices=["1K", "2K", "4K"],
|
||||
help="Output resolution"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
text = edit_image(
|
||||
input_path=args.input,
|
||||
instruction=args.instruction,
|
||||
output_path=args.output,
|
||||
model=args.model,
|
||||
aspect_ratio=args.aspect,
|
||||
image_size=args.size,
|
||||
)
|
||||
|
||||
print(f"Edited image saved to: {args.output}")
|
||||
if text:
|
||||
print(f"Model response: {text}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
263
skills/gemini-imagegen/scripts/gemini_images.py
Executable file
263
skills/gemini-imagegen/scripts/gemini_images.py
Executable file
@@ -0,0 +1,263 @@
|
||||
"""
|
||||
Gemini Image Generation Library
|
||||
|
||||
A simple Python library for generating and editing images with the Gemini API.
|
||||
|
||||
Usage:
|
||||
from gemini_images import GeminiImageGenerator
|
||||
|
||||
gen = GeminiImageGenerator()
|
||||
gen.generate("A sunset over mountains", "sunset.png")
|
||||
gen.edit("input.png", "Add clouds", "output.png")
|
||||
|
||||
Environment:
|
||||
GEMINI_API_KEY - Required API key
|
||||
"""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Literal
|
||||
|
||||
from PIL import Image
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
|
||||
|
||||
AspectRatio = Literal["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"]
|
||||
ImageSize = Literal["1K", "2K", "4K"]
|
||||
Model = Literal["gemini-2.5-flash-image", "gemini-3-pro-image-preview"]
|
||||
|
||||
|
||||
class GeminiImageGenerator:
|
||||
"""High-level interface for Gemini image generation."""
|
||||
|
||||
FLASH = "gemini-2.5-flash-image"
|
||||
PRO = "gemini-3-pro-image-preview"
|
||||
|
||||
def __init__(self, api_key: str | None = None, model: Model = FLASH):
|
||||
"""Initialize the generator.
|
||||
|
||||
Args:
|
||||
api_key: Gemini API key (defaults to GEMINI_API_KEY env var)
|
||||
model: Default model to use
|
||||
"""
|
||||
self.api_key = api_key or os.environ.get("GEMINI_API_KEY")
|
||||
if not self.api_key:
|
||||
raise EnvironmentError("GEMINI_API_KEY not set")
|
||||
|
||||
self.client = genai.Client(api_key=self.api_key)
|
||||
self.model = model
|
||||
|
||||
def _build_config(
|
||||
self,
|
||||
aspect_ratio: AspectRatio | None = None,
|
||||
image_size: ImageSize | None = None,
|
||||
google_search: bool = False,
|
||||
) -> types.GenerateContentConfig:
|
||||
"""Build generation config."""
|
||||
kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
|
||||
|
||||
img_config = {}
|
||||
if aspect_ratio:
|
||||
img_config["aspect_ratio"] = aspect_ratio
|
||||
if image_size:
|
||||
img_config["image_size"] = image_size
|
||||
|
||||
if img_config:
|
||||
kwargs["image_config"] = types.ImageConfig(**img_config)
|
||||
|
||||
if google_search:
|
||||
kwargs["tools"] = [{"google_search": {}}]
|
||||
|
||||
return types.GenerateContentConfig(**kwargs)
|
||||
|
||||
def generate(
|
||||
self,
|
||||
prompt: str,
|
||||
output: str | Path,
|
||||
*,
|
||||
model: Model | None = None,
|
||||
aspect_ratio: AspectRatio | None = None,
|
||||
image_size: ImageSize | None = None,
|
||||
google_search: bool = False,
|
||||
) -> tuple[Path, str | None]:
|
||||
"""Generate an image from a text prompt.
|
||||
|
||||
Args:
|
||||
prompt: Text description
|
||||
output: Output file path
|
||||
model: Override default model
|
||||
aspect_ratio: Output aspect ratio
|
||||
image_size: Output resolution
|
||||
google_search: Enable Google Search grounding (Pro only)
|
||||
|
||||
Returns:
|
||||
Tuple of (output path, optional text response)
|
||||
"""
|
||||
output = Path(output)
|
||||
config = self._build_config(aspect_ratio, image_size, google_search)
|
||||
|
||||
response = self.client.models.generate_content(
|
||||
model=model or self.model,
|
||||
contents=[prompt],
|
||||
config=config,
|
||||
)
|
||||
|
||||
text = None
|
||||
for part in response.parts:
|
||||
if part.text:
|
||||
text = part.text
|
||||
elif part.inline_data:
|
||||
part.as_image().save(output)
|
||||
|
||||
return output, text
|
||||
|
||||
def edit(
|
||||
self,
|
||||
input_image: str | Path | Image.Image,
|
||||
instruction: str,
|
||||
output: str | Path,
|
||||
*,
|
||||
model: Model | None = None,
|
||||
aspect_ratio: AspectRatio | None = None,
|
||||
image_size: ImageSize | None = None,
|
||||
) -> tuple[Path, str | None]:
|
||||
"""Edit an existing image.
|
||||
|
||||
Args:
|
||||
input_image: Input image (path or PIL Image)
|
||||
instruction: Edit instruction
|
||||
output: Output file path
|
||||
model: Override default model
|
||||
aspect_ratio: Output aspect ratio
|
||||
image_size: Output resolution
|
||||
|
||||
Returns:
|
||||
Tuple of (output path, optional text response)
|
||||
"""
|
||||
output = Path(output)
|
||||
|
||||
if isinstance(input_image, (str, Path)):
|
||||
input_image = Image.open(input_image)
|
||||
|
||||
config = self._build_config(aspect_ratio, image_size)
|
||||
|
||||
response = self.client.models.generate_content(
|
||||
model=model or self.model,
|
||||
contents=[instruction, input_image],
|
||||
config=config,
|
||||
)
|
||||
|
||||
text = None
|
||||
for part in response.parts:
|
||||
if part.text:
|
||||
text = part.text
|
||||
elif part.inline_data:
|
||||
part.as_image().save(output)
|
||||
|
||||
return output, text
|
||||
|
||||
def compose(
|
||||
self,
|
||||
instruction: str,
|
||||
images: list[str | Path | Image.Image],
|
||||
output: str | Path,
|
||||
*,
|
||||
model: Model | None = None,
|
||||
aspect_ratio: AspectRatio | None = None,
|
||||
image_size: ImageSize | None = None,
|
||||
) -> tuple[Path, str | None]:
|
||||
"""Compose multiple images into one.
|
||||
|
||||
Args:
|
||||
instruction: Composition instruction
|
||||
images: List of input images (up to 14)
|
||||
output: Output file path
|
||||
model: Override default model (Pro recommended)
|
||||
aspect_ratio: Output aspect ratio
|
||||
image_size: Output resolution
|
||||
|
||||
Returns:
|
||||
Tuple of (output path, optional text response)
|
||||
"""
|
||||
output = Path(output)
|
||||
|
||||
# Load images
|
||||
loaded = []
|
||||
for img in images:
|
||||
if isinstance(img, (str, Path)):
|
||||
loaded.append(Image.open(img))
|
||||
else:
|
||||
loaded.append(img)
|
||||
|
||||
config = self._build_config(aspect_ratio, image_size)
|
||||
contents = [instruction] + loaded
|
||||
|
||||
response = self.client.models.generate_content(
|
||||
model=model or self.PRO, # Pro recommended for composition
|
||||
contents=contents,
|
||||
config=config,
|
||||
)
|
||||
|
||||
text = None
|
||||
for part in response.parts:
|
||||
if part.text:
|
||||
text = part.text
|
||||
elif part.inline_data:
|
||||
part.as_image().save(output)
|
||||
|
||||
return output, text
|
||||
|
||||
def chat(self) -> "ImageChat":
|
||||
"""Start an interactive chat session for iterative refinement."""
|
||||
return ImageChat(self.client, self.model)
|
||||
|
||||
|
||||
class ImageChat:
|
||||
"""Multi-turn chat session for iterative image generation."""
|
||||
|
||||
def __init__(self, client: genai.Client, model: Model):
|
||||
self.client = client
|
||||
self.model = model
|
||||
self._chat = client.chats.create(
|
||||
model=model,
|
||||
config=types.GenerateContentConfig(response_modalities=["TEXT", "IMAGE"]),
|
||||
)
|
||||
self.current_image: Image.Image | None = None
|
||||
|
||||
def send(
|
||||
self,
|
||||
message: str,
|
||||
image: Image.Image | str | Path | None = None,
|
||||
) -> tuple[Image.Image | None, str | None]:
|
||||
"""Send a message and optionally an image.
|
||||
|
||||
Returns:
|
||||
Tuple of (generated image or None, text response or None)
|
||||
"""
|
||||
contents = [message]
|
||||
if image:
|
||||
if isinstance(image, (str, Path)):
|
||||
image = Image.open(image)
|
||||
contents.append(image)
|
||||
|
||||
response = self._chat.send_message(contents)
|
||||
|
||||
text = None
|
||||
img = None
|
||||
for part in response.parts:
|
||||
if part.text:
|
||||
text = part.text
|
||||
elif part.inline_data:
|
||||
img = part.as_image()
|
||||
self.current_image = img
|
||||
|
||||
return img, text
|
||||
|
||||
def reset(self):
|
||||
"""Reset the chat session."""
|
||||
self._chat = self.client.chats.create(
|
||||
model=self.model,
|
||||
config=types.GenerateContentConfig(response_modalities=["TEXT", "IMAGE"]),
|
||||
)
|
||||
self.current_image = None
|
||||
133
skills/gemini-imagegen/scripts/generate_image.py
Executable file
133
skills/gemini-imagegen/scripts/generate_image.py
Executable file
@@ -0,0 +1,133 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Generate images from text prompts using Gemini API.
|
||||
|
||||
Usage:
|
||||
python generate_image.py "prompt" output.png [--model MODEL] [--aspect RATIO] [--size SIZE]
|
||||
|
||||
Examples:
|
||||
python generate_image.py "A cat in space" cat.png
|
||||
python generate_image.py "A logo for Acme Corp" logo.png --model gemini-3-pro-image-preview --aspect 1:1
|
||||
python generate_image.py "Epic landscape" landscape.png --aspect 16:9 --size 2K
|
||||
|
||||
Environment:
|
||||
GEMINI_API_KEY - Required API key
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
|
||||
|
||||
def generate_image(
|
||||
prompt: str,
|
||||
output_path: str,
|
||||
model: str = "gemini-2.5-flash-image",
|
||||
aspect_ratio: str | None = None,
|
||||
image_size: str | None = None,
|
||||
) -> str | None:
|
||||
"""Generate an image from a text prompt.
|
||||
|
||||
Args:
|
||||
prompt: Text description of the image to generate
|
||||
output_path: Path to save the generated image
|
||||
model: Gemini model to use
|
||||
aspect_ratio: Aspect ratio (1:1, 16:9, 9:16, etc.)
|
||||
image_size: Resolution (1K, 2K, 4K - 4K only for pro model)
|
||||
|
||||
Returns:
|
||||
Any text response from the model, or None
|
||||
"""
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
raise EnvironmentError("GEMINI_API_KEY environment variable not set")
|
||||
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Build config
|
||||
config_kwargs = {"response_modalities": ["TEXT", "IMAGE"]}
|
||||
|
||||
image_config_kwargs = {}
|
||||
if aspect_ratio:
|
||||
image_config_kwargs["aspect_ratio"] = aspect_ratio
|
||||
if image_size:
|
||||
image_config_kwargs["image_size"] = image_size
|
||||
|
||||
if image_config_kwargs:
|
||||
config_kwargs["image_config"] = types.ImageConfig(**image_config_kwargs)
|
||||
|
||||
config = types.GenerateContentConfig(**config_kwargs)
|
||||
|
||||
response = client.models.generate_content(
|
||||
model=model,
|
||||
contents=[prompt],
|
||||
config=config,
|
||||
)
|
||||
|
||||
text_response = None
|
||||
image_saved = False
|
||||
|
||||
for part in response.parts:
|
||||
if part.text is not None:
|
||||
text_response = part.text
|
||||
elif part.inline_data is not None:
|
||||
image = part.as_image()
|
||||
image.save(output_path)
|
||||
image_saved = True
|
||||
|
||||
if not image_saved:
|
||||
raise RuntimeError("No image was generated. Check your prompt and try again.")
|
||||
|
||||
return text_response
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate images from text prompts using Gemini API",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__
|
||||
)
|
||||
parser.add_argument("prompt", help="Text prompt describing the image")
|
||||
parser.add_argument("output", help="Output file path (e.g., output.png)")
|
||||
parser.add_argument(
|
||||
"--model", "-m",
|
||||
default="gemini-2.5-flash-image",
|
||||
choices=["gemini-2.5-flash-image", "gemini-3-pro-image-preview"],
|
||||
help="Model to use (default: gemini-2.5-flash-image)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--aspect", "-a",
|
||||
choices=["1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"],
|
||||
help="Aspect ratio"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--size", "-s",
|
||||
choices=["1K", "2K", "4K"],
|
||||
help="Image resolution (4K only available with pro model)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
text = generate_image(
|
||||
prompt=args.prompt,
|
||||
output_path=args.output,
|
||||
model=args.model,
|
||||
aspect_ratio=args.aspect,
|
||||
image_size=args.size,
|
||||
)
|
||||
|
||||
print(f"Image saved to: {args.output}")
|
||||
if text:
|
||||
print(f"Model response: {text}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
216
skills/gemini-imagegen/scripts/multi_turn_chat.py
Executable file
216
skills/gemini-imagegen/scripts/multi_turn_chat.py
Executable file
@@ -0,0 +1,216 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Interactive multi-turn image generation and refinement using Gemini API.
|
||||
|
||||
Usage:
|
||||
python multi_turn_chat.py [--model MODEL] [--output-dir DIR]
|
||||
|
||||
This starts an interactive session where you can:
|
||||
- Generate images from prompts
|
||||
- Iteratively refine images through conversation
|
||||
- Load existing images for editing
|
||||
- Save images at any point
|
||||
|
||||
Commands:
|
||||
/save [filename] - Save current image
|
||||
/load <path> - Load an image into the conversation
|
||||
/clear - Start fresh conversation
|
||||
/quit - Exit
|
||||
|
||||
Environment:
|
||||
GEMINI_API_KEY - Required API key
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from PIL import Image
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
|
||||
|
||||
class ImageChat:
|
||||
"""Interactive chat session for image generation and refinement."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model: str = "gemini-2.5-flash-image",
|
||||
output_dir: str = ".",
|
||||
):
|
||||
api_key = os.environ.get("GEMINI_API_KEY")
|
||||
if not api_key:
|
||||
raise EnvironmentError("GEMINI_API_KEY environment variable not set")
|
||||
|
||||
self.client = genai.Client(api_key=api_key)
|
||||
self.model = model
|
||||
self.output_dir = Path(output_dir)
|
||||
self.output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.chat = None
|
||||
self.current_image = None
|
||||
self.image_count = 0
|
||||
|
||||
self._init_chat()
|
||||
|
||||
def _init_chat(self):
|
||||
"""Initialize or reset the chat session."""
|
||||
config = types.GenerateContentConfig(
|
||||
response_modalities=["TEXT", "IMAGE"]
|
||||
)
|
||||
self.chat = self.client.chats.create(
|
||||
model=self.model,
|
||||
config=config,
|
||||
)
|
||||
self.current_image = None
|
||||
|
||||
def send_message(self, message: str, image: Image.Image | None = None) -> tuple[str | None, Image.Image | None]:
|
||||
"""Send a message and optionally an image, return response text and image."""
|
||||
contents = []
|
||||
if message:
|
||||
contents.append(message)
|
||||
if image:
|
||||
contents.append(image)
|
||||
|
||||
if not contents:
|
||||
return None, None
|
||||
|
||||
response = self.chat.send_message(contents)
|
||||
|
||||
text_response = None
|
||||
image_response = None
|
||||
|
||||
for part in response.parts:
|
||||
if part.text is not None:
|
||||
text_response = part.text
|
||||
elif part.inline_data is not None:
|
||||
image_response = part.as_image()
|
||||
self.current_image = image_response
|
||||
|
||||
return text_response, image_response
|
||||
|
||||
def save_image(self, filename: str | None = None) -> str | None:
|
||||
"""Save the current image to a file."""
|
||||
if self.current_image is None:
|
||||
return None
|
||||
|
||||
if filename is None:
|
||||
self.image_count += 1
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"image_{timestamp}_{self.image_count}.png"
|
||||
|
||||
filepath = self.output_dir / filename
|
||||
self.current_image.save(filepath)
|
||||
return str(filepath)
|
||||
|
||||
def load_image(self, path: str) -> Image.Image:
|
||||
"""Load an image from disk."""
|
||||
img = Image.open(path)
|
||||
self.current_image = img
|
||||
return img
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Interactive multi-turn image generation",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model", "-m",
|
||||
default="gemini-2.5-flash-image",
|
||||
choices=["gemini-2.5-flash-image", "gemini-3-pro-image-preview"],
|
||||
help="Model to use"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output-dir", "-o",
|
||||
default=".",
|
||||
help="Directory to save images"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
chat = ImageChat(model=args.model, output_dir=args.output_dir)
|
||||
except Exception as e:
|
||||
print(f"Error initializing: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Gemini Image Chat ({args.model})")
|
||||
print("Commands: /save [name], /load <path>, /clear, /quit")
|
||||
print("-" * 50)
|
||||
|
||||
while True:
|
||||
try:
|
||||
user_input = input("\nYou: ").strip()
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
print("\nGoodbye!")
|
||||
break
|
||||
|
||||
if not user_input:
|
||||
continue
|
||||
|
||||
# Handle commands
|
||||
if user_input.startswith("/"):
|
||||
parts = user_input.split(maxsplit=1)
|
||||
cmd = parts[0].lower()
|
||||
arg = parts[1] if len(parts) > 1 else None
|
||||
|
||||
if cmd == "/quit":
|
||||
print("Goodbye!")
|
||||
break
|
||||
|
||||
elif cmd == "/clear":
|
||||
chat._init_chat()
|
||||
print("Conversation cleared.")
|
||||
continue
|
||||
|
||||
elif cmd == "/save":
|
||||
path = chat.save_image(arg)
|
||||
if path:
|
||||
print(f"Image saved to: {path}")
|
||||
else:
|
||||
print("No image to save.")
|
||||
continue
|
||||
|
||||
elif cmd == "/load":
|
||||
if not arg:
|
||||
print("Usage: /load <path>")
|
||||
continue
|
||||
try:
|
||||
chat.load_image(arg)
|
||||
print(f"Loaded: {arg}")
|
||||
print("You can now describe edits to make.")
|
||||
except Exception as e:
|
||||
print(f"Error loading image: {e}")
|
||||
continue
|
||||
|
||||
else:
|
||||
print(f"Unknown command: {cmd}")
|
||||
continue
|
||||
|
||||
# Send message to model
|
||||
try:
|
||||
# If we have a loaded image and this is first message, include it
|
||||
image_to_send = None
|
||||
if chat.current_image and not chat.chat.history:
|
||||
image_to_send = chat.current_image
|
||||
|
||||
text, image = chat.send_message(user_input, image_to_send)
|
||||
|
||||
if text:
|
||||
print(f"\nGemini: {text}")
|
||||
|
||||
if image:
|
||||
# Auto-save
|
||||
path = chat.save_image()
|
||||
print(f"\n[Image generated: {path}]")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\nError: {e}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
202
skills/webapp-testing/LICENSE.txt
Normal file
202
skills/webapp-testing/LICENSE.txt
Normal file
@@ -0,0 +1,202 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
96
skills/webapp-testing/SKILL.md
Normal file
96
skills/webapp-testing/SKILL.md
Normal file
@@ -0,0 +1,96 @@
|
||||
---
|
||||
name: webapp-testing
|
||||
description: Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
# Web Application Testing
|
||||
|
||||
To test local web applications, write native Python Playwright scripts.
|
||||
|
||||
**Helper Scripts Available**:
|
||||
- `scripts/with_server.py` - Manages server lifecycle (supports multiple servers)
|
||||
|
||||
**Always run scripts with `--help` first** to see usage. DO NOT read the source until you try running the script first and find that a customized solution is abslutely necessary. These scripts can be very large and thus pollute your context window. They exist to be called directly as black-box scripts rather than ingested into your context window.
|
||||
|
||||
## Decision Tree: Choosing Your Approach
|
||||
|
||||
```
|
||||
User task → Is it static HTML?
|
||||
├─ Yes → Read HTML file directly to identify selectors
|
||||
│ ├─ Success → Write Playwright script using selectors
|
||||
│ └─ Fails/Incomplete → Treat as dynamic (below)
|
||||
│
|
||||
└─ No (dynamic webapp) → Is the server already running?
|
||||
├─ No → Run: python scripts/with_server.py --help
|
||||
│ Then use the helper + write simplified Playwright script
|
||||
│
|
||||
└─ Yes → Reconnaissance-then-action:
|
||||
1. Navigate and wait for networkidle
|
||||
2. Take screenshot or inspect DOM
|
||||
3. Identify selectors from rendered state
|
||||
4. Execute actions with discovered selectors
|
||||
```
|
||||
|
||||
## Example: Using with_server.py
|
||||
|
||||
To start a server, run `--help` first, then use the helper:
|
||||
|
||||
**Single server:**
|
||||
```bash
|
||||
python scripts/with_server.py --server "npm run dev" --port 5173 -- python your_automation.py
|
||||
```
|
||||
|
||||
**Multiple servers (e.g., backend + frontend):**
|
||||
```bash
|
||||
python scripts/with_server.py \
|
||||
--server "cd backend && python server.py" --port 3000 \
|
||||
--server "cd frontend && npm run dev" --port 5173 \
|
||||
-- python your_automation.py
|
||||
```
|
||||
|
||||
To create an automation script, include only Playwright logic (servers are managed automatically):
|
||||
```python
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True) # Always launch chromium in headless mode
|
||||
page = browser.new_page()
|
||||
page.goto('http://localhost:5173') # Server already running and ready
|
||||
page.wait_for_load_state('networkidle') # CRITICAL: Wait for JS to execute
|
||||
# ... your automation logic
|
||||
browser.close()
|
||||
```
|
||||
|
||||
## Reconnaissance-Then-Action Pattern
|
||||
|
||||
1. **Inspect rendered DOM**:
|
||||
```python
|
||||
page.screenshot(path='/tmp/inspect.png', full_page=True)
|
||||
content = page.content()
|
||||
page.locator('button').all()
|
||||
```
|
||||
|
||||
2. **Identify selectors** from inspection results
|
||||
|
||||
3. **Execute actions** using discovered selectors
|
||||
|
||||
## Common Pitfall
|
||||
|
||||
❌ **Don't** inspect the DOM before waiting for `networkidle` on dynamic apps
|
||||
✅ **Do** wait for `page.wait_for_load_state('networkidle')` before inspection
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Use bundled scripts as black boxes** - To accomplish a task, consider whether one of the scripts available in `scripts/` can help. These scripts handle common, complex workflows reliably without cluttering the context window. Use `--help` to see usage, then invoke directly.
|
||||
- Use `sync_playwright()` for synchronous scripts
|
||||
- Always close the browser when done
|
||||
- Use descriptive selectors: `text=`, `role=`, CSS selectors, or IDs
|
||||
- Add appropriate waits: `page.wait_for_selector()` or `page.wait_for_timeout()`
|
||||
|
||||
## Reference Files
|
||||
|
||||
- **examples/** - Examples showing common patterns:
|
||||
- `element_discovery.py` - Discovering buttons, links, and inputs on a page
|
||||
- `static_html_automation.py` - Using file:// URLs for local HTML
|
||||
- `console_logging.py` - Capturing console logs during automation
|
||||
35
skills/webapp-testing/examples/console_logging.py
Normal file
35
skills/webapp-testing/examples/console_logging.py
Normal file
@@ -0,0 +1,35 @@
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
# Example: Capturing console logs during browser automation
|
||||
|
||||
url = 'http://localhost:5173' # Replace with your URL
|
||||
|
||||
console_logs = []
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page(viewport={'width': 1920, 'height': 1080})
|
||||
|
||||
# Set up console log capture
|
||||
def handle_console_message(msg):
|
||||
console_logs.append(f"[{msg.type}] {msg.text}")
|
||||
print(f"Console: [{msg.type}] {msg.text}")
|
||||
|
||||
page.on("console", handle_console_message)
|
||||
|
||||
# Navigate to page
|
||||
page.goto(url)
|
||||
page.wait_for_load_state('networkidle')
|
||||
|
||||
# Interact with the page (triggers console logs)
|
||||
page.click('text=Dashboard')
|
||||
page.wait_for_timeout(1000)
|
||||
|
||||
browser.close()
|
||||
|
||||
# Save console logs to file
|
||||
with open('/mnt/user-data/outputs/console.log', 'w') as f:
|
||||
f.write('\n'.join(console_logs))
|
||||
|
||||
print(f"\nCaptured {len(console_logs)} console messages")
|
||||
print(f"Logs saved to: /mnt/user-data/outputs/console.log")
|
||||
40
skills/webapp-testing/examples/element_discovery.py
Normal file
40
skills/webapp-testing/examples/element_discovery.py
Normal file
@@ -0,0 +1,40 @@
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
# Example: Discovering buttons and other elements on a page
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page()
|
||||
|
||||
# Navigate to page and wait for it to fully load
|
||||
page.goto('http://localhost:5173')
|
||||
page.wait_for_load_state('networkidle')
|
||||
|
||||
# Discover all buttons on the page
|
||||
buttons = page.locator('button').all()
|
||||
print(f"Found {len(buttons)} buttons:")
|
||||
for i, button in enumerate(buttons):
|
||||
text = button.inner_text() if button.is_visible() else "[hidden]"
|
||||
print(f" [{i}] {text}")
|
||||
|
||||
# Discover links
|
||||
links = page.locator('a[href]').all()
|
||||
print(f"\nFound {len(links)} links:")
|
||||
for link in links[:5]: # Show first 5
|
||||
text = link.inner_text().strip()
|
||||
href = link.get_attribute('href')
|
||||
print(f" - {text} -> {href}")
|
||||
|
||||
# Discover input fields
|
||||
inputs = page.locator('input, textarea, select').all()
|
||||
print(f"\nFound {len(inputs)} input fields:")
|
||||
for input_elem in inputs:
|
||||
name = input_elem.get_attribute('name') or input_elem.get_attribute('id') or "[unnamed]"
|
||||
input_type = input_elem.get_attribute('type') or 'text'
|
||||
print(f" - {name} ({input_type})")
|
||||
|
||||
# Take screenshot for visual reference
|
||||
page.screenshot(path='/tmp/page_discovery.png', full_page=True)
|
||||
print("\nScreenshot saved to /tmp/page_discovery.png")
|
||||
|
||||
browser.close()
|
||||
33
skills/webapp-testing/examples/static_html_automation.py
Normal file
33
skills/webapp-testing/examples/static_html_automation.py
Normal file
@@ -0,0 +1,33 @@
|
||||
from playwright.sync_api import sync_playwright
|
||||
import os
|
||||
|
||||
# Example: Automating interaction with static HTML files using file:// URLs
|
||||
|
||||
html_file_path = os.path.abspath('path/to/your/file.html')
|
||||
file_url = f'file://{html_file_path}'
|
||||
|
||||
with sync_playwright() as p:
|
||||
browser = p.chromium.launch(headless=True)
|
||||
page = browser.new_page(viewport={'width': 1920, 'height': 1080})
|
||||
|
||||
# Navigate to local HTML file
|
||||
page.goto(file_url)
|
||||
|
||||
# Take screenshot
|
||||
page.screenshot(path='/mnt/user-data/outputs/static_page.png', full_page=True)
|
||||
|
||||
# Interact with elements
|
||||
page.click('text=Click Me')
|
||||
page.fill('#name', 'John Doe')
|
||||
page.fill('#email', 'john@example.com')
|
||||
|
||||
# Submit form
|
||||
page.click('button[type="submit"]')
|
||||
page.wait_for_timeout(500)
|
||||
|
||||
# Take final screenshot
|
||||
page.screenshot(path='/mnt/user-data/outputs/after_submit.png', full_page=True)
|
||||
|
||||
browser.close()
|
||||
|
||||
print("Static HTML automation completed!")
|
||||
106
skills/webapp-testing/scripts/with_server.py
Executable file
106
skills/webapp-testing/scripts/with_server.py
Executable file
@@ -0,0 +1,106 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Start one or more servers, wait for them to be ready, run a command, then clean up.
|
||||
|
||||
Usage:
|
||||
# Single server
|
||||
python scripts/with_server.py --server "npm run dev" --port 5173 -- python automation.py
|
||||
python scripts/with_server.py --server "npm start" --port 3000 -- python test.py
|
||||
|
||||
# Multiple servers
|
||||
python scripts/with_server.py \
|
||||
--server "cd backend && python server.py" --port 3000 \
|
||||
--server "cd frontend && npm run dev" --port 5173 \
|
||||
-- python test.py
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import socket
|
||||
import time
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
def is_server_ready(port, timeout=30):
|
||||
"""Wait for server to be ready by polling the port."""
|
||||
start_time = time.time()
|
||||
while time.time() - start_time < timeout:
|
||||
try:
|
||||
with socket.create_connection(('localhost', port), timeout=1):
|
||||
return True
|
||||
except (socket.error, ConnectionRefusedError):
|
||||
time.sleep(0.5)
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Run command with one or more servers')
|
||||
parser.add_argument('--server', action='append', dest='servers', required=True, help='Server command (can be repeated)')
|
||||
parser.add_argument('--port', action='append', dest='ports', type=int, required=True, help='Port for each server (must match --server count)')
|
||||
parser.add_argument('--timeout', type=int, default=30, help='Timeout in seconds per server (default: 30)')
|
||||
parser.add_argument('command', nargs=argparse.REMAINDER, help='Command to run after server(s) ready')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Remove the '--' separator if present
|
||||
if args.command and args.command[0] == '--':
|
||||
args.command = args.command[1:]
|
||||
|
||||
if not args.command:
|
||||
print("Error: No command specified to run")
|
||||
sys.exit(1)
|
||||
|
||||
# Parse server configurations
|
||||
if len(args.servers) != len(args.ports):
|
||||
print("Error: Number of --server and --port arguments must match")
|
||||
sys.exit(1)
|
||||
|
||||
servers = []
|
||||
for cmd, port in zip(args.servers, args.ports):
|
||||
servers.append({'cmd': cmd, 'port': port})
|
||||
|
||||
server_processes = []
|
||||
|
||||
try:
|
||||
# Start all servers
|
||||
for i, server in enumerate(servers):
|
||||
print(f"Starting server {i+1}/{len(servers)}: {server['cmd']}")
|
||||
|
||||
# Use shell=True to support commands with cd and &&
|
||||
process = subprocess.Popen(
|
||||
server['cmd'],
|
||||
shell=True,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE
|
||||
)
|
||||
server_processes.append(process)
|
||||
|
||||
# Wait for this server to be ready
|
||||
print(f"Waiting for server on port {server['port']}...")
|
||||
if not is_server_ready(server['port'], timeout=args.timeout):
|
||||
raise RuntimeError(f"Server failed to start on port {server['port']} within {args.timeout}s")
|
||||
|
||||
print(f"Server ready on port {server['port']}")
|
||||
|
||||
print(f"\nAll {len(servers)} server(s) ready")
|
||||
|
||||
# Run the command
|
||||
print(f"Running: {' '.join(args.command)}\n")
|
||||
result = subprocess.run(args.command)
|
||||
sys.exit(result.returncode)
|
||||
|
||||
finally:
|
||||
# Clean up all servers
|
||||
print(f"\nStopping {len(server_processes)} server(s)...")
|
||||
for i, process in enumerate(server_processes):
|
||||
try:
|
||||
process.terminate()
|
||||
process.wait(timeout=5)
|
||||
except subprocess.TimeoutExpired:
|
||||
process.kill()
|
||||
process.wait()
|
||||
print(f"Server {i+1} stopped")
|
||||
print("All servers stopped")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
Reference in New Issue
Block a user