zhongwei/gh-rafaelcalleja-claude-market-place-plugins-claudekit-skills

Fork 0

Files

Zhongwei Li 6ec3196ecc Initial commit

2025-11-30 08:48:52 +08:00

13 KiB

Raw Blame History

Image Generation Reference

Comprehensive guide for image creation, editing, and composition using Gemini API.

Core Capabilities

Text-to-Image: Generate images from text prompts
Image Editing: Modify existing images with text instructions
Multi-Image Composition: Combine up to 3 images
Iterative Refinement: Refine images conversationally
Aspect Ratios: Multiple formats (1:1, 16:9, 9:16, 4:3, 3:4)
Style Control: Control artistic style and quality
Text in Images: Limited text rendering (max 25 chars)

Model

gemini-2.5-flash-image - Specialized for image generation

Input tokens: 65,536
Output tokens: 32,768
Knowledge cutoff: June 2025
Supports: Text and image inputs, image outputs

Quick Start

Basic Generation

from google import genai
from google.genai import types
import os

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

# Save image
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'output-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Aspect Ratios

Ratio	Resolution	Use Case	Token Cost
1:1	1024×1024	Social media, avatars	1290
16:9	1344×768	Landscapes, banners	1290
9:16	768×1344	Mobile, portraits	1290
4:3	1152×896	Traditional media	1290
3:4	896×1152	Vertical posters	1290

All ratios cost the same: 1,290 tokens per image.

Response Modalities

Image Only

config = types.GenerateContentConfig(
    response_modalities=['image'],
    aspect_ratio='1:1'
)

Text Only (No Image)

config = types.GenerateContentConfig(
    response_modalities=['text']
)
# Returns text description instead of generating image

Both Image and Text

config = types.GenerateContentConfig(
    response_modalities=['image', 'text'],
    aspect_ratio='16:9'
)
# Returns both generated image and description

Image Editing

Modify Existing Image

import PIL.Image

# Load original
img = PIL.Image.open('original.png')

# Edit with instructions
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ],
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

Style Transfer

img = PIL.Image.open('photo.jpg')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Transform this into an oil painting style',
        img
    ]
)

Object Addition/Removal

# Add object
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a vintage car parked on the street',
        img
    ]
)

# Remove object
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Remove the person on the left side',
        img
    ]
)

Multi-Image Composition

Combine Multiple Images

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
img3 = PIL.Image.open('overlay.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2,
        img3
    ],
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

Note: Recommended maximum 3 input images for best results.

Prompt Engineering

Effective Prompt Structure

Three key elements:

Subject: What to generate
Context: Environmental setting
Style: Artistic treatment

Example: "A robot [subject] in a futuristic city [context], cyberpunk style with neon lighting [style]"

Quality Modifiers

Technical terms:

"4K", "8K", "high resolution"
"HDR", "high dynamic range"
"professional photography"
"studio lighting"
"ultra detailed"

Camera settings:

"35mm lens", "50mm lens"
"shallow depth of field"
"wide angle shot"
"macro photography"
"golden hour lighting"

Style Keywords

Art styles:

"oil painting", "watercolor", "sketch"
"digital art", "concept art"
"photorealistic", "hyperrealistic"
"minimalist", "abstract"
"cyberpunk", "steampunk", "fantasy"

Mood and atmosphere:

"dramatic lighting", "soft lighting"
"moody", "bright and cheerful"
"mysterious", "whimsical"
"dark and gritty", "pastel colors"

Subject Description

Be specific:

❌ "A cat"
✅ "A fluffy orange tabby cat with green eyes"

Add context:

❌ "A building"
✅ "A modern glass skyscraper reflecting sunset clouds"

Include details:

❌ "A person"
✅ "A young woman in a red dress holding an umbrella"

Composition and Framing

Camera angles:

"bird's eye view", "aerial shot"
"low angle", "high angle"
"close-up", "wide shot"
"centered composition"
"rule of thirds"

Perspective:

"first person view"
"third person perspective"
"isometric view"
"forced perspective"

Text in Images

Limitations:

Maximum 25 characters total
Up to 3 distinct text phrases
Works best with simple text

Best practices:

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A vintage poster with bold text "EXPLORE" at the top, mountain landscape, retro 1950s style'
)

Font control:

"bold sans-serif title"
"handwritten script"
"vintage letterpress"
"modern minimalist font"

Advanced Techniques

Iterative Refinement

# Initial generation
response1 = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A futuristic city skyline'
)

# Save first version
with open('v1.png', 'wb') as f:
    f.write(response1.candidates[0].content.parts[0].inline_data.data)

# Refine
img = PIL.Image.open('v1.png')
response2 = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add flying vehicles and neon signs',
        img
    ]
)

Negative Prompts (Indirect)

# Instead of "no blur", be specific about what you want
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
)

Consistent Style Across Images

base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"

prompts = [
    f"{base_prompt}, a warrior character",
    f"{base_prompt}, a mage character",
    f"{base_prompt}, a rogue character"
]

for i, prompt in enumerate(prompts):
    response = client.models.generate_content(
        model='gemini-2.5-flash-image',
        contents=prompt
    )
    # Save each character

Safety Settings

Configure Safety Filters

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

Available Categories

HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT

Thresholds

BLOCK_NONE: No blocking
BLOCK_LOW_AND_ABOVE: Block low probability and above
BLOCK_MEDIUM_AND_ABOVE: Block medium and above (default)
BLOCK_ONLY_HIGH: Block only high probability

Common Use Cases

1. Marketing Assets

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Professional product photography:
    - Sleek smartphone on minimalist white surface
    - Dramatic side lighting creating subtle shadows
    - Shallow depth of field, crisp focus
    - Clean, modern aesthetic
    - 4K quality
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='4:3'
    )
)

2. Concept Art

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Fantasy concept art:
    - Ancient floating islands connected by chains
    - Waterfalls cascading into clouds below
    - Magical crystals glowing on the islands
    - Epic scale, dramatic lighting
    - Detailed digital painting style
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Instagram post design:
    - Pastel gradient background (pink to blue)
    - Motivational quote layout
    - Modern minimalist style
    - Clean typography
    - Mobile-friendly composition
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='1:1'
    )
)

4. Illustration

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Children's book illustration:
    - Friendly cartoon dragon reading a book
    - Bright, cheerful colors
    - Soft, rounded shapes
    - Whimsical forest background
    - Warm, inviting atmosphere
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='4:3'
    )
)

5. UI/UX Mockups

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='''Modern mobile app interface:
    - Clean dashboard design
    - Card-based layout
    - Soft shadows and gradients
    - Contemporary color scheme (blue and white)
    - Professional fintech aesthetic
    ''',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='9:16'
    )
)

Best Practices

Prompt Quality

Be specific: More detail = better results
Order matters: Most important elements first
Use examples: Reference known styles or artists
Avoid contradictions: Don't ask for opposing styles
Test and iterate: Refine prompts based on results

File Management

# Save with descriptive names
timestamp = int(time.time())
filename = f'generated_{timestamp}_{aspect_ratio}.png'

with open(filename, 'wb') as f:
    f.write(image_data)

Cost Optimization

Token costs:

1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
10 images: 12,900 tokens = $0.0129
100 images: 129,000 tokens = $0.129

Strategies:

Generate fewer iterations
Use text modality first to validate concept
Batch similar requests
Cache prompts for consistent style

Error Handling

Safety Filter Blocking

try:
    response = client.models.generate_content(
        model='gemini-2.5-flash-image',
        contents=prompt
    )
except Exception as e:
    # Check block reason
    if hasattr(e, 'prompt_feedback'):
        print(f"Blocked: {e.prompt_feedback.block_reason}")
        # Modify prompt and retry

Token Limit Exceeded

# Keep prompts concise
if len(prompt) > 1000:
    # Truncate or simplify
    prompt = prompt[:1000]

Limitations

Maximum 3 input images for composition
Text rendering limited (25 chars max)
No video or animation generation
Regional restrictions (child images in EEA, CH, UK)
Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
No real-time generation
Cannot perfectly replicate specific people or copyrighted characters

Troubleshooting

aspect_ratio Parameter Error

Error: Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]

Cause: The aspect_ratio parameter must be nested inside an image_config object, not passed directly to GenerateContentConfig.

Incorrect Usage:

# ❌ This will fail
config = types.GenerateContentConfig(
    response_modalities=['image'],
    aspect_ratio='16:9'  # Wrong - not a direct parameter
)

Correct Usage:

# ✅ Correct implementation
config = types.GenerateContentConfig(
    response_modalities=['Image'],  # Note: Capital 'I'
    image_config=types.ImageConfig(
        aspect_ratio='16:9'
    )
)

Response Modality Case Sensitivity

The response_modalities parameter expects capital case values:

✅ Correct: ['Image'], ['Text'], ['Image', 'Text']
❌ Wrong: ['image'], ['text']

13 KiB Raw Blame History Unescape Escape