13 KiB
Image Generation Reference
Comprehensive guide for image creation, editing, and composition using Gemini API.
Core Capabilities
- Text-to-Image: Generate images from text prompts
- Image Editing: Modify existing images with text instructions
- Multi-Image Composition: Combine up to 3 images
- Iterative Refinement: Refine images conversationally
- Aspect Ratios: Multiple formats (1:1, 16:9, 9:16, 4:3, 3:4)
- Style Control: Control artistic style and quality
- Text in Images: Limited text rendering (max 25 chars)
Model
gemini-2.5-flash-image - Specialized for image generation
- Input tokens: 65,536
- Output tokens: 32,768
- Knowledge cutoff: June 2025
- Supports: Text and image inputs, image outputs
Quick Start
Basic Generation
from google import genai
from google.genai import types
import os
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A serene mountain landscape at sunset with snow-capped peaks',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
# Save image
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'output-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
Aspect Ratios
| Ratio | Resolution | Use Case | Token Cost |
|---|---|---|---|
| 1:1 | 1024×1024 | Social media, avatars | 1290 |
| 16:9 | 1344×768 | Landscapes, banners | 1290 |
| 9:16 | 768×1344 | Mobile, portraits | 1290 |
| 4:3 | 1152×896 | Traditional media | 1290 |
| 3:4 | 896×1152 | Vertical posters | 1290 |
All ratios cost the same: 1,290 tokens per image.
Response Modalities
Image Only
config = types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='1:1'
)
Text Only (No Image)
config = types.GenerateContentConfig(
response_modalities=['text']
)
# Returns text description instead of generating image
Both Image and Text
config = types.GenerateContentConfig(
response_modalities=['image', 'text'],
aspect_ratio='16:9'
)
# Returns both generated image and description
Image Editing
Modify Existing Image
import PIL.Image
# Load original
img = PIL.Image.open('original.png')
# Edit with instructions
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a red balloon floating in the sky',
img
],
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
Style Transfer
img = PIL.Image.open('photo.jpg')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Transform this into an oil painting style',
img
]
)
Object Addition/Removal
# Add object
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a vintage car parked on the street',
img
]
)
# Remove object
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Remove the person on the left side',
img
]
)
Multi-Image Composition
Combine Multiple Images
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
img3 = PIL.Image.open('overlay.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Combine these images into a cohesive scene',
img1,
img2,
img3
],
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
Note: Recommended maximum 3 input images for best results.
Prompt Engineering
Effective Prompt Structure
Three key elements:
- Subject: What to generate
- Context: Environmental setting
- Style: Artistic treatment
Example: "A robot [subject] in a futuristic city [context], cyberpunk style with neon lighting [style]"
Quality Modifiers
Technical terms:
- "4K", "8K", "high resolution"
- "HDR", "high dynamic range"
- "professional photography"
- "studio lighting"
- "ultra detailed"
Camera settings:
- "35mm lens", "50mm lens"
- "shallow depth of field"
- "wide angle shot"
- "macro photography"
- "golden hour lighting"
Style Keywords
Art styles:
- "oil painting", "watercolor", "sketch"
- "digital art", "concept art"
- "photorealistic", "hyperrealistic"
- "minimalist", "abstract"
- "cyberpunk", "steampunk", "fantasy"
Mood and atmosphere:
- "dramatic lighting", "soft lighting"
- "moody", "bright and cheerful"
- "mysterious", "whimsical"
- "dark and gritty", "pastel colors"
Subject Description
Be specific:
- ❌ "A cat"
- ✅ "A fluffy orange tabby cat with green eyes"
Add context:
- ❌ "A building"
- ✅ "A modern glass skyscraper reflecting sunset clouds"
Include details:
- ❌ "A person"
- ✅ "A young woman in a red dress holding an umbrella"
Composition and Framing
Camera angles:
- "bird's eye view", "aerial shot"
- "low angle", "high angle"
- "close-up", "wide shot"
- "centered composition"
- "rule of thirds"
Perspective:
- "first person view"
- "third person perspective"
- "isometric view"
- "forced perspective"
Text in Images
Limitations:
- Maximum 25 characters total
- Up to 3 distinct text phrases
- Works best with simple text
Best practices:
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A vintage poster with bold text "EXPLORE" at the top, mountain landscape, retro 1950s style'
)
Font control:
- "bold sans-serif title"
- "handwritten script"
- "vintage letterpress"
- "modern minimalist font"
Advanced Techniques
Iterative Refinement
# Initial generation
response1 = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A futuristic city skyline'
)
# Save first version
with open('v1.png', 'wb') as f:
f.write(response1.candidates[0].content.parts[0].inline_data.data)
# Refine
img = PIL.Image.open('v1.png')
response2 = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add flying vehicles and neon signs',
img
]
)
Negative Prompts (Indirect)
# Instead of "no blur", be specific about what you want
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
)
Consistent Style Across Images
base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"
prompts = [
f"{base_prompt}, a warrior character",
f"{base_prompt}, a mage character",
f"{base_prompt}, a rogue character"
]
for i, prompt in enumerate(prompts):
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=prompt
)
# Save each character
Safety Settings
Configure Safety Filters
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
),
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)
Available Categories
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Thresholds
BLOCK_NONE: No blockingBLOCK_LOW_AND_ABOVE: Block low probability and aboveBLOCK_MEDIUM_AND_ABOVE: Block medium and above (default)BLOCK_ONLY_HIGH: Block only high probability
Common Use Cases
1. Marketing Assets
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Professional product photography:
- Sleek smartphone on minimalist white surface
- Dramatic side lighting creating subtle shadows
- Shallow depth of field, crisp focus
- Clean, modern aesthetic
- 4K quality
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='4:3'
)
)
2. Concept Art
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Fantasy concept art:
- Ancient floating islands connected by chains
- Waterfalls cascading into clouds below
- Magical crystals glowing on the islands
- Epic scale, dramatic lighting
- Detailed digital painting style
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
3. Social Media Graphics
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Instagram post design:
- Pastel gradient background (pink to blue)
- Motivational quote layout
- Modern minimalist style
- Clean typography
- Mobile-friendly composition
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='1:1'
)
)
4. Illustration
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Children's book illustration:
- Friendly cartoon dragon reading a book
- Bright, cheerful colors
- Soft, rounded shapes
- Whimsical forest background
- Warm, inviting atmosphere
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='4:3'
)
)
5. UI/UX Mockups
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='''Modern mobile app interface:
- Clean dashboard design
- Card-based layout
- Soft shadows and gradients
- Contemporary color scheme (blue and white)
- Professional fintech aesthetic
''',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='9:16'
)
)
Best Practices
Prompt Quality
- Be specific: More detail = better results
- Order matters: Most important elements first
- Use examples: Reference known styles or artists
- Avoid contradictions: Don't ask for opposing styles
- Test and iterate: Refine prompts based on results
File Management
# Save with descriptive names
timestamp = int(time.time())
filename = f'generated_{timestamp}_{aspect_ratio}.png'
with open(filename, 'wb') as f:
f.write(image_data)
Cost Optimization
Token costs:
- 1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
- 10 images: 12,900 tokens = $0.0129
- 100 images: 129,000 tokens = $0.129
Strategies:
- Generate fewer iterations
- Use text modality first to validate concept
- Batch similar requests
- Cache prompts for consistent style
Error Handling
Safety Filter Blocking
try:
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=prompt
)
except Exception as e:
# Check block reason
if hasattr(e, 'prompt_feedback'):
print(f"Blocked: {e.prompt_feedback.block_reason}")
# Modify prompt and retry
Token Limit Exceeded
# Keep prompts concise
if len(prompt) > 1000:
# Truncate or simplify
prompt = prompt[:1000]
Limitations
- Maximum 3 input images for composition
- Text rendering limited (25 chars max)
- No video or animation generation
- Regional restrictions (child images in EEA, CH, UK)
- Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
- No real-time generation
- Cannot perfectly replicate specific people or copyrighted characters
Troubleshooting
aspect_ratio Parameter Error
Error: Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]
Cause: The aspect_ratio parameter must be nested inside an image_config object, not passed directly to GenerateContentConfig.
Incorrect Usage:
# ❌ This will fail
config = types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9' # Wrong - not a direct parameter
)
Correct Usage:
# ✅ Correct implementation
config = types.GenerateContentConfig(
response_modalities=['Image'], # Note: Capital 'I'
image_config=types.ImageConfig(
aspect_ratio='16:9'
)
)
Response Modality Case Sensitivity
The response_modalities parameter expects capital case values:
- ✅ Correct:
['Image'],['Text'],['Image', 'Text'] - ❌ Wrong:
['image'],['text']