559 lines
13 KiB
Markdown
559 lines
13 KiB
Markdown
# Image Generation Reference
|
||
|
||
Comprehensive guide for image creation, editing, and composition using Gemini API.
|
||
|
||
## Core Capabilities
|
||
|
||
- **Text-to-Image**: Generate images from text prompts
|
||
- **Image Editing**: Modify existing images with text instructions
|
||
- **Multi-Image Composition**: Combine up to 3 images
|
||
- **Iterative Refinement**: Refine images conversationally
|
||
- **Aspect Ratios**: Multiple formats (1:1, 16:9, 9:16, 4:3, 3:4)
|
||
- **Style Control**: Control artistic style and quality
|
||
- **Text in Images**: Limited text rendering (max 25 chars)
|
||
|
||
## Model
|
||
|
||
**gemini-2.5-flash-image** - Specialized for image generation
|
||
- Input tokens: 65,536
|
||
- Output tokens: 32,768
|
||
- Knowledge cutoff: June 2025
|
||
- Supports: Text and image inputs, image outputs
|
||
|
||
## Quick Start
|
||
|
||
### Basic Generation
|
||
|
||
```python
|
||
from google import genai
|
||
from google.genai import types
|
||
import os
|
||
|
||
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
|
||
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='A serene mountain landscape at sunset with snow-capped peaks',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
|
||
# Save image
|
||
for i, part in enumerate(response.candidates[0].content.parts):
|
||
if part.inline_data:
|
||
with open(f'output-{i}.png', 'wb') as f:
|
||
f.write(part.inline_data.data)
|
||
```
|
||
|
||
## Aspect Ratios
|
||
|
||
| Ratio | Resolution | Use Case | Token Cost |
|
||
|-------|-----------|----------|------------|
|
||
| 1:1 | 1024×1024 | Social media, avatars | 1290 |
|
||
| 16:9 | 1344×768 | Landscapes, banners | 1290 |
|
||
| 9:16 | 768×1344 | Mobile, portraits | 1290 |
|
||
| 4:3 | 1152×896 | Traditional media | 1290 |
|
||
| 3:4 | 896×1152 | Vertical posters | 1290 |
|
||
|
||
All ratios cost the same: 1,290 tokens per image.
|
||
|
||
## Response Modalities
|
||
|
||
### Image Only
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='1:1'
|
||
)
|
||
```
|
||
|
||
### Text Only (No Image)
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['text']
|
||
)
|
||
# Returns text description instead of generating image
|
||
```
|
||
|
||
### Both Image and Text
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image', 'text'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
# Returns both generated image and description
|
||
```
|
||
|
||
## Image Editing
|
||
|
||
### Modify Existing Image
|
||
|
||
```python
|
||
import PIL.Image
|
||
|
||
# Load original
|
||
img = PIL.Image.open('original.png')
|
||
|
||
# Edit with instructions
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Add a red balloon floating in the sky',
|
||
img
|
||
],
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
### Style Transfer
|
||
|
||
```python
|
||
img = PIL.Image.open('photo.jpg')
|
||
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Transform this into an oil painting style',
|
||
img
|
||
]
|
||
)
|
||
```
|
||
|
||
### Object Addition/Removal
|
||
|
||
```python
|
||
# Add object
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Add a vintage car parked on the street',
|
||
img
|
||
]
|
||
)
|
||
|
||
# Remove object
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Remove the person on the left side',
|
||
img
|
||
]
|
||
)
|
||
```
|
||
|
||
## Multi-Image Composition
|
||
|
||
### Combine Multiple Images
|
||
|
||
```python
|
||
img1 = PIL.Image.open('background.png')
|
||
img2 = PIL.Image.open('foreground.png')
|
||
img3 = PIL.Image.open('overlay.png')
|
||
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Combine these images into a cohesive scene',
|
||
img1,
|
||
img2,
|
||
img3
|
||
],
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
**Note**: Recommended maximum 3 input images for best results.
|
||
|
||
## Prompt Engineering
|
||
|
||
### Effective Prompt Structure
|
||
|
||
**Three key elements**:
|
||
1. **Subject**: What to generate
|
||
2. **Context**: Environmental setting
|
||
3. **Style**: Artistic treatment
|
||
|
||
**Example**: "A robot [subject] in a futuristic city [context], cyberpunk style with neon lighting [style]"
|
||
|
||
### Quality Modifiers
|
||
|
||
**Technical terms**:
|
||
- "4K", "8K", "high resolution"
|
||
- "HDR", "high dynamic range"
|
||
- "professional photography"
|
||
- "studio lighting"
|
||
- "ultra detailed"
|
||
|
||
**Camera settings**:
|
||
- "35mm lens", "50mm lens"
|
||
- "shallow depth of field"
|
||
- "wide angle shot"
|
||
- "macro photography"
|
||
- "golden hour lighting"
|
||
|
||
### Style Keywords
|
||
|
||
**Art styles**:
|
||
- "oil painting", "watercolor", "sketch"
|
||
- "digital art", "concept art"
|
||
- "photorealistic", "hyperrealistic"
|
||
- "minimalist", "abstract"
|
||
- "cyberpunk", "steampunk", "fantasy"
|
||
|
||
**Mood and atmosphere**:
|
||
- "dramatic lighting", "soft lighting"
|
||
- "moody", "bright and cheerful"
|
||
- "mysterious", "whimsical"
|
||
- "dark and gritty", "pastel colors"
|
||
|
||
### Subject Description
|
||
|
||
**Be specific**:
|
||
- ❌ "A cat"
|
||
- ✅ "A fluffy orange tabby cat with green eyes"
|
||
|
||
**Add context**:
|
||
- ❌ "A building"
|
||
- ✅ "A modern glass skyscraper reflecting sunset clouds"
|
||
|
||
**Include details**:
|
||
- ❌ "A person"
|
||
- ✅ "A young woman in a red dress holding an umbrella"
|
||
|
||
### Composition and Framing
|
||
|
||
**Camera angles**:
|
||
- "bird's eye view", "aerial shot"
|
||
- "low angle", "high angle"
|
||
- "close-up", "wide shot"
|
||
- "centered composition"
|
||
- "rule of thirds"
|
||
|
||
**Perspective**:
|
||
- "first person view"
|
||
- "third person perspective"
|
||
- "isometric view"
|
||
- "forced perspective"
|
||
|
||
### Text in Images
|
||
|
||
**Limitations**:
|
||
- Maximum 25 characters total
|
||
- Up to 3 distinct text phrases
|
||
- Works best with simple text
|
||
|
||
**Best practices**:
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='A vintage poster with bold text "EXPLORE" at the top, mountain landscape, retro 1950s style'
|
||
)
|
||
```
|
||
|
||
**Font control**:
|
||
- "bold sans-serif title"
|
||
- "handwritten script"
|
||
- "vintage letterpress"
|
||
- "modern minimalist font"
|
||
|
||
## Advanced Techniques
|
||
|
||
### Iterative Refinement
|
||
|
||
```python
|
||
# Initial generation
|
||
response1 = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='A futuristic city skyline'
|
||
)
|
||
|
||
# Save first version
|
||
with open('v1.png', 'wb') as f:
|
||
f.write(response1.candidates[0].content.parts[0].inline_data.data)
|
||
|
||
# Refine
|
||
img = PIL.Image.open('v1.png')
|
||
response2 = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=[
|
||
'Add flying vehicles and neon signs',
|
||
img
|
||
]
|
||
)
|
||
```
|
||
|
||
### Negative Prompts (Indirect)
|
||
|
||
```python
|
||
# Instead of "no blur", be specific about what you want
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='A crystal clear, sharp photograph of a diamond ring with perfect focus and high detail'
|
||
)
|
||
```
|
||
|
||
### Consistent Style Across Images
|
||
|
||
```python
|
||
base_prompt = "Digital art, vibrant colors, cel-shaded style, clean lines"
|
||
|
||
prompts = [
|
||
f"{base_prompt}, a warrior character",
|
||
f"{base_prompt}, a mage character",
|
||
f"{base_prompt}, a rogue character"
|
||
]
|
||
|
||
for i, prompt in enumerate(prompts):
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=prompt
|
||
)
|
||
# Save each character
|
||
```
|
||
|
||
## Safety Settings
|
||
|
||
### Configure Safety Filters
|
||
|
||
```python
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
safety_settings=[
|
||
types.SafetySetting(
|
||
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
|
||
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
|
||
),
|
||
types.SafetySetting(
|
||
category=types.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
|
||
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
|
||
)
|
||
]
|
||
)
|
||
```
|
||
|
||
### Available Categories
|
||
|
||
- `HARM_CATEGORY_HATE_SPEECH`
|
||
- `HARM_CATEGORY_DANGEROUS_CONTENT`
|
||
- `HARM_CATEGORY_HARASSMENT`
|
||
- `HARM_CATEGORY_SEXUALLY_EXPLICIT`
|
||
|
||
### Thresholds
|
||
|
||
- `BLOCK_NONE`: No blocking
|
||
- `BLOCK_LOW_AND_ABOVE`: Block low probability and above
|
||
- `BLOCK_MEDIUM_AND_ABOVE`: Block medium and above (default)
|
||
- `BLOCK_ONLY_HIGH`: Block only high probability
|
||
|
||
## Common Use Cases
|
||
|
||
### 1. Marketing Assets
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Professional product photography:
|
||
- Sleek smartphone on minimalist white surface
|
||
- Dramatic side lighting creating subtle shadows
|
||
- Shallow depth of field, crisp focus
|
||
- Clean, modern aesthetic
|
||
- 4K quality
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='4:3'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 2. Concept Art
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Fantasy concept art:
|
||
- Ancient floating islands connected by chains
|
||
- Waterfalls cascading into clouds below
|
||
- Magical crystals glowing on the islands
|
||
- Epic scale, dramatic lighting
|
||
- Detailed digital painting style
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 3. Social Media Graphics
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Instagram post design:
|
||
- Pastel gradient background (pink to blue)
|
||
- Motivational quote layout
|
||
- Modern minimalist style
|
||
- Clean typography
|
||
- Mobile-friendly composition
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='1:1'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 4. Illustration
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Children's book illustration:
|
||
- Friendly cartoon dragon reading a book
|
||
- Bright, cheerful colors
|
||
- Soft, rounded shapes
|
||
- Whimsical forest background
|
||
- Warm, inviting atmosphere
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='4:3'
|
||
)
|
||
)
|
||
```
|
||
|
||
### 5. UI/UX Mockups
|
||
|
||
```python
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents='''Modern mobile app interface:
|
||
- Clean dashboard design
|
||
- Card-based layout
|
||
- Soft shadows and gradients
|
||
- Contemporary color scheme (blue and white)
|
||
- Professional fintech aesthetic
|
||
''',
|
||
config=types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='9:16'
|
||
)
|
||
)
|
||
```
|
||
|
||
## Best Practices
|
||
|
||
### Prompt Quality
|
||
|
||
1. **Be specific**: More detail = better results
|
||
2. **Order matters**: Most important elements first
|
||
3. **Use examples**: Reference known styles or artists
|
||
4. **Avoid contradictions**: Don't ask for opposing styles
|
||
5. **Test and iterate**: Refine prompts based on results
|
||
|
||
### File Management
|
||
|
||
```python
|
||
# Save with descriptive names
|
||
timestamp = int(time.time())
|
||
filename = f'generated_{timestamp}_{aspect_ratio}.png'
|
||
|
||
with open(filename, 'wb') as f:
|
||
f.write(image_data)
|
||
```
|
||
|
||
### Cost Optimization
|
||
|
||
**Token costs**:
|
||
- 1 image: 1,290 tokens = $0.00129 (Flash Image at $1/1M)
|
||
- 10 images: 12,900 tokens = $0.0129
|
||
- 100 images: 129,000 tokens = $0.129
|
||
|
||
**Strategies**:
|
||
- Generate fewer iterations
|
||
- Use text modality first to validate concept
|
||
- Batch similar requests
|
||
- Cache prompts for consistent style
|
||
|
||
## Error Handling
|
||
|
||
### Safety Filter Blocking
|
||
|
||
```python
|
||
try:
|
||
response = client.models.generate_content(
|
||
model='gemini-2.5-flash-image',
|
||
contents=prompt
|
||
)
|
||
except Exception as e:
|
||
# Check block reason
|
||
if hasattr(e, 'prompt_feedback'):
|
||
print(f"Blocked: {e.prompt_feedback.block_reason}")
|
||
# Modify prompt and retry
|
||
```
|
||
|
||
### Token Limit Exceeded
|
||
|
||
```python
|
||
# Keep prompts concise
|
||
if len(prompt) > 1000:
|
||
# Truncate or simplify
|
||
prompt = prompt[:1000]
|
||
```
|
||
|
||
## Limitations
|
||
|
||
- Maximum 3 input images for composition
|
||
- Text rendering limited (25 chars max)
|
||
- No video or animation generation
|
||
- Regional restrictions (child images in EEA, CH, UK)
|
||
- Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
|
||
- No real-time generation
|
||
- Cannot perfectly replicate specific people or copyrighted characters
|
||
|
||
## Troubleshooting
|
||
|
||
### aspect_ratio Parameter Error
|
||
|
||
**Error**: `Extra inputs are not permitted [type=extra_forbidden, input_value='1:1', input_type=str]`
|
||
|
||
**Cause**: The `aspect_ratio` parameter must be nested inside an `image_config` object, not passed directly to `GenerateContentConfig`.
|
||
|
||
**Incorrect Usage**:
|
||
```python
|
||
# ❌ This will fail
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['image'],
|
||
aspect_ratio='16:9' # Wrong - not a direct parameter
|
||
)
|
||
```
|
||
|
||
**Correct Usage**:
|
||
```python
|
||
# ✅ Correct implementation
|
||
config = types.GenerateContentConfig(
|
||
response_modalities=['Image'], # Note: Capital 'I'
|
||
image_config=types.ImageConfig(
|
||
aspect_ratio='16:9'
|
||
)
|
||
)
|
||
```
|
||
|
||
### Response Modality Case Sensitivity
|
||
|
||
The `response_modalities` parameter expects capital case values:
|
||
- ✅ Correct: `['Image']`, `['Text']`, `['Image', 'Text']`
|
||
- ❌ Wrong: `['image']`, `['text']`
|