59 lines
894 B
Markdown
59 lines
894 B
Markdown
# Multimodal Guide
|
|
|
|
Complete guide to using images, video, audio, and PDFs with Gemini API.
|
|
|
|
---
|
|
|
|
## Supported Formats
|
|
|
|
### Images
|
|
- JPEG, PNG, WebP, HEIC, HEIF
|
|
- Max size: 20MB
|
|
|
|
### Video
|
|
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV
|
|
- Max size: 2GB
|
|
- Max length (inline): 2 minutes
|
|
|
|
### Audio
|
|
- MP3, WAV, FLAC, AAC, OGG, OPUS
|
|
- Max size: 20MB
|
|
|
|
### PDFs
|
|
- Max size: 30MB
|
|
- Text-based PDFs work best
|
|
|
|
---
|
|
|
|
## Usage Pattern
|
|
|
|
```typescript
|
|
contents: [
|
|
{
|
|
parts: [
|
|
{ text: 'Your question' },
|
|
{
|
|
inlineData: {
|
|
data: base64EncodedData,
|
|
mimeType: 'image/jpeg' // or video/mp4, audio/mp3, application/pdf
|
|
}
|
|
}
|
|
]
|
|
}
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
- Use specific, detailed prompts
|
|
- Combine multiple modalities in one request
|
|
- For large files (>2GB), use File API (Phase 2)
|
|
|
|
---
|
|
|
|
## Official Docs
|
|
|
|
https://ai.google.dev/gemini-api/docs/vision
|