Initial commit
This commit is contained in:
58
references/multimodal-guide.md
Normal file
58
references/multimodal-guide.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Multimodal Guide
|
||||
|
||||
Complete guide to using images, video, audio, and PDFs with Gemini API.
|
||||
|
||||
---
|
||||
|
||||
## Supported Formats
|
||||
|
||||
### Images
|
||||
- JPEG, PNG, WebP, HEIC, HEIF
|
||||
- Max size: 20MB
|
||||
|
||||
### Video
|
||||
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV
|
||||
- Max size: 2GB
|
||||
- Max length (inline): 2 minutes
|
||||
|
||||
### Audio
|
||||
- MP3, WAV, FLAC, AAC, OGG, OPUS
|
||||
- Max size: 20MB
|
||||
|
||||
### PDFs
|
||||
- Max size: 30MB
|
||||
- Text-based PDFs work best
|
||||
|
||||
---
|
||||
|
||||
## Usage Pattern
|
||||
|
||||
```typescript
|
||||
contents: [
|
||||
{
|
||||
parts: [
|
||||
{ text: 'Your question' },
|
||||
{
|
||||
inlineData: {
|
||||
data: base64EncodedData,
|
||||
mimeType: 'image/jpeg' // or video/mp4, audio/mp3, application/pdf
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Use specific, detailed prompts
|
||||
- Combine multiple modalities in one request
|
||||
- For large files (>2GB), use File API (Phase 2)
|
||||
|
||||
---
|
||||
|
||||
## Official Docs
|
||||
|
||||
https://ai.google.dev/gemini-api/docs/vision
|
||||
Reference in New Issue
Block a user