Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:24:51 +08:00
commit 8aebb293cd
31 changed files with 7386 additions and 0 deletions

View File

@@ -0,0 +1,58 @@
# Multimodal Guide
Complete guide to using images, video, audio, and PDFs with Gemini API.
---
## Supported Formats
### Images
- JPEG, PNG, WebP, HEIC, HEIF
- Max size: 20MB
### Video
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV
- Max size: 2GB
- Max length (inline): 2 minutes
### Audio
- MP3, WAV, FLAC, AAC, OGG, OPUS
- Max size: 20MB
### PDFs
- Max size: 30MB
- Text-based PDFs work best
---
## Usage Pattern
```typescript
contents: [
{
parts: [
{ text: 'Your question' },
{
inlineData: {
data: base64EncodedData,
mimeType: 'image/jpeg' // or video/mp4, audio/mp3, application/pdf
}
}
]
}
]
```
---
## Best Practices
- Use specific, detailed prompts
- Combine multiple modalities in one request
- For large files (>2GB), use File API (Phase 2)
---
## Official Docs
https://ai.google.dev/gemini-api/docs/vision