Files
gh-jezweb-claude-skills-ski…/references/multimodal-guide.md
2025-11-30 08:24:51 +08:00

894 B

Multimodal Guide

Complete guide to using images, video, audio, and PDFs with Gemini API.


Supported Formats

Images

  • JPEG, PNG, WebP, HEIC, HEIF
  • Max size: 20MB

Video

  • MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV
  • Max size: 2GB
  • Max length (inline): 2 minutes

Audio

  • MP3, WAV, FLAC, AAC, OGG, OPUS
  • Max size: 20MB

PDFs

  • Max size: 30MB
  • Text-based PDFs work best

Usage Pattern

contents: [
  {
    parts: [
      { text: 'Your question' },
      {
        inlineData: {
          data: base64EncodedData,
          mimeType: 'image/jpeg' // or video/mp4, audio/mp3, application/pdf
        }
      }
    ]
  }
]

Best Practices

  • Use specific, detailed prompts
  • Combine multiple modalities in one request
  • For large files (>2GB), use File API (Phase 2)

Official Docs

https://ai.google.dev/gemini-api/docs/vision