gh-rafaelcalleja-claude-mar…/skills/ai-multimodal/references/video-analysis.md

# Video Analysis Reference

Comprehensive guide for video understanding, temporal analysis, and YouTube processing using Gemini API.

## Core Capabilities

- **Video Summarization**: Create concise summaries
- **Question Answering**: Answer specific questions about content
- **Transcription**: Audio transcription with visual descriptions
- **Timestamp References**: Query specific moments (MM:SS format)
- **Video Clipping**: Process specific segments
- **Scene Detection**: Identify scene changes and transitions
- **Multiple Videos**: Compare up to 10 videos (2.5+)
- **YouTube Support**: Analyze YouTube videos directly
- **Custom Frame Rate**: Adjust FPS sampling

## Supported Formats

- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP

## Model Selection

### Gemini 2.5 Series
- **gemini-2.5-pro**: Best quality, 1M-2M context
- **gemini-2.5-flash**: Balanced, 1M-2M context
- **gemini-2.5-flash-preview-09-2025**: Preview features, 1M context

### Gemini 2.0 Series
- **gemini-2.0-flash**: Fast processing
- **gemini-2.0-flash-lite**: Lightweight option

### Context Windows
- **2M token models**: ~2 hours (default) or ~6 hours (low-res)
- **1M token models**: ~1 hour (default) or ~3 hours (low-res)

## Basic Video Analysis

### Local Video

```python
from google import genai
import os

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

# Upload video (File API for >20MB)
myfile = client.files.upload(file='video.mp4')

# Wait for processing
import time
while myfile.state.name == 'PROCESSING':
    time.sleep(1)
    myfile = client.files.get(name=myfile.name)

if myfile.state.name == 'FAILED':
    raise ValueError('Video processing failed')

# Analyze
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=['Summarize this video in 3 key points', myfile]
)
print(response.text)
```

### YouTube Video

```python
from google.genai import types

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Summarize the main topics discussed',
        types.Part.from_uri(
            uri='https://www.youtube.com/watch?v=VIDEO_ID',
            mime_type='video/mp4'
        )
    ]
)
```

### Inline Video (<20MB)

```python
with open('short-clip.mp4', 'rb') as f:
    video_bytes = f.read()

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'What happens in this video?',
        types.Part.from_bytes(data=video_bytes, mime_type='video/mp4')
    ]
)
```

## Advanced Features

### Video Clipping

```python
# Analyze specific time range
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Summarize this segment',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            start_offset='40s',
            end_offset='80s'
        )
    ]
)
```

### Custom Frame Rate

```python
# Lower FPS for static content (saves tokens)
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Analyze this presentation',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            fps=0.5  # Sample every 2 seconds
        )
    ]
)

# Higher FPS for fast-moving content
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Analyze rapid movements in this sports video',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            fps=5  # Sample 5 times per second
        )
    ]
)
```

### Multiple Videos (2.5+)

```python
video1 = client.files.upload(file='demo1.mp4')
video2 = client.files.upload(file='demo2.mp4')

# Wait for processing
for video in [video1, video2]:
    while video.state.name == 'PROCESSING':
        time.sleep(1)
        video = client.files.get(name=video.name)

response = client.models.generate_content(
    model='gemini-2.5-pro',
    contents=[
        'Compare these two product demos. Which explains features better?',
        video1,
        video2
    ]
)
```

## Temporal Understanding

### Timestamp-Based Questions

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'What happens at 01:15 and how does it relate to 02:30?',
        myfile
    ]
)
```

### Timeline Creation

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Create a timeline with timestamps:
        - Key events
        - Scene changes
        - Important moments
        Format: MM:SS - Description
        ''',
        myfile
    ]
)
```

### Scene Detection

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Identify all scene changes with timestamps and describe each scene',
        myfile
    ]
)
```

## Transcription

### Basic Transcription

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Transcribe the audio from this video',
        myfile
    ]
)
```

### With Visual Descriptions

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Transcribe with visual context:
        - Audio transcription
        - Visual descriptions of important moments
        - Timestamps for salient events
        ''',
        myfile
    ]
)
```

### Speaker Identification

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Transcribe with speaker labels and timestamps',
        myfile
    ]
)
```

## Common Use Cases

### 1. Video Summarization

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Summarize this video:
        1. Main topic and purpose
        2. Key points with timestamps
        3. Conclusion or call-to-action
        ''',
        myfile
    ]
)
```

### 2. Educational Content

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Create educational materials:
        1. List key concepts taught
        2. Create 5 quiz questions with answers
        3. Provide timestamp for each concept
        ''',
        myfile
    ]
)
```

### 3. Action Detection

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'List all actions performed in this tutorial with timestamps',
        myfile
    ]
)
```

### 4. Content Moderation

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Review video content:
        1. Identify any problematic content
        2. Note timestamps of concerns
        3. Provide content rating recommendation
        ''',
        myfile
    ]
)
```

### 5. Interview Analysis

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Analyze interview:
        1. Questions asked (timestamps)
        2. Key responses
        3. Candidate body language and demeanor
        4. Overall assessment
        ''',
        myfile
    ]
)
```

### 6. Sports Analysis

```python
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Analyze sports video:
        1. Key plays with timestamps
        2. Player movements and positioning
        3. Game strategy observations
        ''',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            fps=5  # Higher FPS for fast action
        )
    ]
)
```

## YouTube Specific Features

### Public Video Requirements

- Video must be public (not private or unlisted)
- No age-restricted content
- Valid video ID required

### Usage Example

```python
# YouTube URL
youtube_uri = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Create chapter markers with timestamps',
        types.Part.from_uri(uri=youtube_uri, mime_type='video/mp4')
    ]
)
```

### Rate Limits

- **Free tier**: 8 hours of YouTube video per day
- **Paid tier**: No length-based limits
- Public videos only

## Token Calculation

Video tokens depend on resolution and FPS:

**Default resolution** (~300 tokens/second):
- 1 minute = 18,000 tokens
- 10 minutes = 180,000 tokens
- 1 hour = 1,080,000 tokens

**Low resolution** (~100 tokens/second):
- 1 minute = 6,000 tokens
- 10 minutes = 60,000 tokens
- 1 hour = 360,000 tokens

**Context windows**:
- 2M tokens ≈ 2 hours (default) or 6 hours (low-res)
- 1M tokens ≈ 1 hour (default) or 3 hours (low-res)

## Best Practices

### File Management

1. Use File API for videos >20MB (most videos)
2. Wait for ACTIVE state before analysis
3. Files auto-delete after 48 hours
4. Clean up manually:
   ```python
   client.files.delete(name=myfile.name)
   ```

### Optimization Strategies

**Reduce token usage**:
- Process specific segments using start/end offsets
- Use lower FPS for static content
- Use low-resolution mode for long videos
- Split very long videos into chunks

**Improve accuracy**:
- Provide context in prompts
- Use higher FPS for fast-moving content
- Use Pro model for complex analysis
- Be specific about what to extract

### Prompt Engineering

**Effective prompts**:
- "Summarize key points with timestamps in MM:SS format"
- "Identify all scene changes and describe each scene"
- "Extract action items mentioned with timestamps"
- "Compare these two videos on: X, Y, Z criteria"

**Structured output**:
```python
from pydantic import BaseModel
from typing import List

class VideoEvent(BaseModel):
    timestamp: str  # MM:SS format
    description: str
    category: str

class VideoAnalysis(BaseModel):
    summary: str
    events: List[VideoEvent]
    duration: str

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=['Analyze this video', myfile],
    config=genai.types.GenerateContentConfig(
        response_mime_type='application/json',
        response_schema=VideoAnalysis
    )
)
```

### Error Handling

```python
import time

def upload_and_process_video(file_path, max_wait=300):
    """Upload video and wait for processing"""
    myfile = client.files.upload(file=file_path)

    elapsed = 0
    while myfile.state.name == 'PROCESSING' and elapsed < max_wait:
        time.sleep(5)
        myfile = client.files.get(name=myfile.name)
        elapsed += 5

    if myfile.state.name == 'FAILED':
        raise ValueError(f'Video processing failed: {myfile.state.name}')

    if myfile.state.name == 'PROCESSING':
        raise TimeoutError(f'Processing timeout after {max_wait}s')

    return myfile
```

## Cost Optimization

**Token costs** (Gemini 2.5 Flash at $1/1M):
- 1 minute video (default): 18,000 tokens = $0.018
- 10 minute video: 180,000 tokens = $0.18
- 1 hour video: 1,080,000 tokens = $1.08

**Strategies**:
- Use video clipping for specific segments
- Lower FPS for static content
- Use low-resolution mode for long videos
- Batch related queries on same video
- Use context caching for repeated queries

## Limitations

- Maximum 6 hours (low-res) or 2 hours (default)
- YouTube videos must be public
- No live streaming analysis
- Files expire after 48 hours
- Processing time varies by video length
- No real-time processing
- Limited to 10 videos per request (2.5+)