zhongwei/gh-rafaelcalleja-claude-market-place-plugins-claudekit-skills

Fork 0

Files

Zhongwei Li 6ec3196ecc Initial commit

2025-11-30 08:48:52 +08:00

11 KiB

Raw Permalink Blame History

Video Analysis Reference

Comprehensive guide for video understanding, temporal analysis, and YouTube processing using Gemini API.

Core Capabilities

Video Summarization: Create concise summaries
Question Answering: Answer specific questions about content
Transcription: Audio transcription with visual descriptions
Timestamp References: Query specific moments (MM:SS format)
Video Clipping: Process specific segments
Scene Detection: Identify scene changes and transitions
Multiple Videos: Compare up to 10 videos (2.5+)
YouTube Support: Analyze YouTube videos directly
Custom Frame Rate: Adjust FPS sampling

Supported Formats

MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP

Model Selection

Gemini 2.5 Series

gemini-2.5-pro: Best quality, 1M-2M context
gemini-2.5-flash: Balanced, 1M-2M context
gemini-2.5-flash-preview-09-2025: Preview features, 1M context

Gemini 2.0 Series

gemini-2.0-flash: Fast processing
gemini-2.0-flash-lite: Lightweight option

Context Windows

2M token models: ~2 hours (default) or ~6 hours (low-res)
1M token models: ~1 hour (default) or ~3 hours (low-res)

Basic Video Analysis

Local Video

from google import genai
import os

client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

# Upload video (File API for >20MB)
myfile = client.files.upload(file='video.mp4')

# Wait for processing
import time
while myfile.state.name == 'PROCESSING':
    time.sleep(1)
    myfile = client.files.get(name=myfile.name)

if myfile.state.name == 'FAILED':
    raise ValueError('Video processing failed')

# Analyze
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=['Summarize this video in 3 key points', myfile]
)
print(response.text)

YouTube Video

from google.genai import types

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Summarize the main topics discussed',
        types.Part.from_uri(
            uri='https://www.youtube.com/watch?v=VIDEO_ID',
            mime_type='video/mp4'
        )
    ]
)

Inline Video (<20MB)

with open('short-clip.mp4', 'rb') as f:
    video_bytes = f.read()

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'What happens in this video?',
        types.Part.from_bytes(data=video_bytes, mime_type='video/mp4')
    ]
)

Advanced Features

Video Clipping

# Analyze specific time range
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Summarize this segment',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            start_offset='40s',
            end_offset='80s'
        )
    ]
)

Custom Frame Rate

# Lower FPS for static content (saves tokens)
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Analyze this presentation',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            fps=0.5  # Sample every 2 seconds
        )
    ]
)

# Higher FPS for fast-moving content
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Analyze rapid movements in this sports video',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            fps=5  # Sample 5 times per second
        )
    ]
)

Multiple Videos (2.5+)

video1 = client.files.upload(file='demo1.mp4')
video2 = client.files.upload(file='demo2.mp4')

# Wait for processing
for video in [video1, video2]:
    while video.state.name == 'PROCESSING':
        time.sleep(1)
        video = client.files.get(name=video.name)

response = client.models.generate_content(
    model='gemini-2.5-pro',
    contents=[
        'Compare these two product demos. Which explains features better?',
        video1,
        video2
    ]
)

Temporal Understanding

Timestamp-Based Questions

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'What happens at 01:15 and how does it relate to 02:30?',
        myfile
    ]
)

Timeline Creation

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Create a timeline with timestamps:
        - Key events
        - Scene changes
        - Important moments
        Format: MM:SS - Description
        ''',
        myfile
    ]
)

Scene Detection

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Identify all scene changes with timestamps and describe each scene',
        myfile
    ]
)

Transcription

Basic Transcription

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Transcribe the audio from this video',
        myfile
    ]
)

With Visual Descriptions

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Transcribe with visual context:
        - Audio transcription
        - Visual descriptions of important moments
        - Timestamps for salient events
        ''',
        myfile
    ]
)

Speaker Identification

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Transcribe with speaker labels and timestamps',
        myfile
    ]
)

Common Use Cases

1. Video Summarization

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Summarize this video:
        1. Main topic and purpose
        2. Key points with timestamps
        3. Conclusion or call-to-action
        ''',
        myfile
    ]
)

2. Educational Content

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Create educational materials:
        1. List key concepts taught
        2. Create 5 quiz questions with answers
        3. Provide timestamp for each concept
        ''',
        myfile
    ]
)

3. Action Detection

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'List all actions performed in this tutorial with timestamps',
        myfile
    ]
)

4. Content Moderation

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Review video content:
        1. Identify any problematic content
        2. Note timestamps of concerns
        3. Provide content rating recommendation
        ''',
        myfile
    ]
)

5. Interview Analysis

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Analyze interview:
        1. Questions asked (timestamps)
        2. Key responses
        3. Candidate body language and demeanor
        4. Overall assessment
        ''',
        myfile
    ]
)

6. Sports Analysis

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        '''Analyze sports video:
        1. Key plays with timestamps
        2. Player movements and positioning
        3. Game strategy observations
        ''',
        types.Part.from_video_metadata(
            file_uri=myfile.uri,
            fps=5  # Higher FPS for fast action
        )
    ]
)

YouTube Specific Features

Public Video Requirements

Video must be public (not private or unlisted)
No age-restricted content
Valid video ID required

Usage Example

# YouTube URL
youtube_uri = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=[
        'Create chapter markers with timestamps',
        types.Part.from_uri(uri=youtube_uri, mime_type='video/mp4')
    ]
)

Rate Limits

Free tier: 8 hours of YouTube video per day
Paid tier: No length-based limits
Public videos only

Token Calculation

Video tokens depend on resolution and FPS:

Default resolution (~300 tokens/second):

1 minute = 18,000 tokens
10 minutes = 180,000 tokens
1 hour = 1,080,000 tokens

Low resolution (~100 tokens/second):

1 minute = 6,000 tokens
10 minutes = 60,000 tokens
1 hour = 360,000 tokens

Context windows:

2M tokens ≈ 2 hours (default) or 6 hours (low-res)
1M tokens ≈ 1 hour (default) or 3 hours (low-res)

Best Practices

File Management

Use File API for videos >20MB (most videos)
Wait for ACTIVE state before analysis
Files auto-delete after 48 hours
Clean up manually:
```
client.files.delete(name=myfile.name)
```

Optimization Strategies

Reduce token usage:

Process specific segments using start/end offsets
Use lower FPS for static content
Use low-resolution mode for long videos
Split very long videos into chunks

Improve accuracy:

Provide context in prompts
Use higher FPS for fast-moving content
Use Pro model for complex analysis
Be specific about what to extract

Prompt Engineering

Effective prompts:

"Summarize key points with timestamps in MM:SS format"
"Identify all scene changes and describe each scene"
"Extract action items mentioned with timestamps"
"Compare these two videos on: X, Y, Z criteria"

Structured output:

from pydantic import BaseModel
from typing import List

class VideoEvent(BaseModel):
    timestamp: str  # MM:SS format
    description: str
    category: str

class VideoAnalysis(BaseModel):
    summary: str
    events: List[VideoEvent]
    duration: str

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=['Analyze this video', myfile],
    config=genai.types.GenerateContentConfig(
        response_mime_type='application/json',
        response_schema=VideoAnalysis
    )
)

Error Handling

import time

def upload_and_process_video(file_path, max_wait=300):
    """Upload video and wait for processing"""
    myfile = client.files.upload(file=file_path)

    elapsed = 0
    while myfile.state.name == 'PROCESSING' and elapsed < max_wait:
        time.sleep(5)
        myfile = client.files.get(name=myfile.name)
        elapsed += 5

    if myfile.state.name == 'FAILED':
        raise ValueError(f'Video processing failed: {myfile.state.name}')

    if myfile.state.name == 'PROCESSING':
        raise TimeoutError(f'Processing timeout after {max_wait}s')

    return myfile

Cost Optimization

Token costs (Gemini 2.5 Flash at $1/1M):

1 minute video (default): 18,000 tokens = $0.018
10 minute video: 180,000 tokens = $0.18
1 hour video: 1,080,000 tokens = $1.08

Strategies:

Use video clipping for specific segments
Lower FPS for static content
Use low-resolution mode for long videos
Batch related queries on same video
Use context caching for repeated queries

Limitations

Maximum 6 hours (low-res) or 2 hours (default)
YouTube videos must be public
No live streaming analysis
Files expire after 48 hours
Processing time varies by video length
No real-time processing
Limited to 10 videos per request (2.5+)

11 KiB Raw Permalink Blame History