282 lines
9.6 KiB
Markdown
282 lines
9.6 KiB
Markdown
---
|
|
name: Vertex AI Media Master
|
|
description: |
|
|
Automatic activation for ALL Google Vertex AI multimodal operations - video processing, audio generation, image creation, and marketing campaigns.
|
|
**TRIGGER PHRASES:**
|
|
- "vertex ai", "gemini multimodal", "process video", "generate audio", "create images", "marketing campaign"
|
|
- "imagen", "video understanding", "multimodal", "content generation", "media assets"
|
|
**AUTO-INVOKES FOR:**
|
|
- Video processing and understanding (up to 6 hours)
|
|
- Audio generation and transcription
|
|
- Image generation with Imagen 4
|
|
- Marketing campaign automation
|
|
- Social media content creation
|
|
- Ad creative generation
|
|
- Multimodal content workflows
|
|
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
|
|
version: 1.0.0
|
|
---
|
|
|
|
# Vertex AI Media Master - Comprehensive Multimodal AI Operations
|
|
|
|
This Agent Skill provides comprehensive mastery of Google Vertex AI multimodal capabilities for video, audio, image, and text processing with focus on marketing applications.
|
|
|
|
## Core Capabilities
|
|
|
|
### 🎥 Video Processing (Gemini 2.0/2.5)
|
|
- **Video Understanding**: Process videos up to 6 hours at low resolution or 2 hours at default resolution
|
|
- **2M Context Window**: Gemini 2.5 Pro handles massive video content
|
|
- **Audio Track Processing**: Automatic audio transcription from video
|
|
- **Multi-video Analysis**: Process multiple videos in single request
|
|
- **Video Summarization**: Extract key moments, scenes, and insights
|
|
- **Marketing Use Cases**:
|
|
- Analyze competitor video ads
|
|
- Extract highlights from long-form content
|
|
- Generate video summaries for social media
|
|
- Transcribe and caption video content
|
|
- Identify brand mentions and product placements
|
|
|
|
### 🎵 Audio Generation & Processing
|
|
- **Lyria Model (2025)**: Native audio and music generation
|
|
- **Speech-to-Text**: Transcribe audio with speaker diarization
|
|
- **Text-to-Speech**: Generate natural voiceovers
|
|
- **Music Composition**: Background music for campaigns
|
|
- **Audio Enhancement**: Noise reduction and quality improvement
|
|
- **Marketing Use Cases**:
|
|
- Generate podcast scripts and voiceovers
|
|
- Create audio ads and radio spots
|
|
- Produce background music for video campaigns
|
|
- Transcribe customer interviews
|
|
- Generate multilingual voiceovers
|
|
|
|
### 🖼️ Image Generation (Imagen 4 & Gemini 2.5 Flash Image)
|
|
- **Imagen 4**: Highest quality text-to-image generation
|
|
- **Gemini 2.5 Flash Image**: Interleaved image generation with text
|
|
- **Style Transfer**: Apply brand styles to generated images
|
|
- **Product Visualization**: Generate product mockups
|
|
- **Campaign Assets**: Create ad creatives and social media graphics
|
|
- **Marketing Use Cases**:
|
|
- Generate personalized ad images (Adios solution)
|
|
- Create social media graphics at scale
|
|
- Produce product lifestyle images
|
|
- Generate A/B test variations
|
|
- Create branded campaign visuals
|
|
|
|
### 📢 Marketing Campaign Automation
|
|
- **ViGenAiR**: Convert long-form video ads to short formats automatically
|
|
- **Adios**: Generate personalized ad images tailored to audience context
|
|
- **Campaign Asset Generation**: Photos, soundtracks, voiceovers from prompts
|
|
- **Content Pipeline**: Email copy, blog posts, social media, PMax assets
|
|
- **Catalog Enrichment**: Multi-agent workflow for product onboarding
|
|
- **Marketing Use Cases**:
|
|
- Automated campaign asset production
|
|
- Personalized content at scale
|
|
- Multi-channel content distribution
|
|
- Product catalog enhancement
|
|
- Visual merchandising automation
|
|
|
|
### 🔧 Technical Implementation
|
|
|
|
**API Integration:**
|
|
```python
|
|
from google.cloud import aiplatform
|
|
from vertexai.preview.generative_models import GenerativeModel
|
|
|
|
# Initialize Vertex AI
|
|
aiplatform.init(project="your-project", location="us-central1")
|
|
|
|
# Gemini 2.5 Pro for video
|
|
model = GenerativeModel("gemini-2.5-pro")
|
|
|
|
# Process video with audio
|
|
response = model.generate_content([
|
|
"Analyze this video and extract key marketing insights",
|
|
video_file, # Up to 6 hours
|
|
])
|
|
|
|
# Imagen 4 for image generation
|
|
from vertexai.preview.vision_models import ImageGenerationModel
|
|
imagen = ImageGenerationModel.from_pretrained("imagen-4")
|
|
images = imagen.generate_images(
|
|
prompt="Professional product photo, studio lighting, white background",
|
|
number_of_images=4
|
|
)
|
|
```
|
|
|
|
**Gemini 2.5 Flash Image (Interleaved Generation):**
|
|
```python
|
|
# Generate images within text responses
|
|
model = GenerativeModel("gemini-2.5-flash-image")
|
|
response = model.generate_content([
|
|
"Create a 5-step recipe with images for each step"
|
|
])
|
|
# Returns text + images interleaved
|
|
```
|
|
|
|
**Audio Generation (Lyria):**
|
|
```python
|
|
from vertexai.preview.audio_models import AudioGenerationModel
|
|
lyria = AudioGenerationModel.from_pretrained("lyria")
|
|
audio = lyria.generate_audio(
|
|
prompt="Upbeat background music for product launch video, 30 seconds",
|
|
duration=30
|
|
)
|
|
```
|
|
|
|
### 📊 Marketing Workflow Automation
|
|
|
|
**1. Multi-Channel Campaign Creation:**
|
|
```python
|
|
# Single prompt generates all assets
|
|
campaign = model.generate_content([
|
|
"""Create a product launch campaign for [product]:
|
|
- Hero image (1920x1080)
|
|
- 3 social media graphics (1080x1080)
|
|
- 30-second video script
|
|
- Background music description
|
|
- Email marketing copy
|
|
- Instagram caption"""
|
|
])
|
|
```
|
|
|
|
**2. Video Repurposing Pipeline:**
|
|
```python
|
|
# Long-form to short-form conversion (ViGenAiR approach)
|
|
long_video = "gs://bucket/original-ad-60s.mp4"
|
|
response = model.generate_content([
|
|
f"Extract 3 engaging 15-second clips from this video for TikTok/Reels",
|
|
long_video
|
|
])
|
|
# Auto-generates format-specific versions
|
|
```
|
|
|
|
**3. Personalized Ad Generation:**
|
|
```python
|
|
# Context-aware image generation (Adios approach)
|
|
for audience in audiences:
|
|
ad_image = imagen.generate_images(
|
|
prompt=f"Product ad for {product}, targeting {audience.demographics}, {audience.style_preference}",
|
|
aspect_ratio="16:9"
|
|
)
|
|
```
|
|
|
|
### 🎯 Best Practices for Jeremy
|
|
|
|
**1. Project Setup:**
|
|
```bash
|
|
# Set environment variables
|
|
export GOOGLE_CLOUD_PROJECT="your-project-id"
|
|
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
|
|
|
|
# Install SDK
|
|
pip install google-cloud-aiplatform[vision,audio] google-generativeai
|
|
```
|
|
|
|
**2. Rate Limits & Quotas:**
|
|
- Gemini 2.5 Pro: 2M tokens/min (video processing)
|
|
- Imagen 4: 100 images/min
|
|
- Monitor usage in Cloud Console
|
|
|
|
**3. Cost Optimization:**
|
|
- Use Gemini 2.5 Flash for faster, cheaper operations
|
|
- Batch image generation requests
|
|
- Cache video embeddings for repeated analysis
|
|
- Use low-resolution video setting when appropriate
|
|
|
|
**4. Security & Compliance:**
|
|
- Keep API keys in Secret Manager, never in code
|
|
- Use service accounts with minimal permissions
|
|
- Enable VPC Service Controls for data residency
|
|
- Log all API calls for audit trails
|
|
|
|
### 🚀 Advanced Marketing Use Cases
|
|
|
|
**1. Campaign Performance Analysis:**
|
|
```python
|
|
# Analyze competitor campaigns
|
|
competitor_videos = ["gs://bucket/competitor1.mp4", "gs://bucket/competitor2.mp4"]
|
|
analysis = model.generate_content([
|
|
"Compare these competitor videos: themes, messaging, CTAs, production quality",
|
|
*competitor_videos
|
|
])
|
|
```
|
|
|
|
**2. Content Localization:**
|
|
```python
|
|
# Generate multilingual campaigns
|
|
for lang in ["en", "es", "fr", "de", "ja"]:
|
|
localized_content = model.generate_content([
|
|
f"Translate and culturally adapt this campaign for {lang} market:",
|
|
campaign_brief,
|
|
hero_image
|
|
])
|
|
```
|
|
|
|
**3. A/B Test Generation:**
|
|
```python
|
|
# Generate variations automatically
|
|
variations = []
|
|
for style in ["minimalist", "bold", "luxury", "playful"]:
|
|
variation = imagen.generate_images(
|
|
prompt=f"Product ad, {style} style, {brand_guidelines}",
|
|
number_of_images=1
|
|
)
|
|
variations.append(variation)
|
|
```
|
|
|
|
### 📚 Reference Documentation
|
|
|
|
**Official Documentation:**
|
|
- Vertex AI Multimodal: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview
|
|
- Gemini 2.5 Pro: https://cloud.google.com/vertex-ai/generative-ai/docs/models
|
|
- Imagen 4: https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
|
|
- Video Understanding: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
|
|
|
|
**Marketing Solutions:**
|
|
- GenAI for Marketing: https://github.com/GoogleCloudPlatform/genai-for-marketing
|
|
- ViGenAiR (video repurposing)
|
|
- Adios (personalized ad images)
|
|
|
|
**Pricing:**
|
|
- Gemini 2.5 Pro: $3.50/1M input tokens, $10.50/1M output tokens
|
|
- Imagen 4: $0.04/image
|
|
- Video processing: Included in Gemini token pricing
|
|
|
|
## When This Skill Activates
|
|
|
|
This skill automatically activates when you mention:
|
|
- Video processing, analysis, or understanding
|
|
- Audio generation, music composition, or voiceovers
|
|
- Image generation, ad creatives, or visual content
|
|
- Marketing campaigns, content automation, or asset production
|
|
- Gemini multimodal capabilities
|
|
- Vertex AI media operations
|
|
- Social media content, email marketing, or PMax campaigns
|
|
|
|
## Integration with Other Tools
|
|
|
|
**Google Cloud Services:**
|
|
- Cloud Storage for media asset management
|
|
- BigQuery for campaign analytics
|
|
- Cloud Functions for automation triggers
|
|
- Vertex AI Pipelines for content workflows
|
|
|
|
**Third-Party Integrations:**
|
|
- Social media APIs (LinkedIn, Twitter, Instagram)
|
|
- Marketing automation platforms (HubSpot, Marketo)
|
|
- CMS integrations (WordPress, Contentful)
|
|
- DAM systems (Bynder, Cloudinary)
|
|
|
|
## Success Metrics
|
|
|
|
**Track These KPIs:**
|
|
- Asset generation speed (baseline: 5 images/min)
|
|
- Content approval rate (target: >80%)
|
|
- Campaign personalization scale (target: 1000+ variants)
|
|
- Cost per asset (target: <$0.10/image)
|
|
- Time saved vs manual production (target: 90% reduction)
|
|
|
|
---
|
|
|
|
**This skill makes Jeremy a Vertex AI multimodal expert with instant access to video processing, audio generation, image creation, and marketing automation capabilities.**
|