Files
gh-hirefrank-hirefrank-mark…/skills/gemini-imagegen
2025-11-29 18:45:50 +08:00
..
2025-11-29 18:45:50 +08:00
2025-11-29 18:45:50 +08:00
2025-11-29 18:45:50 +08:00
2025-11-29 18:45:50 +08:00
2025-11-29 18:45:50 +08:00
2025-11-29 18:45:50 +08:00
2025-11-29 18:45:50 +08:00

Gemini ImageGen Skill

AI-powered image generation, editing, and composition using Google's Gemini API.

Quick Start

  1. Install dependencies:

    npm install
    
  2. Set your API key:

    export GEMINI_API_KEY="your-api-key-here"
    

    Get your key from: https://makersuite.google.com/app/apikey

  3. Generate an image:

    npm run generate "a sunset over mountains" output.png
    

Features

  • Generate: Create images from text descriptions
  • Edit: Modify existing images with natural language prompts
  • Compose: Combine multiple images with flexible layouts

Usage Examples

Generate Images

# Basic generation
npm run generate "futuristic city skyline" city.png

# Custom size
npm run generate "modern office" office.png -- --width 1920 --height 1080

Edit Images

# Style transformation
npm run edit photo.jpg "make it look like a watercolor painting" artistic.png

# Object modification
npm run edit landscape.png "add a rainbow in the sky" enhanced.png

Compose Images

# Grid layout (default)
npm run compose collage.png img1.jpg img2.jpg img3.jpg img4.jpg

# Horizontal banner
npm run compose banner.png left.png right.png -- --layout horizontal

# Custom composition
npm run compose result.png a.jpg b.jpg -- --prompt "blend seamlessly"

Scripts

  • npm run generate <prompt> <output> - Generate image from text
  • npm run edit <source> <prompt> <output> - Edit existing image
  • npm run compose <output> <images...> - Compose multiple images

Configuration

Environment Variables

  • GEMINI_API_KEY (required) - Your Google Gemini API key

Options

See SKILL.md for detailed documentation on all available options and parameters.

Development Notes

This is a local development skill that runs on your machine, not on Cloudflare Workers. It's designed for:

  • Design workflows and asset creation
  • Visual content generation
  • Image manipulation and prototyping
  • Creating test images for development

Implementation Status

Note: The current implementation includes:

  • Complete TypeScript structure
  • Argument parsing and validation
  • Gemini API integration for image analysis
  • Comprehensive error handling

For production use with actual image generation/editing, you'll need to:

  1. Use the Imagen model (imagen-3.0-generate-001)
  2. Implement proper image data handling
  3. Add output file writing with actual image data

Refer to the Gemini Imagen documentation for implementation details.

License

MIT