Files
gh-dnvriend-elevenlabs-tts-…/skills/elevenlabs-tts-tool/SKILL.md
2025-11-29 18:23:13 +08:00

21 KiB

name, description
name description
skill-elevenlabs-tts-tool ElevenLabs text-to-speech CLI tool guide

When to use

  • Converting text to speech with ElevenLabs API
  • Exploring available voices and models
  • Managing TTS subscriptions and usage
  • Integrating TTS into workflows and pipelines

ElevenLabs TTS Tool Skill

Purpose

Comprehensive guide for the elevenlabs-tts-tool CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.

When to Use This Skill

Use this skill when:

  • Converting text to speech for notifications, audiobooks, or content creation
  • Exploring and comparing different voice characteristics
  • Managing ElevenLabs subscription quotas and usage
  • Building voice-enabled workflows and automation
  • Integrating TTS into Claude Code hooks or other tools

Do NOT use this skill for:

  • Direct ElevenLabs API programming (use SDK docs instead)
  • Custom voice cloning (requires ElevenLabs web interface)
  • Real-time streaming TTS (tool focuses on file/playback generation)

CLI Tool: elevenlabs-tts-tool

Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.

Installation

# Clone repository
git clone https://github.com/dnvriend/elevenlabs-tts-tool.git
cd elevenlabs-tts-tool

# Install globally with uv
uv tool install .

# Verify installation
elevenlabs-tts-tool --version

Prerequisites

Quick Start

# Set API key
export ELEVENLABS_API_KEY='your-api-key'

# Basic text-to-speech
elevenlabs-tts-tool synthesize "Hello world"

# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam

# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3

Progressive Disclosure

📖 Core Commands (Click to expand)

synthesize - Convert Text to Speech

Convert text to speech using ElevenLabs API. Supports direct playback or file output.

Usage:

elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]

Arguments:

  • TEXT: Text to synthesize (optional if --stdin used)
  • --stdin, -s: Read text from stdin instead of argument
  • --voice, -v NAME: Voice name or ID (default: rachel)
  • --model, -m ID: Model ID (default: eleven_turbo_v2_5)
  • --output, -o PATH: Save to audio file instead of playing
  • --format, -f FORMAT: Output format (default: mp3_44100_128)

Examples:

# Basic usage - play through speakers
elevenlabs-tts-tool synthesize "Hello world"

# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam

# Use specific model
elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2

# Emotional expression (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3

# Multiple emotions
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3

# Add pauses with SSML
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."

# Read from stdin
echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin

# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3

# Pipeline integration
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3

Output: Plays audio through default speakers or saves to specified file format.

Available Formats:

  • mp3_44100_128 (default): MP3, 44.1kHz, 128kbps
  • mp3_44100_64: MP3, 44.1kHz, 64kbps
  • mp3_22050_32: MP3, 22.05kHz, 32kbps
  • pcm_44100: PCM WAV, 44.1kHz (requires Pro tier)

list-voices - Show Available Voices

List all available ElevenLabs voices with characteristics.

Usage:

elevenlabs-tts-tool list-voices

Examples:

# List all voices
elevenlabs-tts-tool list-voices

# Filter by gender
elevenlabs-tts-tool list-voices | grep female
elevenlabs-tts-tool list-voices | grep male

# Filter by accent
elevenlabs-tts-tool list-voices | grep British
elevenlabs-tts-tool list-voices | grep American

# Filter by age
elevenlabs-tts-tool list-voices | grep young
elevenlabs-tts-tool list-voices | grep middle_aged

# Combine filters
elevenlabs-tts-tool list-voices | grep "female.*young.*British"

Output:

Voice           Gender     Age          Accent          Description
====================================================================================================
rachel          female     young        American        Calm and friendly American voice...
adam            male       middle_aged  American        Deep, authoritative American male...
charlotte       female     middle_aged  British         Smooth, professional British voice...
...
====================================================================================================
Total: 42 voices available

Popular Voices:

  • rachel: Calm, friendly American female (default)
  • adam: Deep, authoritative American male
  • charlotte: Professional British female
  • josh: Young, casual American male
  • bella: Expressive Italian female

list-models - Show TTS Models

List all available ElevenLabs TTS models with characteristics and use cases.

Usage:

elevenlabs-tts-tool list-models

Examples:

# List all models
elevenlabs-tts-tool list-models

# Filter by status
elevenlabs-tts-tool list-models | grep stable
elevenlabs-tts-tool list-models | grep deprecated

# Find low-latency models
elevenlabs-tts-tool list-models | grep -i "ultra-low"

# Find multilingual models
elevenlabs-tts-tool list-models | grep -i "multilingual"

Output: Comprehensive model information including:

  • Model ID and version
  • Quality and latency characteristics
  • Language support (mono vs multilingual)
  • Character limits
  • Best use cases
  • Special features (emotions, etc.)

Key Models:

  • eleven_turbo_v2_5: Fast, high-quality (default, best value)
  • eleven_flash_v2_5: Ultra-low latency (real-time applications)
  • eleven_multilingual_v2: 29 languages, production quality
  • eleven_v3: Most expressive with emotion tags (alpha, 2x cost)

Cost Multipliers:

  • Turbo/Flash models: 1x cost
  • Multilingual v2: 1x cost
  • v3 models: 2x cost (half the minutes/tokens)

info - Show Subscription Info

Display subscription tier, character usage, quota limits, and historical usage.

Usage:

elevenlabs-tts-tool info [--days N]

Arguments:

  • --days, -d N: Number of days of historical usage to display (default: 7)

Examples:

# View subscription with last 7 days of usage
elevenlabs-tts-tool info

# View last 30 days of usage
elevenlabs-tts-tool info --days 30

# Quick quota check (1 day)
elevenlabs-tts-tool info --days 1

# Check usage before long generation
elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."

Output Information:

  • Subscription tier and status
  • Character usage (used/limit/remaining)
  • Quota reset date
  • Historical usage breakdown by day
  • Average daily usage
  • Projected monthly usage
  • Warnings when approaching quota limits

Use Cases:

  • Monitor character quota consumption
  • Track usage patterns over time
  • Plan when to upgrade subscription tier
  • Avoid hitting quota limits unexpectedly
  • Identify high-usage periods

update-voices - Update Voice Table

Fetch latest voices from ElevenLabs API and update local lookup table.

Usage:

elevenlabs-tts-tool update-voices [--output PATH]

Arguments:

  • --output, -o PATH: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)

Examples:

# Update default voice lookup (user config directory)
elevenlabs-tts-tool update-voices

# Save to custom location
elevenlabs-tts-tool update-voices --output custom_voices.json

# Update before listing voices
elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices

Behavior:

  • Fetches all premade voices from ElevenLabs API
  • Saves to user config directory by default (~/.config/elevenlabs-tts-tool/)
  • Creates config directory if it doesn't exist
  • Updates take precedence over package default
  • Persists across package reinstalls

pricing - Show Pricing Information

Display ElevenLabs pricing tiers and feature comparison.

Usage:

elevenlabs-tts-tool pricing

Examples:

# View full pricing table
elevenlabs-tts-tool pricing

# Find specific tier information
elevenlabs-tts-tool pricing | grep Creator
elevenlabs-tts-tool pricing | grep "44.1kHz PCM"

Output Information:

  • Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
  • Minutes included per tier
  • Additional minute costs
  • Audio quality options
  • Concurrency limits
  • Priority levels
  • API formats by tier
  • Model cost multipliers

Key Insights:

  • Free tier: 10,000-20,000 characters/month
  • v3 models cost 2x (half the minutes/tokens)
  • Use Flash v2.5 for high-volume integrations
  • Reserve v3 for content requiring emotional expression
  • PCM 44.1kHz requires Pro tier

completion - Shell Completion

Generate shell completion scripts for bash, zsh, or fish.

Usage:

elevenlabs-tts-tool completion [bash|zsh|fish]

Installation:

# Bash (add to ~/.bashrc)
eval "$(elevenlabs-tts-tool completion bash)"

# Zsh (add to ~/.zshrc)
eval "$(elevenlabs-tts-tool completion zsh)"

# Fish (save to completion file)
elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish

Features:

  • Tab-complete commands and subcommands
  • Tab-complete options and flags
  • Context-aware completion for file paths
⚙️ Advanced Features (Click to expand)

Emotion Control (v3 Models)

ElevenLabs v3 model (eleven_v3) supports Audio Tags for emotional expression.

Available Emotion Tags:

  • Basic emotions: [happy], [excited], [sad], [angry], [nervous], [curious]
  • Delivery styles: [cheerfully], [playfully], [mischievously], [resigned tone], [flatly], [deadpan]
  • Speech characteristics: [whispers], [laughs], [gasps], [sighs], [pauses], [hesitates], [stammers], [gulps]

Usage Examples:

# Basic emotion (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3

# Multiple emotions in sequence
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3

# Combine emotions with pauses
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3

# Whispered speech
elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3

# Playful delivery
elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3

Best Practices:

  • Place tags at the beginning of phrases
  • Align text content with emotional intent
  • Test with different voices for best results
  • Use sparingly - let AI infer emotion from context when possible
  • Remember: v3 models cost 2x as much (half the minutes/tokens)

Pause Control (SSML)

Add natural pauses using SSML <break> tags.

Syntax:

<break time="X.Xs" />

Examples:

# 1-second pause
elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service."

# Multiple pauses
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."

# Short pause for emphasis
elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully."

# Combine with emotions (requires eleven_v3)
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3

Limitations:

  • Maximum pause duration: 3 seconds
  • Recommended: 2-4 breaks per generation
  • Too many breaks can cause:
    • AI speedup
    • Audio artifacts
    • Background noise
    • Generation instability

Alternative Methods:

  • Dashes (- or ) for shorter pauses (less consistent)
  • Ellipses (...) for hesitation (may add nervous tone)
  • SSML <break> is most reliable

Verbosity Control

Multi-level verbosity for progressive detail control.

Verbosity Levels:

  • No flag (default): WARNING level - only critical issues
  • -v: INFO level - high-level operations, important events
  • -vv: DEBUG level - detailed operations, API calls, validation steps
  • -vvv: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals

Usage:

# Quiet mode (warnings only)
elevenlabs-tts-tool synthesize "Hello world"

# INFO level
elevenlabs-tts-tool -v synthesize "Hello world"

# DEBUG level (detailed operations)
elevenlabs-tts-tool -vv synthesize "Hello world"

# TRACE level (shows HTTP requests/responses)
elevenlabs-tts-tool -vvv synthesize "Hello world"

Dependent Library Logging: At trace level (-vvv), the following libraries enable DEBUG logging:

  • elevenlabs - ElevenLabs SDK internals
  • httpx / httpcore - HTTP request/response details
  • urllib3 - Low-level HTTP operations

Pipeline Integration

The tool is designed for composition with other CLI tools.

Design Principles:

  • JSON output to stdout, logs/errors to stderr
  • Stdin support for text input
  • Exit codes for success/failure detection
  • Shell completion for productivity

Examples:

# Read from file
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3

# Combine with other tools
gemini-google-search-tool query "AI news" | \
    elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3

# Conditional execution
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
    elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."

# Process multiple texts
for text in "First" "Second" "Third"; do
    elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3"
done

Claude Code Integration

Use elevenlabs-tts-tool as notification system for Claude Code hooks.

Use Cases:

  1. Task Completion Alerts
# After long-running task
elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
  1. Error Notifications
# On build failure
elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
  1. Custom Workflows
# Shell script integration
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
    elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
  1. Multi-Tool Integration
# Combine with other CLI tools
gemini-google-search-tool query "AI news" | \
    elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3

Hook Configuration:

Create hooks in ~/.config/claude-code/hooks.json:

{
  "hooks": {
    "after_command": {
      "type": "bash",
      "command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel"
    },
    "on_error": {
      "type": "bash",
      "command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam"
    }
  }
}

Benefits:

  • Audio alerts for completed tasks without monitoring terminal
  • Error notifications while away from screen
  • Multi-step automation with voice feedback
  • Voice-enabled AI agent pipelines
🔧 Troubleshooting (Click to expand)

Common Issues

Issue: "API key not found" error

# Symptom
Error: ELEVENLABS_API_KEY environment variable not set

Solution:

  1. Get API key from https://elevenlabs.io/app/settings/api-keys
  2. Export as environment variable:
    export ELEVENLABS_API_KEY='your-api-key'
    
  3. Add to shell profile for persistence:
    echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc
    source ~/.bashrc
    

Issue: "Voice not found" error

# Symptom
ValueError: Voice 'unknown' not found in lookup table

Solution:

  1. List available voices:
    elevenlabs-tts-tool list-voices
    
  2. Update voice table if needed:
    elevenlabs-tts-tool update-voices
    
  3. Use correct voice name (case-insensitive):
    elevenlabs-tts-tool synthesize "Hello" --voice rachel
    

Issue: Character quota exceeded

# Symptom
Error: Character quota exceeded for this month

Solution:

  1. Check current usage:
    elevenlabs-tts-tool info
    
  2. Wait until quota reset date
  3. Consider upgrading subscription tier:
    elevenlabs-tts-tool pricing
    
  4. Use more efficient models (Flash/Turbo vs v3)

Issue: Audio quality issues

Symptom: Poor audio quality or artifacts

Solution:

  1. Try different output format:
    elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128
    
  2. Use higher-quality model:
    elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2
    
  3. For professional content, use PCM format (requires Pro tier):
    elevenlabs-tts-tool synthesize "Text" --format pcm_44100
    

Issue: Emotional tags not working

Symptom: Emotion tags like [happy] are spoken literally

Solution:

  1. Ensure using v3 model:
    elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3
    
  2. Place tags at beginning of phrases
  3. Test with different voices (some work better than others)

Issue: Too many SSML breaks causing issues

Symptom: Audio artifacts, speedup, or noise with multiple <break> tags

Solution:

  1. Limit to 2-4 breaks per generation
  2. Use maximum 3 seconds per break
  3. Consider splitting into multiple generations:
    elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3
    elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
    

Getting Help

# Main help
elevenlabs-tts-tool --help

# Command-specific help
elevenlabs-tts-tool synthesize --help
elevenlabs-tts-tool list-voices --help
elevenlabs-tts-tool info --help

# Version information
elevenlabs-tts-tool --version

Additional Resources:

Free Tier Limitations

ElevenLabs Free Tier (2024-2025):

  • 10,000-20,000 characters per month
  • All 42 premade voices
  • Create up to 3 custom voices
  • MP3 formats (all bitrates)
  • Basic SSML support (<break>, phonemes)
  • Emotional tags (v3 models)
  • Full API access
  • No commercial license (personal/experimentation only)
  • PCM 44.1kHz format (requires Pro tier)
  • ⚠️ Max 2,500 characters per single generation

Upgrade Tiers:

  • Starter ($5/month): 30,000 characters, commercial license
  • Creator ($22/month): 100,000 characters, PCM formats
  • Pro ($99/month): 500,000 characters, PCM 44.1kHz, highest priority
  • Scale ($330/month): 2,000,000 characters
  • Business (custom): Custom limits and features

Rate Limits: Not publicly documented - expect reasonable use restrictions on free tier

Exit Codes

  • 0: Success
  • 1: General error (validation, API error, etc.)

Output Formats

Audio Formats:

  • mp3_44100_128: MP3, 44.1kHz, 128kbps (default, best quality)
  • mp3_44100_64: MP3, 44.1kHz, 64kbps (good quality, smaller)
  • mp3_22050_32: MP3, 22.05kHz, 32kbps (acceptable quality, smallest)
  • pcm_44100: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)

Text Formats:

  • Human-readable tables for list commands
  • Structured output with clear sections
  • Errors to stderr, audio/data to stdout

Best Practices

  1. Use Turbo v2.5 for High Volume: Default model offers best value (1x cost, fast, high quality)
  2. Reserve v3 for Emotional Content: Use v3 only when emotion tags needed (costs 2x)
  3. Monitor Quota Regularly: Check info command before large generations
  4. Update Voices Periodically: Run update-voices monthly to get latest voices
  5. Test Voices for Your Use Case: Different voices work better for different content types
  6. Use SSML Breaks Sparingly: Limit to 2-4 breaks per generation for stability
  7. Pipeline for Efficiency: Combine with other tools for automated workflows
  8. Set Verbosity Appropriately: Use -vv or -vvv for debugging, default for production

Resources