21 KiB
name, description
| name | description |
|---|---|
| skill-elevenlabs-tts-tool | ElevenLabs text-to-speech CLI tool guide |
When to use
- Converting text to speech with ElevenLabs API
- Exploring available voices and models
- Managing TTS subscriptions and usage
- Integrating TTS into workflows and pipelines
ElevenLabs TTS Tool Skill
Purpose
Comprehensive guide for the elevenlabs-tts-tool CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.
When to Use This Skill
Use this skill when:
- Converting text to speech for notifications, audiobooks, or content creation
- Exploring and comparing different voice characteristics
- Managing ElevenLabs subscription quotas and usage
- Building voice-enabled workflows and automation
- Integrating TTS into Claude Code hooks or other tools
Do NOT use this skill for:
- Direct ElevenLabs API programming (use SDK docs instead)
- Custom voice cloning (requires ElevenLabs web interface)
- Real-time streaming TTS (tool focuses on file/playback generation)
CLI Tool: elevenlabs-tts-tool
Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.
Installation
# Clone repository
git clone https://github.com/dnvriend/elevenlabs-tts-tool.git
cd elevenlabs-tts-tool
# Install globally with uv
uv tool install .
# Verify installation
elevenlabs-tts-tool --version
Prerequisites
- Python: 3.13 or higher
- API Key: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
- Environment Variable:
export ELEVENLABS_API_KEY='your-api-key'
Quick Start
# Set API key
export ELEVENLABS_API_KEY='your-api-key'
# Basic text-to-speech
elevenlabs-tts-tool synthesize "Hello world"
# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam
# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3
Progressive Disclosure
📖 Core Commands (Click to expand)
synthesize - Convert Text to Speech
Convert text to speech using ElevenLabs API. Supports direct playback or file output.
Usage:
elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]
Arguments:
TEXT: Text to synthesize (optional if --stdin used)--stdin, -s: Read text from stdin instead of argument--voice, -v NAME: Voice name or ID (default: rachel)--model, -m ID: Model ID (default: eleven_turbo_v2_5)--output, -o PATH: Save to audio file instead of playing--format, -f FORMAT: Output format (default: mp3_44100_128)
Examples:
# Basic usage - play through speakers
elevenlabs-tts-tool synthesize "Hello world"
# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam
# Use specific model
elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2
# Emotional expression (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3
# Multiple emotions
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3
# Add pauses with SSML
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."
# Read from stdin
echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin
# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3
# Pipeline integration
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3
Output: Plays audio through default speakers or saves to specified file format.
Available Formats:
mp3_44100_128(default): MP3, 44.1kHz, 128kbpsmp3_44100_64: MP3, 44.1kHz, 64kbpsmp3_22050_32: MP3, 22.05kHz, 32kbpspcm_44100: PCM WAV, 44.1kHz (requires Pro tier)
list-voices - Show Available Voices
List all available ElevenLabs voices with characteristics.
Usage:
elevenlabs-tts-tool list-voices
Examples:
# List all voices
elevenlabs-tts-tool list-voices
# Filter by gender
elevenlabs-tts-tool list-voices | grep female
elevenlabs-tts-tool list-voices | grep male
# Filter by accent
elevenlabs-tts-tool list-voices | grep British
elevenlabs-tts-tool list-voices | grep American
# Filter by age
elevenlabs-tts-tool list-voices | grep young
elevenlabs-tts-tool list-voices | grep middle_aged
# Combine filters
elevenlabs-tts-tool list-voices | grep "female.*young.*British"
Output:
Voice Gender Age Accent Description
====================================================================================================
rachel female young American Calm and friendly American voice...
adam male middle_aged American Deep, authoritative American male...
charlotte female middle_aged British Smooth, professional British voice...
...
====================================================================================================
Total: 42 voices available
Popular Voices:
- rachel: Calm, friendly American female (default)
- adam: Deep, authoritative American male
- charlotte: Professional British female
- josh: Young, casual American male
- bella: Expressive Italian female
list-models - Show TTS Models
List all available ElevenLabs TTS models with characteristics and use cases.
Usage:
elevenlabs-tts-tool list-models
Examples:
# List all models
elevenlabs-tts-tool list-models
# Filter by status
elevenlabs-tts-tool list-models | grep stable
elevenlabs-tts-tool list-models | grep deprecated
# Find low-latency models
elevenlabs-tts-tool list-models | grep -i "ultra-low"
# Find multilingual models
elevenlabs-tts-tool list-models | grep -i "multilingual"
Output: Comprehensive model information including:
- Model ID and version
- Quality and latency characteristics
- Language support (mono vs multilingual)
- Character limits
- Best use cases
- Special features (emotions, etc.)
Key Models:
- eleven_turbo_v2_5: Fast, high-quality (default, best value)
- eleven_flash_v2_5: Ultra-low latency (real-time applications)
- eleven_multilingual_v2: 29 languages, production quality
- eleven_v3: Most expressive with emotion tags (alpha, 2x cost)
Cost Multipliers:
- Turbo/Flash models: 1x cost
- Multilingual v2: 1x cost
- v3 models: 2x cost (half the minutes/tokens)
info - Show Subscription Info
Display subscription tier, character usage, quota limits, and historical usage.
Usage:
elevenlabs-tts-tool info [--days N]
Arguments:
--days, -d N: Number of days of historical usage to display (default: 7)
Examples:
# View subscription with last 7 days of usage
elevenlabs-tts-tool info
# View last 30 days of usage
elevenlabs-tts-tool info --days 30
# Quick quota check (1 day)
elevenlabs-tts-tool info --days 1
# Check usage before long generation
elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."
Output Information:
- Subscription tier and status
- Character usage (used/limit/remaining)
- Quota reset date
- Historical usage breakdown by day
- Average daily usage
- Projected monthly usage
- Warnings when approaching quota limits
Use Cases:
- Monitor character quota consumption
- Track usage patterns over time
- Plan when to upgrade subscription tier
- Avoid hitting quota limits unexpectedly
- Identify high-usage periods
update-voices - Update Voice Table
Fetch latest voices from ElevenLabs API and update local lookup table.
Usage:
elevenlabs-tts-tool update-voices [--output PATH]
Arguments:
--output, -o PATH: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)
Examples:
# Update default voice lookup (user config directory)
elevenlabs-tts-tool update-voices
# Save to custom location
elevenlabs-tts-tool update-voices --output custom_voices.json
# Update before listing voices
elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices
Behavior:
- Fetches all premade voices from ElevenLabs API
- Saves to user config directory by default (
~/.config/elevenlabs-tts-tool/) - Creates config directory if it doesn't exist
- Updates take precedence over package default
- Persists across package reinstalls
pricing - Show Pricing Information
Display ElevenLabs pricing tiers and feature comparison.
Usage:
elevenlabs-tts-tool pricing
Examples:
# View full pricing table
elevenlabs-tts-tool pricing
# Find specific tier information
elevenlabs-tts-tool pricing | grep Creator
elevenlabs-tts-tool pricing | grep "44.1kHz PCM"
Output Information:
- Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
- Minutes included per tier
- Additional minute costs
- Audio quality options
- Concurrency limits
- Priority levels
- API formats by tier
- Model cost multipliers
Key Insights:
- Free tier: 10,000-20,000 characters/month
- v3 models cost 2x (half the minutes/tokens)
- Use Flash v2.5 for high-volume integrations
- Reserve v3 for content requiring emotional expression
- PCM 44.1kHz requires Pro tier
completion - Shell Completion
Generate shell completion scripts for bash, zsh, or fish.
Usage:
elevenlabs-tts-tool completion [bash|zsh|fish]
Installation:
# Bash (add to ~/.bashrc)
eval "$(elevenlabs-tts-tool completion bash)"
# Zsh (add to ~/.zshrc)
eval "$(elevenlabs-tts-tool completion zsh)"
# Fish (save to completion file)
elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish
Features:
- Tab-complete commands and subcommands
- Tab-complete options and flags
- Context-aware completion for file paths
⚙️ Advanced Features (Click to expand)
Emotion Control (v3 Models)
ElevenLabs v3 model (eleven_v3) supports Audio Tags for emotional expression.
Available Emotion Tags:
- Basic emotions:
[happy],[excited],[sad],[angry],[nervous],[curious] - Delivery styles:
[cheerfully],[playfully],[mischievously],[resigned tone],[flatly],[deadpan] - Speech characteristics:
[whispers],[laughs],[gasps],[sighs],[pauses],[hesitates],[stammers],[gulps]
Usage Examples:
# Basic emotion (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3
# Multiple emotions in sequence
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3
# Combine emotions with pauses
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3
# Whispered speech
elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3
# Playful delivery
elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3
Best Practices:
- Place tags at the beginning of phrases
- Align text content with emotional intent
- Test with different voices for best results
- Use sparingly - let AI infer emotion from context when possible
- Remember: v3 models cost 2x as much (half the minutes/tokens)
Pause Control (SSML)
Add natural pauses using SSML <break> tags.
Syntax:
<break time="X.Xs" />
Examples:
# 1-second pause
elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service."
# Multiple pauses
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."
# Short pause for emphasis
elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully."
# Combine with emotions (requires eleven_v3)
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3
Limitations:
- Maximum pause duration: 3 seconds
- Recommended: 2-4 breaks per generation
- Too many breaks can cause:
- AI speedup
- Audio artifacts
- Background noise
- Generation instability
Alternative Methods:
- Dashes (
-or—) for shorter pauses (less consistent) - Ellipses (
...) for hesitation (may add nervous tone) - SSML
<break>is most reliable
Verbosity Control
Multi-level verbosity for progressive detail control.
Verbosity Levels:
- No flag (default): WARNING level - only critical issues
-v: INFO level - high-level operations, important events-vv: DEBUG level - detailed operations, API calls, validation steps-vvv: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals
Usage:
# Quiet mode (warnings only)
elevenlabs-tts-tool synthesize "Hello world"
# INFO level
elevenlabs-tts-tool -v synthesize "Hello world"
# DEBUG level (detailed operations)
elevenlabs-tts-tool -vv synthesize "Hello world"
# TRACE level (shows HTTP requests/responses)
elevenlabs-tts-tool -vvv synthesize "Hello world"
Dependent Library Logging:
At trace level (-vvv), the following libraries enable DEBUG logging:
elevenlabs- ElevenLabs SDK internalshttpx/httpcore- HTTP request/response detailsurllib3- Low-level HTTP operations
Pipeline Integration
The tool is designed for composition with other CLI tools.
Design Principles:
- JSON output to stdout, logs/errors to stderr
- Stdin support for text input
- Exit codes for success/failure detection
- Shell completion for productivity
Examples:
# Read from file
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3
# Combine with other tools
gemini-google-search-tool query "AI news" | \
elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
# Conditional execution
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
# Process multiple texts
for text in "First" "Second" "Third"; do
elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3"
done
Claude Code Integration
Use elevenlabs-tts-tool as notification system for Claude Code hooks.
Use Cases:
- Task Completion Alerts
# After long-running task
elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
- Error Notifications
# On build failure
elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
- Custom Workflows
# Shell script integration
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
- Multi-Tool Integration
# Combine with other CLI tools
gemini-google-search-tool query "AI news" | \
elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
Hook Configuration:
Create hooks in ~/.config/claude-code/hooks.json:
{
"hooks": {
"after_command": {
"type": "bash",
"command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel"
},
"on_error": {
"type": "bash",
"command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam"
}
}
}
Benefits:
- Audio alerts for completed tasks without monitoring terminal
- Error notifications while away from screen
- Multi-step automation with voice feedback
- Voice-enabled AI agent pipelines
🔧 Troubleshooting (Click to expand)
Common Issues
Issue: "API key not found" error
# Symptom
Error: ELEVENLABS_API_KEY environment variable not set
Solution:
- Get API key from https://elevenlabs.io/app/settings/api-keys
- Export as environment variable:
export ELEVENLABS_API_KEY='your-api-key' - Add to shell profile for persistence:
echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc source ~/.bashrc
Issue: "Voice not found" error
# Symptom
ValueError: Voice 'unknown' not found in lookup table
Solution:
- List available voices:
elevenlabs-tts-tool list-voices - Update voice table if needed:
elevenlabs-tts-tool update-voices - Use correct voice name (case-insensitive):
elevenlabs-tts-tool synthesize "Hello" --voice rachel
Issue: Character quota exceeded
# Symptom
Error: Character quota exceeded for this month
Solution:
- Check current usage:
elevenlabs-tts-tool info - Wait until quota reset date
- Consider upgrading subscription tier:
elevenlabs-tts-tool pricing - Use more efficient models (Flash/Turbo vs v3)
Issue: Audio quality issues
Symptom: Poor audio quality or artifacts
Solution:
- Try different output format:
elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128 - Use higher-quality model:
elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2 - For professional content, use PCM format (requires Pro tier):
elevenlabs-tts-tool synthesize "Text" --format pcm_44100
Issue: Emotional tags not working
Symptom: Emotion tags like [happy] are spoken literally
Solution:
- Ensure using v3 model:
elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3 - Place tags at beginning of phrases
- Test with different voices (some work better than others)
Issue: Too many SSML breaks causing issues
Symptom: Audio artifacts, speedup, or noise with multiple <break> tags
Solution:
- Limit to 2-4 breaks per generation
- Use maximum 3 seconds per break
- Consider splitting into multiple generations:
elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3 elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
Getting Help
# Main help
elevenlabs-tts-tool --help
# Command-specific help
elevenlabs-tts-tool synthesize --help
elevenlabs-tts-tool list-voices --help
elevenlabs-tts-tool info --help
# Version information
elevenlabs-tts-tool --version
Additional Resources:
- GitHub Issues: https://github.com/dnvriend/elevenlabs-tts-tool/issues
- ElevenLabs Docs: https://elevenlabs.io/docs
- API Reference: https://elevenlabs.io/docs/api-reference
Free Tier Limitations
ElevenLabs Free Tier (2024-2025):
- ✅ 10,000-20,000 characters per month
- ✅ All 42 premade voices
- ✅ Create up to 3 custom voices
- ✅ MP3 formats (all bitrates)
- ✅ Basic SSML support (
<break>, phonemes) - ✅ Emotional tags (v3 models)
- ✅ Full API access
- ❌ No commercial license (personal/experimentation only)
- ❌ PCM 44.1kHz format (requires Pro tier)
- ⚠️ Max 2,500 characters per single generation
Upgrade Tiers:
- Starter ($5/month): 30,000 characters, commercial license
- Creator ($22/month): 100,000 characters, PCM formats
- Pro ($99/month): 500,000 characters, PCM 44.1kHz, highest priority
- Scale ($330/month): 2,000,000 characters
- Business (custom): Custom limits and features
Rate Limits: Not publicly documented - expect reasonable use restrictions on free tier
Exit Codes
0: Success1: General error (validation, API error, etc.)
Output Formats
Audio Formats:
mp3_44100_128: MP3, 44.1kHz, 128kbps (default, best quality)mp3_44100_64: MP3, 44.1kHz, 64kbps (good quality, smaller)mp3_22050_32: MP3, 22.05kHz, 32kbps (acceptable quality, smallest)pcm_44100: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)
Text Formats:
- Human-readable tables for list commands
- Structured output with clear sections
- Errors to stderr, audio/data to stdout
Best Practices
- Use Turbo v2.5 for High Volume: Default model offers best value (1x cost, fast, high quality)
- Reserve v3 for Emotional Content: Use v3 only when emotion tags needed (costs 2x)
- Monitor Quota Regularly: Check
infocommand before large generations - Update Voices Periodically: Run
update-voicesmonthly to get latest voices - Test Voices for Your Use Case: Different voices work better for different content types
- Use SSML Breaks Sparingly: Limit to 2-4 breaks per generation for stability
- Pipeline for Efficiency: Combine with other tools for automated workflows
- Set Verbosity Appropriately: Use
-vvor-vvvfor debugging, default for production
Resources
- GitHub Repository: https://github.com/dnvriend/elevenlabs-tts-tool
- ElevenLabs Documentation: https://elevenlabs.io/docs
- API Reference: https://elevenlabs.io/docs/api-reference
- Voice Library: https://elevenlabs.io/voice-library
- Python SDK: https://github.com/elevenlabs/elevenlabs-python
- Claude Code: https://docs.anthropic.com/claude-code