--- name: skill-elevenlabs-tts-tool description: ElevenLabs text-to-speech CLI tool guide --- # When to use - Converting text to speech with ElevenLabs API - Exploring available voices and models - Managing TTS subscriptions and usage - Integrating TTS into workflows and pipelines # ElevenLabs TTS Tool Skill ## Purpose Comprehensive guide for the `elevenlabs-tts-tool` CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models. ## When to Use This Skill **Use this skill when:** - Converting text to speech for notifications, audiobooks, or content creation - Exploring and comparing different voice characteristics - Managing ElevenLabs subscription quotas and usage - Building voice-enabled workflows and automation - Integrating TTS into Claude Code hooks or other tools **Do NOT use this skill for:** - Direct ElevenLabs API programming (use SDK docs instead) - Custom voice cloning (requires ElevenLabs web interface) - Real-time streaming TTS (tool focuses on file/playback generation) ## CLI Tool: elevenlabs-tts-tool Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK. ### Installation ```bash # Clone repository git clone https://github.com/dnvriend/elevenlabs-tts-tool.git cd elevenlabs-tts-tool # Install globally with uv uv tool install . # Verify installation elevenlabs-tts-tool --version ``` ### Prerequisites - **Python**: 3.13 or higher - **API Key**: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys) - **Environment Variable**: `export ELEVENLABS_API_KEY='your-api-key'` ### Quick Start ```bash # Set API key export ELEVENLABS_API_KEY='your-api-key' # Basic text-to-speech elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3 ``` ## Progressive Disclosure
📖 Core Commands (Click to expand) ### synthesize - Convert Text to Speech Convert text to speech using ElevenLabs API. Supports direct playback or file output. **Usage:** ```bash elevenlabs-tts-tool synthesize [TEXT] [OPTIONS] ``` **Arguments:** - `TEXT`: Text to synthesize (optional if --stdin used) - `--stdin, -s`: Read text from stdin instead of argument - `--voice, -v NAME`: Voice name or ID (default: rachel) - `--model, -m ID`: Model ID (default: eleven_turbo_v2_5) - `--output, -o PATH`: Save to audio file instead of playing - `--format, -f FORMAT`: Output format (default: mp3_44100_128) **Examples:** ```bash # Basic usage - play through speakers elevenlabs-tts-tool synthesize "Hello world" # Use different voice elevenlabs-tts-tool synthesize "Hello" --voice adam # Use specific model elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2 # Emotional expression (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Add pauses with SSML elevenlabs-tts-tool synthesize "Point one Point two Point three." # Read from stdin echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin # Save to file elevenlabs-tts-tool synthesize "Text" --output speech.mp3 # Pipeline integration cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3 ``` **Output:** Plays audio through default speakers or saves to specified file format. **Available Formats:** - `mp3_44100_128` (default): MP3, 44.1kHz, 128kbps - `mp3_44100_64`: MP3, 44.1kHz, 64kbps - `mp3_22050_32`: MP3, 22.05kHz, 32kbps - `pcm_44100`: PCM WAV, 44.1kHz (requires Pro tier) --- ### list-voices - Show Available Voices List all available ElevenLabs voices with characteristics. **Usage:** ```bash elevenlabs-tts-tool list-voices ``` **Examples:** ```bash # List all voices elevenlabs-tts-tool list-voices # Filter by gender elevenlabs-tts-tool list-voices | grep female elevenlabs-tts-tool list-voices | grep male # Filter by accent elevenlabs-tts-tool list-voices | grep British elevenlabs-tts-tool list-voices | grep American # Filter by age elevenlabs-tts-tool list-voices | grep young elevenlabs-tts-tool list-voices | grep middle_aged # Combine filters elevenlabs-tts-tool list-voices | grep "female.*young.*British" ``` **Output:** ``` Voice Gender Age Accent Description ==================================================================================================== rachel female young American Calm and friendly American voice... adam male middle_aged American Deep, authoritative American male... charlotte female middle_aged British Smooth, professional British voice... ... ==================================================================================================== Total: 42 voices available ``` **Popular Voices:** - **rachel**: Calm, friendly American female (default) - **adam**: Deep, authoritative American male - **charlotte**: Professional British female - **josh**: Young, casual American male - **bella**: Expressive Italian female --- ### list-models - Show TTS Models List all available ElevenLabs TTS models with characteristics and use cases. **Usage:** ```bash elevenlabs-tts-tool list-models ``` **Examples:** ```bash # List all models elevenlabs-tts-tool list-models # Filter by status elevenlabs-tts-tool list-models | grep stable elevenlabs-tts-tool list-models | grep deprecated # Find low-latency models elevenlabs-tts-tool list-models | grep -i "ultra-low" # Find multilingual models elevenlabs-tts-tool list-models | grep -i "multilingual" ``` **Output:** Comprehensive model information including: - Model ID and version - Quality and latency characteristics - Language support (mono vs multilingual) - Character limits - Best use cases - Special features (emotions, etc.) **Key Models:** - **eleven_turbo_v2_5**: Fast, high-quality (default, best value) - **eleven_flash_v2_5**: Ultra-low latency (real-time applications) - **eleven_multilingual_v2**: 29 languages, production quality - **eleven_v3**: Most expressive with emotion tags (alpha, 2x cost) **Cost Multipliers:** - Turbo/Flash models: 1x cost - Multilingual v2: 1x cost - v3 models: 2x cost (half the minutes/tokens) --- ### info - Show Subscription Info Display subscription tier, character usage, quota limits, and historical usage. **Usage:** ```bash elevenlabs-tts-tool info [--days N] ``` **Arguments:** - `--days, -d N`: Number of days of historical usage to display (default: 7) **Examples:** ```bash # View subscription with last 7 days of usage elevenlabs-tts-tool info # View last 30 days of usage elevenlabs-tts-tool info --days 30 # Quick quota check (1 day) elevenlabs-tts-tool info --days 1 # Check usage before long generation elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..." ``` **Output Information:** - Subscription tier and status - Character usage (used/limit/remaining) - Quota reset date - Historical usage breakdown by day - Average daily usage - Projected monthly usage - Warnings when approaching quota limits **Use Cases:** - Monitor character quota consumption - Track usage patterns over time - Plan when to upgrade subscription tier - Avoid hitting quota limits unexpectedly - Identify high-usage periods --- ### update-voices - Update Voice Table Fetch latest voices from ElevenLabs API and update local lookup table. **Usage:** ```bash elevenlabs-tts-tool update-voices [--output PATH] ``` **Arguments:** - `--output, -o PATH`: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json) **Examples:** ```bash # Update default voice lookup (user config directory) elevenlabs-tts-tool update-voices # Save to custom location elevenlabs-tts-tool update-voices --output custom_voices.json # Update before listing voices elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices ``` **Behavior:** - Fetches all premade voices from ElevenLabs API - Saves to user config directory by default (`~/.config/elevenlabs-tts-tool/`) - Creates config directory if it doesn't exist - Updates take precedence over package default - Persists across package reinstalls --- ### pricing - Show Pricing Information Display ElevenLabs pricing tiers and feature comparison. **Usage:** ```bash elevenlabs-tts-tool pricing ``` **Examples:** ```bash # View full pricing table elevenlabs-tts-tool pricing # Find specific tier information elevenlabs-tts-tool pricing | grep Creator elevenlabs-tts-tool pricing | grep "44.1kHz PCM" ``` **Output Information:** - Pricing tiers (Free, Starter, Creator, Pro, Scale, Business) - Minutes included per tier - Additional minute costs - Audio quality options - Concurrency limits - Priority levels - API formats by tier - Model cost multipliers **Key Insights:** - Free tier: 10,000-20,000 characters/month - v3 models cost 2x (half the minutes/tokens) - Use Flash v2.5 for high-volume integrations - Reserve v3 for content requiring emotional expression - PCM 44.1kHz requires Pro tier --- ### completion - Shell Completion Generate shell completion scripts for bash, zsh, or fish. **Usage:** ```bash elevenlabs-tts-tool completion [bash|zsh|fish] ``` **Installation:** ```bash # Bash (add to ~/.bashrc) eval "$(elevenlabs-tts-tool completion bash)" # Zsh (add to ~/.zshrc) eval "$(elevenlabs-tts-tool completion zsh)" # Fish (save to completion file) elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish ``` **Features:** - Tab-complete commands and subcommands - Tab-complete options and flags - Context-aware completion for file paths
⚙️ Advanced Features (Click to expand) ### Emotion Control (v3 Models) ElevenLabs v3 model (`eleven_v3`) supports **Audio Tags** for emotional expression. **Available Emotion Tags:** - Basic emotions: `[happy]`, `[excited]`, `[sad]`, `[angry]`, `[nervous]`, `[curious]` - Delivery styles: `[cheerfully]`, `[playfully]`, `[mischievously]`, `[resigned tone]`, `[flatly]`, `[deadpan]` - Speech characteristics: `[whispers]`, `[laughs]`, `[gasps]`, `[sighs]`, `[pauses]`, `[hesitates]`, `[stammers]`, `[gulps]` **Usage Examples:** ```bash # Basic emotion (requires eleven_v3 model) elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3 # Multiple emotions in sequence elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3 # Combine emotions with pauses elevenlabs-tts-tool synthesize "[happy] Hello! [curious] How are you today?" --model eleven_v3 # Whispered speech elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3 # Playful delivery elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3 ``` **Best Practices:** - Place tags at the beginning of phrases - Align text content with emotional intent - Test with different voices for best results - Use sparingly - let AI infer emotion from context when possible - Remember: v3 models cost 2x as much (half the minutes/tokens) --- ### Pause Control (SSML) Add natural pauses using SSML `` tags. **Syntax:** ```xml ``` **Examples:** ```bash # 1-second pause elevenlabs-tts-tool synthesize "Welcome to our service." # Multiple pauses elevenlabs-tts-tool synthesize "Point one Point two Point three." # Short pause for emphasis elevenlabs-tts-tool synthesize "Think about this carefully." # Combine with emotions (requires eleven_v3) elevenlabs-tts-tool synthesize "[happy] Hello! [cheerfully] How are you?" --model eleven_v3 ``` **Limitations:** - Maximum pause duration: 3 seconds - Recommended: 2-4 breaks per generation - Too many breaks can cause: - AI speedup - Audio artifacts - Background noise - Generation instability **Alternative Methods:** - Dashes (`-` or `—`) for shorter pauses (less consistent) - Ellipses (`...`) for hesitation (may add nervous tone) - SSML `` is most reliable --- ### Verbosity Control Multi-level verbosity for progressive detail control. **Verbosity Levels:** - **No flag** (default): WARNING level - only critical issues - **`-v`**: INFO level - high-level operations, important events - **`-vv`**: DEBUG level - detailed operations, API calls, validation steps - **`-vvv`**: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals **Usage:** ```bash # Quiet mode (warnings only) elevenlabs-tts-tool synthesize "Hello world" # INFO level elevenlabs-tts-tool -v synthesize "Hello world" # DEBUG level (detailed operations) elevenlabs-tts-tool -vv synthesize "Hello world" # TRACE level (shows HTTP requests/responses) elevenlabs-tts-tool -vvv synthesize "Hello world" ``` **Dependent Library Logging:** At trace level (`-vvv`), the following libraries enable DEBUG logging: - `elevenlabs` - ElevenLabs SDK internals - `httpx` / `httpcore` - HTTP request/response details - `urllib3` - Low-level HTTP operations --- ### Pipeline Integration The tool is designed for composition with other CLI tools. **Design Principles:** - JSON output to stdout, logs/errors to stderr - Stdin support for text input - Exit codes for success/failure detection - Shell completion for productivity **Examples:** ```bash # Read from file cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3 # Combine with other tools gemini-google-search-tool query "AI news" | \ elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3 # Conditional execution make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \ elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs." # Process multiple texts for text in "First" "Second" "Third"; do elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3" done ``` --- ### Claude Code Integration Use `elevenlabs-tts-tool` as notification system for Claude Code hooks. **Use Cases:** 1. **Task Completion Alerts** ```bash # After long-running task elevenlabs-tts-tool synthesize "[excited] Task completed successfully!" ``` 2. **Error Notifications** ```bash # On build failure elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output." ``` 3. **Custom Workflows** ```bash # Shell script integration make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \ elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs." ``` 4. **Multi-Tool Integration** ```bash # Combine with other CLI tools gemini-google-search-tool query "AI news" | \ elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3 ``` **Hook Configuration:** Create hooks in `~/.config/claude-code/hooks.json`: ```json { "hooks": { "after_command": { "type": "bash", "command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel" }, "on_error": { "type": "bash", "command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam" } } } ``` **Benefits:** - Audio alerts for completed tasks without monitoring terminal - Error notifications while away from screen - Multi-step automation with voice feedback - Voice-enabled AI agent pipelines
🔧 Troubleshooting (Click to expand) ### Common Issues **Issue: "API key not found" error** ```bash # Symptom Error: ELEVENLABS_API_KEY environment variable not set ``` **Solution:** 1. Get API key from https://elevenlabs.io/app/settings/api-keys 2. Export as environment variable: ```bash export ELEVENLABS_API_KEY='your-api-key' ``` 3. Add to shell profile for persistence: ```bash echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc source ~/.bashrc ``` --- **Issue: "Voice not found" error** ```bash # Symptom ValueError: Voice 'unknown' not found in lookup table ``` **Solution:** 1. List available voices: ```bash elevenlabs-tts-tool list-voices ``` 2. Update voice table if needed: ```bash elevenlabs-tts-tool update-voices ``` 3. Use correct voice name (case-insensitive): ```bash elevenlabs-tts-tool synthesize "Hello" --voice rachel ``` --- **Issue: Character quota exceeded** ```bash # Symptom Error: Character quota exceeded for this month ``` **Solution:** 1. Check current usage: ```bash elevenlabs-tts-tool info ``` 2. Wait until quota reset date 3. Consider upgrading subscription tier: ```bash elevenlabs-tts-tool pricing ``` 4. Use more efficient models (Flash/Turbo vs v3) --- **Issue: Audio quality issues** **Symptom:** Poor audio quality or artifacts **Solution:** 1. Try different output format: ```bash elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128 ``` 2. Use higher-quality model: ```bash elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2 ``` 3. For professional content, use PCM format (requires Pro tier): ```bash elevenlabs-tts-tool synthesize "Text" --format pcm_44100 ``` --- **Issue: Emotional tags not working** **Symptom:** Emotion tags like `[happy]` are spoken literally **Solution:** 1. Ensure using v3 model: ```bash elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3 ``` 2. Place tags at beginning of phrases 3. Test with different voices (some work better than others) --- **Issue: Too many SSML breaks causing issues** **Symptom:** Audio artifacts, speedup, or noise with multiple `` tags **Solution:** 1. Limit to 2-4 breaks per generation 2. Use maximum 3 seconds per break 3. Consider splitting into multiple generations: ```bash elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3 elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3 ``` --- ### Getting Help ```bash # Main help elevenlabs-tts-tool --help # Command-specific help elevenlabs-tts-tool synthesize --help elevenlabs-tts-tool list-voices --help elevenlabs-tts-tool info --help # Version information elevenlabs-tts-tool --version ``` **Additional Resources:** - **GitHub Issues**: https://github.com/dnvriend/elevenlabs-tts-tool/issues - **ElevenLabs Docs**: https://elevenlabs.io/docs - **API Reference**: https://elevenlabs.io/docs/api-reference
## Free Tier Limitations **ElevenLabs Free Tier (2024-2025):** - ✅ 10,000-20,000 characters per month - ✅ All 42 premade voices - ✅ Create up to 3 custom voices - ✅ MP3 formats (all bitrates) - ✅ Basic SSML support (``, phonemes) - ✅ Emotional tags (v3 models) - ✅ Full API access - ❌ No commercial license (personal/experimentation only) - ❌ PCM 44.1kHz format (requires Pro tier) - ⚠️ Max 2,500 characters per single generation **Upgrade Tiers:** - **Starter ($5/month)**: 30,000 characters, commercial license - **Creator ($22/month)**: 100,000 characters, PCM formats - **Pro ($99/month)**: 500,000 characters, PCM 44.1kHz, highest priority - **Scale ($330/month)**: 2,000,000 characters - **Business (custom)**: Custom limits and features **Rate Limits:** Not publicly documented - expect reasonable use restrictions on free tier ## Exit Codes - `0`: Success - `1`: General error (validation, API error, etc.) ## Output Formats **Audio Formats:** - `mp3_44100_128`: MP3, 44.1kHz, 128kbps (default, best quality) - `mp3_44100_64`: MP3, 44.1kHz, 64kbps (good quality, smaller) - `mp3_22050_32`: MP3, 22.05kHz, 32kbps (acceptable quality, smallest) - `pcm_44100`: PCM WAV, 44.1kHz, uncompressed (requires Pro tier) **Text Formats:** - Human-readable tables for list commands - Structured output with clear sections - Errors to stderr, audio/data to stdout ## Best Practices 1. **Use Turbo v2.5 for High Volume**: Default model offers best value (1x cost, fast, high quality) 2. **Reserve v3 for Emotional Content**: Use v3 only when emotion tags needed (costs 2x) 3. **Monitor Quota Regularly**: Check `info` command before large generations 4. **Update Voices Periodically**: Run `update-voices` monthly to get latest voices 5. **Test Voices for Your Use Case**: Different voices work better for different content types 6. **Use SSML Breaks Sparingly**: Limit to 2-4 breaks per generation for stability 7. **Pipeline for Efficiency**: Combine with other tools for automated workflows 8. **Set Verbosity Appropriately**: Use `-vv` or `-vvv` for debugging, default for production ## Resources - **GitHub Repository**: https://github.com/dnvriend/elevenlabs-tts-tool - **ElevenLabs Documentation**: https://elevenlabs.io/docs - **API Reference**: https://elevenlabs.io/docs/api-reference - **Voice Library**: https://elevenlabs.io/voice-library - **Python SDK**: https://github.com/elevenlabs/elevenlabs-python - **Claude Code**: https://docs.anthropic.com/claude-code