777 lines
21 KiB
Markdown
777 lines
21 KiB
Markdown
---
|
|
name: skill-elevenlabs-tts-tool
|
|
description: ElevenLabs text-to-speech CLI tool guide
|
|
---
|
|
|
|
# When to use
|
|
- Converting text to speech with ElevenLabs API
|
|
- Exploring available voices and models
|
|
- Managing TTS subscriptions and usage
|
|
- Integrating TTS into workflows and pipelines
|
|
|
|
# ElevenLabs TTS Tool Skill
|
|
|
|
## Purpose
|
|
|
|
Comprehensive guide for the `elevenlabs-tts-tool` CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.
|
|
|
|
## When to Use This Skill
|
|
|
|
**Use this skill when:**
|
|
- Converting text to speech for notifications, audiobooks, or content creation
|
|
- Exploring and comparing different voice characteristics
|
|
- Managing ElevenLabs subscription quotas and usage
|
|
- Building voice-enabled workflows and automation
|
|
- Integrating TTS into Claude Code hooks or other tools
|
|
|
|
**Do NOT use this skill for:**
|
|
- Direct ElevenLabs API programming (use SDK docs instead)
|
|
- Custom voice cloning (requires ElevenLabs web interface)
|
|
- Real-time streaming TTS (tool focuses on file/playback generation)
|
|
|
|
## CLI Tool: elevenlabs-tts-tool
|
|
|
|
Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/dnvriend/elevenlabs-tts-tool.git
|
|
cd elevenlabs-tts-tool
|
|
|
|
# Install globally with uv
|
|
uv tool install .
|
|
|
|
# Verify installation
|
|
elevenlabs-tts-tool --version
|
|
```
|
|
|
|
### Prerequisites
|
|
|
|
- **Python**: 3.13 or higher
|
|
- **API Key**: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
|
|
- **Environment Variable**: `export ELEVENLABS_API_KEY='your-api-key'`
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Set API key
|
|
export ELEVENLABS_API_KEY='your-api-key'
|
|
|
|
# Basic text-to-speech
|
|
elevenlabs-tts-tool synthesize "Hello world"
|
|
|
|
# Use different voice
|
|
elevenlabs-tts-tool synthesize "Hello" --voice adam
|
|
|
|
# Save to file
|
|
elevenlabs-tts-tool synthesize "Text" --output speech.mp3
|
|
```
|
|
|
|
## Progressive Disclosure
|
|
|
|
<details>
|
|
<summary><strong>📖 Core Commands (Click to expand)</strong></summary>
|
|
|
|
### synthesize - Convert Text to Speech
|
|
|
|
Convert text to speech using ElevenLabs API. Supports direct playback or file output.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]
|
|
```
|
|
|
|
**Arguments:**
|
|
- `TEXT`: Text to synthesize (optional if --stdin used)
|
|
- `--stdin, -s`: Read text from stdin instead of argument
|
|
- `--voice, -v NAME`: Voice name or ID (default: rachel)
|
|
- `--model, -m ID`: Model ID (default: eleven_turbo_v2_5)
|
|
- `--output, -o PATH`: Save to audio file instead of playing
|
|
- `--format, -f FORMAT`: Output format (default: mp3_44100_128)
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Basic usage - play through speakers
|
|
elevenlabs-tts-tool synthesize "Hello world"
|
|
|
|
# Use different voice
|
|
elevenlabs-tts-tool synthesize "Hello" --voice adam
|
|
|
|
# Use specific model
|
|
elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2
|
|
|
|
# Emotional expression (requires eleven_v3 model)
|
|
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3
|
|
|
|
# Multiple emotions
|
|
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3
|
|
|
|
# Add pauses with SSML
|
|
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."
|
|
|
|
# Read from stdin
|
|
echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin
|
|
|
|
# Save to file
|
|
elevenlabs-tts-tool synthesize "Text" --output speech.mp3
|
|
|
|
# Pipeline integration
|
|
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3
|
|
```
|
|
|
|
**Output:**
|
|
Plays audio through default speakers or saves to specified file format.
|
|
|
|
**Available Formats:**
|
|
- `mp3_44100_128` (default): MP3, 44.1kHz, 128kbps
|
|
- `mp3_44100_64`: MP3, 44.1kHz, 64kbps
|
|
- `mp3_22050_32`: MP3, 22.05kHz, 32kbps
|
|
- `pcm_44100`: PCM WAV, 44.1kHz (requires Pro tier)
|
|
|
|
---
|
|
|
|
### list-voices - Show Available Voices
|
|
|
|
List all available ElevenLabs voices with characteristics.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool list-voices
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# List all voices
|
|
elevenlabs-tts-tool list-voices
|
|
|
|
# Filter by gender
|
|
elevenlabs-tts-tool list-voices | grep female
|
|
elevenlabs-tts-tool list-voices | grep male
|
|
|
|
# Filter by accent
|
|
elevenlabs-tts-tool list-voices | grep British
|
|
elevenlabs-tts-tool list-voices | grep American
|
|
|
|
# Filter by age
|
|
elevenlabs-tts-tool list-voices | grep young
|
|
elevenlabs-tts-tool list-voices | grep middle_aged
|
|
|
|
# Combine filters
|
|
elevenlabs-tts-tool list-voices | grep "female.*young.*British"
|
|
```
|
|
|
|
**Output:**
|
|
```
|
|
Voice Gender Age Accent Description
|
|
====================================================================================================
|
|
rachel female young American Calm and friendly American voice...
|
|
adam male middle_aged American Deep, authoritative American male...
|
|
charlotte female middle_aged British Smooth, professional British voice...
|
|
...
|
|
====================================================================================================
|
|
Total: 42 voices available
|
|
```
|
|
|
|
**Popular Voices:**
|
|
- **rachel**: Calm, friendly American female (default)
|
|
- **adam**: Deep, authoritative American male
|
|
- **charlotte**: Professional British female
|
|
- **josh**: Young, casual American male
|
|
- **bella**: Expressive Italian female
|
|
|
|
---
|
|
|
|
### list-models - Show TTS Models
|
|
|
|
List all available ElevenLabs TTS models with characteristics and use cases.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool list-models
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# List all models
|
|
elevenlabs-tts-tool list-models
|
|
|
|
# Filter by status
|
|
elevenlabs-tts-tool list-models | grep stable
|
|
elevenlabs-tts-tool list-models | grep deprecated
|
|
|
|
# Find low-latency models
|
|
elevenlabs-tts-tool list-models | grep -i "ultra-low"
|
|
|
|
# Find multilingual models
|
|
elevenlabs-tts-tool list-models | grep -i "multilingual"
|
|
```
|
|
|
|
**Output:**
|
|
Comprehensive model information including:
|
|
- Model ID and version
|
|
- Quality and latency characteristics
|
|
- Language support (mono vs multilingual)
|
|
- Character limits
|
|
- Best use cases
|
|
- Special features (emotions, etc.)
|
|
|
|
**Key Models:**
|
|
- **eleven_turbo_v2_5**: Fast, high-quality (default, best value)
|
|
- **eleven_flash_v2_5**: Ultra-low latency (real-time applications)
|
|
- **eleven_multilingual_v2**: 29 languages, production quality
|
|
- **eleven_v3**: Most expressive with emotion tags (alpha, 2x cost)
|
|
|
|
**Cost Multipliers:**
|
|
- Turbo/Flash models: 1x cost
|
|
- Multilingual v2: 1x cost
|
|
- v3 models: 2x cost (half the minutes/tokens)
|
|
|
|
---
|
|
|
|
### info - Show Subscription Info
|
|
|
|
Display subscription tier, character usage, quota limits, and historical usage.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool info [--days N]
|
|
```
|
|
|
|
**Arguments:**
|
|
- `--days, -d N`: Number of days of historical usage to display (default: 7)
|
|
|
|
**Examples:**
|
|
```bash
|
|
# View subscription with last 7 days of usage
|
|
elevenlabs-tts-tool info
|
|
|
|
# View last 30 days of usage
|
|
elevenlabs-tts-tool info --days 30
|
|
|
|
# Quick quota check (1 day)
|
|
elevenlabs-tts-tool info --days 1
|
|
|
|
# Check usage before long generation
|
|
elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."
|
|
```
|
|
|
|
**Output Information:**
|
|
- Subscription tier and status
|
|
- Character usage (used/limit/remaining)
|
|
- Quota reset date
|
|
- Historical usage breakdown by day
|
|
- Average daily usage
|
|
- Projected monthly usage
|
|
- Warnings when approaching quota limits
|
|
|
|
**Use Cases:**
|
|
- Monitor character quota consumption
|
|
- Track usage patterns over time
|
|
- Plan when to upgrade subscription tier
|
|
- Avoid hitting quota limits unexpectedly
|
|
- Identify high-usage periods
|
|
|
|
---
|
|
|
|
### update-voices - Update Voice Table
|
|
|
|
Fetch latest voices from ElevenLabs API and update local lookup table.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool update-voices [--output PATH]
|
|
```
|
|
|
|
**Arguments:**
|
|
- `--output, -o PATH`: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Update default voice lookup (user config directory)
|
|
elevenlabs-tts-tool update-voices
|
|
|
|
# Save to custom location
|
|
elevenlabs-tts-tool update-voices --output custom_voices.json
|
|
|
|
# Update before listing voices
|
|
elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices
|
|
```
|
|
|
|
**Behavior:**
|
|
- Fetches all premade voices from ElevenLabs API
|
|
- Saves to user config directory by default (`~/.config/elevenlabs-tts-tool/`)
|
|
- Creates config directory if it doesn't exist
|
|
- Updates take precedence over package default
|
|
- Persists across package reinstalls
|
|
|
|
---
|
|
|
|
### pricing - Show Pricing Information
|
|
|
|
Display ElevenLabs pricing tiers and feature comparison.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool pricing
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# View full pricing table
|
|
elevenlabs-tts-tool pricing
|
|
|
|
# Find specific tier information
|
|
elevenlabs-tts-tool pricing | grep Creator
|
|
elevenlabs-tts-tool pricing | grep "44.1kHz PCM"
|
|
```
|
|
|
|
**Output Information:**
|
|
- Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
|
|
- Minutes included per tier
|
|
- Additional minute costs
|
|
- Audio quality options
|
|
- Concurrency limits
|
|
- Priority levels
|
|
- API formats by tier
|
|
- Model cost multipliers
|
|
|
|
**Key Insights:**
|
|
- Free tier: 10,000-20,000 characters/month
|
|
- v3 models cost 2x (half the minutes/tokens)
|
|
- Use Flash v2.5 for high-volume integrations
|
|
- Reserve v3 for content requiring emotional expression
|
|
- PCM 44.1kHz requires Pro tier
|
|
|
|
---
|
|
|
|
### completion - Shell Completion
|
|
|
|
Generate shell completion scripts for bash, zsh, or fish.
|
|
|
|
**Usage:**
|
|
```bash
|
|
elevenlabs-tts-tool completion [bash|zsh|fish]
|
|
```
|
|
|
|
**Installation:**
|
|
```bash
|
|
# Bash (add to ~/.bashrc)
|
|
eval "$(elevenlabs-tts-tool completion bash)"
|
|
|
|
# Zsh (add to ~/.zshrc)
|
|
eval "$(elevenlabs-tts-tool completion zsh)"
|
|
|
|
# Fish (save to completion file)
|
|
elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish
|
|
```
|
|
|
|
**Features:**
|
|
- Tab-complete commands and subcommands
|
|
- Tab-complete options and flags
|
|
- Context-aware completion for file paths
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>⚙️ Advanced Features (Click to expand)</strong></summary>
|
|
|
|
### Emotion Control (v3 Models)
|
|
|
|
ElevenLabs v3 model (`eleven_v3`) supports **Audio Tags** for emotional expression.
|
|
|
|
**Available Emotion Tags:**
|
|
- Basic emotions: `[happy]`, `[excited]`, `[sad]`, `[angry]`, `[nervous]`, `[curious]`
|
|
- Delivery styles: `[cheerfully]`, `[playfully]`, `[mischievously]`, `[resigned tone]`, `[flatly]`, `[deadpan]`
|
|
- Speech characteristics: `[whispers]`, `[laughs]`, `[gasps]`, `[sighs]`, `[pauses]`, `[hesitates]`, `[stammers]`, `[gulps]`
|
|
|
|
**Usage Examples:**
|
|
```bash
|
|
# Basic emotion (requires eleven_v3 model)
|
|
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3
|
|
|
|
# Multiple emotions in sequence
|
|
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3
|
|
|
|
# Combine emotions with pauses
|
|
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3
|
|
|
|
# Whispered speech
|
|
elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3
|
|
|
|
# Playful delivery
|
|
elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3
|
|
```
|
|
|
|
**Best Practices:**
|
|
- Place tags at the beginning of phrases
|
|
- Align text content with emotional intent
|
|
- Test with different voices for best results
|
|
- Use sparingly - let AI infer emotion from context when possible
|
|
- Remember: v3 models cost 2x as much (half the minutes/tokens)
|
|
|
|
---
|
|
|
|
### Pause Control (SSML)
|
|
|
|
Add natural pauses using SSML `<break>` tags.
|
|
|
|
**Syntax:**
|
|
```xml
|
|
<break time="X.Xs" />
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# 1-second pause
|
|
elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service."
|
|
|
|
# Multiple pauses
|
|
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."
|
|
|
|
# Short pause for emphasis
|
|
elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully."
|
|
|
|
# Combine with emotions (requires eleven_v3)
|
|
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3
|
|
```
|
|
|
|
**Limitations:**
|
|
- Maximum pause duration: 3 seconds
|
|
- Recommended: 2-4 breaks per generation
|
|
- Too many breaks can cause:
|
|
- AI speedup
|
|
- Audio artifacts
|
|
- Background noise
|
|
- Generation instability
|
|
|
|
**Alternative Methods:**
|
|
- Dashes (`-` or `—`) for shorter pauses (less consistent)
|
|
- Ellipses (`...`) for hesitation (may add nervous tone)
|
|
- SSML `<break>` is most reliable
|
|
|
|
---
|
|
|
|
### Verbosity Control
|
|
|
|
Multi-level verbosity for progressive detail control.
|
|
|
|
**Verbosity Levels:**
|
|
- **No flag** (default): WARNING level - only critical issues
|
|
- **`-v`**: INFO level - high-level operations, important events
|
|
- **`-vv`**: DEBUG level - detailed operations, API calls, validation steps
|
|
- **`-vvv`**: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Quiet mode (warnings only)
|
|
elevenlabs-tts-tool synthesize "Hello world"
|
|
|
|
# INFO level
|
|
elevenlabs-tts-tool -v synthesize "Hello world"
|
|
|
|
# DEBUG level (detailed operations)
|
|
elevenlabs-tts-tool -vv synthesize "Hello world"
|
|
|
|
# TRACE level (shows HTTP requests/responses)
|
|
elevenlabs-tts-tool -vvv synthesize "Hello world"
|
|
```
|
|
|
|
**Dependent Library Logging:**
|
|
At trace level (`-vvv`), the following libraries enable DEBUG logging:
|
|
- `elevenlabs` - ElevenLabs SDK internals
|
|
- `httpx` / `httpcore` - HTTP request/response details
|
|
- `urllib3` - Low-level HTTP operations
|
|
|
|
---
|
|
|
|
### Pipeline Integration
|
|
|
|
The tool is designed for composition with other CLI tools.
|
|
|
|
**Design Principles:**
|
|
- JSON output to stdout, logs/errors to stderr
|
|
- Stdin support for text input
|
|
- Exit codes for success/failure detection
|
|
- Shell completion for productivity
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Read from file
|
|
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3
|
|
|
|
# Combine with other tools
|
|
gemini-google-search-tool query "AI news" | \
|
|
elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
|
|
|
|
# Conditional execution
|
|
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
|
|
elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
|
|
|
|
# Process multiple texts
|
|
for text in "First" "Second" "Third"; do
|
|
elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3"
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
### Claude Code Integration
|
|
|
|
Use `elevenlabs-tts-tool` as notification system for Claude Code hooks.
|
|
|
|
**Use Cases:**
|
|
|
|
1. **Task Completion Alerts**
|
|
```bash
|
|
# After long-running task
|
|
elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
|
|
```
|
|
|
|
2. **Error Notifications**
|
|
```bash
|
|
# On build failure
|
|
elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
|
|
```
|
|
|
|
3. **Custom Workflows**
|
|
```bash
|
|
# Shell script integration
|
|
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
|
|
elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
|
|
```
|
|
|
|
4. **Multi-Tool Integration**
|
|
```bash
|
|
# Combine with other CLI tools
|
|
gemini-google-search-tool query "AI news" | \
|
|
elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
|
|
```
|
|
|
|
**Hook Configuration:**
|
|
|
|
Create hooks in `~/.config/claude-code/hooks.json`:
|
|
|
|
```json
|
|
{
|
|
"hooks": {
|
|
"after_command": {
|
|
"type": "bash",
|
|
"command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel"
|
|
},
|
|
"on_error": {
|
|
"type": "bash",
|
|
"command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Benefits:**
|
|
- Audio alerts for completed tasks without monitoring terminal
|
|
- Error notifications while away from screen
|
|
- Multi-step automation with voice feedback
|
|
- Voice-enabled AI agent pipelines
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><strong>🔧 Troubleshooting (Click to expand)</strong></summary>
|
|
|
|
### Common Issues
|
|
|
|
**Issue: "API key not found" error**
|
|
```bash
|
|
# Symptom
|
|
Error: ELEVENLABS_API_KEY environment variable not set
|
|
```
|
|
|
|
**Solution:**
|
|
1. Get API key from https://elevenlabs.io/app/settings/api-keys
|
|
2. Export as environment variable:
|
|
```bash
|
|
export ELEVENLABS_API_KEY='your-api-key'
|
|
```
|
|
3. Add to shell profile for persistence:
|
|
```bash
|
|
echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc
|
|
source ~/.bashrc
|
|
```
|
|
|
|
---
|
|
|
|
**Issue: "Voice not found" error**
|
|
```bash
|
|
# Symptom
|
|
ValueError: Voice 'unknown' not found in lookup table
|
|
```
|
|
|
|
**Solution:**
|
|
1. List available voices:
|
|
```bash
|
|
elevenlabs-tts-tool list-voices
|
|
```
|
|
2. Update voice table if needed:
|
|
```bash
|
|
elevenlabs-tts-tool update-voices
|
|
```
|
|
3. Use correct voice name (case-insensitive):
|
|
```bash
|
|
elevenlabs-tts-tool synthesize "Hello" --voice rachel
|
|
```
|
|
|
|
---
|
|
|
|
**Issue: Character quota exceeded**
|
|
```bash
|
|
# Symptom
|
|
Error: Character quota exceeded for this month
|
|
```
|
|
|
|
**Solution:**
|
|
1. Check current usage:
|
|
```bash
|
|
elevenlabs-tts-tool info
|
|
```
|
|
2. Wait until quota reset date
|
|
3. Consider upgrading subscription tier:
|
|
```bash
|
|
elevenlabs-tts-tool pricing
|
|
```
|
|
4. Use more efficient models (Flash/Turbo vs v3)
|
|
|
|
---
|
|
|
|
**Issue: Audio quality issues**
|
|
|
|
**Symptom:** Poor audio quality or artifacts
|
|
|
|
**Solution:**
|
|
1. Try different output format:
|
|
```bash
|
|
elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128
|
|
```
|
|
2. Use higher-quality model:
|
|
```bash
|
|
elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2
|
|
```
|
|
3. For professional content, use PCM format (requires Pro tier):
|
|
```bash
|
|
elevenlabs-tts-tool synthesize "Text" --format pcm_44100
|
|
```
|
|
|
|
---
|
|
|
|
**Issue: Emotional tags not working**
|
|
|
|
**Symptom:** Emotion tags like `[happy]` are spoken literally
|
|
|
|
**Solution:**
|
|
1. Ensure using v3 model:
|
|
```bash
|
|
elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3
|
|
```
|
|
2. Place tags at beginning of phrases
|
|
3. Test with different voices (some work better than others)
|
|
|
|
---
|
|
|
|
**Issue: Too many SSML breaks causing issues**
|
|
|
|
**Symptom:** Audio artifacts, speedup, or noise with multiple `<break>` tags
|
|
|
|
**Solution:**
|
|
1. Limit to 2-4 breaks per generation
|
|
2. Use maximum 3 seconds per break
|
|
3. Consider splitting into multiple generations:
|
|
```bash
|
|
elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3
|
|
elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
|
|
```
|
|
|
|
---
|
|
|
|
### Getting Help
|
|
|
|
```bash
|
|
# Main help
|
|
elevenlabs-tts-tool --help
|
|
|
|
# Command-specific help
|
|
elevenlabs-tts-tool synthesize --help
|
|
elevenlabs-tts-tool list-voices --help
|
|
elevenlabs-tts-tool info --help
|
|
|
|
# Version information
|
|
elevenlabs-tts-tool --version
|
|
```
|
|
|
|
**Additional Resources:**
|
|
- **GitHub Issues**: https://github.com/dnvriend/elevenlabs-tts-tool/issues
|
|
- **ElevenLabs Docs**: https://elevenlabs.io/docs
|
|
- **API Reference**: https://elevenlabs.io/docs/api-reference
|
|
|
|
</details>
|
|
|
|
## Free Tier Limitations
|
|
|
|
**ElevenLabs Free Tier (2024-2025):**
|
|
- ✅ 10,000-20,000 characters per month
|
|
- ✅ All 42 premade voices
|
|
- ✅ Create up to 3 custom voices
|
|
- ✅ MP3 formats (all bitrates)
|
|
- ✅ Basic SSML support (`<break>`, phonemes)
|
|
- ✅ Emotional tags (v3 models)
|
|
- ✅ Full API access
|
|
- ❌ No commercial license (personal/experimentation only)
|
|
- ❌ PCM 44.1kHz format (requires Pro tier)
|
|
- ⚠️ Max 2,500 characters per single generation
|
|
|
|
**Upgrade Tiers:**
|
|
- **Starter ($5/month)**: 30,000 characters, commercial license
|
|
- **Creator ($22/month)**: 100,000 characters, PCM formats
|
|
- **Pro ($99/month)**: 500,000 characters, PCM 44.1kHz, highest priority
|
|
- **Scale ($330/month)**: 2,000,000 characters
|
|
- **Business (custom)**: Custom limits and features
|
|
|
|
**Rate Limits:** Not publicly documented - expect reasonable use restrictions on free tier
|
|
|
|
## Exit Codes
|
|
|
|
- `0`: Success
|
|
- `1`: General error (validation, API error, etc.)
|
|
|
|
## Output Formats
|
|
|
|
**Audio Formats:**
|
|
- `mp3_44100_128`: MP3, 44.1kHz, 128kbps (default, best quality)
|
|
- `mp3_44100_64`: MP3, 44.1kHz, 64kbps (good quality, smaller)
|
|
- `mp3_22050_32`: MP3, 22.05kHz, 32kbps (acceptable quality, smallest)
|
|
- `pcm_44100`: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)
|
|
|
|
**Text Formats:**
|
|
- Human-readable tables for list commands
|
|
- Structured output with clear sections
|
|
- Errors to stderr, audio/data to stdout
|
|
|
|
## Best Practices
|
|
|
|
1. **Use Turbo v2.5 for High Volume**: Default model offers best value (1x cost, fast, high quality)
|
|
2. **Reserve v3 for Emotional Content**: Use v3 only when emotion tags needed (costs 2x)
|
|
3. **Monitor Quota Regularly**: Check `info` command before large generations
|
|
4. **Update Voices Periodically**: Run `update-voices` monthly to get latest voices
|
|
5. **Test Voices for Your Use Case**: Different voices work better for different content types
|
|
6. **Use SSML Breaks Sparingly**: Limit to 2-4 breaks per generation for stability
|
|
7. **Pipeline for Efficiency**: Combine with other tools for automated workflows
|
|
8. **Set Verbosity Appropriately**: Use `-vv` or `-vvv` for debugging, default for production
|
|
|
|
## Resources
|
|
|
|
- **GitHub Repository**: https://github.com/dnvriend/elevenlabs-tts-tool
|
|
- **ElevenLabs Documentation**: https://elevenlabs.io/docs
|
|
- **API Reference**: https://elevenlabs.io/docs/api-reference
|
|
- **Voice Library**: https://elevenlabs.io/voice-library
|
|
- **Python SDK**: https://github.com/elevenlabs/elevenlabs-python
|
|
- **Claude Code**: https://docs.anthropic.com/claude-code
|