gh-dnvriend-elevenlabs-tts-…/skills/elevenlabs-tts-tool/SKILL.md

---
name: skill-elevenlabs-tts-tool
description: ElevenLabs text-to-speech CLI tool guide
---

# When to use
- Converting text to speech with ElevenLabs API
- Exploring available voices and models
- Managing TTS subscriptions and usage
- Integrating TTS into workflows and pipelines

# ElevenLabs TTS Tool Skill

## Purpose

Comprehensive guide for the `elevenlabs-tts-tool` CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.

## When to Use This Skill

**Use this skill when:**
- Converting text to speech for notifications, audiobooks, or content creation
- Exploring and comparing different voice characteristics
- Managing ElevenLabs subscription quotas and usage
- Building voice-enabled workflows and automation
- Integrating TTS into Claude Code hooks or other tools

**Do NOT use this skill for:**
- Direct ElevenLabs API programming (use SDK docs instead)
- Custom voice cloning (requires ElevenLabs web interface)
- Real-time streaming TTS (tool focuses on file/playback generation)

## CLI Tool: elevenlabs-tts-tool

Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.

### Installation

```bash
# Clone repository
git clone https://github.com/dnvriend/elevenlabs-tts-tool.git
cd elevenlabs-tts-tool

# Install globally with uv
uv tool install .

# Verify installation
elevenlabs-tts-tool --version
```

### Prerequisites

- **Python**: 3.13 or higher
- **API Key**: ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
- **Environment Variable**: `export ELEVENLABS_API_KEY='your-api-key'`

### Quick Start

```bash
# Set API key
export ELEVENLABS_API_KEY='your-api-key'

# Basic text-to-speech
elevenlabs-tts-tool synthesize "Hello world"

# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam

# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3
```

## Progressive Disclosure

<details>
<summary><strong>📖 Core Commands (Click to expand)</strong></summary>

### synthesize - Convert Text to Speech

Convert text to speech using ElevenLabs API. Supports direct playback or file output.

**Usage:**
```bash
elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]
```

**Arguments:**
- `TEXT`: Text to synthesize (optional if --stdin used)
- `--stdin, -s`: Read text from stdin instead of argument
- `--voice, -v NAME`: Voice name or ID (default: rachel)
- `--model, -m ID`: Model ID (default: eleven_turbo_v2_5)
- `--output, -o PATH`: Save to audio file instead of playing
- `--format, -f FORMAT`: Output format (default: mp3_44100_128)

**Examples:**
```bash
# Basic usage - play through speakers
elevenlabs-tts-tool synthesize "Hello world"

# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam

# Use specific model
elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2

# Emotional expression (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3

# Multiple emotions
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3

# Add pauses with SSML
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."

# Read from stdin
echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin

# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3

# Pipeline integration
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3
```

**Output:**
Plays audio through default speakers or saves to specified file format.

**Available Formats:**
- `mp3_44100_128` (default): MP3, 44.1kHz, 128kbps
- `mp3_44100_64`: MP3, 44.1kHz, 64kbps
- `mp3_22050_32`: MP3, 22.05kHz, 32kbps
- `pcm_44100`: PCM WAV, 44.1kHz (requires Pro tier)

---

### list-voices - Show Available Voices

List all available ElevenLabs voices with characteristics.

**Usage:**
```bash
elevenlabs-tts-tool list-voices
```

**Examples:**
```bash
# List all voices
elevenlabs-tts-tool list-voices

# Filter by gender
elevenlabs-tts-tool list-voices | grep female
elevenlabs-tts-tool list-voices | grep male

# Filter by accent
elevenlabs-tts-tool list-voices | grep British
elevenlabs-tts-tool list-voices | grep American

# Filter by age
elevenlabs-tts-tool list-voices | grep young
elevenlabs-tts-tool list-voices | grep middle_aged

# Combine filters
elevenlabs-tts-tool list-voices | grep "female.*young.*British"
```

**Output:**
```
Voice           Gender     Age          Accent          Description
====================================================================================================
rachel          female     young        American        Calm and friendly American voice...
adam            male       middle_aged  American        Deep, authoritative American male...
charlotte       female     middle_aged  British         Smooth, professional British voice...
...
====================================================================================================
Total: 42 voices available
```

**Popular Voices:**
- **rachel**: Calm, friendly American female (default)
- **adam**: Deep, authoritative American male
- **charlotte**: Professional British female
- **josh**: Young, casual American male
- **bella**: Expressive Italian female

---

### list-models - Show TTS Models

List all available ElevenLabs TTS models with characteristics and use cases.

**Usage:**
```bash
elevenlabs-tts-tool list-models
```

**Examples:**
```bash
# List all models
elevenlabs-tts-tool list-models

# Filter by status
elevenlabs-tts-tool list-models | grep stable
elevenlabs-tts-tool list-models | grep deprecated

# Find low-latency models
elevenlabs-tts-tool list-models | grep -i "ultra-low"

# Find multilingual models
elevenlabs-tts-tool list-models | grep -i "multilingual"
```

**Output:**
Comprehensive model information including:
- Model ID and version
- Quality and latency characteristics
- Language support (mono vs multilingual)
- Character limits
- Best use cases
- Special features (emotions, etc.)

**Key Models:**
- **eleven_turbo_v2_5**: Fast, high-quality (default, best value)
- **eleven_flash_v2_5**: Ultra-low latency (real-time applications)
- **eleven_multilingual_v2**: 29 languages, production quality
- **eleven_v3**: Most expressive with emotion tags (alpha, 2x cost)

**Cost Multipliers:**
- Turbo/Flash models: 1x cost
- Multilingual v2: 1x cost
- v3 models: 2x cost (half the minutes/tokens)

---

### info - Show Subscription Info

Display subscription tier, character usage, quota limits, and historical usage.

**Usage:**
```bash
elevenlabs-tts-tool info [--days N]
```

**Arguments:**
- `--days, -d N`: Number of days of historical usage to display (default: 7)

**Examples:**
```bash
# View subscription with last 7 days of usage
elevenlabs-tts-tool info

# View last 30 days of usage
elevenlabs-tts-tool info --days 30

# Quick quota check (1 day)
elevenlabs-tts-tool info --days 1

# Check usage before long generation
elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."
```

**Output Information:**
- Subscription tier and status
- Character usage (used/limit/remaining)
- Quota reset date
- Historical usage breakdown by day
- Average daily usage
- Projected monthly usage
- Warnings when approaching quota limits

**Use Cases:**
- Monitor character quota consumption
- Track usage patterns over time
- Plan when to upgrade subscription tier
- Avoid hitting quota limits unexpectedly
- Identify high-usage periods

---

### update-voices - Update Voice Table

Fetch latest voices from ElevenLabs API and update local lookup table.

**Usage:**
```bash
elevenlabs-tts-tool update-voices [--output PATH]
```

**Arguments:**
- `--output, -o PATH`: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)

**Examples:**
```bash
# Update default voice lookup (user config directory)
elevenlabs-tts-tool update-voices

# Save to custom location
elevenlabs-tts-tool update-voices --output custom_voices.json

# Update before listing voices
elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices
```

**Behavior:**
- Fetches all premade voices from ElevenLabs API
- Saves to user config directory by default (`~/.config/elevenlabs-tts-tool/`)
- Creates config directory if it doesn't exist
- Updates take precedence over package default
- Persists across package reinstalls

---

### pricing - Show Pricing Information

Display ElevenLabs pricing tiers and feature comparison.

**Usage:**
```bash
elevenlabs-tts-tool pricing
```

**Examples:**
```bash
# View full pricing table
elevenlabs-tts-tool pricing

# Find specific tier information
elevenlabs-tts-tool pricing | grep Creator
elevenlabs-tts-tool pricing | grep "44.1kHz PCM"
```

**Output Information:**
- Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
- Minutes included per tier
- Additional minute costs
- Audio quality options
- Concurrency limits
- Priority levels
- API formats by tier
- Model cost multipliers

**Key Insights:**
- Free tier: 10,000-20,000 characters/month
- v3 models cost 2x (half the minutes/tokens)
- Use Flash v2.5 for high-volume integrations
- Reserve v3 for content requiring emotional expression
- PCM 44.1kHz requires Pro tier

---

### completion - Shell Completion

Generate shell completion scripts for bash, zsh, or fish.

**Usage:**
```bash
elevenlabs-tts-tool completion [bash|zsh|fish]
```

**Installation:**
```bash
# Bash (add to ~/.bashrc)
eval "$(elevenlabs-tts-tool completion bash)"

# Zsh (add to ~/.zshrc)
eval "$(elevenlabs-tts-tool completion zsh)"

# Fish (save to completion file)
elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish
```

**Features:**
- Tab-complete commands and subcommands
- Tab-complete options and flags
- Context-aware completion for file paths

</details>

<details>
<summary><strong>⚙️  Advanced Features (Click to expand)</strong></summary>

### Emotion Control (v3 Models)

ElevenLabs v3 model (`eleven_v3`) supports **Audio Tags** for emotional expression.

**Available Emotion Tags:**
- Basic emotions: `[happy]`, `[excited]`, `[sad]`, `[angry]`, `[nervous]`, `[curious]`
- Delivery styles: `[cheerfully]`, `[playfully]`, `[mischievously]`, `[resigned tone]`, `[flatly]`, `[deadpan]`
- Speech characteristics: `[whispers]`, `[laughs]`, `[gasps]`, `[sighs]`, `[pauses]`, `[hesitates]`, `[stammers]`, `[gulps]`

**Usage Examples:**
```bash
# Basic emotion (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3

# Multiple emotions in sequence
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3

# Combine emotions with pauses
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3

# Whispered speech
elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3

# Playful delivery
elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3
```

**Best Practices:**
- Place tags at the beginning of phrases
- Align text content with emotional intent
- Test with different voices for best results
- Use sparingly - let AI infer emotion from context when possible
- Remember: v3 models cost 2x as much (half the minutes/tokens)

---

### Pause Control (SSML)

Add natural pauses using SSML `<break>` tags.

**Syntax:**
```xml
<break time="X.Xs" />
```

**Examples:**
```bash
# 1-second pause
elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service."

# Multiple pauses
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."

# Short pause for emphasis
elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully."

# Combine with emotions (requires eleven_v3)
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3
```

**Limitations:**
- Maximum pause duration: 3 seconds
- Recommended: 2-4 breaks per generation
- Too many breaks can cause:
  - AI speedup
  - Audio artifacts
  - Background noise
  - Generation instability

**Alternative Methods:**
- Dashes (`-` or `—`) for shorter pauses (less consistent)
- Ellipses (`...`) for hesitation (may add nervous tone)
- SSML `<break>` is most reliable

---

### Verbosity Control

Multi-level verbosity for progressive detail control.

**Verbosity Levels:**
- **No flag** (default): WARNING level - only critical issues
- **`-v`**: INFO level - high-level operations, important events
- **`-vv`**: DEBUG level - detailed operations, API calls, validation steps
- **`-vvv`**: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals

**Usage:**
```bash
# Quiet mode (warnings only)
elevenlabs-tts-tool synthesize "Hello world"

# INFO level
elevenlabs-tts-tool -v synthesize "Hello world"

# DEBUG level (detailed operations)
elevenlabs-tts-tool -vv synthesize "Hello world"

# TRACE level (shows HTTP requests/responses)
elevenlabs-tts-tool -vvv synthesize "Hello world"
```

**Dependent Library Logging:**
At trace level (`-vvv`), the following libraries enable DEBUG logging:
- `elevenlabs` - ElevenLabs SDK internals
- `httpx` / `httpcore` - HTTP request/response details
- `urllib3` - Low-level HTTP operations

---

### Pipeline Integration

The tool is designed for composition with other CLI tools.

**Design Principles:**
- JSON output to stdout, logs/errors to stderr
- Stdin support for text input
- Exit codes for success/failure detection
- Shell completion for productivity

**Examples:**
```bash
# Read from file
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3

# Combine with other tools
gemini-google-search-tool query "AI news" | \
    elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3

# Conditional execution
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
    elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."

# Process multiple texts
for text in "First" "Second" "Third"; do
    elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3"
done
```

---

### Claude Code Integration

Use `elevenlabs-tts-tool` as notification system for Claude Code hooks.

**Use Cases:**

1. **Task Completion Alerts**
```bash
# After long-running task
elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
```

2. **Error Notifications**
```bash
# On build failure
elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
```

3. **Custom Workflows**
```bash
# Shell script integration
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
    elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
```

4. **Multi-Tool Integration**
```bash
# Combine with other CLI tools
gemini-google-search-tool query "AI news" | \
    elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3
```

**Hook Configuration:**

Create hooks in `~/.config/claude-code/hooks.json`:

```json
{
  "hooks": {
    "after_command": {
      "type": "bash",
      "command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel"
    },
    "on_error": {
      "type": "bash",
      "command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam"
    }
  }
}
```

**Benefits:**
- Audio alerts for completed tasks without monitoring terminal
- Error notifications while away from screen
- Multi-step automation with voice feedback
- Voice-enabled AI agent pipelines

</details>

<details>
<summary><strong>🔧 Troubleshooting (Click to expand)</strong></summary>

### Common Issues

**Issue: "API key not found" error**
```bash
# Symptom
Error: ELEVENLABS_API_KEY environment variable not set
```

**Solution:**
1. Get API key from https://elevenlabs.io/app/settings/api-keys
2. Export as environment variable:
   ```bash
   export ELEVENLABS_API_KEY='your-api-key'
   ```
3. Add to shell profile for persistence:
   ```bash
   echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc
   source ~/.bashrc
   ```

---

**Issue: "Voice not found" error**
```bash
# Symptom
ValueError: Voice 'unknown' not found in lookup table
```

**Solution:**
1. List available voices:
   ```bash
   elevenlabs-tts-tool list-voices
   ```
2. Update voice table if needed:
   ```bash
   elevenlabs-tts-tool update-voices
   ```
3. Use correct voice name (case-insensitive):
   ```bash
   elevenlabs-tts-tool synthesize "Hello" --voice rachel
   ```

---

**Issue: Character quota exceeded**
```bash
# Symptom
Error: Character quota exceeded for this month
```

**Solution:**
1. Check current usage:
   ```bash
   elevenlabs-tts-tool info
   ```
2. Wait until quota reset date
3. Consider upgrading subscription tier:
   ```bash
   elevenlabs-tts-tool pricing
   ```
4. Use more efficient models (Flash/Turbo vs v3)

---

**Issue: Audio quality issues**

**Symptom:** Poor audio quality or artifacts

**Solution:**
1. Try different output format:
   ```bash
   elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128
   ```
2. Use higher-quality model:
   ```bash
   elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2
   ```
3. For professional content, use PCM format (requires Pro tier):
   ```bash
   elevenlabs-tts-tool synthesize "Text" --format pcm_44100
   ```

---

**Issue: Emotional tags not working**

**Symptom:** Emotion tags like `[happy]` are spoken literally

**Solution:**
1. Ensure using v3 model:
   ```bash
   elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3
   ```
2. Place tags at beginning of phrases
3. Test with different voices (some work better than others)

---

**Issue: Too many SSML breaks causing issues**

**Symptom:** Audio artifacts, speedup, or noise with multiple `<break>` tags

**Solution:**
1. Limit to 2-4 breaks per generation
2. Use maximum 3 seconds per break
3. Consider splitting into multiple generations:
   ```bash
   elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3
   elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
   ```

---

### Getting Help

```bash
# Main help
elevenlabs-tts-tool --help

# Command-specific help
elevenlabs-tts-tool synthesize --help
elevenlabs-tts-tool list-voices --help
elevenlabs-tts-tool info --help

# Version information
elevenlabs-tts-tool --version
```

**Additional Resources:**
- **GitHub Issues**: https://github.com/dnvriend/elevenlabs-tts-tool/issues
- **ElevenLabs Docs**: https://elevenlabs.io/docs
- **API Reference**: https://elevenlabs.io/docs/api-reference

</details>

## Free Tier Limitations

**ElevenLabs Free Tier (2024-2025):**
- ✅ 10,000-20,000 characters per month
- ✅ All 42 premade voices
- ✅ Create up to 3 custom voices
- ✅ MP3 formats (all bitrates)
- ✅ Basic SSML support (`<break>`, phonemes)
- ✅ Emotional tags (v3 models)
- ✅ Full API access
- ❌ No commercial license (personal/experimentation only)
- ❌ PCM 44.1kHz format (requires Pro tier)
- ⚠️ Max 2,500 characters per single generation

**Upgrade Tiers:**
- **Starter ($5/month)**: 30,000 characters, commercial license
- **Creator ($22/month)**: 100,000 characters, PCM formats
- **Pro ($99/month)**: 500,000 characters, PCM 44.1kHz, highest priority
- **Scale ($330/month)**: 2,000,000 characters
- **Business (custom)**: Custom limits and features

**Rate Limits:** Not publicly documented - expect reasonable use restrictions on free tier

## Exit Codes

- `0`: Success
- `1`: General error (validation, API error, etc.)

## Output Formats

**Audio Formats:**
- `mp3_44100_128`: MP3, 44.1kHz, 128kbps (default, best quality)
- `mp3_44100_64`: MP3, 44.1kHz, 64kbps (good quality, smaller)
- `mp3_22050_32`: MP3, 22.05kHz, 32kbps (acceptable quality, smallest)
- `pcm_44100`: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)

**Text Formats:**
- Human-readable tables for list commands
- Structured output with clear sections
- Errors to stderr, audio/data to stdout

## Best Practices

1. **Use Turbo v2.5 for High Volume**: Default model offers best value (1x cost, fast, high quality)
2. **Reserve v3 for Emotional Content**: Use v3 only when emotion tags needed (costs 2x)
3. **Monitor Quota Regularly**: Check `info` command before large generations
4. **Update Voices Periodically**: Run `update-voices` monthly to get latest voices
5. **Test Voices for Your Use Case**: Different voices work better for different content types
6. **Use SSML Breaks Sparingly**: Limit to 2-4 breaks per generation for stability
7. **Pipeline for Efficiency**: Combine with other tools for automated workflows
8. **Set Verbosity Appropriately**: Use `-vv` or `-vvv` for debugging, default for production

## Resources

- **GitHub Repository**: https://github.com/dnvriend/elevenlabs-tts-tool
- **ElevenLabs Documentation**: https://elevenlabs.io/docs
- **API Reference**: https://elevenlabs.io/docs/api-reference
- **Voice Library**: https://elevenlabs.io/voice-library
- **Python SDK**: https://github.com/elevenlabs/elevenlabs-python
- **Claude Code**: https://docs.anthropic.com/claude-code