zhongwei/gh-jezweb-claude-skills-skills-elevenlabs-agents

Fork 0

Files

Zhongwei Li 49178918d7 Initial commit

2025-11-30 08:24:46 +08:00

24 KiB

Raw Permalink Blame History

name, description, license, metadata

name

description

license

metadata

elevenlabs-agents

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication

MIT

version

last_updated

production_tested

packages

documentation

errors_prevented

token_savings

1.2.0

2025-11-25

true

name	version
@elevenlabs/elevenlabs-js	2.25.0

name	version
@elevenlabs/agents-cli	0.6.1

name	version
@elevenlabs/react	0.11.3

name	version
@elevenlabs/client	0.11.3

name	version
@elevenlabs/react-native	0.5.4

https://elevenlabs.io/docs/agents-platform/overview

https://elevenlabs.io/docs/api-reference

https://github.com/elevenlabs/elevenlabs-examples

17+

~73%

ElevenLabs Agents Platform

Overview

ElevenLabs Agents Platform is a comprehensive solution for building production-ready conversational AI voice agents. The platform coordinates four core components:

ASR (Automatic Speech Recognition) - Converts speech to text (32+ languages, sub-second latency)
LLM (Large Language Model) - Reasoning and response generation (GPT, Claude, Gemini, custom models)
TTS (Text-to-Speech) - Converts text to speech (5000+ voices, 31 languages, low latency)
Turn-Taking Model - Proprietary model that handles conversation timing and interruptions

🚨 Package Updates (November 2025)

ElevenLabs migrated to new scoped packages in August 2025. Current packages:

npm install @elevenlabs/react@0.11.3           # React SDK
npm install @elevenlabs/client@0.11.3          # JavaScript SDK
npm install @elevenlabs/react-native@0.5.4     # React Native SDK
npm install @elevenlabs/elevenlabs-js@2.25.0   # Base SDK (Python: elevenlabs@1.59.0)
npm install -g @elevenlabs/agents-cli@0.6.1    # CLI

DEPRECATED: @11labs/react, @11labs/client (uninstall if present)

⚠️ CRITICAL: v1 TTS models will be removed 2025-12-15. Migrate to Turbo v2/v2.5.

1. Quick Start

React SDK

npm install @elevenlabs/react zod

import { useConversation } from '@elevenlabs/react';

const { startConversation, stopConversation, status } = useConversation({
  agentId: 'your-agent-id',
  signedUrl: '/api/elevenlabs/auth', // Recommended (secure)
  // OR apiKey: process.env.NEXT_PUBLIC_ELEVENLABS_API_KEY,

  clientTools: { /* browser-side tools */ },
  onEvent: (event) => { /* transcript, agent_response, tool_call */ },
  serverLocation: 'us' // 'eu-residency' | 'in-residency' | 'global'
});

CLI ("Agents as Code")

npm install -g @elevenlabs/agents-cli
elevenlabs auth login
elevenlabs agents init                              # Creates agents.json, tools.json, tests.json
elevenlabs agents add "Bot" --template customer-service
elevenlabs agents push --env dev                    # Deploy
elevenlabs agents test "Bot"                        # Test

API (Programmatic)

import { ElevenLabsClient } from 'elevenlabs';
const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY });

const agent = await client.agents.create({
  name: 'Support Bot',
  conversation_config: {
    agent: { prompt: { prompt: "...", llm: "gpt-4o" }, language: "en" },
    tts: { model_id: "eleven_turbo_v2_5", voice_id: "your-voice-id" }
  }
});

2. Agent Configuration

System Prompt Architecture (6 Components)

1. Personality - Identity, role, character traits 2. Environment - Communication context (phone, web, video) 3. Tone - Formality, speech patterns, verbosity 4. Goal - Objectives and success criteria 5. Guardrails - Boundaries, prohibited topics, ethical constraints 6. Tools - Available capabilities and when to use them

Template:

{
  "agent": {
    "prompt": {
      "prompt": "Personality:\n[Agent identity and role]\n\nEnvironment:\n[Communication context]\n\nTone:\n[Speech style]\n\nGoal:\n[Primary objectives]\n\nGuardrails:\n[Boundaries and constraints]\n\nTools:\n[Available tools and usage]",
      "llm": "gpt-4o", // gpt-5.1, claude-sonnet-4-5, gemini-3-pro-preview
      "temperature": 0.7
    }
  }
}

2025 LLM Models:

gpt-5.1, gpt-5.1-2025-11-13 (Oct 2025)
claude-sonnet-4-5, claude-sonnet-4-5@20250929 (Oct 2025)
gemini-3-pro-preview (2025)
gemini-2.5-flash-preview-09-2025 (Oct 2025)

Turn-Taking Modes

Mode	Behavior	Best For
Eager	Responds quickly	Fast-paced support, quick orders
Normal	Balanced (default)	General customer service
Patient	Waits longer	Information collection, therapy

{ "conversation_config": { "turn": { "mode": "patient" } } }

Workflows & Agent Management (2025)

Workflow Features:

Subagent Nodes - Override prompt, voice, turn-taking per node
Tool Nodes - Guarantee tool execution
Edges - Conditional routing with edge_order (determinism, Oct 2025)

{
  "workflow": {
    "nodes": [
      { "id": "node_1", "type": "subagent", "config": { "system_prompt": "...", "turn_eagerness": "patient" } },
      { "id": "node_2", "type": "tool", "tool_name": "transfer_to_human" }
    ],
    "edges": [{ "from": "node_1", "to": "node_2", "condition": "escalation", "edge_order": 1 }]
  }
}

Agent Management (2025):

Agent Archiving - archived: true field (Oct 2025)
Agent Duplication - Clone existing agents
Service Account API Keys - Management endpoints (Jul 2025)

Dynamic Variables

Use {{var_name}} syntax in prompts, messages, and tool parameters.

System Variables:

{{system__agent_id}}, {{system__conversation_id}}
{{system__caller_id}}, {{system__called_number}} (telephony)
{{system__call_duration_secs}}, {{system__time_utc}}
{{system__call_sid}} (Twilio only)

Custom Variables:

await client.conversations.create({
  agent_id: "agent_123",
  dynamic_variables: { user_name: "John", account_tier: "premium" }
});

Secret Variables: {{secret__api_key}} (headers only, never sent to LLM)

⚠️ Error: Missing variables cause "Missing required dynamic variables" - always provide all referenced variables.

3. Voice & Language Features

Multi-Voice, Pronunciation & Speed

Multi-Voice - Switch voices dynamically (adds ~200ms latency per switch):

{ "prompt": "When speaking as customer, use voice_id 'voice_abc'. As agent, use 'voice_def'." }

Pronunciation Dictionary - IPA, CMU, word substitutions (Turbo v2/v2.5 only):

{
  "pronunciation_dictionary": [
    { "word": "API", "pronunciation": "ey-pee-ay", "format": "cmu" },
    { "word": "AI", "substitution": "artificial intelligence" }
  ]
}

PATCH Support (Aug 2025) - Update dictionaries without replacement

Speed Control - 0.7x-1.2x (use 0.9x-1.1x for natural sound):

{ "voice_settings": { "speed": 1.0 } }

Voice Cloning Best Practices:

Clean audio (no noise, music, pops)
Consistent microphone distance
1-2 minutes of audio
Use language-matched voices (English voices fail on non-English)

Language Configuration

32+ Languages with automatic detection and in-conversation switching.

Multi-Language Presets:

{
  "language_presets": [
    { "language": "en", "voice_id": "en_voice", "first_message": "Hello!" },
    { "language": "es", "voice_id": "es_voice", "first_message": "¡Hola!" }
  ]
}

4. Knowledge Base & RAG

Enable agents to access large knowledge bases without loading entire documents into context.

Workflow:

Upload documents (PDF, TXT, DOCX)
Compute RAG index (vector embeddings)
Agent retrieves relevant chunks during conversation

Configuration:

{
  "agent": { "prompt": { "knowledge_base": ["doc_id_1", "doc_id_2"] } },
  "knowledge_base_config": {
    "max_chunks": 5,
    "vector_distance_threshold": 0.8
  }
}

API Upload:

const doc = await client.knowledgeBase.upload({ file: fs.createReadStream('docs.pdf'), name: 'Docs' });
await client.knowledgeBase.computeRagIndex({ document_id: doc.id, embedding_model: 'e5_mistral_7b' });

⚠️ Gotchas: RAG adds ~500ms latency. Check index status before use - indexing can take minutes.

5. Tools (4 Types)

A. Client Tools (Browser/Mobile)

Execute in browser or mobile app. Tool names case-sensitive.

clientTools: {
  updateCart: {
    description: "Update shopping cart",
    parameters: z.object({ item: z.string(), quantity: z.number() }),
    handler: async ({ item, quantity }) => {
      // Client-side logic
      return { success: true };
    }
  }
}

B. Server Tools (Webhooks)

HTTP requests to external APIs. PUT support added Apr 2025.

{
  "name": "get_weather",
  "url": "https://api.weather.com/{{user_id}}",
  "method": "GET",
  "headers": { "Authorization": "Bearer {{secret__api_key}}" },
  "parameters": { "type": "object", "properties": { "city": { "type": "string" } } }
}

⚠️ Secret variables only in headers (not URL/body)

2025 Features:

transfer-to-human system tool (Apr 2025)
tool_latency_secs tracking (Apr 2025)

C. MCP Tools (Model Context Protocol)

Connect to MCP servers for databases, IDEs, data sources.

Configuration: Dashboard → Add Custom MCP Server → Configure SSE/HTTP endpoint

Approval Modes: Always Ask | Fine-Grained | No Approval

2025 Updates:

disable_interruptions flag (Oct 2025) - Prevents interruption during tool execution
Tools Management Interface (Jun 2025)

⚠️ Limitations: SSE/HTTP only. Not available for Zero Retention or HIPAA.

D. System Tools

Built-in conversation control (no external APIs):

end_call, detect_language, transfer_agent
transfer_to_number (telephony)
dtmf_playpad, voicemail_detection (telephony)

2025: use_out_of_band_dtmf flag for telephony integration

6. SDK Integration

useConversation Hook (React/React Native)

const { startConversation, stopConversation, status, isSpeaking } = useConversation({
  agentId: 'your-agent-id',
  signedUrl: '/api/auth', // OR apiKey: process.env.NEXT_PUBLIC_ELEVENLABS_API_KEY
  clientTools: { /* ... */ },
  onEvent: (event) => { /* transcript, agent_response, tool_call, agent_tool_request (Oct 2025) */ },
  onConnect/onDisconnect/onError,
  serverLocation: 'us' // 'eu-residency' | 'in-residency' | 'global'
});

2025 Events:

agent_chat_response_part - Streaming responses (Oct 2025)
agent_tool_request - Tool interaction tracking (Oct 2025)

Connection Types: WebRTC vs WebSocket

Feature	WebSocket	WebRTC (Jul 2025 rollout)
Auth	`signedUrl`	`conversationToken`
Audio	Configurable (16k/24k/48k)	PCM_48000 (hardcoded)
Latency	Standard	Lower
Best For	Flexibility	Low-latency

⚠️ WebRTC: Hardcoded PCM_48000, limited device switching

Platforms

React: @elevenlabs/react@0.11.3
JavaScript: @elevenlabs/client@0.11.3 - new Conversation({...})
React Native: @elevenlabs/react-native@0.5.4 - Expo SDK 47+, iOS/macOS (custom build required, no Expo Go)
Swift: iOS 14.0+, macOS 11.0+, Swift 5.9+
Embeddable Widget: <script src="https://elevenlabs.io/convai-widget/index.js"></script>

Scribe (Real-Time Speech-to-Text - Beta 2025)

Real-time transcription with word-level timestamps. Single-use tokens, not API keys.

const { connect, startRecording, stopRecording, transcript, partialTranscript } = useScribe({
  token: async () => (await fetch('/api/scribe/token')).json().then(d => d.token),
  commitStrategy: 'vad', // 'vad' (auto on silence) | 'manual' (explicit .commit())
  sampleRate: 16000, // 16000 or 24000
  onPartialTranscript/onFinalTranscript/onError
});

Events: PARTIAL_TRANSCRIPT, FINAL_TRANSCRIPT_WITH_TIMESTAMPS, SESSION_STARTED, ERROR

⚠️ Closed Beta - requires sales contact. For agents, use Agents Platform instead (LLM + TTS + two-way interaction).

7. Testing & Evaluation

🆕 Agent Testing Framework (Aug 2025)

Comprehensive automated testing with 9 new API endpoints for creating, managing, and executing tests.

Test Types:

Scenario Testing - LLM-based evaluation against success criteria
Tool Call Testing - Verify correct tool usage and parameters
Load Testing - High-concurrency capacity testing

CLI Workflow:

# Create test
elevenlabs tests add "Refund Test" --template basic-llm

# Configure in test_configs/refund-test.json
{
  "name": "Refund Test",
  "scenario": "Customer requests refund",
  "success_criteria": ["Agent acknowledges empathetically", "Verifies order details"],
  "expected_tool_call": { "tool_name": "lookup_order", "parameters": { "order_id": "..." } }
}

# Deploy and execute
elevenlabs tests push
elevenlabs agents test "Support Agent"

9 New API Endpoints (Aug 2025):

POST /v1/convai/tests - Create test
GET /v1/convai/tests/:id - Retrieve test
PATCH /v1/convai/tests/:id - Update test
DELETE /v1/convai/tests/:id - Delete test
POST /v1/convai/tests/:id/execute - Execute test
GET /v1/convai/test-invocations - List invocations (pagination, agent filtering)
POST /v1/convai/test-invocations/:id/resubmit - Resubmit failed test
GET /v1/convai/test-results/:id - Get results
GET /v1/convai/test-results/:id/debug - Detailed debugging info

Test Invocation Listing (Oct 2025):

const invocations = await client.convai.testInvocations.list({
  agent_id: 'agent_123',      // Filter by agent
  page_size: 30,              // Default 30, max 100
  cursor: 'next_page_cursor'  // Pagination
});
// Returns: test run counts, pass/fail stats, titles

Programmatic Testing:

const simulation = await client.agents.simulate({
  agent_id: 'agent_123',
  scenario: 'Refund request',
  user_messages: ["I want a refund", "Order #12345"],
  success_criteria: ["Acknowledges request", "Verifies order"]
});
console.log('Passed:', simulation.passed);

Agent Tracking (Oct 2025): Tests now include agent_id association for better organization

8. Analytics & Monitoring

2025 Features:

Custom Dashboard Charts (Apr 2025) - Display evaluation criteria metrics over time
Call History Filtering (Apr 2025) - call_start_before_unix parameter
Multi-Voice History - Separate conversation history by voice
LLM Cost Tracking - Per agent/conversation costs with aggregation_interval (hour/day/week/month)
Tool Latency (Apr 2025) - tool_latency_secs tracking
Usage Metrics - minutes_used, request_count, ttfb_avg, ttfb_p95

Conversation Analysis: Success evaluation (LLM-based), data collection fields, post-call webhooks

Access: Dashboard → Analytics | Post-call Webhooks | API

9. Privacy & Compliance

Data Retention: 2 years default (GDPR). Configure: { "transcripts": { "retention_days": 730 }, "audio": { "retention_days": 2190 } }

Encryption: TLS 1.3 (transit), AES-256 (rest)

Regional: serverLocation: 'eu-residency' | 'us' | 'global' | 'in-residency'

Zero Retention Mode: Immediate deletion (no history, analytics, webhooks, or MCP)

Compliance: GDPR (1-2 years), HIPAA (6 years), SOC 2 (automatic encryption)

10. Cost Optimization

LLM Caching: Up to 90% savings on repeated inputs. { "caching": { "enabled": true, "ttl_seconds": 3600 } }

Model Swapping: GPT-5.1, GPT-4o/mini, Claude Sonnet 4.5, Gemini 3 Pro/2.5 Flash (2025 models)

Burst Pricing: 3x concurrency limit at 2x cost. { "burst_pricing_enabled": true }

11. Advanced Features

2025 Platform Updates:

Azure OpenAI (Jul 2025) - Custom LLM with Azure-hosted models (requires API version field)
Genesys Output Variables (Jul 2025) - Enhanced call analytics
LLMReasoningEffort "none" (Oct 2025) - Control model reasoning behavior
Streaming Voice Previews (Jul 2025) - Real-time voice generation
pcm_48000 audio format (Apr 2025) - New output format support

Events: audio, transcript, agent_response, tool_call, agent_chat_response_part (streaming, Oct 2025), agent_tool_request (Oct 2025), conversation_state

Custom Models: Bring your own LLM (OpenAI-compatible endpoints). { "llm_config": { "custom": { "endpoint": "...", "api_key": "{{secret__key}}" } } }

Post-Call Webhooks: HMAC verification required. Return 200 or auto-disable after 10 failures. Payload includes conversation_id, transcript, analysis.

Chat Mode: Text-only (no ASR/TTS). { "chat_mode": true }. Saves ~200ms + costs.

Telephony: SIP (sip-static.rtc.elevenlabs.io), Twilio native, Vonage, RingCentral. 2025: Twilio keypad fix (Jul), SIP TLS remote_domains validation (Oct)

12. CLI & DevOps ("Agents as Code")

Installation & Auth:

npm install -g @elevenlabs/agents-cli@0.6.1
elevenlabs auth login
elevenlabs auth residency eu-residency  # 'in-residency' | 'global'
export ELEVENLABS_API_KEY=your-api-key  # For CI/CD

Project Structure: agents.json, tools.json, tests.json + agent_configs/, tool_configs/, test_configs/

Key Commands:

elevenlabs agents init
elevenlabs agents add "Bot" --template customer-service
elevenlabs agents push --env prod --dry-run  # Preview
elevenlabs agents push --env prod            # Deploy
elevenlabs agents pull                       # Import existing
elevenlabs agents test "Bot"                 # 2025: Enhanced testing

elevenlabs tools add-webhook "Weather" --config-path tool_configs/weather.json
elevenlabs tools push

elevenlabs tests add "Test" --template basic-llm
elevenlabs tests push

Multi-Environment: Create agent.dev.json, agent.staging.json, agent.prod.json for overrides

CI/CD: GitHub Actions with --dry-run validation before deploy

.gitignore: .env, .elevenlabs/, *.secret.json

13. Common Errors & Solutions (17 Documented)

Error 1: Missing Required Dynamic Variables

Cause: Variables referenced in prompts not provided at conversation start Solution: Provide all variables in dynamic_variables: { user_name: "John", ... }

Error 2: Case-Sensitive Tool Names

Cause: Tool name mismatch (case-sensitive) Solution: Ensure tool_ids: ["orderLookup"] matches name: "orderLookup" exactly

Error 3: Webhook Authentication Failures

Cause: Incorrect HMAC signature, not returning 200, or 10+ failures Solution: Verify hmac = crypto.createHmac('sha256', SECRET).update(payload).digest('hex') and return 200

Error 4: Voice Consistency Issues

Cause: Background noise, inconsistent mic distance, extreme volumes in training Solution: Use clean audio, consistent distance, avoid extremes

Error 5: Wrong Language Voice

Cause: English-trained voice for non-English language Solution: Use language-matched voices: { "language": "es", "voice_id": "spanish_voice" }

Error 6: Restricted API Keys Not Supported (CLI)

Cause: CLI doesn't support restricted API keys Solution: Use unrestricted API key for CLI

Error 7: Agent Configuration Push Conflicts

Cause: Hash-based change detection missed modification Solution: elevenlabs agents init --override + elevenlabs agents pull + push

Error 8: Tool Parameter Schema Mismatch

Cause: Schema doesn't match usage Solution: Add clear descriptions: "description": "Order ID (format: ORD-12345)"

Error 9: RAG Index Not Ready

Cause: Index still computing (takes minutes) Solution: Check index.status === 'ready' before using

Error 10: WebSocket Protocol Error (1002)

Cause: Network instability or incompatible browser Solution: Use WebRTC instead, implement reconnection logic

Error 11: 401 Unauthorized in Production

Cause: Agent visibility or API key config Solution: Check visibility (public/private), verify API key in prod, check allowlist

Error 12: Allowlist Connection Errors

Cause: Allowlist enabled but using shared link Solution: Configure allowlist domains or disable for testing

Error 13: Workflow Infinite Loops

Cause: Edge conditions creating loops Solution: Add max iteration limits, test all paths, explicit exit conditions

Error 14: Burst Pricing Not Enabled

Cause: Burst not enabled in settings Solution: { "call_limits": { "burst_pricing_enabled": true } }

Error 15: MCP Server Timeout

Cause: MCP server slow/unreachable Solution: Check URL accessible, verify transport (SSE/HTTP), check auth, monitor logs

Error 16: First Message Cutoff on Android

Cause: Android needs time to switch audio mode Solution: connectionDelay: { android: 3_000, ios: 0 } (3s for audio routing)

Error 17: CSP (Content Security Policy) Violations

Cause: Strict CSP blocks blob: URLs. SDK uses Audio Worklets loaded as blobs Solution: Self-host worklets:

cp node_modules/@elevenlabs/client/dist/worklets/*.js public/elevenlabs/
Configure: workletPaths: { 'rawAudioProcessor': '/elevenlabs/rawAudioProcessor.worklet.js', 'audioConcatProcessor': '/elevenlabs/audioConcatProcessor.worklet.js' }
Update CSP: script-src 'self' https://elevenlabs.io; worker-src 'self'; Gotcha: Update worklets when upgrading @elevenlabs/client

Integration with Existing Skills

This skill composes well with:

cloudflare-worker-base → Deploy agents on Cloudflare Workers edge network
cloudflare-workers-ai → Use Cloudflare LLMs as custom models in agents
cloudflare-durable-objects → Persistent conversation state and session management
cloudflare-kv → Cache agent configurations and user preferences
nextjs → React SDK integration in Next.js applications
ai-sdk-core → Vercel AI SDK provider for unified AI interface
clerk-auth → Authenticated voice sessions with user identity
hono-routing → API routes for webhooks and server tools

Additional Resources

Official Documentation:

Platform Overview: https://elevenlabs.io/docs/agents-platform/overview
API Reference: https://elevenlabs.io/docs/api-reference
CLI GitHub: https://github.com/elevenlabs/cli

Examples:

Official Examples: https://github.com/elevenlabs/elevenlabs-examples
MCP Server: https://github.com/elevenlabs/elevenlabs-mcp

Community:

Discord: https://discord.com/invite/elevenlabs
Twitter: @elevenlabsio

Production Tested: WordPress Auditor, Customer Support Agents Last Updated: 2025-11-25 Package Versions: elevenlabs@1.59.0, @elevenlabs/elevenlabs-js@2.25.0, @elevenlabs/agents-cli@0.6.1, @elevenlabs/react@0.11.3, @elevenlabs/client@0.11.3, @elevenlabs/react-native@0.5.4

24 KiB Raw Permalink Blame History