zhongwei/gh-jezweb-claude-skills-skills-elevenlabs-agents

Fork 0

Files

Zhongwei Li 49178918d7 Initial commit

2025-11-30 08:24:46 +08:00

4.0 KiB

Raw Permalink Blame History

Testing Guide

1. Scenario Testing (LLM-Based)

Create Test

elevenlabs tests add "Refund Request" --template basic-llm

Test Configuration

{
  "name": "Refund Request Test",
  "scenario": "Customer requests refund for defective product",
  "user_input": "I want a refund for order #12345. The product arrived broken.",
  "success_criteria": [
    "Agent acknowledges the issue empathetically",
    "Agent asks for order number or uses provided number",
    "Agent verifies order details",
    "Agent provides clear next steps or refund timeline"
  ],
  "evaluation_type": "llm"
}

Run Test

elevenlabs agents test "Support Agent"

2. Tool Call Testing

Test Configuration

{
  "name": "Order Lookup Test",
  "scenario": "Customer asks about order status",
  "user_input": "What's the status of order ORD-12345?",
  "expected_tool_call": {
    "tool_name": "lookup_order",
    "parameters": {
      "order_id": "ORD-12345"
    }
  }
}

3. Load Testing

Basic Load Test

# 100 concurrent users, spawn 10/second, run for 5 minutes
elevenlabs test load \
  --users 100 \
  --spawn-rate 10 \
  --duration 300

With Burst Pricing

{
  "call_limits": {
    "burst_pricing_enabled": true
  }
}

4. Simulation API

Programmatic Testing

const simulation = await client.agents.simulate({
  agent_id: 'agent_123',
  scenario: 'Customer requests refund',
  user_messages: [
    "I want a refund for order #12345",
    "It arrived broken",
    "Yes, process the refund"
  ],
  success_criteria: [
    "Agent shows empathy",
    "Agent verifies order",
    "Agent provides timeline"
  ]
});

console.log('Passed:', simulation.passed);
console.log('Criteria met:', simulation.evaluation.criteria_met, '/', simulation.evaluation.criteria_total);

5. Convert Real Conversations to Tests

From Dashboard

Navigate to Conversations
Select conversation
Click "Convert to Test"
Add success criteria
Save

From API

const test = await client.tests.createFromConversation({
  conversation_id: 'conv_123',
  success_criteria: [
    "Issue was resolved",
    "Customer satisfaction >= 4/5"
  ]
});

6. CI/CD Integration

GitHub Actions

name: Test Agent
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install CLI
        run: npm install -g @elevenlabs/cli

      - name: Push Tests
        run: elevenlabs tests push
        env:
          ELEVENLABS_API_KEY: ${{ secrets.ELEVENLABS_API_KEY }}

      - name: Run Tests
        run: elevenlabs agents test "Support Agent"
        env:
          ELEVENLABS_API_KEY: ${{ secrets.ELEVENLABS_API_KEY }}

7. Test Organization

Directory Structure

test_configs/
├── refund-tests/
│   ├── basic-refund.json
│   ├── duplicate-refund.json
│   └── expired-refund.json
├── order-lookup-tests/
│   ├── valid-order.json
│   └── invalid-order.json
└── escalation-tests/
    ├── angry-customer.json
    └── complex-issue.json

8. Best Practices

Do's:

✅ Test all conversation paths ✅ Include edge cases ✅ Test tool calls thoroughly ✅ Run tests before deployment ✅ Convert failed conversations to tests ✅ Monitor test trends over time

Don'ts:

❌ Only test happy paths ❌ Ignore failing tests ❌ Skip load testing ❌ Test only in production ❌ Write vague success criteria

9. Metrics to Track

Pass Rate: % of tests passing
Tool Accuracy: % of correct tool calls
Response Time: Average time to resolution
Load Capacity: Max concurrent users before degradation
Error Rate: % of conversations with errors

10. Debugging Failed Tests

Review conversation transcript
Check tool calls and parameters
Verify dynamic variables provided
Test prompt clarity
Check knowledge base content
Review guardrails and constraints
Iterate and retest

4.0 KiB Raw Permalink Blame History

Testing Guide

1. Scenario Testing (LLM-Based)

Create Test

Test Configuration

Run Test

2. Tool Call Testing

Test Configuration

3. Load Testing

Basic Load Test

With Burst Pricing

4. Simulation API

Programmatic Testing

5. Convert Real Conversations to Tests

From Dashboard

From API

6. CI/CD Integration

GitHub Actions

7. Test Organization

Directory Structure

8. Best Practices

Do's:

Don'ts:

9. Metrics to Track

10. Debugging Failed Tests

4.0 KiB

Raw Permalink Blame History