Files
gh-k-dense-ai-claude-scient…/skills/biomni/references/llm_providers.md
2025-11-30 08:30:10 +08:00

11 KiB

LLM Provider Configuration

Comprehensive guide for configuring different LLM providers with biomni.

Overview

Biomni supports multiple LLM providers for flexible deployment across different infrastructure and cost requirements. The framework abstracts provider differences through a unified interface.

Supported Providers

  1. Anthropic Claude (Recommended)
  2. OpenAI
  3. Azure OpenAI
  4. Google Gemini
  5. Groq
  6. AWS Bedrock
  7. Custom Endpoints

Anthropic Claude

Recommended for: Best balance of reasoning quality, speed, and biomedical knowledge.

Setup

# Set API key
export ANTHROPIC_API_KEY="sk-ant-..."

# Or in .env file
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env

Available Models

from biomni.agent import A1

# Sonnet 4 - Balanced performance (recommended)
agent = A1(path='./data', llm='claude-sonnet-4-20250514')

# Opus 4 - Maximum capability
agent = A1(path='./data', llm='claude-opus-4-20250514')

# Haiku 4 - Fast and economical
agent = A1(path='./data', llm='claude-haiku-4-20250514')

Configuration Options

from biomni.config import default_config

default_config.llm = "claude-sonnet-4-20250514"
default_config.llm_temperature = 0.7
default_config.max_tokens = 4096
default_config.anthropic_api_key = "sk-ant-..."  # Or use env var

Model Characteristics:

Model Best For Speed Cost Reasoning Quality
Opus 4 Complex multi-step analyses Slower High Highest
Sonnet 4 General biomedical tasks Fast Medium High
Haiku 4 Simple queries, bulk processing Fastest Low Good

OpenAI

Recommended for: Established infrastructure, GPT-4 optimization.

Setup

export OPENAI_API_KEY="sk-..."

Available Models

# GPT-4 Turbo
agent = A1(path='./data', llm='gpt-4-turbo')

# GPT-4
agent = A1(path='./data', llm='gpt-4')

# GPT-4o
agent = A1(path='./data', llm='gpt-4o')

Configuration

from biomni.config import default_config

default_config.llm = "gpt-4-turbo"
default_config.openai_api_key = "sk-..."
default_config.openai_organization = "org-..."  # Optional
default_config.llm_temperature = 0.7

Considerations:

  • GPT-4 Turbo recommended for cost-effectiveness
  • May require additional biomedical context for specialized tasks
  • Rate limits vary by account tier

Azure OpenAI

Recommended for: Enterprise deployments, data residency requirements.

Setup

export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4"
export AZURE_OPENAI_API_VERSION="2024-02-01"

Configuration

from biomni.config import default_config

default_config.llm = "azure-gpt-4"
default_config.azure_openai_api_key = "..."
default_config.azure_openai_endpoint = "https://your-resource.openai.azure.com/"
default_config.azure_openai_deployment_name = "gpt-4"
default_config.azure_openai_api_version = "2024-02-01"

Usage

agent = A1(path='./data', llm='azure-gpt-4')

Deployment Notes:

  • Requires Azure OpenAI Service provisioning
  • Deployment names set during Azure resource creation
  • API versions periodically updated by Microsoft

Google Gemini

Recommended for: Google Cloud integration, multimodal tasks.

Setup

export GOOGLE_API_KEY="..."

Available Models

# Gemini 2.0 Flash (recommended)
agent = A1(path='./data', llm='gemini-2.0-flash-exp')

# Gemini Pro
agent = A1(path='./data', llm='gemini-pro')

Configuration

from biomni.config import default_config

default_config.llm = "gemini-2.0-flash-exp"
default_config.google_api_key = "..."
default_config.llm_temperature = 0.7

Features:

  • Native multimodal support (text, images, code)
  • Fast inference
  • Competitive pricing

Groq

Recommended for: Ultra-fast inference, cost-sensitive applications.

Setup

export GROQ_API_KEY="gsk_..."

Available Models

# Llama 3.3 70B
agent = A1(path='./data', llm='llama-3.3-70b-versatile')

# Mixtral 8x7B
agent = A1(path='./data', llm='mixtral-8x7b-32768')

Configuration

from biomni.config import default_config

default_config.llm = "llama-3.3-70b-versatile"
default_config.groq_api_key = "gsk_..."

Characteristics:

  • Extremely fast inference via custom hardware
  • Open-source model options
  • Limited context windows for some models

AWS Bedrock

Recommended for: AWS infrastructure, compliance requirements.

Setup

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"

Available Models

# Claude via Bedrock
agent = A1(path='./data', llm='bedrock-claude-sonnet-4')

# Llama via Bedrock
agent = A1(path='./data', llm='bedrock-llama-3-70b')

Configuration

from biomni.config import default_config

default_config.llm = "bedrock-claude-sonnet-4"
default_config.aws_access_key_id = "..."
default_config.aws_secret_access_key = "..."
default_config.aws_region = "us-east-1"

Requirements:

  • AWS account with Bedrock access enabled
  • Model access requested through AWS console
  • IAM permissions configured for Bedrock APIs

Custom Endpoints

Recommended for: Self-hosted models, custom infrastructure.

Configuration

from biomni.config import default_config

default_config.llm = "custom"
default_config.custom_llm_endpoint = "http://localhost:8000/v1/chat/completions"
default_config.custom_llm_api_key = "..."  # If required
default_config.custom_llm_model_name = "llama-3-70b"

Usage

agent = A1(path='./data', llm='custom')

Endpoint Requirements:

  • Must implement OpenAI-compatible chat completions API
  • Support for function/tool calling recommended
  • JSON response format

Example with vLLM:

# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3-70b-chat \
    --port 8000

# Configure biomni
export CUSTOM_LLM_ENDPOINT="http://localhost:8000/v1/chat/completions"

Model Selection Guidelines

By Task Complexity

Simple queries (gene lookup, basic calculations):

  • Claude Haiku 4
  • Gemini 2.0 Flash
  • Groq Llama 3.3 70B

Moderate tasks (data analysis, literature search):

  • Claude Sonnet 4 (recommended)
  • GPT-4 Turbo
  • Gemini 2.0 Flash

Complex analyses (multi-step reasoning, novel insights):

  • Claude Opus 4 (recommended)
  • GPT-4
  • Claude Sonnet 4

By Cost Sensitivity

Budget-conscious:

  1. Groq (fastest, cheapest)
  2. Claude Haiku 4
  3. Gemini 2.0 Flash

Balanced:

  1. Claude Sonnet 4 (recommended)
  2. GPT-4 Turbo
  3. Gemini Pro

Quality-first:

  1. Claude Opus 4
  2. GPT-4
  3. Claude Sonnet 4

By Infrastructure

Cloud-agnostic:

  • Anthropic Claude (direct API)
  • OpenAI (direct API)

AWS ecosystem:

  • AWS Bedrock (Claude, Llama)

Azure ecosystem:

  • Azure OpenAI Service

Google Cloud:

  • Google Gemini

On-premises:

  • Custom endpoints with self-hosted models

Performance Comparison

Based on Biomni-Eval1 benchmark:

Provider Model Avg Score Avg Time (s) Cost/1K tasks
Anthropic Opus 4 0.89 45 $120
Anthropic Sonnet 4 0.85 28 $45
OpenAI GPT-4 Turbo 0.82 35 $55
Google Gemini 2.0 Flash 0.78 22 $25
Groq Llama 3.3 70B 0.73 12 $8
Anthropic Haiku 4 0.75 15 $15

Note: Costs are approximate and vary by usage patterns.

Troubleshooting

API Key Issues

# Verify key is set
import os
print(os.getenv('ANTHROPIC_API_KEY'))

# Or check in Python
from biomni.config import default_config
print(default_config.anthropic_api_key)

Rate Limiting

from biomni.config import default_config

# Add retry logic
default_config.max_retries = 5
default_config.retry_delay = 10  # seconds

# Reduce concurrency
default_config.max_concurrent_requests = 1

Timeout Errors

# Increase timeout for slow providers
default_config.llm_timeout = 120  # seconds

# Or switch to faster model
default_config.llm = "claude-sonnet-4-20250514"  # Fast and capable

Model Not Available

# For Bedrock: Enable model access in AWS console
aws bedrock list-foundation-models --region us-east-1

# For Azure: Check deployment name
az cognitiveservices account deployment list \
    --name your-resource-name \
    --resource-group your-rg

Best Practices

Cost Optimization

  1. Use appropriate models - Don't use Opus 4 for simple queries
  2. Enable caching - Reuse data lake access across tasks
  3. Batch processing - Group similar tasks together
  4. Monitor usage - Track API costs per task type
from biomni.config import default_config

# Enable response caching
default_config.enable_caching = True
default_config.cache_ttl = 3600  # 1 hour

Multi-Provider Strategy

def get_agent_for_task(task_complexity):
    """Select provider based on task requirements"""
    if task_complexity == 'simple':
        return A1(path='./data', llm='claude-haiku-4-20250514')
    elif task_complexity == 'moderate':
        return A1(path='./data', llm='claude-sonnet-4-20250514')
    else:
        return A1(path='./data', llm='claude-opus-4-20250514')

# Use appropriate model
agent = get_agent_for_task('moderate')
result = agent.go(task_query)

Fallback Configuration

from biomni.exceptions import LLMError

def execute_with_fallback(task_query):
    """Try multiple providers if primary fails"""
    providers = [
        'claude-sonnet-4-20250514',
        'gpt-4-turbo',
        'gemini-2.0-flash-exp'
    ]

    for llm in providers:
        try:
            agent = A1(path='./data', llm=llm)
            return agent.go(task_query)
        except LLMError as e:
            print(f"{llm} failed: {e}")
            continue

    raise Exception("All providers failed")

Provider-Specific Tips

Anthropic Claude

  • Best for complex biomedical reasoning
  • Use Sonnet 4 for most tasks
  • Reserve Opus 4 for novel research questions

OpenAI

  • Add system prompts with biomedical context for better results
  • Use JSON mode for structured outputs
  • Monitor token usage - context window limits

Azure OpenAI

  • Provision deployments in regions close to data
  • Use managed identity for secure authentication
  • Monitor quota consumption in Azure portal

Google Gemini

  • Leverage multimodal capabilities for image-based tasks
  • Use streaming for long-running analyses
  • Consider Gemini Pro for production workloads

Groq

  • Ideal for high-throughput screening tasks
  • Limited reasoning depth vs. Claude/GPT-4
  • Best for well-defined, structured problems

AWS Bedrock

  • Use IAM roles instead of access keys when possible
  • Enable CloudWatch logging for debugging
  • Monitor cross-region latency