11 KiB
LLM Provider Configuration
Comprehensive guide for configuring different LLM providers with biomni.
Overview
Biomni supports multiple LLM providers for flexible deployment across different infrastructure and cost requirements. The framework abstracts provider differences through a unified interface.
Supported Providers
- Anthropic Claude (Recommended)
- OpenAI
- Azure OpenAI
- Google Gemini
- Groq
- AWS Bedrock
- Custom Endpoints
Anthropic Claude
Recommended for: Best balance of reasoning quality, speed, and biomedical knowledge.
Setup
# Set API key
export ANTHROPIC_API_KEY="sk-ant-..."
# Or in .env file
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env
Available Models
from biomni.agent import A1
# Sonnet 4 - Balanced performance (recommended)
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
# Opus 4 - Maximum capability
agent = A1(path='./data', llm='claude-opus-4-20250514')
# Haiku 4 - Fast and economical
agent = A1(path='./data', llm='claude-haiku-4-20250514')
Configuration Options
from biomni.config import default_config
default_config.llm = "claude-sonnet-4-20250514"
default_config.llm_temperature = 0.7
default_config.max_tokens = 4096
default_config.anthropic_api_key = "sk-ant-..." # Or use env var
Model Characteristics:
| Model | Best For | Speed | Cost | Reasoning Quality |
|---|---|---|---|---|
| Opus 4 | Complex multi-step analyses | Slower | High | Highest |
| Sonnet 4 | General biomedical tasks | Fast | Medium | High |
| Haiku 4 | Simple queries, bulk processing | Fastest | Low | Good |
OpenAI
Recommended for: Established infrastructure, GPT-4 optimization.
Setup
export OPENAI_API_KEY="sk-..."
Available Models
# GPT-4 Turbo
agent = A1(path='./data', llm='gpt-4-turbo')
# GPT-4
agent = A1(path='./data', llm='gpt-4')
# GPT-4o
agent = A1(path='./data', llm='gpt-4o')
Configuration
from biomni.config import default_config
default_config.llm = "gpt-4-turbo"
default_config.openai_api_key = "sk-..."
default_config.openai_organization = "org-..." # Optional
default_config.llm_temperature = 0.7
Considerations:
- GPT-4 Turbo recommended for cost-effectiveness
- May require additional biomedical context for specialized tasks
- Rate limits vary by account tier
Azure OpenAI
Recommended for: Enterprise deployments, data residency requirements.
Setup
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4"
export AZURE_OPENAI_API_VERSION="2024-02-01"
Configuration
from biomni.config import default_config
default_config.llm = "azure-gpt-4"
default_config.azure_openai_api_key = "..."
default_config.azure_openai_endpoint = "https://your-resource.openai.azure.com/"
default_config.azure_openai_deployment_name = "gpt-4"
default_config.azure_openai_api_version = "2024-02-01"
Usage
agent = A1(path='./data', llm='azure-gpt-4')
Deployment Notes:
- Requires Azure OpenAI Service provisioning
- Deployment names set during Azure resource creation
- API versions periodically updated by Microsoft
Google Gemini
Recommended for: Google Cloud integration, multimodal tasks.
Setup
export GOOGLE_API_KEY="..."
Available Models
# Gemini 2.0 Flash (recommended)
agent = A1(path='./data', llm='gemini-2.0-flash-exp')
# Gemini Pro
agent = A1(path='./data', llm='gemini-pro')
Configuration
from biomni.config import default_config
default_config.llm = "gemini-2.0-flash-exp"
default_config.google_api_key = "..."
default_config.llm_temperature = 0.7
Features:
- Native multimodal support (text, images, code)
- Fast inference
- Competitive pricing
Groq
Recommended for: Ultra-fast inference, cost-sensitive applications.
Setup
export GROQ_API_KEY="gsk_..."
Available Models
# Llama 3.3 70B
agent = A1(path='./data', llm='llama-3.3-70b-versatile')
# Mixtral 8x7B
agent = A1(path='./data', llm='mixtral-8x7b-32768')
Configuration
from biomni.config import default_config
default_config.llm = "llama-3.3-70b-versatile"
default_config.groq_api_key = "gsk_..."
Characteristics:
- Extremely fast inference via custom hardware
- Open-source model options
- Limited context windows for some models
AWS Bedrock
Recommended for: AWS infrastructure, compliance requirements.
Setup
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"
Available Models
# Claude via Bedrock
agent = A1(path='./data', llm='bedrock-claude-sonnet-4')
# Llama via Bedrock
agent = A1(path='./data', llm='bedrock-llama-3-70b')
Configuration
from biomni.config import default_config
default_config.llm = "bedrock-claude-sonnet-4"
default_config.aws_access_key_id = "..."
default_config.aws_secret_access_key = "..."
default_config.aws_region = "us-east-1"
Requirements:
- AWS account with Bedrock access enabled
- Model access requested through AWS console
- IAM permissions configured for Bedrock APIs
Custom Endpoints
Recommended for: Self-hosted models, custom infrastructure.
Configuration
from biomni.config import default_config
default_config.llm = "custom"
default_config.custom_llm_endpoint = "http://localhost:8000/v1/chat/completions"
default_config.custom_llm_api_key = "..." # If required
default_config.custom_llm_model_name = "llama-3-70b"
Usage
agent = A1(path='./data', llm='custom')
Endpoint Requirements:
- Must implement OpenAI-compatible chat completions API
- Support for function/tool calling recommended
- JSON response format
Example with vLLM:
# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3-70b-chat \
--port 8000
# Configure biomni
export CUSTOM_LLM_ENDPOINT="http://localhost:8000/v1/chat/completions"
Model Selection Guidelines
By Task Complexity
Simple queries (gene lookup, basic calculations):
- Claude Haiku 4
- Gemini 2.0 Flash
- Groq Llama 3.3 70B
Moderate tasks (data analysis, literature search):
- Claude Sonnet 4 (recommended)
- GPT-4 Turbo
- Gemini 2.0 Flash
Complex analyses (multi-step reasoning, novel insights):
- Claude Opus 4 (recommended)
- GPT-4
- Claude Sonnet 4
By Cost Sensitivity
Budget-conscious:
- Groq (fastest, cheapest)
- Claude Haiku 4
- Gemini 2.0 Flash
Balanced:
- Claude Sonnet 4 (recommended)
- GPT-4 Turbo
- Gemini Pro
Quality-first:
- Claude Opus 4
- GPT-4
- Claude Sonnet 4
By Infrastructure
Cloud-agnostic:
- Anthropic Claude (direct API)
- OpenAI (direct API)
AWS ecosystem:
- AWS Bedrock (Claude, Llama)
Azure ecosystem:
- Azure OpenAI Service
Google Cloud:
- Google Gemini
On-premises:
- Custom endpoints with self-hosted models
Performance Comparison
Based on Biomni-Eval1 benchmark:
| Provider | Model | Avg Score | Avg Time (s) | Cost/1K tasks |
|---|---|---|---|---|
| Anthropic | Opus 4 | 0.89 | 45 | $120 |
| Anthropic | Sonnet 4 | 0.85 | 28 | $45 |
| OpenAI | GPT-4 Turbo | 0.82 | 35 | $55 |
| Gemini 2.0 Flash | 0.78 | 22 | $25 | |
| Groq | Llama 3.3 70B | 0.73 | 12 | $8 |
| Anthropic | Haiku 4 | 0.75 | 15 | $15 |
Note: Costs are approximate and vary by usage patterns.
Troubleshooting
API Key Issues
# Verify key is set
import os
print(os.getenv('ANTHROPIC_API_KEY'))
# Or check in Python
from biomni.config import default_config
print(default_config.anthropic_api_key)
Rate Limiting
from biomni.config import default_config
# Add retry logic
default_config.max_retries = 5
default_config.retry_delay = 10 # seconds
# Reduce concurrency
default_config.max_concurrent_requests = 1
Timeout Errors
# Increase timeout for slow providers
default_config.llm_timeout = 120 # seconds
# Or switch to faster model
default_config.llm = "claude-sonnet-4-20250514" # Fast and capable
Model Not Available
# For Bedrock: Enable model access in AWS console
aws bedrock list-foundation-models --region us-east-1
# For Azure: Check deployment name
az cognitiveservices account deployment list \
--name your-resource-name \
--resource-group your-rg
Best Practices
Cost Optimization
- Use appropriate models - Don't use Opus 4 for simple queries
- Enable caching - Reuse data lake access across tasks
- Batch processing - Group similar tasks together
- Monitor usage - Track API costs per task type
from biomni.config import default_config
# Enable response caching
default_config.enable_caching = True
default_config.cache_ttl = 3600 # 1 hour
Multi-Provider Strategy
def get_agent_for_task(task_complexity):
"""Select provider based on task requirements"""
if task_complexity == 'simple':
return A1(path='./data', llm='claude-haiku-4-20250514')
elif task_complexity == 'moderate':
return A1(path='./data', llm='claude-sonnet-4-20250514')
else:
return A1(path='./data', llm='claude-opus-4-20250514')
# Use appropriate model
agent = get_agent_for_task('moderate')
result = agent.go(task_query)
Fallback Configuration
from biomni.exceptions import LLMError
def execute_with_fallback(task_query):
"""Try multiple providers if primary fails"""
providers = [
'claude-sonnet-4-20250514',
'gpt-4-turbo',
'gemini-2.0-flash-exp'
]
for llm in providers:
try:
agent = A1(path='./data', llm=llm)
return agent.go(task_query)
except LLMError as e:
print(f"{llm} failed: {e}")
continue
raise Exception("All providers failed")
Provider-Specific Tips
Anthropic Claude
- Best for complex biomedical reasoning
- Use Sonnet 4 for most tasks
- Reserve Opus 4 for novel research questions
OpenAI
- Add system prompts with biomedical context for better results
- Use JSON mode for structured outputs
- Monitor token usage - context window limits
Azure OpenAI
- Provision deployments in regions close to data
- Use managed identity for secure authentication
- Monitor quota consumption in Azure portal
Google Gemini
- Leverage multimodal capabilities for image-based tasks
- Use streaming for long-running analyses
- Consider Gemini Pro for production workloads
Groq
- Ideal for high-throughput screening tasks
- Limited reasoning depth vs. Claude/GPT-4
- Best for well-defined, structured problems
AWS Bedrock
- Use IAM roles instead of access keys when possible
- Enable CloudWatch logging for debugging
- Monitor cross-region latency