# LLM Provider Configuration Comprehensive guide for configuring different LLM providers with biomni. ## Overview Biomni supports multiple LLM providers for flexible deployment across different infrastructure and cost requirements. The framework abstracts provider differences through a unified interface. ## Supported Providers 1. **Anthropic Claude** (Recommended) 2. **OpenAI** 3. **Azure OpenAI** 4. **Google Gemini** 5. **Groq** 6. **AWS Bedrock** 7. **Custom Endpoints** ## Anthropic Claude **Recommended for:** Best balance of reasoning quality, speed, and biomedical knowledge. ### Setup ```bash # Set API key export ANTHROPIC_API_KEY="sk-ant-..." # Or in .env file echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env ``` ### Available Models ```python from biomni.agent import A1 # Sonnet 4 - Balanced performance (recommended) agent = A1(path='./data', llm='claude-sonnet-4-20250514') # Opus 4 - Maximum capability agent = A1(path='./data', llm='claude-opus-4-20250514') # Haiku 4 - Fast and economical agent = A1(path='./data', llm='claude-haiku-4-20250514') ``` ### Configuration Options ```python from biomni.config import default_config default_config.llm = "claude-sonnet-4-20250514" default_config.llm_temperature = 0.7 default_config.max_tokens = 4096 default_config.anthropic_api_key = "sk-ant-..." # Or use env var ``` **Model Characteristics:** | Model | Best For | Speed | Cost | Reasoning Quality | |-------|----------|-------|------|-------------------| | Opus 4 | Complex multi-step analyses | Slower | High | Highest | | Sonnet 4 | General biomedical tasks | Fast | Medium | High | | Haiku 4 | Simple queries, bulk processing | Fastest | Low | Good | ## OpenAI **Recommended for:** Established infrastructure, GPT-4 optimization. ### Setup ```bash export OPENAI_API_KEY="sk-..." ``` ### Available Models ```python # GPT-4 Turbo agent = A1(path='./data', llm='gpt-4-turbo') # GPT-4 agent = A1(path='./data', llm='gpt-4') # GPT-4o agent = A1(path='./data', llm='gpt-4o') ``` ### Configuration ```python from biomni.config import default_config default_config.llm = "gpt-4-turbo" default_config.openai_api_key = "sk-..." default_config.openai_organization = "org-..." # Optional default_config.llm_temperature = 0.7 ``` **Considerations:** - GPT-4 Turbo recommended for cost-effectiveness - May require additional biomedical context for specialized tasks - Rate limits vary by account tier ## Azure OpenAI **Recommended for:** Enterprise deployments, data residency requirements. ### Setup ```bash export AZURE_OPENAI_API_KEY="..." export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4" export AZURE_OPENAI_API_VERSION="2024-02-01" ``` ### Configuration ```python from biomni.config import default_config default_config.llm = "azure-gpt-4" default_config.azure_openai_api_key = "..." default_config.azure_openai_endpoint = "https://your-resource.openai.azure.com/" default_config.azure_openai_deployment_name = "gpt-4" default_config.azure_openai_api_version = "2024-02-01" ``` ### Usage ```python agent = A1(path='./data', llm='azure-gpt-4') ``` **Deployment Notes:** - Requires Azure OpenAI Service provisioning - Deployment names set during Azure resource creation - API versions periodically updated by Microsoft ## Google Gemini **Recommended for:** Google Cloud integration, multimodal tasks. ### Setup ```bash export GOOGLE_API_KEY="..." ``` ### Available Models ```python # Gemini 2.0 Flash (recommended) agent = A1(path='./data', llm='gemini-2.0-flash-exp') # Gemini Pro agent = A1(path='./data', llm='gemini-pro') ``` ### Configuration ```python from biomni.config import default_config default_config.llm = "gemini-2.0-flash-exp" default_config.google_api_key = "..." default_config.llm_temperature = 0.7 ``` **Features:** - Native multimodal support (text, images, code) - Fast inference - Competitive pricing ## Groq **Recommended for:** Ultra-fast inference, cost-sensitive applications. ### Setup ```bash export GROQ_API_KEY="gsk_..." ``` ### Available Models ```python # Llama 3.3 70B agent = A1(path='./data', llm='llama-3.3-70b-versatile') # Mixtral 8x7B agent = A1(path='./data', llm='mixtral-8x7b-32768') ``` ### Configuration ```python from biomni.config import default_config default_config.llm = "llama-3.3-70b-versatile" default_config.groq_api_key = "gsk_..." ``` **Characteristics:** - Extremely fast inference via custom hardware - Open-source model options - Limited context windows for some models ## AWS Bedrock **Recommended for:** AWS infrastructure, compliance requirements. ### Setup ```bash export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." export AWS_DEFAULT_REGION="us-east-1" ``` ### Available Models ```python # Claude via Bedrock agent = A1(path='./data', llm='bedrock-claude-sonnet-4') # Llama via Bedrock agent = A1(path='./data', llm='bedrock-llama-3-70b') ``` ### Configuration ```python from biomni.config import default_config default_config.llm = "bedrock-claude-sonnet-4" default_config.aws_access_key_id = "..." default_config.aws_secret_access_key = "..." default_config.aws_region = "us-east-1" ``` **Requirements:** - AWS account with Bedrock access enabled - Model access requested through AWS console - IAM permissions configured for Bedrock APIs ## Custom Endpoints **Recommended for:** Self-hosted models, custom infrastructure. ### Configuration ```python from biomni.config import default_config default_config.llm = "custom" default_config.custom_llm_endpoint = "http://localhost:8000/v1/chat/completions" default_config.custom_llm_api_key = "..." # If required default_config.custom_llm_model_name = "llama-3-70b" ``` ### Usage ```python agent = A1(path='./data', llm='custom') ``` **Endpoint Requirements:** - Must implement OpenAI-compatible chat completions API - Support for function/tool calling recommended - JSON response format **Example with vLLM:** ```bash # Start vLLM server python -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-3-70b-chat \ --port 8000 # Configure biomni export CUSTOM_LLM_ENDPOINT="http://localhost:8000/v1/chat/completions" ``` ## Model Selection Guidelines ### By Task Complexity **Simple queries** (gene lookup, basic calculations): - Claude Haiku 4 - Gemini 2.0 Flash - Groq Llama 3.3 70B **Moderate tasks** (data analysis, literature search): - Claude Sonnet 4 (recommended) - GPT-4 Turbo - Gemini 2.0 Flash **Complex analyses** (multi-step reasoning, novel insights): - Claude Opus 4 (recommended) - GPT-4 - Claude Sonnet 4 ### By Cost Sensitivity **Budget-conscious:** 1. Groq (fastest, cheapest) 2. Claude Haiku 4 3. Gemini 2.0 Flash **Balanced:** 1. Claude Sonnet 4 (recommended) 2. GPT-4 Turbo 3. Gemini Pro **Quality-first:** 1. Claude Opus 4 2. GPT-4 3. Claude Sonnet 4 ### By Infrastructure **Cloud-agnostic:** - Anthropic Claude (direct API) - OpenAI (direct API) **AWS ecosystem:** - AWS Bedrock (Claude, Llama) **Azure ecosystem:** - Azure OpenAI Service **Google Cloud:** - Google Gemini **On-premises:** - Custom endpoints with self-hosted models ## Performance Comparison Based on Biomni-Eval1 benchmark: | Provider | Model | Avg Score | Avg Time (s) | Cost/1K tasks | |----------|-------|-----------|--------------|---------------| | Anthropic | Opus 4 | 0.89 | 45 | $120 | | Anthropic | Sonnet 4 | 0.85 | 28 | $45 | | OpenAI | GPT-4 Turbo | 0.82 | 35 | $55 | | Google | Gemini 2.0 Flash | 0.78 | 22 | $25 | | Groq | Llama 3.3 70B | 0.73 | 12 | $8 | | Anthropic | Haiku 4 | 0.75 | 15 | $15 | *Note: Costs are approximate and vary by usage patterns.* ## Troubleshooting ### API Key Issues ```python # Verify key is set import os print(os.getenv('ANTHROPIC_API_KEY')) # Or check in Python from biomni.config import default_config print(default_config.anthropic_api_key) ``` ### Rate Limiting ```python from biomni.config import default_config # Add retry logic default_config.max_retries = 5 default_config.retry_delay = 10 # seconds # Reduce concurrency default_config.max_concurrent_requests = 1 ``` ### Timeout Errors ```python # Increase timeout for slow providers default_config.llm_timeout = 120 # seconds # Or switch to faster model default_config.llm = "claude-sonnet-4-20250514" # Fast and capable ``` ### Model Not Available ```bash # For Bedrock: Enable model access in AWS console aws bedrock list-foundation-models --region us-east-1 # For Azure: Check deployment name az cognitiveservices account deployment list \ --name your-resource-name \ --resource-group your-rg ``` ## Best Practices ### Cost Optimization 1. **Use appropriate models** - Don't use Opus 4 for simple queries 2. **Enable caching** - Reuse data lake access across tasks 3. **Batch processing** - Group similar tasks together 4. **Monitor usage** - Track API costs per task type ```python from biomni.config import default_config # Enable response caching default_config.enable_caching = True default_config.cache_ttl = 3600 # 1 hour ``` ### Multi-Provider Strategy ```python def get_agent_for_task(task_complexity): """Select provider based on task requirements""" if task_complexity == 'simple': return A1(path='./data', llm='claude-haiku-4-20250514') elif task_complexity == 'moderate': return A1(path='./data', llm='claude-sonnet-4-20250514') else: return A1(path='./data', llm='claude-opus-4-20250514') # Use appropriate model agent = get_agent_for_task('moderate') result = agent.go(task_query) ``` ### Fallback Configuration ```python from biomni.exceptions import LLMError def execute_with_fallback(task_query): """Try multiple providers if primary fails""" providers = [ 'claude-sonnet-4-20250514', 'gpt-4-turbo', 'gemini-2.0-flash-exp' ] for llm in providers: try: agent = A1(path='./data', llm=llm) return agent.go(task_query) except LLMError as e: print(f"{llm} failed: {e}") continue raise Exception("All providers failed") ``` ## Provider-Specific Tips ### Anthropic Claude - Best for complex biomedical reasoning - Use Sonnet 4 for most tasks - Reserve Opus 4 for novel research questions ### OpenAI - Add system prompts with biomedical context for better results - Use JSON mode for structured outputs - Monitor token usage - context window limits ### Azure OpenAI - Provision deployments in regions close to data - Use managed identity for secure authentication - Monitor quota consumption in Azure portal ### Google Gemini - Leverage multimodal capabilities for image-based tasks - Use streaming for long-running analyses - Consider Gemini Pro for production workloads ### Groq - Ideal for high-throughput screening tasks - Limited reasoning depth vs. Claude/GPT-4 - Best for well-defined, structured problems ### AWS Bedrock - Use IAM roles instead of access keys when possible - Enable CloudWatch logging for debugging - Monitor cross-region latency