Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/adaptyv/reference/api_reference.md
+++ b/skills/adaptyv/reference/api_reference.md
@@ -0,0 +1,308 @@
+# Adaptyv API Reference
+
+## Base URL
+
+```
+https://kq5jp7qj7wdqklhsxmovkzn4l40obksv.lambda-url.eu-central-1.on.aws
+```
+
+## Authentication
+
+All API requests require bearer token authentication in the request header:
+
+```
+Authorization: Bearer YOUR_API_KEY
+```
+
+To obtain API access:
+1. Contact support@adaptyvbio.com
+2. Request API access during alpha/beta period
+3. Receive your personal access token
+
+Store your API key securely:
+- Use environment variables: `ADAPTYV_API_KEY`
+- Never commit API keys to version control
+- Use `.env` files with `.gitignore` for local development
+
+## Endpoints
+
+### Experiments
+
+#### Create Experiment
+
+Submit protein sequences for experimental testing.
+
+**Endpoint:** `POST /experiments`
+
+**Request Body:**
+```json
+{
+  "sequences": ">protein1\nMKVLWALLGLLGAA...\n>protein2\nMATGVLWALLG...",
+  "experiment_type": "binding|expression|thermostability|enzyme_activity",
+  "target_id": "optional_target_identifier",
+  "webhook_url": "https://your-webhook.com/callback",
+  "metadata": {
+    "project": "optional_project_name",
+    "notes": "optional_notes"
+  }
+}
+```
+
+**Sequence Format:**
+- FASTA format with headers
+- Multiple sequences supported
+- Standard amino acid codes
+
+**Response:**
+```json
+{
+  "experiment_id": "exp_abc123xyz",
+  "status": "submitted",
+  "created_at": "2025-11-24T10:00:00Z",
+  "estimated_completion": "2025-12-15T10:00:00Z"
+}
+```
+
+#### Get Experiment Status
+
+Check the current status of an experiment.
+
+**Endpoint:** `GET /experiments/{experiment_id}`
+
+**Response:**
+```json
+{
+  "experiment_id": "exp_abc123xyz",
+  "status": "submitted|processing|completed|failed",
+  "created_at": "2025-11-24T10:00:00Z",
+  "updated_at": "2025-11-25T14:30:00Z",
+  "progress": {
+    "stage": "sequencing|expression|assay|analysis",
+    "percentage": 45
+  }
+}
+```
+
+**Status Values:**
+- `submitted` - Experiment received and queued
+- `processing` - Active testing in progress
+- `completed` - Results available for download
+- `failed` - Experiment encountered an error
+
+#### List Experiments
+
+Retrieve all experiments for your organization.
+
+**Endpoint:** `GET /experiments`
+
+**Query Parameters:**
+- `status` - Filter by status (optional)
+- `limit` - Number of results per page (default: 50)
+- `offset` - Pagination offset (default: 0)
+
+**Response:**
+```json
+{
+  "experiments": [
+    {
+      "experiment_id": "exp_abc123xyz",
+      "status": "completed",
+      "experiment_type": "binding",
+      "created_at": "2025-11-24T10:00:00Z"
+    }
+  ],
+  "total": 150,
+  "limit": 50,
+  "offset": 0
+}
+```
+
+### Results
+
+#### Get Experiment Results
+
+Download results from a completed experiment.
+
+**Endpoint:** `GET /experiments/{experiment_id}/results`
+
+**Response:**
+```json
+{
+  "experiment_id": "exp_abc123xyz",
+  "results": [
+    {
+      "sequence_id": "protein1",
+      "measurements": {
+        "kd": 1.2e-9,
+        "kon": 1.5e5,
+        "koff": 1.8e-4
+      },
+      "quality_metrics": {
+        "confidence": "high",
+        "r_squared": 0.98
+      }
+    }
+  ],
+  "download_urls": {
+    "raw_data": "https://...",
+    "analysis_package": "https://...",
+    "report": "https://..."
+  }
+}
+```
+
+### Targets
+
+#### Search Target Catalog
+
+Search the ACROBiosystems antigen catalog.
+
+**Endpoint:** `GET /targets`
+
+**Query Parameters:**
+- `search` - Search term (protein name, UniProt ID, etc.)
+- `species` - Filter by species
+- `category` - Filter by category
+
+**Response:**
+```json
+{
+  "targets": [
+    {
+      "target_id": "tgt_12345",
+      "name": "Human PD-L1",
+      "species": "Homo sapiens",
+      "uniprot_id": "Q9NZQ7",
+      "availability": "in_stock|custom_order",
+      "price_usd": 450
+    }
+  ]
+}
+```
+
+#### Request Custom Target
+
+Request an antigen not in the standard catalog.
+
+**Endpoint:** `POST /targets/request`
+
+**Request Body:**
+```json
+{
+  "target_name": "Custom target name",
+  "uniprot_id": "optional_uniprot_id",
+  "species": "species_name",
+  "notes": "Additional requirements"
+}
+```
+
+### Organization
+
+#### Get Credits Balance
+
+Check your organization's credit balance and usage.
+
+**Endpoint:** `GET /organization/credits`
+
+**Response:**
+```json
+{
+  "balance": 10000,
+  "currency": "USD",
+  "usage_this_month": 2500,
+  "experiments_remaining": 22
+}
+```
+
+## Webhooks
+
+Configure webhook URLs to receive notifications when experiments complete.
+
+**Webhook Payload:**
+```json
+{
+  "event": "experiment.completed",
+  "experiment_id": "exp_abc123xyz",
+  "status": "completed",
+  "timestamp": "2025-12-15T10:00:00Z",
+  "results_url": "/experiments/exp_abc123xyz/results"
+}
+```
+
+**Webhook Events:**
+- `experiment.submitted` - Experiment received
+- `experiment.started` - Processing began
+- `experiment.completed` - Results available
+- `experiment.failed` - Error occurred
+
+**Security:**
+- Verify webhook signatures (details provided during onboarding)
+- Use HTTPS endpoints only
+- Respond with 200 OK to acknowledge receipt
+
+## Error Handling
+
+**Error Response Format:**
+```json
+{
+  "error": {
+    "code": "invalid_sequence",
+    "message": "Sequence contains invalid amino acid codes",
+    "details": {
+      "sequence_id": "protein1",
+      "position": 45,
+      "character": "X"
+    }
+  }
+}
+```
+
+**Common Error Codes:**
+- `authentication_failed` - Invalid or missing API key
+- `invalid_sequence` - Malformed FASTA or invalid amino acids
+- `insufficient_credits` - Not enough credits for experiment
+- `target_not_found` - Specified target ID doesn't exist
+- `rate_limit_exceeded` - Too many requests
+- `experiment_not_found` - Invalid experiment ID
+- `internal_error` - Server-side error
+
+## Rate Limits
+
+- 100 requests per minute per API key
+- 1000 experiments per day per organization
+- Batch submissions encouraged for large-scale testing
+
+When rate limited, response includes:
+```
+HTTP 429 Too Many Requests
+Retry-After: 60
+```
+
+## Best Practices
+
+1. **Use webhooks** for long-running experiments instead of polling
+2. **Batch sequences** when submitting multiple variants
+3. **Cache results** to avoid redundant API calls
+4. **Implement retry logic** with exponential backoff
+5. **Monitor credits** to avoid experiment failures
+6. **Validate sequences** locally before submission
+7. **Use descriptive metadata** for better experiment tracking
+
+## API Versioning
+
+The API is currently in alpha/beta. Breaking changes may occur but will be:
+- Announced via email to registered users
+- Documented in the changelog
+- Supported with migration guides
+
+Current version is reflected in response headers:
+```
+X-API-Version: alpha-2025-11
+```
+
+## Support
+
+For API issues or questions:
+- Email: support@adaptyvbio.com
+- Documentation updates: https://docs.adaptyvbio.com
+- Report bugs with experiment IDs and request details
--- a/skills/adaptyv/reference/examples.md
+++ b/skills/adaptyv/reference/examples.md
@@ -0,0 +1,913 @@
+# Code Examples
+
+## Setup and Authentication
+
+### Basic Setup
+
+```python
+import os
+import requests
+from dotenv import load_dotenv
+
+# Load environment variables
+load_dotenv()
+
+# Configuration
+API_KEY = os.getenv("ADAPTYV_API_KEY")
+BASE_URL = "https://kq5jp7qj7wdqklhsxmovkzn4l40obksv.lambda-url.eu-central-1.on.aws"
+
+# Standard headers
+HEADERS = {
+    "Authorization": f"Bearer {API_KEY}",
+    "Content-Type": "application/json"
+}
+
+def check_api_connection():
+    """Verify API connection and credentials"""
+    try:
+        response = requests.get(f"{BASE_URL}/organization/credits", headers=HEADERS)
+        response.raise_for_status()
+        print("✓ API connection successful")
+        print(f"  Credits remaining: {response.json()['balance']}")
+        return True
+    except requests.exceptions.HTTPError as e:
+        print(f"✗ API authentication failed: {e}")
+        return False
+```
+
+### Environment Setup
+
+Create a `.env` file:
+```bash
+ADAPTYV_API_KEY=your_api_key_here
+```
+
+Install dependencies:
+```bash
+uv pip install requests python-dotenv
+```
+
+## Experiment Submission
+
+### Submit Single Sequence
+
+```python
+def submit_single_experiment(sequence, experiment_type="binding", target_id=None):
+    """
+    Submit a single protein sequence for testing
+
+    Args:
+        sequence: Amino acid sequence string
+        experiment_type: Type of experiment (binding, expression, thermostability, enzyme_activity)
+        target_id: Optional target identifier for binding assays
+
+    Returns:
+        Experiment ID and status
+    """
+
+    # Format as FASTA
+    fasta_content = f">protein_sequence\n{sequence}\n"
+
+    payload = {
+        "sequences": fasta_content,
+        "experiment_type": experiment_type
+    }
+
+    if target_id:
+        payload["target_id"] = target_id
+
+    response = requests.post(
+        f"{BASE_URL}/experiments",
+        headers=HEADERS,
+        json=payload
+    )
+
+    response.raise_for_status()
+    result = response.json()
+
+    print(f"✓ Experiment submitted")
+    print(f"  Experiment ID: {result['experiment_id']}")
+    print(f"  Status: {result['status']}")
+    print(f"  Estimated completion: {result['estimated_completion']}")
+
+    return result
+
+# Example usage
+sequence = "MKVLWAALLGLLGAAAAFPAVTSAVKPYKAAVSAAVSKPYKAAVSAAVSKPYK"
+experiment = submit_single_experiment(sequence, experiment_type="expression")
+```
+
+### Submit Multiple Sequences (Batch)
+
+```python
+def submit_batch_experiment(sequences_dict, experiment_type="binding", metadata=None):
+    """
+    Submit multiple protein sequences in a single batch
+
+    Args:
+        sequences_dict: Dictionary of {name: sequence}
+        experiment_type: Type of experiment
+        metadata: Optional dictionary of additional information
+
+    Returns:
+        Experiment details
+    """
+
+    # Format all sequences as FASTA
+    fasta_content = ""
+    for name, sequence in sequences_dict.items():
+        fasta_content += f">{name}\n{sequence}\n"
+
+    payload = {
+        "sequences": fasta_content,
+        "experiment_type": experiment_type
+    }
+
+    if metadata:
+        payload["metadata"] = metadata
+
+    response = requests.post(
+        f"{BASE_URL}/experiments",
+        headers=HEADERS,
+        json=payload
+    )
+
+    response.raise_for_status()
+    result = response.json()
+
+    print(f"✓ Batch experiment submitted")
+    print(f"  Experiment ID: {result['experiment_id']}")
+    print(f"  Sequences: {len(sequences_dict)}")
+    print(f"  Status: {result['status']}")
+
+    return result
+
+# Example usage
+sequences = {
+    "variant_1": "MKVLWAALLGLLGAAA...",
+    "variant_2": "MKVLSAALLGLLGAAA...",
+    "variant_3": "MKVLAAALLGLLGAAA...",
+    "wildtype": "MKVLWAALLGLLGAAA..."
+}
+
+metadata = {
+    "project": "antibody_optimization",
+    "round": 3,
+    "notes": "Testing solubility-optimized variants"
+}
+
+experiment = submit_batch_experiment(sequences, "expression", metadata)
+```
+
+### Submit with Webhook Notification
+
+```python
+def submit_with_webhook(sequences_dict, experiment_type, webhook_url):
+    """
+    Submit experiment with webhook for completion notification
+
+    Args:
+        sequences_dict: Dictionary of {name: sequence}
+        experiment_type: Type of experiment
+        webhook_url: URL to receive notification when complete
+    """
+
+    fasta_content = ""
+    for name, sequence in sequences_dict.items():
+        fasta_content += f">{name}\n{sequence}\n"
+
+    payload = {
+        "sequences": fasta_content,
+        "experiment_type": experiment_type,
+        "webhook_url": webhook_url
+    }
+
+    response = requests.post(
+        f"{BASE_URL}/experiments",
+        headers=HEADERS,
+        json=payload
+    )
+
+    response.raise_for_status()
+    result = response.json()
+
+    print(f"✓ Experiment submitted with webhook")
+    print(f"  Experiment ID: {result['experiment_id']}")
+    print(f"  Webhook: {webhook_url}")
+
+    return result
+
+# Example
+webhook_url = "https://your-server.com/adaptyv-webhook"
+experiment = submit_with_webhook(sequences, "binding", webhook_url)
+```
+
+## Tracking Experiments
+
+### Check Experiment Status
+
+```python
+def check_experiment_status(experiment_id):
+    """
+    Get current status of an experiment
+
+    Args:
+        experiment_id: Experiment identifier
+
+    Returns:
+        Status information
+    """
+
+    response = requests.get(
+        f"{BASE_URL}/experiments/{experiment_id}",
+        headers=HEADERS
+    )
+
+    response.raise_for_status()
+    status = response.json()
+
+    print(f"Experiment: {experiment_id}")
+    print(f"  Status: {status['status']}")
+    print(f"  Created: {status['created_at']}")
+    print(f"  Updated: {status['updated_at']}")
+
+    if 'progress' in status:
+        print(f"  Progress: {status['progress']['percentage']}%")
+        print(f"  Current stage: {status['progress']['stage']}")
+
+    return status
+
+# Example
+status = check_experiment_status("exp_abc123xyz")
+```
+
+### List All Experiments
+
+```python
+def list_experiments(status_filter=None, limit=50):
+    """
+    List experiments with optional status filtering
+
+    Args:
+        status_filter: Filter by status (submitted, processing, completed, failed)
+        limit: Maximum number of results
+
+    Returns:
+        List of experiments
+    """
+
+    params = {"limit": limit}
+    if status_filter:
+        params["status"] = status_filter
+
+    response = requests.get(
+        f"{BASE_URL}/experiments",
+        headers=HEADERS,
+        params=params
+    )
+
+    response.raise_for_status()
+    result = response.json()
+
+    print(f"Found {result['total']} experiments")
+    for exp in result['experiments']:
+        print(f"  {exp['experiment_id']}: {exp['status']} ({exp['experiment_type']})")
+
+    return result['experiments']
+
+# Example - list all completed experiments
+completed_experiments = list_experiments(status_filter="completed")
+```
+
+### Poll Until Complete
+
+```python
+import time
+
+def wait_for_completion(experiment_id, check_interval=3600):
+    """
+    Poll experiment status until completion
+
+    Args:
+        experiment_id: Experiment identifier
+        check_interval: Seconds between status checks (default: 1 hour)
+
+    Returns:
+        Final status
+    """
+
+    print(f"Monitoring experiment {experiment_id}...")
+
+    while True:
+        status = check_experiment_status(experiment_id)
+
+        if status['status'] == 'completed':
+            print("✓ Experiment completed!")
+            return status
+        elif status['status'] == 'failed':
+            print("✗ Experiment failed")
+            return status
+
+        print(f"  Status: {status['status']} - checking again in {check_interval}s")
+        time.sleep(check_interval)
+
+# Example (not recommended - use webhooks instead!)
+# status = wait_for_completion("exp_abc123xyz", check_interval=3600)
+```
+
+## Retrieving Results
+
+### Download Experiment Results
+
+```python
+import json
+
+def download_results(experiment_id, output_dir="results"):
+    """
+    Download and parse experiment results
+
+    Args:
+        experiment_id: Experiment identifier
+        output_dir: Directory to save results
+
+    Returns:
+        Parsed results data
+    """
+
+    # Get results
+    response = requests.get(
+        f"{BASE_URL}/experiments/{experiment_id}/results",
+        headers=HEADERS
+    )
+
+    response.raise_for_status()
+    results = response.json()
+
+    # Save results JSON
+    os.makedirs(output_dir, exist_ok=True)
+    output_file = f"{output_dir}/{experiment_id}_results.json"
+
+    with open(output_file, 'w') as f:
+        json.dump(results, f, indent=2)
+
+    print(f"✓ Results downloaded: {output_file}")
+    print(f"  Sequences tested: {len(results['results'])}")
+
+    # Download raw data if available
+    if 'download_urls' in results:
+        for data_type, url in results['download_urls'].items():
+            print(f"  {data_type} available at: {url}")
+
+    return results
+
+# Example
+results = download_results("exp_abc123xyz")
+```
+
+### Parse Binding Results
+
+```python
+import pandas as pd
+
+def parse_binding_results(results):
+    """
+    Parse binding assay results into DataFrame
+
+    Args:
+        results: Results dictionary from API
+
+    Returns:
+        pandas DataFrame with organized results
+    """
+
+    data = []
+    for result in results['results']:
+        row = {
+            'sequence_id': result['sequence_id'],
+            'kd': result['measurements']['kd'],
+            'kd_error': result['measurements']['kd_error'],
+            'kon': result['measurements']['kon'],
+            'koff': result['measurements']['koff'],
+            'confidence': result['quality_metrics']['confidence'],
+            'r_squared': result['quality_metrics']['r_squared']
+        }
+        data.append(row)
+
+    df = pd.DataFrame(data)
+
+    # Sort by affinity (lower KD = stronger binding)
+    df = df.sort_values('kd')
+
+    print("Top 5 binders:")
+    print(df.head())
+
+    return df
+
+# Example
+experiment_id = "exp_abc123xyz"
+results = download_results(experiment_id)
+binding_df = parse_binding_results(results)
+
+# Export to CSV
+binding_df.to_csv(f"{experiment_id}_binding_results.csv", index=False)
+```
+
+### Parse Expression Results
+
+```python
+def parse_expression_results(results):
+    """
+    Parse expression testing results into DataFrame
+
+    Args:
+        results: Results dictionary from API
+
+    Returns:
+        pandas DataFrame with organized results
+    """
+
+    data = []
+    for result in results['results']:
+        row = {
+            'sequence_id': result['sequence_id'],
+            'yield_mg_per_l': result['measurements']['total_yield_mg_per_l'],
+            'soluble_fraction': result['measurements']['soluble_fraction_percent'],
+            'purity': result['measurements']['purity_percent'],
+            'percentile': result['ranking']['percentile']
+        }
+        data.append(row)
+
+    df = pd.DataFrame(data)
+
+    # Sort by yield
+    df = df.sort_values('yield_mg_per_l', ascending=False)
+
+    print(f"Mean yield: {df['yield_mg_per_l'].mean():.2f} mg/L")
+    print(f"Top performer: {df.iloc[0]['sequence_id']} ({df.iloc[0]['yield_mg_per_l']:.2f} mg/L)")
+
+    return df
+
+# Example
+results = download_results("exp_expression123")
+expression_df = parse_expression_results(results)
+```
+
+## Target Catalog
+
+### Search for Targets
+
+```python
+def search_targets(query, species=None, category=None):
+    """
+    Search the antigen catalog
+
+    Args:
+        query: Search term (protein name, UniProt ID, etc.)
+        species: Optional species filter
+        category: Optional category filter
+
+    Returns:
+        List of matching targets
+    """
+
+    params = {"search": query}
+    if species:
+        params["species"] = species
+    if category:
+        params["category"] = category
+
+    response = requests.get(
+        f"{BASE_URL}/targets",
+        headers=HEADERS,
+        params=params
+    )
+
+    response.raise_for_status()
+    targets = response.json()['targets']
+
+    print(f"Found {len(targets)} targets matching '{query}':")
+    for target in targets:
+        print(f"  {target['target_id']}: {target['name']}")
+        print(f"    Species: {target['species']}")
+        print(f"    Availability: {target['availability']}")
+        print(f"    Price: ${target['price_usd']}")
+
+    return targets
+
+# Example
+targets = search_targets("PD-L1", species="Homo sapiens")
+```
+
+### Request Custom Target
+
+```python
+def request_custom_target(target_name, uniprot_id=None, species=None, notes=None):
+    """
+    Request a custom antigen not in the standard catalog
+
+    Args:
+        target_name: Name of the target protein
+        uniprot_id: Optional UniProt identifier
+        species: Species name
+        notes: Additional requirements or notes
+
+    Returns:
+        Request confirmation
+    """
+
+    payload = {
+        "target_name": target_name,
+        "species": species
+    }
+
+    if uniprot_id:
+        payload["uniprot_id"] = uniprot_id
+    if notes:
+        payload["notes"] = notes
+
+    response = requests.post(
+        f"{BASE_URL}/targets/request",
+        headers=HEADERS,
+        json=payload
+    )
+
+    response.raise_for_status()
+    result = response.json()
+
+    print(f"✓ Custom target request submitted")
+    print(f"  Request ID: {result['request_id']}")
+    print(f"  Status: {result['status']}")
+
+    return result
+
+# Example
+request = request_custom_target(
+    target_name="Novel receptor XYZ",
+    uniprot_id="P12345",
+    species="Mus musculus",
+    notes="Need high purity for structural studies"
+)
+```
+
+## Complete Workflows
+
+### End-to-End Binding Assay
+
+```python
+def complete_binding_workflow(sequences_dict, target_id, project_name):
+    """
+    Complete workflow: submit sequences, track, and retrieve binding results
+
+    Args:
+        sequences_dict: Dictionary of {name: sequence}
+        target_id: Target identifier from catalog
+        project_name: Project name for metadata
+
+    Returns:
+        DataFrame with binding results
+    """
+
+    print("=== Starting Binding Assay Workflow ===")
+
+    # Step 1: Submit experiment
+    print("\n1. Submitting experiment...")
+    metadata = {
+        "project": project_name,
+        "target": target_id
+    }
+
+    experiment = submit_batch_experiment(
+        sequences_dict,
+        experiment_type="binding",
+        metadata=metadata
+    )
+
+    experiment_id = experiment['experiment_id']
+
+    # Step 2: Save experiment info
+    print("\n2. Saving experiment details...")
+    with open(f"{experiment_id}_info.json", 'w') as f:
+        json.dump(experiment, f, indent=2)
+
+    print(f"✓ Experiment {experiment_id} submitted")
+    print("  Results will be available in ~21 days")
+    print("  Use webhook or poll status for updates")
+
+    # Note: In practice, wait for completion before this step
+    # print("\n3. Waiting for completion...")
+    # status = wait_for_completion(experiment_id)
+
+    # print("\n4. Downloading results...")
+    # results = download_results(experiment_id)
+
+    # print("\n5. Parsing results...")
+    # df = parse_binding_results(results)
+
+    # return df
+
+    return experiment_id
+
+# Example
+antibody_variants = {
+    "variant_1": "EVQLVESGGGLVQPGG...",
+    "variant_2": "EVQLVESGGGLVQPGS...",
+    "variant_3": "EVQLVESGGGLVQPGA...",
+    "wildtype": "EVQLVESGGGLVQPGG..."
+}
+
+experiment_id = complete_binding_workflow(
+    antibody_variants,
+    target_id="tgt_pdl1_human",
+    project_name="antibody_affinity_maturation"
+)
+```
+
+### Optimization + Testing Pipeline
+
+```python
+# Combine computational optimization with experimental testing
+
+def optimization_and_testing_pipeline(initial_sequences, experiment_type="expression"):
+    """
+    Complete pipeline: optimize sequences computationally, then submit for testing
+
+    Args:
+        initial_sequences: Dictionary of {name: sequence}
+        experiment_type: Type of experiment
+
+    Returns:
+        Experiment ID for tracking
+    """
+
+    print("=== Optimization and Testing Pipeline ===")
+
+    # Step 1: Computational optimization
+    print("\n1. Computational optimization...")
+    from protein_optimization import complete_optimization_pipeline
+
+    optimized = complete_optimization_pipeline(initial_sequences)
+
+    print(f"✓ Optimization complete")
+    print(f"  Started with: {len(initial_sequences)} sequences")
+    print(f"  Optimized to: {len(optimized)} sequences")
+
+    # Step 2: Select top candidates
+    print("\n2. Selecting top candidates for testing...")
+    top_candidates = optimized[:50]  # Top 50
+
+    sequences_to_test = {
+        seq_data['name']: seq_data['sequence']
+        for seq_data in top_candidates
+    }
+
+    # Step 3: Submit for experimental validation
+    print("\n3. Submitting to Adaptyv...")
+    metadata = {
+        "optimization_method": "computational_pipeline",
+        "initial_library_size": len(initial_sequences),
+        "computational_scores": [s['combined'] for s in top_candidates]
+    }
+
+    experiment = submit_batch_experiment(
+        sequences_to_test,
+        experiment_type=experiment_type,
+        metadata=metadata
+    )
+
+    print(f"✓ Pipeline complete")
+    print(f"  Experiment ID: {experiment['experiment_id']}")
+
+    return experiment['experiment_id']
+
+# Example
+initial_library = {
+    f"variant_{i}": generate_random_sequence()
+    for i in range(1000)
+}
+
+experiment_id = optimization_and_testing_pipeline(
+    initial_library,
+    experiment_type="expression"
+)
+```
+
+### Batch Result Analysis
+
+```python
+def analyze_multiple_experiments(experiment_ids):
+    """
+    Download and analyze results from multiple experiments
+
+    Args:
+        experiment_ids: List of experiment identifiers
+
+    Returns:
+        Combined DataFrame with all results
+    """
+
+    all_results = []
+
+    for exp_id in experiment_ids:
+        print(f"Processing {exp_id}...")
+
+        # Download results
+        results = download_results(exp_id, output_dir=f"results/{exp_id}")
+
+        # Parse based on experiment type
+        exp_type = results.get('experiment_type', 'unknown')
+
+        if exp_type == 'binding':
+            df = parse_binding_results(results)
+            df['experiment_id'] = exp_id
+            all_results.append(df)
+
+        elif exp_type == 'expression':
+            df = parse_expression_results(results)
+            df['experiment_id'] = exp_id
+            all_results.append(df)
+
+    # Combine all results
+    combined_df = pd.concat(all_results, ignore_index=True)
+
+    print(f"\n✓ Analysis complete")
+    print(f"  Total experiments: {len(experiment_ids)}")
+    print(f"  Total sequences: {len(combined_df)}")
+
+    return combined_df
+
+# Example
+experiment_ids = [
+    "exp_round1_abc",
+    "exp_round2_def",
+    "exp_round3_ghi"
+]
+
+all_data = analyze_multiple_experiments(experiment_ids)
+all_data.to_csv("combined_results.csv", index=False)
+```
+
+## Error Handling
+
+### Robust API Wrapper
+
+```python
+import time
+from requests.exceptions import RequestException, HTTPError
+
+def api_request_with_retry(method, url, max_retries=3, backoff_factor=2, **kwargs):
+    """
+    Make API request with retry logic and error handling
+
+    Args:
+        method: HTTP method (GET, POST, etc.)
+        url: Request URL
+        max_retries: Maximum number of retry attempts
+        backoff_factor: Exponential backoff multiplier
+        **kwargs: Additional arguments for requests
+
+    Returns:
+        Response object
+
+    Raises:
+        RequestException: If all retries fail
+    """
+
+    for attempt in range(max_retries):
+        try:
+            response = requests.request(method, url, **kwargs)
+            response.raise_for_status()
+            return response
+
+        except HTTPError as e:
+            if e.response.status_code == 429:  # Rate limit
+                wait_time = backoff_factor ** attempt
+                print(f"Rate limited. Waiting {wait_time}s...")
+                time.sleep(wait_time)
+                continue
+
+            elif e.response.status_code >= 500:  # Server error
+                if attempt < max_retries - 1:
+                    wait_time = backoff_factor ** attempt
+                    print(f"Server error. Retrying in {wait_time}s...")
+                    time.sleep(wait_time)
+                    continue
+                else:
+                    raise
+
+            else:  # Client error (4xx) - don't retry
+                error_data = e.response.json() if e.response.content else {}
+                print(f"API Error: {error_data.get('error', {}).get('message', str(e))}")
+                raise
+
+        except RequestException as e:
+            if attempt < max_retries - 1:
+                wait_time = backoff_factor ** attempt
+                print(f"Request failed. Retrying in {wait_time}s...")
+                time.sleep(wait_time)
+                continue
+            else:
+                raise
+
+    raise RequestException(f"Failed after {max_retries} attempts")
+
+# Example usage
+response = api_request_with_retry(
+    "POST",
+    f"{BASE_URL}/experiments",
+    headers=HEADERS,
+    json={"sequences": fasta_content, "experiment_type": "binding"}
+)
+```
+
+## Utility Functions
+
+### Validate FASTA Format
+
+```python
+def validate_fasta(fasta_string):
+    """
+    Validate FASTA format and sequences
+
+    Args:
+        fasta_string: FASTA-formatted string
+
+    Returns:
+        Tuple of (is_valid, error_message)
+    """
+
+    lines = fasta_string.strip().split('\n')
+
+    if not lines:
+        return False, "Empty FASTA content"
+
+    if not lines[0].startswith('>'):
+        return False, "FASTA must start with header line (>)"
+
+    valid_amino_acids = set("ACDEFGHIKLMNPQRSTVWY")
+    current_header = None
+
+    for i, line in enumerate(lines):
+        if line.startswith('>'):
+            if not line[1:].strip():
+                return False, f"Line {i+1}: Empty header"
+            current_header = line[1:].strip()
+
+        else:
+            if current_header is None:
+                return False, f"Line {i+1}: Sequence before header"
+
+            sequence = line.strip().upper()
+            invalid = set(sequence) - valid_amino_acids
+
+            if invalid:
+                return False, f"Line {i+1}: Invalid amino acids: {invalid}"
+
+    return True, None
+
+# Example
+fasta = ">protein1\nMKVLWAALLG\n>protein2\nMATGVLWALG"
+is_valid, error = validate_fasta(fasta)
+
+if is_valid:
+    print("✓ FASTA format valid")
+else:
+    print(f"✗ FASTA validation failed: {error}")
+```
+
+### Format Sequences to FASTA
+
+```python
+def sequences_to_fasta(sequences_dict):
+    """
+    Convert dictionary of sequences to FASTA format
+
+    Args:
+        sequences_dict: Dictionary of {name: sequence}
+
+    Returns:
+        FASTA-formatted string
+    """
+
+    fasta_content = ""
+    for name, sequence in sequences_dict.items():
+        # Clean sequence (remove whitespace, ensure uppercase)
+        clean_seq = ''.join(sequence.split()).upper()
+
+        # Validate
+        is_valid, error = validate_fasta(f">{name}\n{clean_seq}")
+        if not is_valid:
+            raise ValueError(f"Invalid sequence '{name}': {error}")
+
+        fasta_content += f">{name}\n{clean_seq}\n"
+
+    return fasta_content
+
+# Example
+sequences = {
+    "var1": "MKVLWAALLG",
+    "var2": "MATGVLWALG"
+}
+
+fasta = sequences_to_fasta(sequences)
+print(fasta)
+```
--- a/skills/adaptyv/reference/experiments.md
+++ b/skills/adaptyv/reference/experiments.md
@@ -0,0 +1,360 @@
+# Experiment Types and Workflows
+
+## Overview
+
+Adaptyv provides multiple experimental assay types for comprehensive protein characterization. Each experiment type has specific applications, workflows, and data outputs.
+
+## Binding Assays
+
+### Description
+
+Measure protein-target interactions using biolayer interferometry (BLI), a label-free technique that monitors biomolecular binding in real-time.
+
+### Use Cases
+
+- Antibody-antigen binding characterization
+- Receptor-ligand interaction analysis
+- Protein-protein interaction studies
+- Affinity maturation screening
+- Epitope binning experiments
+
+### Technology: Biolayer Interferometry (BLI)
+
+BLI measures the interference pattern of reflected light from two surfaces:
+- **Reference layer** - Biosensor tip surface
+- **Biological layer** - Accumulated bound molecules
+
+As molecules bind, the optical thickness increases, causing a wavelength shift proportional to binding.
+
+**Advantages:**
+- Label-free detection
+- Real-time kinetics
+- High-throughput compatible
+- Works in crude samples
+- Minimal sample consumption
+
+### Measured Parameters
+
+**Kinetic constants:**
+- **KD** - Equilibrium dissociation constant (binding affinity)
+- **kon** - Association rate constant (binding speed)
+- **koff** - Dissociation rate constant (unbinding speed)
+
+**Typical ranges:**
+- Strong binders: KD < 1 nM
+- Moderate binders: KD = 1-100 nM
+- Weak binders: KD > 100 nM
+
+### Workflow
+
+1. **Sequence submission** - Provide protein sequences in FASTA format
+2. **Expression** - Proteins expressed in appropriate host system
+3. **Purification** - Automated purification protocols
+4. **BLI assay** - Real-time binding measurements against specified targets
+5. **Analysis** - Kinetic curve fitting and quality assessment
+6. **Results delivery** - Binding parameters with confidence metrics
+
+### Sample Requirements
+
+- Protein sequence (standard amino acid codes)
+- Target specification (from catalog or custom request)
+- Buffer conditions (standard or custom)
+- Expected concentration range (optional, improves assay design)
+
+### Results Format
+
+```json
+{
+  "sequence_id": "antibody_variant_1",
+  "target": "Human PD-L1",
+  "measurements": {
+    "kd": 2.5e-9,
+    "kd_error": 0.3e-9,
+    "kon": 1.8e5,
+    "kon_error": 0.2e5,
+    "koff": 4.5e-4,
+    "koff_error": 0.5e-4
+  },
+  "quality_metrics": {
+    "confidence": "high|medium|low",
+    "r_squared": 0.97,
+    "chi_squared": 0.02,
+    "flags": []
+  },
+  "raw_data_url": "https://..."
+}
+```
+
+## Expression Testing
+
+### Description
+
+Quantify protein expression levels in various host systems to assess producibility and optimize sequences for manufacturing.
+
+### Use Cases
+
+- Screening variants for high expression
+- Optimizing codon usage
+- Identifying expression bottlenecks
+- Selecting candidates for scale-up
+- Comparing expression systems
+
+### Host Systems
+
+Available expression platforms:
+- **E. coli** - Rapid, cost-effective, prokaryotic system
+- **Mammalian cells** - Native post-translational modifications
+- **Yeast** - Eukaryotic system with simpler growth requirements
+- **Insect cells** - Alternative eukaryotic platform
+
+### Measured Parameters
+
+- **Total protein yield** (mg/L culture)
+- **Soluble fraction** (percentage)
+- **Purity** (after initial purification)
+- **Expression time course** (optional)
+
+### Workflow
+
+1. **Sequence submission** - Provide protein sequences
+2. **Construct generation** - Cloning into expression vectors
+3. **Expression** - Culture in specified host system
+4. **Quantification** - Protein measurement via multiple methods
+5. **Analysis** - Expression level comparison and ranking
+6. **Results delivery** - Yield data and recommendations
+
+### Results Format
+
+```json
+{
+  "sequence_id": "variant_1",
+  "host_system": "E. coli",
+  "measurements": {
+    "total_yield_mg_per_l": 25.5,
+    "soluble_fraction_percent": 78,
+    "purity_percent": 92
+  },
+  "ranking": {
+    "percentile": 85,
+    "notes": "High expression, good solubility"
+  }
+}
+```
+
+## Thermostability Testing
+
+### Description
+
+Measure protein thermal stability to assess structural integrity, predict shelf-life, and identify stabilizing mutations.
+
+### Use Cases
+
+- Selecting thermally stable variants
+- Formulation development
+- Shelf-life prediction
+- Stability-driven protein engineering
+- Quality control screening
+
+### Measurement Techniques
+
+**Differential Scanning Fluorimetry (DSF):**
+- Monitors protein unfolding via fluorescent dye binding
+- Determines melting temperature (Tm)
+- High-throughput capable
+
+**Circular Dichroism (CD):**
+- Secondary structure analysis
+- Thermal unfolding curves
+- Reversibility assessment
+
+### Measured Parameters
+
+- **Tm** - Melting temperature (midpoint of unfolding)
+- **ΔH** - Enthalpy of unfolding
+- **Aggregation temperature** (Tagg)
+- **Reversibility** - Refolding after heating
+
+### Workflow
+
+1. **Sequence submission** - Provide protein sequences
+2. **Expression and purification** - Standard protocols
+3. **Thermostability assay** - Temperature gradient analysis
+4. **Data analysis** - Curve fitting and parameter extraction
+5. **Results delivery** - Stability metrics with ranking
+
+### Results Format
+
+```json
+{
+  "sequence_id": "variant_1",
+  "measurements": {
+    "tm_celsius": 68.5,
+    "tm_error": 0.5,
+    "tagg_celsius": 72.0,
+    "reversibility_percent": 85
+  },
+  "quality_metrics": {
+    "curve_quality": "excellent",
+    "cooperativity": "two-state"
+  }
+}
+```
+
+## Enzyme Activity Assays
+
+### Description
+
+Measure enzymatic function including substrate turnover, catalytic efficiency, and inhibitor sensitivity.
+
+### Use Cases
+
+- Screening enzyme variants for improved activity
+- Substrate specificity profiling
+- Inhibitor testing
+- pH and temperature optimization
+- Mechanistic studies
+
+### Assay Types
+
+**Continuous assays:**
+- Chromogenic substrates
+- Fluorogenic substrates
+- Real-time monitoring
+
+**Endpoint assays:**
+- HPLC quantification
+- Mass spectrometry
+- Colorimetric detection
+
+### Measured Parameters
+
+**Kinetic parameters:**
+- **kcat** - Turnover number (catalytic rate constant)
+- **KM** - Michaelis constant (substrate affinity)
+- **kcat/KM** - Catalytic efficiency
+- **IC50** - Inhibitor concentration for 50% inhibition
+
+**Activity metrics:**
+- Specific activity (units/mg protein)
+- Relative activity vs. reference
+- Substrate specificity profile
+
+### Workflow
+
+1. **Sequence submission** - Provide enzyme sequences
+2. **Expression and purification** - Optimized for activity retention
+3. **Activity assay** - Substrate turnover measurements
+4. **Kinetic analysis** - Michaelis-Menten fitting
+5. **Results delivery** - Kinetic parameters and rankings
+
+### Results Format
+
+```json
+{
+  "sequence_id": "enzyme_variant_1",
+  "substrate": "substrate_name",
+  "measurements": {
+    "kcat_per_second": 125,
+    "km_micromolar": 45,
+    "kcat_km": 2.8,
+    "specific_activity": 180
+  },
+  "quality_metrics": {
+    "confidence": "high",
+    "r_squared": 0.99
+  },
+  "ranking": {
+    "relative_activity": 1.8,
+    "improvement_vs_wildtype": "80%"
+  }
+}
+```
+
+## Experiment Design Best Practices
+
+### Sequence Submission
+
+1. **Use clear identifiers** - Name sequences descriptively
+2. **Include controls** - Submit wild-type or reference sequences
+3. **Batch similar variants** - Group related sequences in single submission
+4. **Validate sequences** - Check for errors before submission
+
+### Sample Size
+
+- **Pilot studies** - 5-10 sequences to test feasibility
+- **Library screening** - 50-500 sequences for variant exploration
+- **Focused optimization** - 10-50 sequences for fine-tuning
+- **Large-scale campaigns** - 500+ sequences for ML-driven design
+
+### Quality Control
+
+Adaptyv includes automated QC steps:
+- Expression verification before assay
+- Replicate measurements for reliability
+- Positive/negative controls in each batch
+- Statistical validation of results
+
+### Timeline Expectations
+
+**Standard turnaround:** ~21 days from submission to results
+
+**Timeline breakdown:**
+- Construct generation: 3-5 days
+- Expression: 5-7 days
+- Purification: 2-3 days
+- Assay execution: 3-5 days
+- Analysis and QC: 2-3 days
+
+**Factors affecting timeline:**
+- Custom targets (add 1-2 weeks)
+- Novel assay development (add 2-4 weeks)
+- Large batch sizes (may add 1 week)
+
+### Cost Optimization
+
+1. **Batch submissions** - Lower per-sequence cost
+2. **Standard targets** - Catalog antigens are faster/cheaper
+3. **Standard conditions** - Custom buffers add cost
+4. **Computational pre-filtering** - Submit only promising candidates
+
+## Combining Experiment Types
+
+For comprehensive protein characterization, combine multiple assays:
+
+**Therapeutic antibody development:**
+1. Binding assay → Identify high-affinity binders
+2. Expression testing → Select manufacturable candidates
+3. Thermostability → Ensure formulation stability
+
+**Enzyme engineering:**
+1. Activity assay → Screen for improved catalysis
+2. Expression testing → Ensure producibility
+3. Thermostability → Validate industrial robustness
+
+**Sequential vs. Parallel:**
+- **Sequential** - Use results from early assays to filter candidates
+- **Parallel** - Run all assays simultaneously for faster results
+
+## Data Integration
+
+Results integrate with computational workflows:
+
+1. **Download raw data** via API
+2. **Parse results** into standardized format
+3. **Feed into ML models** for next-round design
+4. **Track experiments** with metadata tags
+5. **Visualize trends** across design iterations
+
+## Support and Troubleshooting
+
+**Common issues:**
+- Low expression → Consider sequence optimization (see protein_optimization.md)
+- Poor binding → Verify target specification and expected range
+- Variable results → Check sequence quality and controls
+- Incomplete data → Contact support with experiment ID
+
+**Getting help:**
+- Email: support@adaptyvbio.com
+- Include experiment ID and specific question
+- Provide context (design goals, expected results)
+- Response time: <24 hours for active experiments
--- a/skills/adaptyv/reference/protein_optimization.md
+++ b/skills/adaptyv/reference/protein_optimization.md
@@ -0,0 +1,637 @@
+# Protein Sequence Optimization
+
+## Overview
+
+Before submitting protein sequences for experimental testing, use computational tools to optimize sequences for improved expression, solubility, and stability. This pre-screening reduces experimental costs and increases success rates.
+
+## Common Protein Expression Problems
+
+### 1. Unpaired Cysteines
+
+**Problem:**
+- Unpaired cysteines form unwanted disulfide bonds
+- Leads to aggregation and misfolding
+- Reduces expression yield and stability
+
+**Solution:**
+- Remove unpaired cysteines unless functionally necessary
+- Pair cysteines appropriately for structural disulfides
+- Replace with serine or alanine in non-critical positions
+
+**Example:**
+```python
+# Check for cysteine pairs
+from Bio.Seq import Seq
+
+def check_cysteines(sequence):
+    cys_count = sequence.count('C')
+    if cys_count % 2 != 0:
+        print(f"Warning: Odd number of cysteines ({cys_count})")
+    return cys_count
+```
+
+### 2. Excessive Hydrophobicity
+
+**Problem:**
+- Long hydrophobic patches promote aggregation
+- Exposed hydrophobic residues drive protein clumping
+- Poor solubility in aqueous buffers
+
+**Solution:**
+- Maintain balanced hydropathy profiles
+- Use short, flexible linkers between domains
+- Reduce surface-exposed hydrophobic residues
+
+**Metrics:**
+- Kyte-Doolittle hydropathy plots
+- GRAVY score (Grand Average of Hydropathy)
+- pSAE (percent Solvent-Accessible hydrophobic residues)
+
+### 3. Low Solubility
+
+**Problem:**
+- Proteins precipitate during expression or purification
+- Inclusion body formation
+- Difficult downstream processing
+
+**Solution:**
+- Use solubility prediction tools for pre-screening
+- Apply sequence optimization algorithms
+- Add solubilizing tags if needed
+
+## Computational Tools for Optimization
+
+### NetSolP - Initial Solubility Screening
+
+**Purpose:** Fast solubility prediction for filtering sequences.
+
+**Method:** Machine learning model trained on E. coli expression data.
+
+**Usage:**
+```python
+# Install: uv pip install requests
+import requests
+
+def predict_solubility_netsolp(sequence):
+    """Predict protein solubility using NetSolP web service"""
+    url = "https://services.healthtech.dtu.dk/services/NetSolP-1.0/api/predict"
+
+    data = {
+        "sequence": sequence,
+        "format": "fasta"
+    }
+
+    response = requests.post(url, data=data)
+    return response.json()
+
+# Example
+sequence = "MKVLWAALLGLLGAAA..."
+result = predict_solubility_netsolp(sequence)
+print(f"Solubility score: {result['score']}")
+```
+
+**Interpretation:**
+- Score > 0.5: Likely soluble
+- Score < 0.5: Likely insoluble
+- Use for initial filtering before more expensive predictions
+
+**When to use:**
+- First-pass filtering of large libraries
+- Quick validation of designed sequences
+- Prioritizing sequences for experimental testing
+
+### SoluProt - Comprehensive Solubility Prediction
+
+**Purpose:** Advanced solubility prediction with higher accuracy.
+
+**Method:** Deep learning model incorporating sequence and structural features.
+
+**Usage:**
+```python
+# Install: uv pip install soluprot
+from soluprot import predict_solubility
+
+def screen_variants_soluprot(sequences):
+    """Screen multiple sequences for solubility"""
+    results = []
+    for name, seq in sequences.items():
+        score = predict_solubility(seq)
+        results.append({
+            'name': name,
+            'sequence': seq,
+            'solubility_score': score,
+            'predicted_soluble': score > 0.6
+        })
+    return results
+
+# Example
+sequences = {
+    'variant_1': 'MKVLW...',
+    'variant_2': 'MATGV...'
+}
+
+results = screen_variants_soluprot(sequences)
+soluble_variants = [r for r in results if r['predicted_soluble']]
+```
+
+**Interpretation:**
+- Score > 0.6: High solubility confidence
+- Score 0.4-0.6: Uncertain, may need optimization
+- Score < 0.4: Likely problematic
+
+**When to use:**
+- After initial NetSolP filtering
+- When higher prediction accuracy is needed
+- Before committing to expensive synthesis/testing
+
+### SolubleMPNN - Sequence Redesign
+
+**Purpose:** Redesign protein sequences to improve solubility while maintaining function.
+
+**Method:** Graph neural network that suggests mutations to increase solubility.
+
+**Usage:**
+```python
+# Install: uv pip install soluble-mpnn
+from soluble_mpnn import optimize_sequence
+
+def optimize_for_solubility(sequence, structure_pdb=None):
+    """
+    Redesign sequence for improved solubility
+
+    Args:
+        sequence: Original amino acid sequence
+        structure_pdb: Optional PDB file for structure-aware design
+
+    Returns:
+        Optimized sequence variants ranked by predicted solubility
+    """
+
+    variants = optimize_sequence(
+        sequence=sequence,
+        structure=structure_pdb,
+        num_variants=10,
+        temperature=0.1  # Lower = more conservative mutations
+    )
+
+    return variants
+
+# Example
+original_seq = "MKVLWAALLGLLGAAA..."
+optimized_variants = optimize_for_solubility(original_seq)
+
+for i, variant in enumerate(optimized_variants):
+    print(f"Variant {i+1}:")
+    print(f"  Sequence: {variant['sequence']}")
+    print(f"  Solubility score: {variant['solubility_score']}")
+    print(f"  Mutations: {variant['mutations']}")
+```
+
+**Design strategy:**
+- **Conservative** (temperature=0.1): Minimal changes, safer
+- **Moderate** (temperature=0.3): Balance between change and safety
+- **Aggressive** (temperature=0.5): More mutations, higher risk
+
+**When to use:**
+- Primary tool for sequence optimization
+- Default starting point for improving problematic sequences
+- Generating diverse soluble variants
+
+**Best practices:**
+- Generate 10-50 variants per sequence
+- Use structure information when available (improves accuracy)
+- Validate key functional residues are preserved
+- Test multiple temperature settings
+
+### ESM (Evolutionary Scale Modeling) - Sequence Likelihood
+
+**Purpose:** Assess how "natural" a protein sequence appears based on evolutionary patterns.
+
+**Method:** Protein language model trained on millions of natural sequences.
+
+**Usage:**
+```python
+# Install: uv pip install fair-esm
+import torch
+from esm import pretrained
+
+def score_sequence_esm(sequence):
+    """
+    Calculate ESM likelihood score for sequence
+    Higher scores indicate more natural/stable sequences
+    """
+
+    model, alphabet = pretrained.esm2_t33_650M_UR50D()
+    batch_converter = alphabet.get_batch_converter()
+
+    data = [("protein", sequence)]
+    _, _, batch_tokens = batch_converter(data)
+
+    with torch.no_grad():
+        results = model(batch_tokens, repr_layers=[33])
+        token_logprobs = results["logits"].log_softmax(dim=-1)
+
+    # Calculate perplexity as sequence quality metric
+    sequence_score = token_logprobs.mean().item()
+
+    return sequence_score
+
+# Example - Compare variants
+sequences = {
+    'original': 'MKVLW...',
+    'optimized_1': 'MKVLS...',
+    'optimized_2': 'MKVLA...'
+}
+
+for name, seq in sequences.items():
+    score = score_sequence_esm(seq)
+    print(f"{name}: ESM score = {score:.3f}")
+```
+
+**Interpretation:**
+- Higher scores → More "natural" sequence
+- Use to avoid unlikely mutations
+- Balance with functional requirements
+
+**When to use:**
+- Filtering synthetic designs
+- Comparing SolubleMPNN variants
+- Ensuring sequences aren't too artificial
+- Avoiding expression bottlenecks
+
+**Integration with design:**
+```python
+def rank_variants_by_esm(variants):
+    """Rank protein variants by ESM likelihood"""
+    scored = []
+    for v in variants:
+        esm_score = score_sequence_esm(v['sequence'])
+        v['esm_score'] = esm_score
+        scored.append(v)
+
+    # Sort by combined solubility and ESM score
+    scored.sort(
+        key=lambda x: x['solubility_score'] * x['esm_score'],
+        reverse=True
+    )
+
+    return scored
+```
+
+### ipTM - Interface Stability (AlphaFold-Multimer)
+
+**Purpose:** Assess protein-protein interface stability and binding confidence.
+
+**Method:** Interface predicted TM-score from AlphaFold-Multimer predictions.
+
+**Usage:**
+```python
+# Requires AlphaFold-Multimer installation
+# Or use ColabFold for easier access
+
+def predict_interface_stability(protein_a_seq, protein_b_seq):
+    """
+    Predict interface stability using AlphaFold-Multimer
+
+    Returns ipTM score: higher = more stable interface
+    """
+    from colabfold import run_alphafold_multimer
+
+    sequences = {
+        'chainA': protein_a_seq,
+        'chainB': protein_b_seq
+    }
+
+    result = run_alphafold_multimer(sequences)
+
+    return {
+        'ipTM': result['iptm'],
+        'pTM': result['ptm'],
+        'pLDDT': result['plddt']
+    }
+
+# Example for antibody-antigen binding
+antibody_seq = "EVQLVESGGGLVQPGG..."
+antigen_seq = "MKVLWAALLGLLGAAA..."
+
+stability = predict_interface_stability(antibody_seq, antigen_seq)
+print(f"Interface pTM: {stability['ipTM']:.3f}")
+
+# Interpretation
+if stability['ipTM'] > 0.7:
+    print("High confidence interface")
+elif stability['ipTM'] > 0.5:
+    print("Moderate confidence interface")
+else:
+    print("Low confidence interface - may need redesign")
+```
+
+**Interpretation:**
+- ipTM > 0.7: Strong predicted interface
+- ipTM 0.5-0.7: Moderate interface confidence
+- ipTM < 0.5: Weak interface, consider redesign
+
+**When to use:**
+- Antibody-antigen design
+- Protein-protein interaction engineering
+- Validating binding interfaces
+- Comparing interface variants
+
+### pSAE - Solvent-Accessible Hydrophobic Residues
+
+**Purpose:** Quantify exposed hydrophobic residues that promote aggregation.
+
+**Method:** Calculates percentage of solvent-accessible surface area (SASA) occupied by hydrophobic residues.
+
+**Usage:**
+```python
+# Requires structure (PDB file or AlphaFold prediction)
+# Install: uv pip install biopython
+
+from Bio.PDB import PDBParser, DSSP
+import numpy as np
+
+def calculate_psae(pdb_file):
+    """
+    Calculate percent Solvent-Accessible hydrophobic residues (pSAE)
+
+    Lower pSAE = better solubility
+    """
+
+    parser = PDBParser(QUIET=True)
+    structure = parser.get_structure('protein', pdb_file)
+
+    # Run DSSP to get solvent accessibility
+    model = structure[0]
+    dssp = DSSP(model, pdb_file, acc_array='Wilke')
+
+    hydrophobic = ['ALA', 'VAL', 'ILE', 'LEU', 'MET', 'PHE', 'TRP', 'PRO']
+
+    total_sasa = 0
+    hydrophobic_sasa = 0
+
+    for residue in dssp:
+        res_name = residue[1]
+        rel_accessibility = residue[3]
+
+        total_sasa += rel_accessibility
+        if res_name in hydrophobic:
+            hydrophobic_sasa += rel_accessibility
+
+    psae = (hydrophobic_sasa / total_sasa) * 100
+
+    return psae
+
+# Example
+pdb_file = "protein_structure.pdb"
+psae_score = calculate_psae(pdb_file)
+print(f"pSAE: {psae_score:.2f}%")
+
+# Interpretation
+if psae_score < 25:
+    print("Good solubility expected")
+elif psae_score < 35:
+    print("Moderate solubility")
+else:
+    print("High aggregation risk")
+```
+
+**Interpretation:**
+- pSAE < 25%: Low aggregation risk
+- pSAE 25-35%: Moderate risk
+- pSAE > 35%: High aggregation risk
+
+**When to use:**
+- Analyzing designed structures
+- Post-AlphaFold validation
+- Identifying aggregation hotspots
+- Guiding surface mutations
+
+## Recommended Optimization Workflow
+
+### Step 1: Initial Screening (Fast)
+
+```python
+def initial_screening(sequences):
+    """
+    Quick first-pass filtering using NetSolP
+    Filters out obviously problematic sequences
+    """
+    passed = []
+    for name, seq in sequences.items():
+        netsolp_score = predict_solubility_netsolp(seq)
+        if netsolp_score > 0.5:
+            passed.append((name, seq))
+
+    return passed
+```
+
+### Step 2: Detailed Assessment (Moderate)
+
+```python
+def detailed_assessment(filtered_sequences):
+    """
+    More thorough analysis with SoluProt and ESM
+    Ranks sequences by multiple criteria
+    """
+    results = []
+    for name, seq in filtered_sequences:
+        soluprot_score = predict_solubility(seq)
+        esm_score = score_sequence_esm(seq)
+
+        combined_score = soluprot_score * 0.7 + esm_score * 0.3
+
+        results.append({
+            'name': name,
+            'sequence': seq,
+            'soluprot': soluprot_score,
+            'esm': esm_score,
+            'combined': combined_score
+        })
+
+    results.sort(key=lambda x: x['combined'], reverse=True)
+    return results
+```
+
+### Step 3: Sequence Optimization (If needed)
+
+```python
+def optimize_problematic_sequences(sequences_needing_optimization):
+    """
+    Use SolubleMPNN to redesign problematic sequences
+    Returns improved variants
+    """
+    optimized = []
+    for name, seq in sequences_needing_optimization:
+        # Generate multiple variants
+        variants = optimize_sequence(
+            sequence=seq,
+            num_variants=10,
+            temperature=0.2
+        )
+
+        # Score variants with ESM
+        for variant in variants:
+            variant['esm_score'] = score_sequence_esm(variant['sequence'])
+
+        # Keep best variants
+        variants.sort(
+            key=lambda x: x['solubility_score'] * x['esm_score'],
+            reverse=True
+        )
+
+        optimized.extend(variants[:3])  # Top 3 variants per sequence
+
+    return optimized
+```
+
+### Step 4: Structure-Based Validation (For critical sequences)
+
+```python
+def structure_validation(top_candidates):
+    """
+    Predict structures and calculate pSAE for top candidates
+    Final validation before experimental testing
+    """
+    validated = []
+    for candidate in top_candidates:
+        # Predict structure with AlphaFold
+        structure_pdb = predict_structure_alphafold(candidate['sequence'])
+
+        # Calculate pSAE
+        psae = calculate_psae(structure_pdb)
+
+        candidate['psae'] = psae
+        candidate['pass_structure_check'] = psae < 30
+
+        validated.append(candidate)
+
+    return validated
+```
+
+### Complete Workflow Example
+
+```python
+def complete_optimization_pipeline(initial_sequences):
+    """
+    End-to-end optimization pipeline
+
+    Input: Dictionary of {name: sequence}
+    Output: Ranked list of optimized, validated sequences
+    """
+
+    print("Step 1: Initial screening with NetSolP...")
+    filtered = initial_screening(initial_sequences)
+    print(f"  Passed: {len(filtered)}/{len(initial_sequences)}")
+
+    print("Step 2: Detailed assessment with SoluProt and ESM...")
+    assessed = detailed_assessment(filtered)
+
+    # Split into good and needs-optimization
+    good_sequences = [s for s in assessed if s['soluprot'] > 0.6]
+    needs_optimization = [s for s in assessed if s['soluprot'] <= 0.6]
+
+    print(f"  Good sequences: {len(good_sequences)}")
+    print(f"  Need optimization: {len(needs_optimization)}")
+
+    if needs_optimization:
+        print("Step 3: Optimizing problematic sequences with SolubleMPNN...")
+        optimized = optimize_problematic_sequences(needs_optimization)
+        all_sequences = good_sequences + optimized
+    else:
+        all_sequences = good_sequences
+
+    print("Step 4: Structure-based validation for top candidates...")
+    top_20 = all_sequences[:20]
+    final_validated = structure_validation(top_20)
+
+    # Final ranking
+    final_validated.sort(
+        key=lambda x: (
+            x['pass_structure_check'],
+            x['combined'],
+            -x['psae']
+        ),
+        reverse=True
+    )
+
+    return final_validated
+
+# Usage
+initial_library = {
+    'variant_1': 'MKVLWAALLGLLGAAA...',
+    'variant_2': 'MATGVLWAALLGLLGA...',
+    # ... more sequences
+}
+
+optimized_library = complete_optimization_pipeline(initial_library)
+
+# Submit top sequences to Adaptyv
+top_sequences_for_testing = optimized_library[:50]
+```
+
+## Best Practices Summary
+
+1. **Always pre-screen** before experimental testing
+2. **Use NetSolP first** for fast filtering of large libraries
+3. **Apply SolubleMPNN** as default optimization tool
+4. **Validate with ESM** to avoid unnatural sequences
+5. **Calculate pSAE** for structure-based validation
+6. **Test multiple variants** per design to account for prediction uncertainty
+7. **Keep controls** - include wild-type or known-good sequences
+8. **Iterate** - use experimental results to refine predictions
+
+## Integration with Adaptyv
+
+After computational optimization, submit sequences to Adaptyv:
+
+```python
+# After optimization pipeline
+optimized_sequences = complete_optimization_pipeline(initial_library)
+
+# Prepare FASTA format
+fasta_content = ""
+for seq_data in optimized_sequences[:50]:  # Top 50
+    fasta_content += f">{seq_data['name']}\n{seq_data['sequence']}\n"
+
+# Submit to Adaptyv
+import requests
+response = requests.post(
+    "https://kq5jp7qj7wdqklhsxmovkzn4l40obksv.lambda-url.eu-central-1.on.aws/experiments",
+    headers={"Authorization": f"Bearer {api_key}"},
+    json={
+        "sequences": fasta_content,
+        "experiment_type": "expression",
+        "metadata": {
+            "optimization_method": "SolubleMPNN_ESM_pipeline",
+            "computational_scores": [s['combined'] for s in optimized_sequences[:50]]
+        }
+    }
+)
+```
+
+## Troubleshooting
+
+**Issue: All sequences score poorly on solubility predictions**
+- Check if sequences contain unusual amino acids
+- Verify FASTA format is correct
+- Consider if protein family is naturally low-solubility
+- May need experimental validation despite predictions
+
+**Issue: SolubleMPNN changes functionally important residues**
+- Provide structure file to preserve spatial constraints
+- Mask critical residues from mutation
+- Lower temperature parameter for conservative changes
+- Manually revert problematic mutations
+
+**Issue: ESM scores are low after optimization**
+- Optimization may be too aggressive
+- Try lower temperature in SolubleMPNN
+- Balance between solubility and naturalness
+- Consider that some optimization may require non-natural mutations
+
+**Issue: Predictions don't match experimental results**
+- Predictions are probabilistic, not deterministic
+- Host system and conditions affect expression
+- Some proteins may need experimental validation
+- Use predictions as enrichment, not absolute filters