Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,308 @@
# Adaptyv API Reference
## Base URL
```
https://kq5jp7qj7wdqklhsxmovkzn4l40obksv.lambda-url.eu-central-1.on.aws
```
## Authentication
All API requests require bearer token authentication in the request header:
```
Authorization: Bearer YOUR_API_KEY
```
To obtain API access:
1. Contact support@adaptyvbio.com
2. Request API access during alpha/beta period
3. Receive your personal access token
Store your API key securely:
- Use environment variables: `ADAPTYV_API_KEY`
- Never commit API keys to version control
- Use `.env` files with `.gitignore` for local development
## Endpoints
### Experiments
#### Create Experiment
Submit protein sequences for experimental testing.
**Endpoint:** `POST /experiments`
**Request Body:**
```json
{
"sequences": ">protein1\nMKVLWALLGLLGAA...\n>protein2\nMATGVLWALLG...",
"experiment_type": "binding|expression|thermostability|enzyme_activity",
"target_id": "optional_target_identifier",
"webhook_url": "https://your-webhook.com/callback",
"metadata": {
"project": "optional_project_name",
"notes": "optional_notes"
}
}
```
**Sequence Format:**
- FASTA format with headers
- Multiple sequences supported
- Standard amino acid codes
**Response:**
```json
{
"experiment_id": "exp_abc123xyz",
"status": "submitted",
"created_at": "2025-11-24T10:00:00Z",
"estimated_completion": "2025-12-15T10:00:00Z"
}
```
#### Get Experiment Status
Check the current status of an experiment.
**Endpoint:** `GET /experiments/{experiment_id}`
**Response:**
```json
{
"experiment_id": "exp_abc123xyz",
"status": "submitted|processing|completed|failed",
"created_at": "2025-11-24T10:00:00Z",
"updated_at": "2025-11-25T14:30:00Z",
"progress": {
"stage": "sequencing|expression|assay|analysis",
"percentage": 45
}
}
```
**Status Values:**
- `submitted` - Experiment received and queued
- `processing` - Active testing in progress
- `completed` - Results available for download
- `failed` - Experiment encountered an error
#### List Experiments
Retrieve all experiments for your organization.
**Endpoint:** `GET /experiments`
**Query Parameters:**
- `status` - Filter by status (optional)
- `limit` - Number of results per page (default: 50)
- `offset` - Pagination offset (default: 0)
**Response:**
```json
{
"experiments": [
{
"experiment_id": "exp_abc123xyz",
"status": "completed",
"experiment_type": "binding",
"created_at": "2025-11-24T10:00:00Z"
}
],
"total": 150,
"limit": 50,
"offset": 0
}
```
### Results
#### Get Experiment Results
Download results from a completed experiment.
**Endpoint:** `GET /experiments/{experiment_id}/results`
**Response:**
```json
{
"experiment_id": "exp_abc123xyz",
"results": [
{
"sequence_id": "protein1",
"measurements": {
"kd": 1.2e-9,
"kon": 1.5e5,
"koff": 1.8e-4
},
"quality_metrics": {
"confidence": "high",
"r_squared": 0.98
}
}
],
"download_urls": {
"raw_data": "https://...",
"analysis_package": "https://...",
"report": "https://..."
}
}
```
### Targets
#### Search Target Catalog
Search the ACROBiosystems antigen catalog.
**Endpoint:** `GET /targets`
**Query Parameters:**
- `search` - Search term (protein name, UniProt ID, etc.)
- `species` - Filter by species
- `category` - Filter by category
**Response:**
```json
{
"targets": [
{
"target_id": "tgt_12345",
"name": "Human PD-L1",
"species": "Homo sapiens",
"uniprot_id": "Q9NZQ7",
"availability": "in_stock|custom_order",
"price_usd": 450
}
]
}
```
#### Request Custom Target
Request an antigen not in the standard catalog.
**Endpoint:** `POST /targets/request`
**Request Body:**
```json
{
"target_name": "Custom target name",
"uniprot_id": "optional_uniprot_id",
"species": "species_name",
"notes": "Additional requirements"
}
```
### Organization
#### Get Credits Balance
Check your organization's credit balance and usage.
**Endpoint:** `GET /organization/credits`
**Response:**
```json
{
"balance": 10000,
"currency": "USD",
"usage_this_month": 2500,
"experiments_remaining": 22
}
```
## Webhooks
Configure webhook URLs to receive notifications when experiments complete.
**Webhook Payload:**
```json
{
"event": "experiment.completed",
"experiment_id": "exp_abc123xyz",
"status": "completed",
"timestamp": "2025-12-15T10:00:00Z",
"results_url": "/experiments/exp_abc123xyz/results"
}
```
**Webhook Events:**
- `experiment.submitted` - Experiment received
- `experiment.started` - Processing began
- `experiment.completed` - Results available
- `experiment.failed` - Error occurred
**Security:**
- Verify webhook signatures (details provided during onboarding)
- Use HTTPS endpoints only
- Respond with 200 OK to acknowledge receipt
## Error Handling
**Error Response Format:**
```json
{
"error": {
"code": "invalid_sequence",
"message": "Sequence contains invalid amino acid codes",
"details": {
"sequence_id": "protein1",
"position": 45,
"character": "X"
}
}
}
```
**Common Error Codes:**
- `authentication_failed` - Invalid or missing API key
- `invalid_sequence` - Malformed FASTA or invalid amino acids
- `insufficient_credits` - Not enough credits for experiment
- `target_not_found` - Specified target ID doesn't exist
- `rate_limit_exceeded` - Too many requests
- `experiment_not_found` - Invalid experiment ID
- `internal_error` - Server-side error
## Rate Limits
- 100 requests per minute per API key
- 1000 experiments per day per organization
- Batch submissions encouraged for large-scale testing
When rate limited, response includes:
```
HTTP 429 Too Many Requests
Retry-After: 60
```
## Best Practices
1. **Use webhooks** for long-running experiments instead of polling
2. **Batch sequences** when submitting multiple variants
3. **Cache results** to avoid redundant API calls
4. **Implement retry logic** with exponential backoff
5. **Monitor credits** to avoid experiment failures
6. **Validate sequences** locally before submission
7. **Use descriptive metadata** for better experiment tracking
## API Versioning
The API is currently in alpha/beta. Breaking changes may occur but will be:
- Announced via email to registered users
- Documented in the changelog
- Supported with migration guides
Current version is reflected in response headers:
```
X-API-Version: alpha-2025-11
```
## Support
For API issues or questions:
- Email: support@adaptyvbio.com
- Documentation updates: https://docs.adaptyvbio.com
- Report bugs with experiment IDs and request details

View File

@@ -0,0 +1,913 @@
# Code Examples
## Setup and Authentication
### Basic Setup
```python
import os
import requests
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Configuration
API_KEY = os.getenv("ADAPTYV_API_KEY")
BASE_URL = "https://kq5jp7qj7wdqklhsxmovkzn4l40obksv.lambda-url.eu-central-1.on.aws"
# Standard headers
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
def check_api_connection():
"""Verify API connection and credentials"""
try:
response = requests.get(f"{BASE_URL}/organization/credits", headers=HEADERS)
response.raise_for_status()
print("✓ API connection successful")
print(f" Credits remaining: {response.json()['balance']}")
return True
except requests.exceptions.HTTPError as e:
print(f"✗ API authentication failed: {e}")
return False
```
### Environment Setup
Create a `.env` file:
```bash
ADAPTYV_API_KEY=your_api_key_here
```
Install dependencies:
```bash
uv pip install requests python-dotenv
```
## Experiment Submission
### Submit Single Sequence
```python
def submit_single_experiment(sequence, experiment_type="binding", target_id=None):
"""
Submit a single protein sequence for testing
Args:
sequence: Amino acid sequence string
experiment_type: Type of experiment (binding, expression, thermostability, enzyme_activity)
target_id: Optional target identifier for binding assays
Returns:
Experiment ID and status
"""
# Format as FASTA
fasta_content = f">protein_sequence\n{sequence}\n"
payload = {
"sequences": fasta_content,
"experiment_type": experiment_type
}
if target_id:
payload["target_id"] = target_id
response = requests.post(
f"{BASE_URL}/experiments",
headers=HEADERS,
json=payload
)
response.raise_for_status()
result = response.json()
print(f"✓ Experiment submitted")
print(f" Experiment ID: {result['experiment_id']}")
print(f" Status: {result['status']}")
print(f" Estimated completion: {result['estimated_completion']}")
return result
# Example usage
sequence = "MKVLWAALLGLLGAAAAFPAVTSAVKPYKAAVSAAVSKPYKAAVSAAVSKPYK"
experiment = submit_single_experiment(sequence, experiment_type="expression")
```
### Submit Multiple Sequences (Batch)
```python
def submit_batch_experiment(sequences_dict, experiment_type="binding", metadata=None):
"""
Submit multiple protein sequences in a single batch
Args:
sequences_dict: Dictionary of {name: sequence}
experiment_type: Type of experiment
metadata: Optional dictionary of additional information
Returns:
Experiment details
"""
# Format all sequences as FASTA
fasta_content = ""
for name, sequence in sequences_dict.items():
fasta_content += f">{name}\n{sequence}\n"
payload = {
"sequences": fasta_content,
"experiment_type": experiment_type
}
if metadata:
payload["metadata"] = metadata
response = requests.post(
f"{BASE_URL}/experiments",
headers=HEADERS,
json=payload
)
response.raise_for_status()
result = response.json()
print(f"✓ Batch experiment submitted")
print(f" Experiment ID: {result['experiment_id']}")
print(f" Sequences: {len(sequences_dict)}")
print(f" Status: {result['status']}")
return result
# Example usage
sequences = {
"variant_1": "MKVLWAALLGLLGAAA...",
"variant_2": "MKVLSAALLGLLGAAA...",
"variant_3": "MKVLAAALLGLLGAAA...",
"wildtype": "MKVLWAALLGLLGAAA..."
}
metadata = {
"project": "antibody_optimization",
"round": 3,
"notes": "Testing solubility-optimized variants"
}
experiment = submit_batch_experiment(sequences, "expression", metadata)
```
### Submit with Webhook Notification
```python
def submit_with_webhook(sequences_dict, experiment_type, webhook_url):
"""
Submit experiment with webhook for completion notification
Args:
sequences_dict: Dictionary of {name: sequence}
experiment_type: Type of experiment
webhook_url: URL to receive notification when complete
"""
fasta_content = ""
for name, sequence in sequences_dict.items():
fasta_content += f">{name}\n{sequence}\n"
payload = {
"sequences": fasta_content,
"experiment_type": experiment_type,
"webhook_url": webhook_url
}
response = requests.post(
f"{BASE_URL}/experiments",
headers=HEADERS,
json=payload
)
response.raise_for_status()
result = response.json()
print(f"✓ Experiment submitted with webhook")
print(f" Experiment ID: {result['experiment_id']}")
print(f" Webhook: {webhook_url}")
return result
# Example
webhook_url = "https://your-server.com/adaptyv-webhook"
experiment = submit_with_webhook(sequences, "binding", webhook_url)
```
## Tracking Experiments
### Check Experiment Status
```python
def check_experiment_status(experiment_id):
"""
Get current status of an experiment
Args:
experiment_id: Experiment identifier
Returns:
Status information
"""
response = requests.get(
f"{BASE_URL}/experiments/{experiment_id}",
headers=HEADERS
)
response.raise_for_status()
status = response.json()
print(f"Experiment: {experiment_id}")
print(f" Status: {status['status']}")
print(f" Created: {status['created_at']}")
print(f" Updated: {status['updated_at']}")
if 'progress' in status:
print(f" Progress: {status['progress']['percentage']}%")
print(f" Current stage: {status['progress']['stage']}")
return status
# Example
status = check_experiment_status("exp_abc123xyz")
```
### List All Experiments
```python
def list_experiments(status_filter=None, limit=50):
"""
List experiments with optional status filtering
Args:
status_filter: Filter by status (submitted, processing, completed, failed)
limit: Maximum number of results
Returns:
List of experiments
"""
params = {"limit": limit}
if status_filter:
params["status"] = status_filter
response = requests.get(
f"{BASE_URL}/experiments",
headers=HEADERS,
params=params
)
response.raise_for_status()
result = response.json()
print(f"Found {result['total']} experiments")
for exp in result['experiments']:
print(f" {exp['experiment_id']}: {exp['status']} ({exp['experiment_type']})")
return result['experiments']
# Example - list all completed experiments
completed_experiments = list_experiments(status_filter="completed")
```
### Poll Until Complete
```python
import time
def wait_for_completion(experiment_id, check_interval=3600):
"""
Poll experiment status until completion
Args:
experiment_id: Experiment identifier
check_interval: Seconds between status checks (default: 1 hour)
Returns:
Final status
"""
print(f"Monitoring experiment {experiment_id}...")
while True:
status = check_experiment_status(experiment_id)
if status['status'] == 'completed':
print("✓ Experiment completed!")
return status
elif status['status'] == 'failed':
print("✗ Experiment failed")
return status
print(f" Status: {status['status']} - checking again in {check_interval}s")
time.sleep(check_interval)
# Example (not recommended - use webhooks instead!)
# status = wait_for_completion("exp_abc123xyz", check_interval=3600)
```
## Retrieving Results
### Download Experiment Results
```python
import json
def download_results(experiment_id, output_dir="results"):
"""
Download and parse experiment results
Args:
experiment_id: Experiment identifier
output_dir: Directory to save results
Returns:
Parsed results data
"""
# Get results
response = requests.get(
f"{BASE_URL}/experiments/{experiment_id}/results",
headers=HEADERS
)
response.raise_for_status()
results = response.json()
# Save results JSON
os.makedirs(output_dir, exist_ok=True)
output_file = f"{output_dir}/{experiment_id}_results.json"
with open(output_file, 'w') as f:
json.dump(results, f, indent=2)
print(f"✓ Results downloaded: {output_file}")
print(f" Sequences tested: {len(results['results'])}")
# Download raw data if available
if 'download_urls' in results:
for data_type, url in results['download_urls'].items():
print(f" {data_type} available at: {url}")
return results
# Example
results = download_results("exp_abc123xyz")
```
### Parse Binding Results
```python
import pandas as pd
def parse_binding_results(results):
"""
Parse binding assay results into DataFrame
Args:
results: Results dictionary from API
Returns:
pandas DataFrame with organized results
"""
data = []
for result in results['results']:
row = {
'sequence_id': result['sequence_id'],
'kd': result['measurements']['kd'],
'kd_error': result['measurements']['kd_error'],
'kon': result['measurements']['kon'],
'koff': result['measurements']['koff'],
'confidence': result['quality_metrics']['confidence'],
'r_squared': result['quality_metrics']['r_squared']
}
data.append(row)
df = pd.DataFrame(data)
# Sort by affinity (lower KD = stronger binding)
df = df.sort_values('kd')
print("Top 5 binders:")
print(df.head())
return df
# Example
experiment_id = "exp_abc123xyz"
results = download_results(experiment_id)
binding_df = parse_binding_results(results)
# Export to CSV
binding_df.to_csv(f"{experiment_id}_binding_results.csv", index=False)
```
### Parse Expression Results
```python
def parse_expression_results(results):
"""
Parse expression testing results into DataFrame
Args:
results: Results dictionary from API
Returns:
pandas DataFrame with organized results
"""
data = []
for result in results['results']:
row = {
'sequence_id': result['sequence_id'],
'yield_mg_per_l': result['measurements']['total_yield_mg_per_l'],
'soluble_fraction': result['measurements']['soluble_fraction_percent'],
'purity': result['measurements']['purity_percent'],
'percentile': result['ranking']['percentile']
}
data.append(row)
df = pd.DataFrame(data)
# Sort by yield
df = df.sort_values('yield_mg_per_l', ascending=False)
print(f"Mean yield: {df['yield_mg_per_l'].mean():.2f} mg/L")
print(f"Top performer: {df.iloc[0]['sequence_id']} ({df.iloc[0]['yield_mg_per_l']:.2f} mg/L)")
return df
# Example
results = download_results("exp_expression123")
expression_df = parse_expression_results(results)
```
## Target Catalog
### Search for Targets
```python
def search_targets(query, species=None, category=None):
"""
Search the antigen catalog
Args:
query: Search term (protein name, UniProt ID, etc.)
species: Optional species filter
category: Optional category filter
Returns:
List of matching targets
"""
params = {"search": query}
if species:
params["species"] = species
if category:
params["category"] = category
response = requests.get(
f"{BASE_URL}/targets",
headers=HEADERS,
params=params
)
response.raise_for_status()
targets = response.json()['targets']
print(f"Found {len(targets)} targets matching '{query}':")
for target in targets:
print(f" {target['target_id']}: {target['name']}")
print(f" Species: {target['species']}")
print(f" Availability: {target['availability']}")
print(f" Price: ${target['price_usd']}")
return targets
# Example
targets = search_targets("PD-L1", species="Homo sapiens")
```
### Request Custom Target
```python
def request_custom_target(target_name, uniprot_id=None, species=None, notes=None):
"""
Request a custom antigen not in the standard catalog
Args:
target_name: Name of the target protein
uniprot_id: Optional UniProt identifier
species: Species name
notes: Additional requirements or notes
Returns:
Request confirmation
"""
payload = {
"target_name": target_name,
"species": species
}
if uniprot_id:
payload["uniprot_id"] = uniprot_id
if notes:
payload["notes"] = notes
response = requests.post(
f"{BASE_URL}/targets/request",
headers=HEADERS,
json=payload
)
response.raise_for_status()
result = response.json()
print(f"✓ Custom target request submitted")
print(f" Request ID: {result['request_id']}")
print(f" Status: {result['status']}")
return result
# Example
request = request_custom_target(
target_name="Novel receptor XYZ",
uniprot_id="P12345",
species="Mus musculus",
notes="Need high purity for structural studies"
)
```
## Complete Workflows
### End-to-End Binding Assay
```python
def complete_binding_workflow(sequences_dict, target_id, project_name):
"""
Complete workflow: submit sequences, track, and retrieve binding results
Args:
sequences_dict: Dictionary of {name: sequence}
target_id: Target identifier from catalog
project_name: Project name for metadata
Returns:
DataFrame with binding results
"""
print("=== Starting Binding Assay Workflow ===")
# Step 1: Submit experiment
print("\n1. Submitting experiment...")
metadata = {
"project": project_name,
"target": target_id
}
experiment = submit_batch_experiment(
sequences_dict,
experiment_type="binding",
metadata=metadata
)
experiment_id = experiment['experiment_id']
# Step 2: Save experiment info
print("\n2. Saving experiment details...")
with open(f"{experiment_id}_info.json", 'w') as f:
json.dump(experiment, f, indent=2)
print(f"✓ Experiment {experiment_id} submitted")
print(" Results will be available in ~21 days")
print(" Use webhook or poll status for updates")
# Note: In practice, wait for completion before this step
# print("\n3. Waiting for completion...")
# status = wait_for_completion(experiment_id)
# print("\n4. Downloading results...")
# results = download_results(experiment_id)
# print("\n5. Parsing results...")
# df = parse_binding_results(results)
# return df
return experiment_id
# Example
antibody_variants = {
"variant_1": "EVQLVESGGGLVQPGG...",
"variant_2": "EVQLVESGGGLVQPGS...",
"variant_3": "EVQLVESGGGLVQPGA...",
"wildtype": "EVQLVESGGGLVQPGG..."
}
experiment_id = complete_binding_workflow(
antibody_variants,
target_id="tgt_pdl1_human",
project_name="antibody_affinity_maturation"
)
```
### Optimization + Testing Pipeline
```python
# Combine computational optimization with experimental testing
def optimization_and_testing_pipeline(initial_sequences, experiment_type="expression"):
"""
Complete pipeline: optimize sequences computationally, then submit for testing
Args:
initial_sequences: Dictionary of {name: sequence}
experiment_type: Type of experiment
Returns:
Experiment ID for tracking
"""
print("=== Optimization and Testing Pipeline ===")
# Step 1: Computational optimization
print("\n1. Computational optimization...")
from protein_optimization import complete_optimization_pipeline
optimized = complete_optimization_pipeline(initial_sequences)
print(f"✓ Optimization complete")
print(f" Started with: {len(initial_sequences)} sequences")
print(f" Optimized to: {len(optimized)} sequences")
# Step 2: Select top candidates
print("\n2. Selecting top candidates for testing...")
top_candidates = optimized[:50] # Top 50
sequences_to_test = {
seq_data['name']: seq_data['sequence']
for seq_data in top_candidates
}
# Step 3: Submit for experimental validation
print("\n3. Submitting to Adaptyv...")
metadata = {
"optimization_method": "computational_pipeline",
"initial_library_size": len(initial_sequences),
"computational_scores": [s['combined'] for s in top_candidates]
}
experiment = submit_batch_experiment(
sequences_to_test,
experiment_type=experiment_type,
metadata=metadata
)
print(f"✓ Pipeline complete")
print(f" Experiment ID: {experiment['experiment_id']}")
return experiment['experiment_id']
# Example
initial_library = {
f"variant_{i}": generate_random_sequence()
for i in range(1000)
}
experiment_id = optimization_and_testing_pipeline(
initial_library,
experiment_type="expression"
)
```
### Batch Result Analysis
```python
def analyze_multiple_experiments(experiment_ids):
"""
Download and analyze results from multiple experiments
Args:
experiment_ids: List of experiment identifiers
Returns:
Combined DataFrame with all results
"""
all_results = []
for exp_id in experiment_ids:
print(f"Processing {exp_id}...")
# Download results
results = download_results(exp_id, output_dir=f"results/{exp_id}")
# Parse based on experiment type
exp_type = results.get('experiment_type', 'unknown')
if exp_type == 'binding':
df = parse_binding_results(results)
df['experiment_id'] = exp_id
all_results.append(df)
elif exp_type == 'expression':
df = parse_expression_results(results)
df['experiment_id'] = exp_id
all_results.append(df)
# Combine all results
combined_df = pd.concat(all_results, ignore_index=True)
print(f"\n✓ Analysis complete")
print(f" Total experiments: {len(experiment_ids)}")
print(f" Total sequences: {len(combined_df)}")
return combined_df
# Example
experiment_ids = [
"exp_round1_abc",
"exp_round2_def",
"exp_round3_ghi"
]
all_data = analyze_multiple_experiments(experiment_ids)
all_data.to_csv("combined_results.csv", index=False)
```
## Error Handling
### Robust API Wrapper
```python
import time
from requests.exceptions import RequestException, HTTPError
def api_request_with_retry(method, url, max_retries=3, backoff_factor=2, **kwargs):
"""
Make API request with retry logic and error handling
Args:
method: HTTP method (GET, POST, etc.)
url: Request URL
max_retries: Maximum number of retry attempts
backoff_factor: Exponential backoff multiplier
**kwargs: Additional arguments for requests
Returns:
Response object
Raises:
RequestException: If all retries fail
"""
for attempt in range(max_retries):
try:
response = requests.request(method, url, **kwargs)
response.raise_for_status()
return response
except HTTPError as e:
if e.response.status_code == 429: # Rate limit
wait_time = backoff_factor ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
continue
elif e.response.status_code >= 500: # Server error
if attempt < max_retries - 1:
wait_time = backoff_factor ** attempt
print(f"Server error. Retrying in {wait_time}s...")
time.sleep(wait_time)
continue
else:
raise
else: # Client error (4xx) - don't retry
error_data = e.response.json() if e.response.content else {}
print(f"API Error: {error_data.get('error', {}).get('message', str(e))}")
raise
except RequestException as e:
if attempt < max_retries - 1:
wait_time = backoff_factor ** attempt
print(f"Request failed. Retrying in {wait_time}s...")
time.sleep(wait_time)
continue
else:
raise
raise RequestException(f"Failed after {max_retries} attempts")
# Example usage
response = api_request_with_retry(
"POST",
f"{BASE_URL}/experiments",
headers=HEADERS,
json={"sequences": fasta_content, "experiment_type": "binding"}
)
```
## Utility Functions
### Validate FASTA Format
```python
def validate_fasta(fasta_string):
"""
Validate FASTA format and sequences
Args:
fasta_string: FASTA-formatted string
Returns:
Tuple of (is_valid, error_message)
"""
lines = fasta_string.strip().split('\n')
if not lines:
return False, "Empty FASTA content"
if not lines[0].startswith('>'):
return False, "FASTA must start with header line (>)"
valid_amino_acids = set("ACDEFGHIKLMNPQRSTVWY")
current_header = None
for i, line in enumerate(lines):
if line.startswith('>'):
if not line[1:].strip():
return False, f"Line {i+1}: Empty header"
current_header = line[1:].strip()
else:
if current_header is None:
return False, f"Line {i+1}: Sequence before header"
sequence = line.strip().upper()
invalid = set(sequence) - valid_amino_acids
if invalid:
return False, f"Line {i+1}: Invalid amino acids: {invalid}"
return True, None
# Example
fasta = ">protein1\nMKVLWAALLG\n>protein2\nMATGVLWALG"
is_valid, error = validate_fasta(fasta)
if is_valid:
print("✓ FASTA format valid")
else:
print(f"✗ FASTA validation failed: {error}")
```
### Format Sequences to FASTA
```python
def sequences_to_fasta(sequences_dict):
"""
Convert dictionary of sequences to FASTA format
Args:
sequences_dict: Dictionary of {name: sequence}
Returns:
FASTA-formatted string
"""
fasta_content = ""
for name, sequence in sequences_dict.items():
# Clean sequence (remove whitespace, ensure uppercase)
clean_seq = ''.join(sequence.split()).upper()
# Validate
is_valid, error = validate_fasta(f">{name}\n{clean_seq}")
if not is_valid:
raise ValueError(f"Invalid sequence '{name}': {error}")
fasta_content += f">{name}\n{clean_seq}\n"
return fasta_content
# Example
sequences = {
"var1": "MKVLWAALLG",
"var2": "MATGVLWALG"
}
fasta = sequences_to_fasta(sequences)
print(fasta)
```

View File

@@ -0,0 +1,360 @@
# Experiment Types and Workflows
## Overview
Adaptyv provides multiple experimental assay types for comprehensive protein characterization. Each experiment type has specific applications, workflows, and data outputs.
## Binding Assays
### Description
Measure protein-target interactions using biolayer interferometry (BLI), a label-free technique that monitors biomolecular binding in real-time.
### Use Cases
- Antibody-antigen binding characterization
- Receptor-ligand interaction analysis
- Protein-protein interaction studies
- Affinity maturation screening
- Epitope binning experiments
### Technology: Biolayer Interferometry (BLI)
BLI measures the interference pattern of reflected light from two surfaces:
- **Reference layer** - Biosensor tip surface
- **Biological layer** - Accumulated bound molecules
As molecules bind, the optical thickness increases, causing a wavelength shift proportional to binding.
**Advantages:**
- Label-free detection
- Real-time kinetics
- High-throughput compatible
- Works in crude samples
- Minimal sample consumption
### Measured Parameters
**Kinetic constants:**
- **KD** - Equilibrium dissociation constant (binding affinity)
- **kon** - Association rate constant (binding speed)
- **koff** - Dissociation rate constant (unbinding speed)
**Typical ranges:**
- Strong binders: KD < 1 nM
- Moderate binders: KD = 1-100 nM
- Weak binders: KD > 100 nM
### Workflow
1. **Sequence submission** - Provide protein sequences in FASTA format
2. **Expression** - Proteins expressed in appropriate host system
3. **Purification** - Automated purification protocols
4. **BLI assay** - Real-time binding measurements against specified targets
5. **Analysis** - Kinetic curve fitting and quality assessment
6. **Results delivery** - Binding parameters with confidence metrics
### Sample Requirements
- Protein sequence (standard amino acid codes)
- Target specification (from catalog or custom request)
- Buffer conditions (standard or custom)
- Expected concentration range (optional, improves assay design)
### Results Format
```json
{
"sequence_id": "antibody_variant_1",
"target": "Human PD-L1",
"measurements": {
"kd": 2.5e-9,
"kd_error": 0.3e-9,
"kon": 1.8e5,
"kon_error": 0.2e5,
"koff": 4.5e-4,
"koff_error": 0.5e-4
},
"quality_metrics": {
"confidence": "high|medium|low",
"r_squared": 0.97,
"chi_squared": 0.02,
"flags": []
},
"raw_data_url": "https://..."
}
```
## Expression Testing
### Description
Quantify protein expression levels in various host systems to assess producibility and optimize sequences for manufacturing.
### Use Cases
- Screening variants for high expression
- Optimizing codon usage
- Identifying expression bottlenecks
- Selecting candidates for scale-up
- Comparing expression systems
### Host Systems
Available expression platforms:
- **E. coli** - Rapid, cost-effective, prokaryotic system
- **Mammalian cells** - Native post-translational modifications
- **Yeast** - Eukaryotic system with simpler growth requirements
- **Insect cells** - Alternative eukaryotic platform
### Measured Parameters
- **Total protein yield** (mg/L culture)
- **Soluble fraction** (percentage)
- **Purity** (after initial purification)
- **Expression time course** (optional)
### Workflow
1. **Sequence submission** - Provide protein sequences
2. **Construct generation** - Cloning into expression vectors
3. **Expression** - Culture in specified host system
4. **Quantification** - Protein measurement via multiple methods
5. **Analysis** - Expression level comparison and ranking
6. **Results delivery** - Yield data and recommendations
### Results Format
```json
{
"sequence_id": "variant_1",
"host_system": "E. coli",
"measurements": {
"total_yield_mg_per_l": 25.5,
"soluble_fraction_percent": 78,
"purity_percent": 92
},
"ranking": {
"percentile": 85,
"notes": "High expression, good solubility"
}
}
```
## Thermostability Testing
### Description
Measure protein thermal stability to assess structural integrity, predict shelf-life, and identify stabilizing mutations.
### Use Cases
- Selecting thermally stable variants
- Formulation development
- Shelf-life prediction
- Stability-driven protein engineering
- Quality control screening
### Measurement Techniques
**Differential Scanning Fluorimetry (DSF):**
- Monitors protein unfolding via fluorescent dye binding
- Determines melting temperature (Tm)
- High-throughput capable
**Circular Dichroism (CD):**
- Secondary structure analysis
- Thermal unfolding curves
- Reversibility assessment
### Measured Parameters
- **Tm** - Melting temperature (midpoint of unfolding)
- **ΔH** - Enthalpy of unfolding
- **Aggregation temperature** (Tagg)
- **Reversibility** - Refolding after heating
### Workflow
1. **Sequence submission** - Provide protein sequences
2. **Expression and purification** - Standard protocols
3. **Thermostability assay** - Temperature gradient analysis
4. **Data analysis** - Curve fitting and parameter extraction
5. **Results delivery** - Stability metrics with ranking
### Results Format
```json
{
"sequence_id": "variant_1",
"measurements": {
"tm_celsius": 68.5,
"tm_error": 0.5,
"tagg_celsius": 72.0,
"reversibility_percent": 85
},
"quality_metrics": {
"curve_quality": "excellent",
"cooperativity": "two-state"
}
}
```
## Enzyme Activity Assays
### Description
Measure enzymatic function including substrate turnover, catalytic efficiency, and inhibitor sensitivity.
### Use Cases
- Screening enzyme variants for improved activity
- Substrate specificity profiling
- Inhibitor testing
- pH and temperature optimization
- Mechanistic studies
### Assay Types
**Continuous assays:**
- Chromogenic substrates
- Fluorogenic substrates
- Real-time monitoring
**Endpoint assays:**
- HPLC quantification
- Mass spectrometry
- Colorimetric detection
### Measured Parameters
**Kinetic parameters:**
- **kcat** - Turnover number (catalytic rate constant)
- **KM** - Michaelis constant (substrate affinity)
- **kcat/KM** - Catalytic efficiency
- **IC50** - Inhibitor concentration for 50% inhibition
**Activity metrics:**
- Specific activity (units/mg protein)
- Relative activity vs. reference
- Substrate specificity profile
### Workflow
1. **Sequence submission** - Provide enzyme sequences
2. **Expression and purification** - Optimized for activity retention
3. **Activity assay** - Substrate turnover measurements
4. **Kinetic analysis** - Michaelis-Menten fitting
5. **Results delivery** - Kinetic parameters and rankings
### Results Format
```json
{
"sequence_id": "enzyme_variant_1",
"substrate": "substrate_name",
"measurements": {
"kcat_per_second": 125,
"km_micromolar": 45,
"kcat_km": 2.8,
"specific_activity": 180
},
"quality_metrics": {
"confidence": "high",
"r_squared": 0.99
},
"ranking": {
"relative_activity": 1.8,
"improvement_vs_wildtype": "80%"
}
}
```
## Experiment Design Best Practices
### Sequence Submission
1. **Use clear identifiers** - Name sequences descriptively
2. **Include controls** - Submit wild-type or reference sequences
3. **Batch similar variants** - Group related sequences in single submission
4. **Validate sequences** - Check for errors before submission
### Sample Size
- **Pilot studies** - 5-10 sequences to test feasibility
- **Library screening** - 50-500 sequences for variant exploration
- **Focused optimization** - 10-50 sequences for fine-tuning
- **Large-scale campaigns** - 500+ sequences for ML-driven design
### Quality Control
Adaptyv includes automated QC steps:
- Expression verification before assay
- Replicate measurements for reliability
- Positive/negative controls in each batch
- Statistical validation of results
### Timeline Expectations
**Standard turnaround:** ~21 days from submission to results
**Timeline breakdown:**
- Construct generation: 3-5 days
- Expression: 5-7 days
- Purification: 2-3 days
- Assay execution: 3-5 days
- Analysis and QC: 2-3 days
**Factors affecting timeline:**
- Custom targets (add 1-2 weeks)
- Novel assay development (add 2-4 weeks)
- Large batch sizes (may add 1 week)
### Cost Optimization
1. **Batch submissions** - Lower per-sequence cost
2. **Standard targets** - Catalog antigens are faster/cheaper
3. **Standard conditions** - Custom buffers add cost
4. **Computational pre-filtering** - Submit only promising candidates
## Combining Experiment Types
For comprehensive protein characterization, combine multiple assays:
**Therapeutic antibody development:**
1. Binding assay → Identify high-affinity binders
2. Expression testing → Select manufacturable candidates
3. Thermostability → Ensure formulation stability
**Enzyme engineering:**
1. Activity assay → Screen for improved catalysis
2. Expression testing → Ensure producibility
3. Thermostability → Validate industrial robustness
**Sequential vs. Parallel:**
- **Sequential** - Use results from early assays to filter candidates
- **Parallel** - Run all assays simultaneously for faster results
## Data Integration
Results integrate with computational workflows:
1. **Download raw data** via API
2. **Parse results** into standardized format
3. **Feed into ML models** for next-round design
4. **Track experiments** with metadata tags
5. **Visualize trends** across design iterations
## Support and Troubleshooting
**Common issues:**
- Low expression → Consider sequence optimization (see protein_optimization.md)
- Poor binding → Verify target specification and expected range
- Variable results → Check sequence quality and controls
- Incomplete data → Contact support with experiment ID
**Getting help:**
- Email: support@adaptyvbio.com
- Include experiment ID and specific question
- Provide context (design goals, expected results)
- Response time: <24 hours for active experiments

View File

@@ -0,0 +1,637 @@
# Protein Sequence Optimization
## Overview
Before submitting protein sequences for experimental testing, use computational tools to optimize sequences for improved expression, solubility, and stability. This pre-screening reduces experimental costs and increases success rates.
## Common Protein Expression Problems
### 1. Unpaired Cysteines
**Problem:**
- Unpaired cysteines form unwanted disulfide bonds
- Leads to aggregation and misfolding
- Reduces expression yield and stability
**Solution:**
- Remove unpaired cysteines unless functionally necessary
- Pair cysteines appropriately for structural disulfides
- Replace with serine or alanine in non-critical positions
**Example:**
```python
# Check for cysteine pairs
from Bio.Seq import Seq
def check_cysteines(sequence):
cys_count = sequence.count('C')
if cys_count % 2 != 0:
print(f"Warning: Odd number of cysteines ({cys_count})")
return cys_count
```
### 2. Excessive Hydrophobicity
**Problem:**
- Long hydrophobic patches promote aggregation
- Exposed hydrophobic residues drive protein clumping
- Poor solubility in aqueous buffers
**Solution:**
- Maintain balanced hydropathy profiles
- Use short, flexible linkers between domains
- Reduce surface-exposed hydrophobic residues
**Metrics:**
- Kyte-Doolittle hydropathy plots
- GRAVY score (Grand Average of Hydropathy)
- pSAE (percent Solvent-Accessible hydrophobic residues)
### 3. Low Solubility
**Problem:**
- Proteins precipitate during expression or purification
- Inclusion body formation
- Difficult downstream processing
**Solution:**
- Use solubility prediction tools for pre-screening
- Apply sequence optimization algorithms
- Add solubilizing tags if needed
## Computational Tools for Optimization
### NetSolP - Initial Solubility Screening
**Purpose:** Fast solubility prediction for filtering sequences.
**Method:** Machine learning model trained on E. coli expression data.
**Usage:**
```python
# Install: uv pip install requests
import requests
def predict_solubility_netsolp(sequence):
"""Predict protein solubility using NetSolP web service"""
url = "https://services.healthtech.dtu.dk/services/NetSolP-1.0/api/predict"
data = {
"sequence": sequence,
"format": "fasta"
}
response = requests.post(url, data=data)
return response.json()
# Example
sequence = "MKVLWAALLGLLGAAA..."
result = predict_solubility_netsolp(sequence)
print(f"Solubility score: {result['score']}")
```
**Interpretation:**
- Score > 0.5: Likely soluble
- Score < 0.5: Likely insoluble
- Use for initial filtering before more expensive predictions
**When to use:**
- First-pass filtering of large libraries
- Quick validation of designed sequences
- Prioritizing sequences for experimental testing
### SoluProt - Comprehensive Solubility Prediction
**Purpose:** Advanced solubility prediction with higher accuracy.
**Method:** Deep learning model incorporating sequence and structural features.
**Usage:**
```python
# Install: uv pip install soluprot
from soluprot import predict_solubility
def screen_variants_soluprot(sequences):
"""Screen multiple sequences for solubility"""
results = []
for name, seq in sequences.items():
score = predict_solubility(seq)
results.append({
'name': name,
'sequence': seq,
'solubility_score': score,
'predicted_soluble': score > 0.6
})
return results
# Example
sequences = {
'variant_1': 'MKVLW...',
'variant_2': 'MATGV...'
}
results = screen_variants_soluprot(sequences)
soluble_variants = [r for r in results if r['predicted_soluble']]
```
**Interpretation:**
- Score > 0.6: High solubility confidence
- Score 0.4-0.6: Uncertain, may need optimization
- Score < 0.4: Likely problematic
**When to use:**
- After initial NetSolP filtering
- When higher prediction accuracy is needed
- Before committing to expensive synthesis/testing
### SolubleMPNN - Sequence Redesign
**Purpose:** Redesign protein sequences to improve solubility while maintaining function.
**Method:** Graph neural network that suggests mutations to increase solubility.
**Usage:**
```python
# Install: uv pip install soluble-mpnn
from soluble_mpnn import optimize_sequence
def optimize_for_solubility(sequence, structure_pdb=None):
"""
Redesign sequence for improved solubility
Args:
sequence: Original amino acid sequence
structure_pdb: Optional PDB file for structure-aware design
Returns:
Optimized sequence variants ranked by predicted solubility
"""
variants = optimize_sequence(
sequence=sequence,
structure=structure_pdb,
num_variants=10,
temperature=0.1 # Lower = more conservative mutations
)
return variants
# Example
original_seq = "MKVLWAALLGLLGAAA..."
optimized_variants = optimize_for_solubility(original_seq)
for i, variant in enumerate(optimized_variants):
print(f"Variant {i+1}:")
print(f" Sequence: {variant['sequence']}")
print(f" Solubility score: {variant['solubility_score']}")
print(f" Mutations: {variant['mutations']}")
```
**Design strategy:**
- **Conservative** (temperature=0.1): Minimal changes, safer
- **Moderate** (temperature=0.3): Balance between change and safety
- **Aggressive** (temperature=0.5): More mutations, higher risk
**When to use:**
- Primary tool for sequence optimization
- Default starting point for improving problematic sequences
- Generating diverse soluble variants
**Best practices:**
- Generate 10-50 variants per sequence
- Use structure information when available (improves accuracy)
- Validate key functional residues are preserved
- Test multiple temperature settings
### ESM (Evolutionary Scale Modeling) - Sequence Likelihood
**Purpose:** Assess how "natural" a protein sequence appears based on evolutionary patterns.
**Method:** Protein language model trained on millions of natural sequences.
**Usage:**
```python
# Install: uv pip install fair-esm
import torch
from esm import pretrained
def score_sequence_esm(sequence):
"""
Calculate ESM likelihood score for sequence
Higher scores indicate more natural/stable sequences
"""
model, alphabet = pretrained.esm2_t33_650M_UR50D()
batch_converter = alphabet.get_batch_converter()
data = [("protein", sequence)]
_, _, batch_tokens = batch_converter(data)
with torch.no_grad():
results = model(batch_tokens, repr_layers=[33])
token_logprobs = results["logits"].log_softmax(dim=-1)
# Calculate perplexity as sequence quality metric
sequence_score = token_logprobs.mean().item()
return sequence_score
# Example - Compare variants
sequences = {
'original': 'MKVLW...',
'optimized_1': 'MKVLS...',
'optimized_2': 'MKVLA...'
}
for name, seq in sequences.items():
score = score_sequence_esm(seq)
print(f"{name}: ESM score = {score:.3f}")
```
**Interpretation:**
- Higher scores → More "natural" sequence
- Use to avoid unlikely mutations
- Balance with functional requirements
**When to use:**
- Filtering synthetic designs
- Comparing SolubleMPNN variants
- Ensuring sequences aren't too artificial
- Avoiding expression bottlenecks
**Integration with design:**
```python
def rank_variants_by_esm(variants):
"""Rank protein variants by ESM likelihood"""
scored = []
for v in variants:
esm_score = score_sequence_esm(v['sequence'])
v['esm_score'] = esm_score
scored.append(v)
# Sort by combined solubility and ESM score
scored.sort(
key=lambda x: x['solubility_score'] * x['esm_score'],
reverse=True
)
return scored
```
### ipTM - Interface Stability (AlphaFold-Multimer)
**Purpose:** Assess protein-protein interface stability and binding confidence.
**Method:** Interface predicted TM-score from AlphaFold-Multimer predictions.
**Usage:**
```python
# Requires AlphaFold-Multimer installation
# Or use ColabFold for easier access
def predict_interface_stability(protein_a_seq, protein_b_seq):
"""
Predict interface stability using AlphaFold-Multimer
Returns ipTM score: higher = more stable interface
"""
from colabfold import run_alphafold_multimer
sequences = {
'chainA': protein_a_seq,
'chainB': protein_b_seq
}
result = run_alphafold_multimer(sequences)
return {
'ipTM': result['iptm'],
'pTM': result['ptm'],
'pLDDT': result['plddt']
}
# Example for antibody-antigen binding
antibody_seq = "EVQLVESGGGLVQPGG..."
antigen_seq = "MKVLWAALLGLLGAAA..."
stability = predict_interface_stability(antibody_seq, antigen_seq)
print(f"Interface pTM: {stability['ipTM']:.3f}")
# Interpretation
if stability['ipTM'] > 0.7:
print("High confidence interface")
elif stability['ipTM'] > 0.5:
print("Moderate confidence interface")
else:
print("Low confidence interface - may need redesign")
```
**Interpretation:**
- ipTM > 0.7: Strong predicted interface
- ipTM 0.5-0.7: Moderate interface confidence
- ipTM < 0.5: Weak interface, consider redesign
**When to use:**
- Antibody-antigen design
- Protein-protein interaction engineering
- Validating binding interfaces
- Comparing interface variants
### pSAE - Solvent-Accessible Hydrophobic Residues
**Purpose:** Quantify exposed hydrophobic residues that promote aggregation.
**Method:** Calculates percentage of solvent-accessible surface area (SASA) occupied by hydrophobic residues.
**Usage:**
```python
# Requires structure (PDB file or AlphaFold prediction)
# Install: uv pip install biopython
from Bio.PDB import PDBParser, DSSP
import numpy as np
def calculate_psae(pdb_file):
"""
Calculate percent Solvent-Accessible hydrophobic residues (pSAE)
Lower pSAE = better solubility
"""
parser = PDBParser(QUIET=True)
structure = parser.get_structure('protein', pdb_file)
# Run DSSP to get solvent accessibility
model = structure[0]
dssp = DSSP(model, pdb_file, acc_array='Wilke')
hydrophobic = ['ALA', 'VAL', 'ILE', 'LEU', 'MET', 'PHE', 'TRP', 'PRO']
total_sasa = 0
hydrophobic_sasa = 0
for residue in dssp:
res_name = residue[1]
rel_accessibility = residue[3]
total_sasa += rel_accessibility
if res_name in hydrophobic:
hydrophobic_sasa += rel_accessibility
psae = (hydrophobic_sasa / total_sasa) * 100
return psae
# Example
pdb_file = "protein_structure.pdb"
psae_score = calculate_psae(pdb_file)
print(f"pSAE: {psae_score:.2f}%")
# Interpretation
if psae_score < 25:
print("Good solubility expected")
elif psae_score < 35:
print("Moderate solubility")
else:
print("High aggregation risk")
```
**Interpretation:**
- pSAE < 25%: Low aggregation risk
- pSAE 25-35%: Moderate risk
- pSAE > 35%: High aggregation risk
**When to use:**
- Analyzing designed structures
- Post-AlphaFold validation
- Identifying aggregation hotspots
- Guiding surface mutations
## Recommended Optimization Workflow
### Step 1: Initial Screening (Fast)
```python
def initial_screening(sequences):
"""
Quick first-pass filtering using NetSolP
Filters out obviously problematic sequences
"""
passed = []
for name, seq in sequences.items():
netsolp_score = predict_solubility_netsolp(seq)
if netsolp_score > 0.5:
passed.append((name, seq))
return passed
```
### Step 2: Detailed Assessment (Moderate)
```python
def detailed_assessment(filtered_sequences):
"""
More thorough analysis with SoluProt and ESM
Ranks sequences by multiple criteria
"""
results = []
for name, seq in filtered_sequences:
soluprot_score = predict_solubility(seq)
esm_score = score_sequence_esm(seq)
combined_score = soluprot_score * 0.7 + esm_score * 0.3
results.append({
'name': name,
'sequence': seq,
'soluprot': soluprot_score,
'esm': esm_score,
'combined': combined_score
})
results.sort(key=lambda x: x['combined'], reverse=True)
return results
```
### Step 3: Sequence Optimization (If needed)
```python
def optimize_problematic_sequences(sequences_needing_optimization):
"""
Use SolubleMPNN to redesign problematic sequences
Returns improved variants
"""
optimized = []
for name, seq in sequences_needing_optimization:
# Generate multiple variants
variants = optimize_sequence(
sequence=seq,
num_variants=10,
temperature=0.2
)
# Score variants with ESM
for variant in variants:
variant['esm_score'] = score_sequence_esm(variant['sequence'])
# Keep best variants
variants.sort(
key=lambda x: x['solubility_score'] * x['esm_score'],
reverse=True
)
optimized.extend(variants[:3]) # Top 3 variants per sequence
return optimized
```
### Step 4: Structure-Based Validation (For critical sequences)
```python
def structure_validation(top_candidates):
"""
Predict structures and calculate pSAE for top candidates
Final validation before experimental testing
"""
validated = []
for candidate in top_candidates:
# Predict structure with AlphaFold
structure_pdb = predict_structure_alphafold(candidate['sequence'])
# Calculate pSAE
psae = calculate_psae(structure_pdb)
candidate['psae'] = psae
candidate['pass_structure_check'] = psae < 30
validated.append(candidate)
return validated
```
### Complete Workflow Example
```python
def complete_optimization_pipeline(initial_sequences):
"""
End-to-end optimization pipeline
Input: Dictionary of {name: sequence}
Output: Ranked list of optimized, validated sequences
"""
print("Step 1: Initial screening with NetSolP...")
filtered = initial_screening(initial_sequences)
print(f" Passed: {len(filtered)}/{len(initial_sequences)}")
print("Step 2: Detailed assessment with SoluProt and ESM...")
assessed = detailed_assessment(filtered)
# Split into good and needs-optimization
good_sequences = [s for s in assessed if s['soluprot'] > 0.6]
needs_optimization = [s for s in assessed if s['soluprot'] <= 0.6]
print(f" Good sequences: {len(good_sequences)}")
print(f" Need optimization: {len(needs_optimization)}")
if needs_optimization:
print("Step 3: Optimizing problematic sequences with SolubleMPNN...")
optimized = optimize_problematic_sequences(needs_optimization)
all_sequences = good_sequences + optimized
else:
all_sequences = good_sequences
print("Step 4: Structure-based validation for top candidates...")
top_20 = all_sequences[:20]
final_validated = structure_validation(top_20)
# Final ranking
final_validated.sort(
key=lambda x: (
x['pass_structure_check'],
x['combined'],
-x['psae']
),
reverse=True
)
return final_validated
# Usage
initial_library = {
'variant_1': 'MKVLWAALLGLLGAAA...',
'variant_2': 'MATGVLWAALLGLLGA...',
# ... more sequences
}
optimized_library = complete_optimization_pipeline(initial_library)
# Submit top sequences to Adaptyv
top_sequences_for_testing = optimized_library[:50]
```
## Best Practices Summary
1. **Always pre-screen** before experimental testing
2. **Use NetSolP first** for fast filtering of large libraries
3. **Apply SolubleMPNN** as default optimization tool
4. **Validate with ESM** to avoid unnatural sequences
5. **Calculate pSAE** for structure-based validation
6. **Test multiple variants** per design to account for prediction uncertainty
7. **Keep controls** - include wild-type or known-good sequences
8. **Iterate** - use experimental results to refine predictions
## Integration with Adaptyv
After computational optimization, submit sequences to Adaptyv:
```python
# After optimization pipeline
optimized_sequences = complete_optimization_pipeline(initial_library)
# Prepare FASTA format
fasta_content = ""
for seq_data in optimized_sequences[:50]: # Top 50
fasta_content += f">{seq_data['name']}\n{seq_data['sequence']}\n"
# Submit to Adaptyv
import requests
response = requests.post(
"https://kq5jp7qj7wdqklhsxmovkzn4l40obksv.lambda-url.eu-central-1.on.aws/experiments",
headers={"Authorization": f"Bearer {api_key}"},
json={
"sequences": fasta_content,
"experiment_type": "expression",
"metadata": {
"optimization_method": "SolubleMPNN_ESM_pipeline",
"computational_scores": [s['combined'] for s in optimized_sequences[:50]]
}
}
)
```
## Troubleshooting
**Issue: All sequences score poorly on solubility predictions**
- Check if sequences contain unusual amino acids
- Verify FASTA format is correct
- Consider if protein family is naturally low-solubility
- May need experimental validation despite predictions
**Issue: SolubleMPNN changes functionally important residues**
- Provide structure file to preserve spatial constraints
- Mask critical residues from mutation
- Lower temperature parameter for conservative changes
- Manually revert problematic mutations
**Issue: ESM scores are low after optimization**
- Optimization may be too aggressive
- Try lower temperature in SolubleMPNN
- Balance between solubility and naturalness
- Consider that some optimization may require non-natural mutations
**Issue: Predictions don't match experimental results**
- Predictions are probabilistic, not deterministic
- Host system and conditions affect expression
- Some proteins may need experimental validation
- Use predictions as enrichment, not absolute filters