Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/cobrapy/SKILL.md
+++ b/skills/cobrapy/SKILL.md
@@ -0,0 +1,457 @@
+---
+name: cobrapy
+description: "Constraint-based metabolic modeling (COBRA). FBA, FVA, gene knockouts, flux sampling, SBML models, for systems biology and metabolic engineering analysis."
+---
+
+# COBRApy - Constraint-Based Reconstruction and Analysis
+
+## Overview
+
+COBRApy is a Python library for constraint-based reconstruction and analysis (COBRA) of metabolic models, essential for systems biology research. Work with genome-scale metabolic models, perform computational simulations of cellular metabolism, conduct metabolic engineering analyses, and predict phenotypic behaviors.
+
+## Core Capabilities
+
+COBRApy provides comprehensive tools organized into several key areas:
+
+### 1. Model Management
+
+Load existing models from repositories or files:
+```python
+from cobra.io import load_model
+
+# Load bundled test models
+model = load_model("textbook")  # E. coli core model
+model = load_model("ecoli")     # Full E. coli model
+model = load_model("salmonella")
+
+# Load from files
+from cobra.io import read_sbml_model, load_json_model, load_yaml_model
+model = read_sbml_model("path/to/model.xml")
+model = load_json_model("path/to/model.json")
+model = load_yaml_model("path/to/model.yml")
+```
+
+Save models in various formats:
+```python
+from cobra.io import write_sbml_model, save_json_model, save_yaml_model
+write_sbml_model(model, "output.xml")  # Preferred format
+save_json_model(model, "output.json")  # For Escher compatibility
+save_yaml_model(model, "output.yml")   # Human-readable
+```
+
+### 2. Model Structure and Components
+
+Access and inspect model components:
+```python
+# Access components
+model.reactions      # DictList of all reactions
+model.metabolites    # DictList of all metabolites
+model.genes          # DictList of all genes
+
+# Get specific items by ID or index
+reaction = model.reactions.get_by_id("PFK")
+metabolite = model.metabolites[0]
+
+# Inspect properties
+print(reaction.reaction)        # Stoichiometric equation
+print(reaction.bounds)          # Flux constraints
+print(reaction.gene_reaction_rule)  # GPR logic
+print(metabolite.formula)       # Chemical formula
+print(metabolite.compartment)   # Cellular location
+```
+
+### 3. Flux Balance Analysis (FBA)
+
+Perform standard FBA simulation:
+```python
+# Basic optimization
+solution = model.optimize()
+print(f"Objective value: {solution.objective_value}")
+print(f"Status: {solution.status}")
+
+# Access fluxes
+print(solution.fluxes["PFK"])
+print(solution.fluxes.head())
+
+# Fast optimization (objective value only)
+objective_value = model.slim_optimize()
+
+# Change objective
+model.objective = "ATPM"
+solution = model.optimize()
+```
+
+Parsimonious FBA (minimize total flux):
+```python
+from cobra.flux_analysis import pfba
+solution = pfba(model)
+```
+
+Geometric FBA (find central solution):
+```python
+from cobra.flux_analysis import geometric_fba
+solution = geometric_fba(model)
+```
+
+### 4. Flux Variability Analysis (FVA)
+
+Determine flux ranges for all reactions:
+```python
+from cobra.flux_analysis import flux_variability_analysis
+
+# Standard FVA
+fva_result = flux_variability_analysis(model)
+
+# FVA at 90% optimality
+fva_result = flux_variability_analysis(model, fraction_of_optimum=0.9)
+
+# Loopless FVA (eliminates thermodynamically infeasible loops)
+fva_result = flux_variability_analysis(model, loopless=True)
+
+# FVA for specific reactions
+fva_result = flux_variability_analysis(
+    model,
+    reaction_list=["PFK", "FBA", "PGI"]
+)
+```
+
+### 5. Gene and Reaction Deletion Studies
+
+Perform knockout analyses:
+```python
+from cobra.flux_analysis import (
+    single_gene_deletion,
+    single_reaction_deletion,
+    double_gene_deletion,
+    double_reaction_deletion
+)
+
+# Single deletions
+gene_results = single_gene_deletion(model)
+reaction_results = single_reaction_deletion(model)
+
+# Double deletions (uses multiprocessing)
+double_gene_results = double_gene_deletion(
+    model,
+    processes=4  # Number of CPU cores
+)
+
+# Manual knockout using context manager
+with model:
+    model.genes.get_by_id("b0008").knock_out()
+    solution = model.optimize()
+    print(f"Growth after knockout: {solution.objective_value}")
+# Model automatically reverts after context exit
+```
+
+### 6. Growth Media and Minimal Media
+
+Manage growth medium:
+```python
+# View current medium
+print(model.medium)
+
+# Modify medium (must reassign entire dict)
+medium = model.medium
+medium["EX_glc__D_e"] = 10.0  # Set glucose uptake
+medium["EX_o2_e"] = 0.0       # Anaerobic conditions
+model.medium = medium
+
+# Calculate minimal media
+from cobra.medium import minimal_medium
+
+# Minimize total import flux
+min_medium = minimal_medium(model, minimize_components=False)
+
+# Minimize number of components (uses MILP, slower)
+min_medium = minimal_medium(
+    model,
+    minimize_components=True,
+    open_exchanges=True
+)
+```
+
+### 7. Flux Sampling
+
+Sample the feasible flux space:
+```python
+from cobra.sampling import sample
+
+# Sample using OptGP (default, supports parallel processing)
+samples = sample(model, n=1000, method="optgp", processes=4)
+
+# Sample using ACHR
+samples = sample(model, n=1000, method="achr")
+
+# Validate samples
+from cobra.sampling import OptGPSampler
+sampler = OptGPSampler(model, processes=4)
+sampler.sample(1000)
+validation = sampler.validate(sampler.samples)
+print(validation.value_counts())  # Should be all 'v' for valid
+```
+
+### 8. Production Envelopes
+
+Calculate phenotype phase planes:
+```python
+from cobra.flux_analysis import production_envelope
+
+# Standard production envelope
+envelope = production_envelope(
+    model,
+    reactions=["EX_glc__D_e", "EX_o2_e"],
+    objective="EX_ac_e"  # Acetate production
+)
+
+# With carbon yield
+envelope = production_envelope(
+    model,
+    reactions=["EX_glc__D_e", "EX_o2_e"],
+    carbon_sources="EX_glc__D_e"
+)
+
+# Visualize (use matplotlib or pandas plotting)
+import matplotlib.pyplot as plt
+envelope.plot(x="EX_glc__D_e", y="EX_o2_e", kind="scatter")
+plt.show()
+```
+
+### 9. Gapfilling
+
+Add reactions to make models feasible:
+```python
+from cobra.flux_analysis import gapfill
+
+# Prepare universal model with candidate reactions
+universal = load_model("universal")
+
+# Perform gapfilling
+with model:
+    # Remove reactions to create gaps for demonstration
+    model.remove_reactions([model.reactions.PGI])
+
+    # Find reactions needed
+    solution = gapfill(model, universal)
+    print(f"Reactions to add: {solution}")
+```
+
+### 10. Model Building
+
+Build models from scratch:
+```python
+from cobra import Model, Reaction, Metabolite
+
+# Create model
+model = Model("my_model")
+
+# Create metabolites
+atp_c = Metabolite("atp_c", formula="C10H12N5O13P3",
+                   name="ATP", compartment="c")
+adp_c = Metabolite("adp_c", formula="C10H12N5O10P2",
+                   name="ADP", compartment="c")
+pi_c = Metabolite("pi_c", formula="HO4P",
+                  name="Phosphate", compartment="c")
+
+# Create reaction
+reaction = Reaction("ATPASE")
+reaction.name = "ATP hydrolysis"
+reaction.subsystem = "Energy"
+reaction.lower_bound = 0.0
+reaction.upper_bound = 1000.0
+
+# Add metabolites with stoichiometry
+reaction.add_metabolites({
+    atp_c: -1.0,
+    adp_c: 1.0,
+    pi_c: 1.0
+})
+
+# Add gene-reaction rule
+reaction.gene_reaction_rule = "(gene1 and gene2) or gene3"
+
+# Add to model
+model.add_reactions([reaction])
+
+# Add boundary reactions
+model.add_boundary(atp_c, type="exchange")
+model.add_boundary(adp_c, type="demand")
+
+# Set objective
+model.objective = "ATPASE"
+```
+
+## Common Workflows
+
+### Workflow 1: Load Model and Predict Growth
+
+```python
+from cobra.io import load_model
+
+# Load model
+model = load_model("ecoli")
+
+# Run FBA
+solution = model.optimize()
+print(f"Growth rate: {solution.objective_value:.3f} /h")
+
+# Show active pathways
+print(solution.fluxes[solution.fluxes.abs() > 1e-6])
+```
+
+### Workflow 2: Gene Knockout Screen
+
+```python
+from cobra.io import load_model
+from cobra.flux_analysis import single_gene_deletion
+
+# Load model
+model = load_model("ecoli")
+
+# Perform single gene deletions
+results = single_gene_deletion(model)
+
+# Find essential genes (growth < threshold)
+essential_genes = results[results["growth"] < 0.01]
+print(f"Found {len(essential_genes)} essential genes")
+
+# Find genes with minimal impact
+neutral_genes = results[results["growth"] > 0.9 * solution.objective_value]
+```
+
+### Workflow 3: Media Optimization
+
+```python
+from cobra.io import load_model
+from cobra.medium import minimal_medium
+
+# Load model
+model = load_model("ecoli")
+
+# Calculate minimal medium for 50% of max growth
+target_growth = model.slim_optimize() * 0.5
+min_medium = minimal_medium(
+    model,
+    target_growth,
+    minimize_components=True
+)
+
+print(f"Minimal medium components: {len(min_medium)}")
+print(min_medium)
+```
+
+### Workflow 4: Flux Uncertainty Analysis
+
+```python
+from cobra.io import load_model
+from cobra.flux_analysis import flux_variability_analysis
+from cobra.sampling import sample
+
+# Load model
+model = load_model("ecoli")
+
+# First check flux ranges at optimality
+fva = flux_variability_analysis(model, fraction_of_optimum=1.0)
+
+# For reactions with large ranges, sample to understand distribution
+samples = sample(model, n=1000)
+
+# Analyze specific reaction
+reaction_id = "PFK"
+import matplotlib.pyplot as plt
+samples[reaction_id].hist(bins=50)
+plt.xlabel(f"Flux through {reaction_id}")
+plt.ylabel("Frequency")
+plt.show()
+```
+
+### Workflow 5: Context Manager for Temporary Changes
+
+Use context managers to make temporary modifications:
+```python
+# Model remains unchanged outside context
+with model:
+    # Temporarily change objective
+    model.objective = "ATPM"
+
+    # Temporarily modify bounds
+    model.reactions.EX_glc__D_e.lower_bound = -5.0
+
+    # Temporarily knock out genes
+    model.genes.b0008.knock_out()
+
+    # Optimize with changes
+    solution = model.optimize()
+    print(f"Modified growth: {solution.objective_value}")
+
+# All changes automatically reverted
+solution = model.optimize()
+print(f"Original growth: {solution.objective_value}")
+```
+
+## Key Concepts
+
+### DictList Objects
+Models use `DictList` objects for reactions, metabolites, and genes - behaving like both lists and dictionaries:
+```python
+# Access by index
+first_reaction = model.reactions[0]
+
+# Access by ID
+pfk = model.reactions.get_by_id("PFK")
+
+# Query methods
+atp_reactions = model.reactions.query("atp")
+```
+
+### Flux Constraints
+Reaction bounds define feasible flux ranges:
+- **Irreversible**: `lower_bound = 0, upper_bound > 0`
+- **Reversible**: `lower_bound < 0, upper_bound > 0`
+- Set both bounds simultaneously with `.bounds` to avoid inconsistencies
+
+### Gene-Reaction Rules (GPR)
+Boolean logic linking genes to reactions:
+```python
+# AND logic (both required)
+reaction.gene_reaction_rule = "gene1 and gene2"
+
+# OR logic (either sufficient)
+reaction.gene_reaction_rule = "gene1 or gene2"
+
+# Complex logic
+reaction.gene_reaction_rule = "(gene1 and gene2) or (gene3 and gene4)"
+```
+
+### Exchange Reactions
+Special reactions representing metabolite import/export:
+- Named with prefix `EX_` by convention
+- Positive flux = secretion, negative flux = uptake
+- Managed through `model.medium` dictionary
+
+## Best Practices
+
+1. **Use context managers** for temporary modifications to avoid state management issues
+2. **Validate models** before analysis using `model.slim_optimize()` to ensure feasibility
+3. **Check solution status** after optimization - `optimal` indicates successful solve
+4. **Use loopless FVA** when thermodynamic feasibility matters
+5. **Set fraction_of_optimum** appropriately in FVA to explore suboptimal space
+6. **Parallelize** computationally expensive operations (sampling, double deletions)
+7. **Prefer SBML format** for model exchange and long-term storage
+8. **Use slim_optimize()** when only objective value needed for performance
+9. **Validate flux samples** to ensure numerical stability
+
+## Troubleshooting
+
+**Infeasible solutions**: Check medium constraints, reaction bounds, and model consistency
+**Slow optimization**: Try different solvers (GLPK, CPLEX, Gurobi) via `model.solver`
+**Unbounded solutions**: Verify exchange reactions have appropriate upper bounds
+**Import errors**: Ensure correct file format and valid SBML identifiers
+
+## References
+
+For detailed workflows and API patterns, refer to:
+- `references/workflows.md` - Comprehensive step-by-step workflow examples
+- `references/api_quick_reference.md` - Common function signatures and patterns
+
+Official documentation: https://cobrapy.readthedocs.io/en/latest/