Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/medchem/SKILL.md
+++ b/skills/medchem/SKILL.md
@@ -0,0 +1,400 @@
+---
+name: medchem
+description: "Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering."
+---
+
+# Medchem
+
+## Overview
+
+Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise.
+
+## When to Use This Skill
+
+This skill should be used when:
+- Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
+- Filtering molecules by structural alerts or PAINS patterns
+- Prioritizing compounds for lead optimization
+- Assessing compound quality and medicinal chemistry properties
+- Detecting reactive or problematic functional groups
+- Calculating molecular complexity metrics
+
+## Installation
+
+```bash
+uv pip install medchem
+```
+
+## Core Capabilities
+
+### 1. Medicinal Chemistry Rules
+
+Apply established drug-likeness rules to molecules using the `medchem.rules` module.
+
+**Available Rules:**
+- Rule of Five (Lipinski)
+- Rule of Oprea
+- Rule of CNS
+- Rule of leadlike (soft and strict)
+- Rule of three
+- Rule of Reos
+- Rule of drug
+- Rule of Veber
+- Golden triangle
+- PAINS filters
+
+**Single Rule Application:**
+
+```python
+import medchem as mc
+
+# Apply Rule of Five to a SMILES string
+smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
+passes = mc.rules.basic_rules.rule_of_five(smiles)
+# Returns: True
+
+# Check specific rules
+passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
+passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
+```
+
+**Multiple Rules with RuleFilters:**
+
+```python
+import datamol as dm
+import medchem as mc
+
+# Load molecules
+mols = [dm.to_mol(smiles) for smiles in smiles_list]
+
+# Create filter with multiple rules
+rfilter = mc.rules.RuleFilters(
+    rule_list=[
+        "rule_of_five",
+        "rule_of_oprea",
+        "rule_of_cns",
+        "rule_of_leadlike_soft"
+    ]
+)
+
+# Apply filters with parallelization
+results = rfilter(
+    mols=mols,
+    n_jobs=-1,  # Use all CPU cores
+    progress=True
+)
+```
+
+**Result Format:**
+Results are returned as dictionaries with pass/fail status and detailed information for each rule.
+
+### 2. Structural Alert Filters
+
+Detect potentially problematic structural patterns using the `medchem.structural` module.
+
+**Available Filters:**
+
+1. **Common Alerts** - General structural alerts derived from ChEMBL curation and literature
+2. **NIBR Filters** - Novartis Institutes for BioMedical Research filter set
+3. **Lilly Demerits** - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
+
+**Common Alerts:**
+
+```python
+import medchem as mc
+
+# Create filter
+alert_filter = mc.structural.CommonAlertsFilters()
+
+# Check single molecule
+mol = dm.to_mol("c1ccccc1")
+has_alerts, details = alert_filter.check_mol(mol)
+
+# Batch filtering with parallelization
+results = alert_filter(
+    mols=mol_list,
+    n_jobs=-1,
+    progress=True
+)
+```
+
+**NIBR Filters:**
+
+```python
+import medchem as mc
+
+# Apply NIBR filters
+nibr_filter = mc.structural.NIBRFilters()
+results = nibr_filter(mols=mol_list, n_jobs=-1)
+```
+
+**Lilly Demerits:**
+
+```python
+import medchem as mc
+
+# Calculate Lilly demerits
+lilly = mc.structural.LillyDemeritsFilters()
+results = lilly(mols=mol_list, n_jobs=-1)
+
+# Each result includes demerit score and whether it passes (≤100 demerits)
+```
+
+### 3. Functional API for High-Level Operations
+
+The `medchem.functional` module provides convenient functions for common workflows.
+
+**Quick Filtering:**
+
+```python
+import medchem as mc
+
+# Apply NIBR filters to a list
+filter_ok = mc.functional.nibr_filter(
+    mols=mol_list,
+    n_jobs=-1
+)
+
+# Apply common alerts
+alert_results = mc.functional.common_alerts_filter(
+    mols=mol_list,
+    n_jobs=-1
+)
+```
+
+### 4. Chemical Groups Detection
+
+Identify specific chemical groups and functional groups using `medchem.groups`.
+
+**Available Groups:**
+- Hinge binders
+- Phosphate binders
+- Michael acceptors
+- Reactive groups
+- Custom SMARTS patterns
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Create group detector
+group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
+
+# Check for matches
+has_matches = group.has_match(mol_list)
+
+# Get detailed match information
+matches = group.get_matches(mol)
+```
+
+### 5. Named Catalogs
+
+Access curated collections of chemical structures through `medchem.catalogs`.
+
+**Available Catalogs:**
+- Functional groups
+- Protecting groups
+- Common reagents
+- Standard fragments
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Access named catalogs
+catalogs = mc.catalogs.NamedCatalogs
+
+# Use catalog for matching
+catalog = catalogs.get("functional_groups")
+matches = catalog.get_matches(mol)
+```
+
+### 6. Molecular Complexity
+
+Calculate complexity metrics that approximate synthetic accessibility using `medchem.complexity`.
+
+**Common Metrics:**
+- Bertz complexity
+- Whitlock complexity
+- Barone complexity
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Calculate complexity
+complexity_score = mc.complexity.calculate_complexity(mol)
+
+# Filter by complexity threshold
+complex_filter = mc.complexity.ComplexityFilter(max_complexity=500)
+results = complex_filter(mols=mol_list)
+```
+
+### 7. Constraints Filtering
+
+Apply custom property-based constraints using `medchem.constraints`.
+
+**Example Constraints:**
+- Molecular weight ranges
+- LogP bounds
+- TPSA limits
+- Rotatable bond counts
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Define constraints
+constraints = mc.constraints.Constraints(
+    mw_range=(200, 500),
+    logp_range=(-2, 5),
+    tpsa_max=140,
+    rotatable_bonds_max=10
+)
+
+# Apply constraints
+results = constraints(mols=mol_list, n_jobs=-1)
+```
+
+### 8. Medchem Query Language
+
+Use a specialized query language for complex filtering criteria.
+
+**Query Examples:**
+```
+# Molecules passing Ro5 AND not having common alerts
+"rule_of_five AND NOT common_alerts"
+
+# CNS-like molecules with low complexity
+"rule_of_cns AND complexity < 400"
+
+# Leadlike molecules without Lilly demerits
+"rule_of_leadlike AND lilly_demerits == 0"
+```
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Parse and apply query
+query = mc.query.parse("rule_of_five AND NOT common_alerts")
+results = query.apply(mols=mol_list, n_jobs=-1)
+```
+
+## Workflow Patterns
+
+### Pattern 1: Initial Triage of Compound Library
+
+Filter a large compound collection to identify drug-like candidates.
+
+```python
+import datamol as dm
+import medchem as mc
+import pandas as pd
+
+# Load compound library
+df = pd.read_csv("compounds.csv")
+mols = [dm.to_mol(smi) for smi in df["smiles"]]
+
+# Apply primary filters
+rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])
+rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
+
+# Apply structural alerts
+alert_filter = mc.structural.CommonAlertsFilters()
+alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
+
+# Combine results
+df["passes_rules"] = rule_results["pass"]
+df["has_alerts"] = alert_results["has_alerts"]
+df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
+
+# Save filtered compounds
+filtered_df = df[df["drug_like"]]
+filtered_df.to_csv("filtered_compounds.csv", index=False)
+```
+
+### Pattern 2: Lead Optimization Filtering
+
+Apply stricter criteria during lead optimization.
+
+```python
+import medchem as mc
+
+# Create comprehensive filter
+filters = {
+    "rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]),
+    "alerts": mc.structural.NIBRFilters(),
+    "lilly": mc.structural.LillyDemeritsFilters(),
+    "complexity": mc.complexity.ComplexityFilter(max_complexity=400)
+}
+
+# Apply all filters
+results = {}
+for name, filt in filters.items():
+    results[name] = filt(mols=candidate_mols, n_jobs=-1)
+
+# Identify compounds passing all filters
+passes_all = all(r["pass"] for r in results.values())
+```
+
+### Pattern 3: Identify Specific Chemical Groups
+
+Find molecules containing specific functional groups or scaffolds.
+
+```python
+import medchem as mc
+
+# Create group detector for multiple groups
+group_detector = mc.groups.ChemicalGroup(
+    groups=["hinge_binders", "phosphate_binders"]
+)
+
+# Screen library
+matches = group_detector.get_all_matches(mol_list)
+
+# Filter molecules with desired groups
+mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
+```
+
+## Best Practices
+
+1. **Context Matters**: Don't blindly apply filters. Understand the biological target and chemical space.
+
+2. **Combine Multiple Filters**: Use rules, structural alerts, and domain knowledge together for better decisions.
+
+3. **Use Parallelization**: For large datasets (>1000 molecules), always use `n_jobs=-1` for parallel processing.
+
+4. **Iterative Refinement**: Start with broad filters (Ro5), then apply more specific criteria (CNS, leadlike) as needed.
+
+5. **Document Filtering Decisions**: Track which molecules were filtered out and why for reproducibility.
+
+6. **Validate Results**: Remember that marketed drugs often fail standard filters—use these as guidelines, not absolute rules.
+
+7. **Consider Prodrugs**: Molecules designed as prodrugs may intentionally violate standard medicinal chemistry rules.
+
+## Resources
+
+### references/api_guide.md
+Comprehensive API reference covering all medchem modules with detailed function signatures, parameters, and return types.
+
+### references/rules_catalog.md
+Complete catalog of available rules, filters, and alerts with descriptions, thresholds, and literature references.
+
+### scripts/filter_molecules.py
+Production-ready script for batch filtering workflows. Supports multiple input formats (CSV, SDF, SMILES), configurable filter combinations, and detailed reporting.
+
+**Usage:**
+```bash
+python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv
+```
+
+## Documentation
+
+Official documentation: https://medchem-docs.datamol.io/
+GitHub repository: https://github.com/datamol-io/medchem