Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/medchem/SKILL.md
+++ b/skills/medchem/SKILL.md
@@ -0,0 +1,400 @@
+---
+name: medchem
+description: "Medicinal chemistry filters. Apply drug-likeness rules (Lipinski, Veber), PAINS filters, structural alerts, complexity metrics, for compound prioritization and library filtering."
+---
+
+# Medchem
+
+## Overview
+
+Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. Apply hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale. Rules and filters are context-specific—use as guidelines combined with domain expertise.
+
+## When to Use This Skill
+
+This skill should be used when:
+- Applying drug-likeness rules (Lipinski, Veber, etc.) to compound libraries
+- Filtering molecules by structural alerts or PAINS patterns
+- Prioritizing compounds for lead optimization
+- Assessing compound quality and medicinal chemistry properties
+- Detecting reactive or problematic functional groups
+- Calculating molecular complexity metrics
+
+## Installation
+
+```bash
+uv pip install medchem
+```
+
+## Core Capabilities
+
+### 1. Medicinal Chemistry Rules
+
+Apply established drug-likeness rules to molecules using the `medchem.rules` module.
+
+**Available Rules:**
+- Rule of Five (Lipinski)
+- Rule of Oprea
+- Rule of CNS
+- Rule of leadlike (soft and strict)
+- Rule of three
+- Rule of Reos
+- Rule of drug
+- Rule of Veber
+- Golden triangle
+- PAINS filters
+
+**Single Rule Application:**
+
+```python
+import medchem as mc
+
+# Apply Rule of Five to a SMILES string
+smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
+passes = mc.rules.basic_rules.rule_of_five(smiles)
+# Returns: True
+
+# Check specific rules
+passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
+passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
+```
+
+**Multiple Rules with RuleFilters:**
+
+```python
+import datamol as dm
+import medchem as mc
+
+# Load molecules
+mols = [dm.to_mol(smiles) for smiles in smiles_list]
+
+# Create filter with multiple rules
+rfilter = mc.rules.RuleFilters(
+    rule_list=[
+        "rule_of_five",
+        "rule_of_oprea",
+        "rule_of_cns",
+        "rule_of_leadlike_soft"
+    ]
+)
+
+# Apply filters with parallelization
+results = rfilter(
+    mols=mols,
+    n_jobs=-1,  # Use all CPU cores
+    progress=True
+)
+```
+
+**Result Format:**
+Results are returned as dictionaries with pass/fail status and detailed information for each rule.
+
+### 2. Structural Alert Filters
+
+Detect potentially problematic structural patterns using the `medchem.structural` module.
+
+**Available Filters:**
+
+1. **Common Alerts** - General structural alerts derived from ChEMBL curation and literature
+2. **NIBR Filters** - Novartis Institutes for BioMedical Research filter set
+3. **Lilly Demerits** - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
+
+**Common Alerts:**
+
+```python
+import medchem as mc
+
+# Create filter
+alert_filter = mc.structural.CommonAlertsFilters()
+
+# Check single molecule
+mol = dm.to_mol("c1ccccc1")
+has_alerts, details = alert_filter.check_mol(mol)
+
+# Batch filtering with parallelization
+results = alert_filter(
+    mols=mol_list,
+    n_jobs=-1,
+    progress=True
+)
+```
+
+**NIBR Filters:**
+
+```python
+import medchem as mc
+
+# Apply NIBR filters
+nibr_filter = mc.structural.NIBRFilters()
+results = nibr_filter(mols=mol_list, n_jobs=-1)
+```
+
+**Lilly Demerits:**
+
+```python
+import medchem as mc
+
+# Calculate Lilly demerits
+lilly = mc.structural.LillyDemeritsFilters()
+results = lilly(mols=mol_list, n_jobs=-1)
+
+# Each result includes demerit score and whether it passes (≤100 demerits)
+```
+
+### 3. Functional API for High-Level Operations
+
+The `medchem.functional` module provides convenient functions for common workflows.
+
+**Quick Filtering:**
+
+```python
+import medchem as mc
+
+# Apply NIBR filters to a list
+filter_ok = mc.functional.nibr_filter(
+    mols=mol_list,
+    n_jobs=-1
+)
+
+# Apply common alerts
+alert_results = mc.functional.common_alerts_filter(
+    mols=mol_list,
+    n_jobs=-1
+)
+```
+
+### 4. Chemical Groups Detection
+
+Identify specific chemical groups and functional groups using `medchem.groups`.
+
+**Available Groups:**
+- Hinge binders
+- Phosphate binders
+- Michael acceptors
+- Reactive groups
+- Custom SMARTS patterns
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Create group detector
+group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
+
+# Check for matches
+has_matches = group.has_match(mol_list)
+
+# Get detailed match information
+matches = group.get_matches(mol)
+```
+
+### 5. Named Catalogs
+
+Access curated collections of chemical structures through `medchem.catalogs`.
+
+**Available Catalogs:**
+- Functional groups
+- Protecting groups
+- Common reagents
+- Standard fragments
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Access named catalogs
+catalogs = mc.catalogs.NamedCatalogs
+
+# Use catalog for matching
+catalog = catalogs.get("functional_groups")
+matches = catalog.get_matches(mol)
+```
+
+### 6. Molecular Complexity
+
+Calculate complexity metrics that approximate synthetic accessibility using `medchem.complexity`.
+
+**Common Metrics:**
+- Bertz complexity
+- Whitlock complexity
+- Barone complexity
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Calculate complexity
+complexity_score = mc.complexity.calculate_complexity(mol)
+
+# Filter by complexity threshold
+complex_filter = mc.complexity.ComplexityFilter(max_complexity=500)
+results = complex_filter(mols=mol_list)
+```
+
+### 7. Constraints Filtering
+
+Apply custom property-based constraints using `medchem.constraints`.
+
+**Example Constraints:**
+- Molecular weight ranges
+- LogP bounds
+- TPSA limits
+- Rotatable bond counts
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Define constraints
+constraints = mc.constraints.Constraints(
+    mw_range=(200, 500),
+    logp_range=(-2, 5),
+    tpsa_max=140,
+    rotatable_bonds_max=10
+)
+
+# Apply constraints
+results = constraints(mols=mol_list, n_jobs=-1)
+```
+
+### 8. Medchem Query Language
+
+Use a specialized query language for complex filtering criteria.
+
+**Query Examples:**
+```
+# Molecules passing Ro5 AND not having common alerts
+"rule_of_five AND NOT common_alerts"
+
+# CNS-like molecules with low complexity
+"rule_of_cns AND complexity < 400"
+
+# Leadlike molecules without Lilly demerits
+"rule_of_leadlike AND lilly_demerits == 0"
+```
+
+**Usage:**
+
+```python
+import medchem as mc
+
+# Parse and apply query
+query = mc.query.parse("rule_of_five AND NOT common_alerts")
+results = query.apply(mols=mol_list, n_jobs=-1)
+```
+
+## Workflow Patterns
+
+### Pattern 1: Initial Triage of Compound Library
+
+Filter a large compound collection to identify drug-like candidates.
+
+```python
+import datamol as dm
+import medchem as mc
+import pandas as pd
+
+# Load compound library
+df = pd.read_csv("compounds.csv")
+mols = [dm.to_mol(smi) for smi in df["smiles"]]
+
+# Apply primary filters
+rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])
+rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
+
+# Apply structural alerts
+alert_filter = mc.structural.CommonAlertsFilters()
+alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
+
+# Combine results
+df["passes_rules"] = rule_results["pass"]
+df["has_alerts"] = alert_results["has_alerts"]
+df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
+
+# Save filtered compounds
+filtered_df = df[df["drug_like"]]
+filtered_df.to_csv("filtered_compounds.csv", index=False)
+```
+
+### Pattern 2: Lead Optimization Filtering
+
+Apply stricter criteria during lead optimization.
+
+```python
+import medchem as mc
+
+# Create comprehensive filter
+filters = {
+    "rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]),
+    "alerts": mc.structural.NIBRFilters(),
+    "lilly": mc.structural.LillyDemeritsFilters(),
+    "complexity": mc.complexity.ComplexityFilter(max_complexity=400)
+}
+
+# Apply all filters
+results = {}
+for name, filt in filters.items():
+    results[name] = filt(mols=candidate_mols, n_jobs=-1)
+
+# Identify compounds passing all filters
+passes_all = all(r["pass"] for r in results.values())
+```
+
+### Pattern 3: Identify Specific Chemical Groups
+
+Find molecules containing specific functional groups or scaffolds.
+
+```python
+import medchem as mc
+
+# Create group detector for multiple groups
+group_detector = mc.groups.ChemicalGroup(
+    groups=["hinge_binders", "phosphate_binders"]
+)
+
+# Screen library
+matches = group_detector.get_all_matches(mol_list)
+
+# Filter molecules with desired groups
+mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
+```
+
+## Best Practices
+
+1. **Context Matters**: Don't blindly apply filters. Understand the biological target and chemical space.
+
+2. **Combine Multiple Filters**: Use rules, structural alerts, and domain knowledge together for better decisions.
+
+3. **Use Parallelization**: For large datasets (>1000 molecules), always use `n_jobs=-1` for parallel processing.
+
+4. **Iterative Refinement**: Start with broad filters (Ro5), then apply more specific criteria (CNS, leadlike) as needed.
+
+5. **Document Filtering Decisions**: Track which molecules were filtered out and why for reproducibility.
+
+6. **Validate Results**: Remember that marketed drugs often fail standard filters—use these as guidelines, not absolute rules.
+
+7. **Consider Prodrugs**: Molecules designed as prodrugs may intentionally violate standard medicinal chemistry rules.
+
+## Resources
+
+### references/api_guide.md
+Comprehensive API reference covering all medchem modules with detailed function signatures, parameters, and return types.
+
+### references/rules_catalog.md
+Complete catalog of available rules, filters, and alerts with descriptions, thresholds, and literature references.
+
+### scripts/filter_molecules.py
+Production-ready script for batch filtering workflows. Supports multiple input formats (CSV, SDF, SMILES), configurable filter combinations, and detailed reporting.
+
+**Usage:**
+```bash
+python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv
+```
+
+## Documentation
+
+Official documentation: https://medchem-docs.datamol.io/
+GitHub repository: https://github.com/datamol-io/medchem
--- a/skills/medchem/references/api_guide.md
+++ b/skills/medchem/references/api_guide.md
@@ -0,0 +1,600 @@
+# Medchem API Reference
+
+Comprehensive reference for all medchem modules and functions.
+
+## Module: medchem.rules
+
+### Class: RuleFilters
+
+Filter molecules based on multiple medicinal chemistry rules.
+
+**Constructor:**
+```python
+RuleFilters(rule_list: List[str])
+```
+
+**Parameters:**
+- `rule_list`: List of rule names to apply. See available rules below.
+
+**Methods:**
+
+```python
+__call__(mols: List[Chem.Mol], n_jobs: int = 1, progress: bool = False) -> Dict
+```
+- `mols`: List of RDKit molecule objects
+- `n_jobs`: Number of parallel jobs (-1 uses all cores)
+- `progress`: Show progress bar
+- **Returns**: Dictionary with results for each rule
+
+**Example:**
+```python
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_cns"])
+results = rfilter(mols=mol_list, n_jobs=-1, progress=True)
+```
+
+### Module: medchem.rules.basic_rules
+
+Individual rule functions that can be applied to single molecules.
+
+#### rule_of_five()
+
+```python
+rule_of_five(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Lipinski's Rule of Five for oral bioavailability.
+
+**Criteria:**
+- Molecular weight ≤ 500 Da
+- LogP ≤ 5
+- H-bond donors ≤ 5
+- H-bond acceptors ≤ 10
+
+**Parameters:**
+- `mol`: SMILES string or RDKit molecule object
+
+**Returns:** True if molecule passes all criteria
+
+#### rule_of_three()
+
+```python
+rule_of_three(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Rule of Three for fragment screening libraries.
+
+**Criteria:**
+- Molecular weight ≤ 300 Da
+- LogP ≤ 3
+- H-bond donors ≤ 3
+- H-bond acceptors ≤ 3
+- Rotatable bonds ≤ 3
+- Polar surface area ≤ 60 Ų
+
+#### rule_of_oprea()
+
+```python
+rule_of_oprea(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Oprea's lead-like criteria for hit-to-lead optimization.
+
+**Criteria:**
+- Molecular weight: 200-350 Da
+- LogP: -2 to 4
+- Rotatable bonds ≤ 7
+- Rings ≤ 4
+
+#### rule_of_cns()
+
+```python
+rule_of_cns(mol: Union[str, Chem.Mol]) -> bool
+```
+
+CNS drug-likeness rules.
+
+**Criteria:**
+- Molecular weight ≤ 450 Da
+- LogP: -1 to 5
+- H-bond donors ≤ 2
+- TPSA ≤ 90 Ų
+
+#### rule_of_leadlike_soft()
+
+```python
+rule_of_leadlike_soft(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Soft lead-like criteria (more permissive).
+
+**Criteria:**
+- Molecular weight: 250-450 Da
+- LogP: -3 to 4
+- Rotatable bonds ≤ 10
+
+#### rule_of_leadlike_strict()
+
+```python
+rule_of_leadlike_strict(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Strict lead-like criteria (more restrictive).
+
+**Criteria:**
+- Molecular weight: 200-350 Da
+- LogP: -2 to 3.5
+- Rotatable bonds ≤ 7
+- Rings: 1-3
+
+#### rule_of_veber()
+
+```python
+rule_of_veber(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Veber's rules for oral bioavailability.
+
+**Criteria:**
+- Rotatable bonds ≤ 10
+- TPSA ≤ 140 Ų
+
+#### rule_of_reos()
+
+```python
+rule_of_reos(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Rapid Elimination Of Swill (REOS) filter.
+
+**Criteria:**
+- Molecular weight: 200-500 Da
+- LogP: -5 to 5
+- H-bond donors: 0-5
+- H-bond acceptors: 0-10
+
+#### rule_of_drug()
+
+```python
+rule_of_drug(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Combined drug-likeness criteria.
+
+**Criteria:**
+- Passes Rule of Five
+- Passes Veber rules
+- No PAINS substructures
+
+#### golden_triangle()
+
+```python
+golden_triangle(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Golden Triangle for drug-likeness balance.
+
+**Criteria:**
+- 200 ≤ MW ≤ 50×LogP + 400
+- LogP: -2 to 5
+
+#### pains_filter()
+
+```python
+pains_filter(mol: Union[str, Chem.Mol]) -> bool
+```
+
+Pan Assay INterference compoundS (PAINS) filter.
+
+**Returns:** True if molecule does NOT contain PAINS substructures
+
+---
+
+## Module: medchem.structural
+
+### Class: CommonAlertsFilters
+
+Filter for common structural alerts derived from ChEMBL and literature.
+
+**Constructor:**
+```python
+CommonAlertsFilters()
+```
+
+**Methods:**
+
+```python
+__call__(mols: List[Chem.Mol], n_jobs: int = 1, progress: bool = False) -> List[Dict]
+```
+
+Apply common alerts filter to a list of molecules.
+
+**Returns:** List of dictionaries with keys:
+- `has_alerts`: Boolean indicating if molecule has alerts
+- `alert_details`: List of matched alert patterns
+- `num_alerts`: Number of alerts found
+
+```python
+check_mol(mol: Chem.Mol) -> Tuple[bool, List[str]]
+```
+
+Check a single molecule for structural alerts.
+
+**Returns:** Tuple of (has_alerts, list_of_alert_names)
+
+### Class: NIBRFilters
+
+Novartis NIBR medicinal chemistry filters.
+
+**Constructor:**
+```python
+NIBRFilters()
+```
+
+**Methods:**
+
+```python
+__call__(mols: List[Chem.Mol], n_jobs: int = 1, progress: bool = False) -> List[bool]
+```
+
+Apply NIBR filters to molecules.
+
+**Returns:** List of booleans (True if molecule passes)
+
+### Class: LillyDemeritsFilters
+
+Eli Lilly's demerit-based structural alert system (275 rules).
+
+**Constructor:**
+```python
+LillyDemeritsFilters()
+```
+
+**Methods:**
+
+```python
+__call__(mols: List[Chem.Mol], n_jobs: int = 1, progress: bool = False) -> List[Dict]
+```
+
+Calculate Lilly demerits for molecules.
+
+**Returns:** List of dictionaries with keys:
+- `demerits`: Total demerit score
+- `passes`: Boolean (True if demerits ≤ 100)
+- `matched_patterns`: List of matched patterns with scores
+
+---
+
+## Module: medchem.functional
+
+High-level functional API for common operations.
+
+### nibr_filter()
+
+```python
+nibr_filter(mols: List[Chem.Mol], n_jobs: int = 1) -> List[bool]
+```
+
+Apply NIBR filters using functional API.
+
+**Parameters:**
+- `mols`: List of molecules
+- `n_jobs`: Parallelization level
+
+**Returns:** List of pass/fail booleans
+
+### common_alerts_filter()
+
+```python
+common_alerts_filter(mols: List[Chem.Mol], n_jobs: int = 1) -> List[Dict]
+```
+
+Apply common alerts filter using functional API.
+
+**Returns:** List of results dictionaries
+
+### lilly_demerits_filter()
+
+```python
+lilly_demerits_filter(mols: List[Chem.Mol], n_jobs: int = 1) -> List[Dict]
+```
+
+Calculate Lilly demerits using functional API.
+
+---
+
+## Module: medchem.groups
+
+### Class: ChemicalGroup
+
+Detect specific chemical groups in molecules.
+
+**Constructor:**
+```python
+ChemicalGroup(groups: List[str], custom_smarts: Optional[Dict[str, str]] = None)
+```
+
+**Parameters:**
+- `groups`: List of predefined group names
+- `custom_smarts`: Dictionary mapping custom group names to SMARTS patterns
+
+**Predefined Groups:**
+- `"hinge_binders"`: Kinase hinge binding motifs
+- `"phosphate_binders"`: Phosphate binding groups
+- `"michael_acceptors"`: Michael acceptor electrophiles
+- `"reactive_groups"`: General reactive functionalities
+
+**Methods:**
+
+```python
+has_match(mols: List[Chem.Mol]) -> List[bool]
+```
+
+Check if molecules contain any of the specified groups.
+
+```python
+get_matches(mol: Chem.Mol) -> Dict[str, List[Tuple]]
+```
+
+Get detailed match information for a single molecule.
+
+**Returns:** Dictionary mapping group names to lists of atom indices
+
+```python
+get_all_matches(mols: List[Chem.Mol]) -> List[Dict]
+```
+
+Get match information for all molecules.
+
+**Example:**
+```python
+group = mc.groups.ChemicalGroup(groups=["hinge_binders", "phosphate_binders"])
+matches = group.get_all_matches(mol_list)
+```
+
+---
+
+## Module: medchem.catalogs
+
+### Class: NamedCatalogs
+
+Access to curated chemical catalogs.
+
+**Available Catalogs:**
+- `"functional_groups"`: Common functional groups
+- `"protecting_groups"`: Protecting group structures
+- `"reagents"`: Common reagents
+- `"fragments"`: Standard fragments
+
+**Usage:**
+```python
+catalog = mc.catalogs.NamedCatalogs.get("functional_groups")
+matches = catalog.get_matches(mol)
+```
+
+---
+
+## Module: medchem.complexity
+
+Calculate molecular complexity metrics.
+
+### calculate_complexity()
+
+```python
+calculate_complexity(mol: Chem.Mol, method: str = "bertz") -> float
+```
+
+Calculate complexity score for a molecule.
+
+**Parameters:**
+- `mol`: RDKit molecule
+- `method`: Complexity metric ("bertz", "whitlock", "barone")
+
+**Returns:** Complexity score (higher = more complex)
+
+### Class: ComplexityFilter
+
+Filter molecules by complexity threshold.
+
+**Constructor:**
+```python
+ComplexityFilter(max_complexity: float, method: str = "bertz")
+```
+
+**Methods:**
+
+```python
+__call__(mols: List[Chem.Mol], n_jobs: int = 1) -> List[bool]
+```
+
+Filter molecules exceeding complexity threshold.
+
+---
+
+## Module: medchem.constraints
+
+### Class: Constraints
+
+Apply custom property-based constraints.
+
+**Constructor:**
+```python
+Constraints(
+    mw_range: Optional[Tuple[float, float]] = None,
+    logp_range: Optional[Tuple[float, float]] = None,
+    tpsa_max: Optional[float] = None,
+    tpsa_range: Optional[Tuple[float, float]] = None,
+    hbd_max: Optional[int] = None,
+    hba_max: Optional[int] = None,
+    rotatable_bonds_max: Optional[int] = None,
+    rings_range: Optional[Tuple[int, int]] = None,
+    aromatic_rings_max: Optional[int] = None,
+)
+```
+
+**Parameters:** All parameters are optional. Specify only the constraints needed.
+
+**Methods:**
+
+```python
+__call__(mols: List[Chem.Mol], n_jobs: int = 1) -> List[Dict]
+```
+
+Apply constraints to molecules.
+
+**Returns:** List of dictionaries with keys:
+- `passes`: Boolean indicating if all constraints pass
+- `violations`: List of constraint names that failed
+
+**Example:**
+```python
+constraints = mc.constraints.Constraints(
+    mw_range=(200, 500),
+    logp_range=(-2, 5),
+    tpsa_max=140
+)
+results = constraints(mols=mol_list, n_jobs=-1)
+```
+
+---
+
+## Module: medchem.query
+
+Query language for complex filtering.
+
+### parse()
+
+```python
+parse(query: str) -> Query
+```
+
+Parse a medchem query string into a Query object.
+
+**Query Syntax:**
+- Operators: `AND`, `OR`, `NOT`
+- Comparisons: `<`, `>`, `<=`, `>=`, `==`, `!=`
+- Properties: `complexity`, `lilly_demerits`, `mw`, `logp`, `tpsa`
+- Rules: `rule_of_five`, `rule_of_cns`, etc.
+- Filters: `common_alerts`, `nibr_filter`, `pains_filter`
+
+**Example Queries:**
+```python
+"rule_of_five AND NOT common_alerts"
+"rule_of_cns AND complexity < 400"
+"mw > 200 AND mw < 500 AND logp < 5"
+"(rule_of_five OR rule_of_oprea) AND NOT pains_filter"
+```
+
+### Class: Query
+
+**Methods:**
+
+```python
+apply(mols: List[Chem.Mol], n_jobs: int = 1) -> List[bool]
+```
+
+Apply parsed query to molecules.
+
+**Example:**
+```python
+query = mc.query.parse("rule_of_five AND NOT common_alerts")
+results = query.apply(mols=mol_list, n_jobs=-1)
+passing_mols = [mol for mol, passes in zip(mol_list, results) if passes]
+```
+
+---
+
+## Module: medchem.utils
+
+Utility functions for working with molecules.
+
+### batch_process()
+
+```python
+batch_process(
+    mols: List[Chem.Mol],
+    func: Callable,
+    n_jobs: int = 1,
+    progress: bool = False,
+    batch_size: Optional[int] = None
+) -> List
+```
+
+Process molecules in parallel batches.
+
+**Parameters:**
+- `mols`: List of molecules
+- `func`: Function to apply to each molecule
+- `n_jobs`: Number of parallel workers
+- `progress`: Show progress bar
+- `batch_size`: Size of processing batches
+
+### standardize_mol()
+
+```python
+standardize_mol(mol: Chem.Mol) -> Chem.Mol
+```
+
+Standardize molecule representation (sanitize, neutralize charges, etc.).
+
+---
+
+## Common Patterns
+
+### Pattern: Parallel Processing
+
+All filters support parallelization:
+
+```python
+# Use all CPU cores
+results = filter_object(mols=mol_list, n_jobs=-1, progress=True)
+
+# Use specific number of cores
+results = filter_object(mols=mol_list, n_jobs=4, progress=True)
+```
+
+### Pattern: Combining Multiple Filters
+
+```python
+import medchem as mc
+
+# Apply multiple filters
+rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five"])
+alert_filter = mc.structural.CommonAlertsFilters()
+lilly_filter = mc.structural.LillyDemeritsFilters()
+
+# Get results
+rule_results = rule_filter(mols=mol_list, n_jobs=-1)
+alert_results = alert_filter(mols=mol_list, n_jobs=-1)
+lilly_results = lilly_filter(mols=mol_list, n_jobs=-1)
+
+# Combine criteria
+passing_mols = [
+    mol for i, mol in enumerate(mol_list)
+    if rule_results[i]["passes"]
+    and not alert_results[i]["has_alerts"]
+    and lilly_results[i]["passes"]
+]
+```
+
+### Pattern: Working with DataFrames
+
+```python
+import pandas as pd
+import datamol as dm
+import medchem as mc
+
+# Load data
+df = pd.read_csv("molecules.csv")
+df["mol"] = df["smiles"].apply(dm.to_mol)
+
+# Apply filters
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_cns"])
+results = rfilter(mols=df["mol"].tolist(), n_jobs=-1)
+
+# Add results to dataframe
+df["passes_ro5"] = [r["rule_of_five"] for r in results]
+df["passes_cns"] = [r["rule_of_cns"] for r in results]
+
+# Filter dataframe
+filtered_df = df[df["passes_ro5"] & df["passes_cns"]]
+```
--- a/skills/medchem/references/rules_catalog.md
+++ b/skills/medchem/references/rules_catalog.md
@@ -0,0 +1,604 @@
+# Medchem Rules and Filters Catalog
+
+Comprehensive catalog of all available medicinal chemistry rules, structural alerts, and filters in medchem.
+
+## Table of Contents
+
+1. [Drug-Likeness Rules](#drug-likeness-rules)
+2. [Lead-Likeness Rules](#lead-likeness-rules)
+3. [Fragment Rules](#fragment-rules)
+4. [CNS Rules](#cns-rules)
+5. [Structural Alert Filters](#structural-alert-filters)
+6. [Chemical Group Patterns](#chemical-group-patterns)
+
+---
+
+## Drug-Likeness Rules
+
+### Rule of Five (Lipinski)
+
+**Reference:** Lipinski et al., Adv Drug Deliv Rev (1997) 23:3-25
+
+**Purpose:** Predict oral bioavailability
+
+**Criteria:**
+- Molecular Weight ≤ 500 Da
+- LogP ≤ 5
+- Hydrogen Bond Donors ≤ 5
+- Hydrogen Bond Acceptors ≤ 10
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_five(mol)
+```
+
+**Notes:**
+- One of the most widely used filters in drug discovery
+- About 90% of orally active drugs comply with these rules
+- Exceptions exist, especially for natural products and antibiotics
+
+---
+
+### Rule of Veber
+
+**Reference:** Veber et al., J Med Chem (2002) 45:2615-2623
+
+**Purpose:** Additional criteria for oral bioavailability
+
+**Criteria:**
+- Rotatable Bonds ≤ 10
+- Topological Polar Surface Area (TPSA) ≤ 140 Ų
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_veber(mol)
+```
+
+**Notes:**
+- Complements Rule of Five
+- TPSA correlates with cell permeability
+- Rotatable bonds affect molecular flexibility
+
+---
+
+### Rule of Drug
+
+**Purpose:** Combined drug-likeness assessment
+
+**Criteria:**
+- Passes Rule of Five
+- Passes Veber rules
+- Does not contain PAINS substructures
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_drug(mol)
+```
+
+---
+
+### REOS (Rapid Elimination Of Swill)
+
+**Reference:** Walters & Murcko, Adv Drug Deliv Rev (2002) 54:255-271
+
+**Purpose:** Filter out compounds unlikely to be drugs
+
+**Criteria:**
+- Molecular Weight: 200-500 Da
+- LogP: -5 to 5
+- Hydrogen Bond Donors: 0-5
+- Hydrogen Bond Acceptors: 0-10
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_reos(mol)
+```
+
+---
+
+### Golden Triangle
+
+**Reference:** Johnson et al., J Med Chem (2009) 52:5487-5500
+
+**Purpose:** Balance lipophilicity and molecular weight
+
+**Criteria:**
+- 200 ≤ MW ≤ 50 × LogP + 400
+- LogP: -2 to 5
+
+**Usage:**
+```python
+mc.rules.basic_rules.golden_triangle(mol)
+```
+
+**Notes:**
+- Defines optimal physicochemical space
+- Visual representation resembles a triangle on MW vs LogP plot
+
+---
+
+## Lead-Likeness Rules
+
+### Rule of Oprea
+
+**Reference:** Oprea et al., J Chem Inf Comput Sci (2001) 41:1308-1315
+
+**Purpose:** Identify lead-like compounds for optimization
+
+**Criteria:**
+- Molecular Weight: 200-350 Da
+- LogP: -2 to 4
+- Rotatable Bonds ≤ 7
+- Number of Rings ≤ 4
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_oprea(mol)
+```
+
+**Rationale:** Lead compounds should have "room to grow" during optimization
+
+---
+
+### Rule of Leadlike (Soft)
+
+**Purpose:** Permissive lead-like criteria
+
+**Criteria:**
+- Molecular Weight: 250-450 Da
+- LogP: -3 to 4
+- Rotatable Bonds ≤ 10
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_leadlike_soft(mol)
+```
+
+---
+
+### Rule of Leadlike (Strict)
+
+**Purpose:** Restrictive lead-like criteria
+
+**Criteria:**
+- Molecular Weight: 200-350 Da
+- LogP: -2 to 3.5
+- Rotatable Bonds ≤ 7
+- Number of Rings: 1-3
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_leadlike_strict(mol)
+```
+
+---
+
+## Fragment Rules
+
+### Rule of Three
+
+**Reference:** Congreve et al., Drug Discov Today (2003) 8:876-877
+
+**Purpose:** Screen fragment libraries for fragment-based drug discovery
+
+**Criteria:**
+- Molecular Weight ≤ 300 Da
+- LogP ≤ 3
+- Hydrogen Bond Donors ≤ 3
+- Hydrogen Bond Acceptors ≤ 3
+- Rotatable Bonds ≤ 3
+- Polar Surface Area ≤ 60 Ų
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_three(mol)
+```
+
+**Notes:**
+- Fragments are grown into leads during optimization
+- Lower complexity allows more starting points
+
+---
+
+## CNS Rules
+
+### Rule of CNS
+
+**Purpose:** Central nervous system drug-likeness
+
+**Criteria:**
+- Molecular Weight ≤ 450 Da
+- LogP: -1 to 5
+- Hydrogen Bond Donors ≤ 2
+- TPSA ≤ 90 Ų
+
+**Usage:**
+```python
+mc.rules.basic_rules.rule_of_cns(mol)
+```
+
+**Rationale:**
+- Blood-brain barrier penetration requires specific properties
+- Lower TPSA and HBD count improve BBB permeability
+- Tight constraints reflect CNS challenges
+
+---
+
+## Structural Alert Filters
+
+### PAINS (Pan Assay INterference compoundS)
+
+**Reference:** Baell & Holloway, J Med Chem (2010) 53:2719-2740
+
+**Purpose:** Identify compounds that interfere with assays
+
+**Categories:**
+- Catechols
+- Quinones
+- Rhodanines
+- Hydroxyphenylhydrazones
+- Alkyl/aryl aldehydes
+- Michael acceptors (specific patterns)
+
+**Usage:**
+```python
+mc.rules.basic_rules.pains_filter(mol)
+# Returns True if NO PAINS found
+```
+
+**Notes:**
+- PAINS compounds show activity in multiple assays through non-specific mechanisms
+- Common false positives in screening campaigns
+- Should be deprioritized in lead selection
+
+---
+
+### Common Alerts Filters
+
+**Source:** Derived from ChEMBL curation and medicinal chemistry literature
+
+**Purpose:** Flag common problematic structural patterns
+
+**Alert Categories:**
+1. **Reactive Groups**
+   - Epoxides
+   - Aziridines
+   - Acid halides
+   - Isocyanates
+
+2. **Metabolic Liabilities**
+   - Hydrazines
+   - Thioureas
+   - Anilines (certain patterns)
+
+3. **Aggregators**
+   - Polyaromatic systems
+   - Long aliphatic chains
+
+4. **Toxicophores**
+   - Nitro aromatics
+   - Aromatic N-oxides
+   - Certain heterocycles
+
+**Usage:**
+```python
+alert_filter = mc.structural.CommonAlertsFilters()
+has_alerts, details = alert_filter.check_mol(mol)
+```
+
+**Return Format:**
+```python
+{
+    "has_alerts": True,
+    "alert_details": ["reactive_epoxide", "metabolic_hydrazine"],
+    "num_alerts": 2
+}
+```
+
+---
+
+### NIBR Filters
+
+**Source:** Novartis Institutes for BioMedical Research
+
+**Purpose:** Industrial medicinal chemistry filtering rules
+
+**Features:**
+- Proprietary filter set developed from Novartis experience
+- Balances drug-likeness with practical medicinal chemistry
+- Includes both structural alerts and property filters
+
+**Usage:**
+```python
+nibr_filter = mc.structural.NIBRFilters()
+results = nibr_filter(mols=mol_list, n_jobs=-1)
+```
+
+**Return Format:** Boolean list (True = passes)
+
+---
+
+### Lilly Demerits Filter
+
+**Reference:** Based on Eli Lilly medicinal chemistry rules
+
+**Source:** 275 structural patterns accumulated over 18 years
+
+**Purpose:** Identify assay interference and problematic functionalities
+
+**Mechanism:**
+- Each matched pattern adds demerits
+- Molecules with >100 demerits are rejected
+- Some patterns add 10-50 demerits, others add 100+ (instant rejection)
+
+**Demerit Categories:**
+
+1. **High Demerits (>50):**
+   - Known toxic groups
+   - Highly reactive functionalities
+   - Strong metal chelators
+
+2. **Medium Demerits (20-50):**
+   - Metabolic liabilities
+   - Aggregation-prone structures
+   - Frequent hitters
+
+3. **Low Demerits (5-20):**
+   - Minor concerns
+   - Context-dependent issues
+
+**Usage:**
+```python
+lilly_filter = mc.structural.LillyDemeritsFilters()
+results = lilly_filter(mols=mol_list, n_jobs=-1)
+```
+
+**Return Format:**
+```python
+{
+    "demerits": 35,
+    "passes": True,  # (demerits ≤ 100)
+    "matched_patterns": [
+        {"pattern": "phenolic_ester", "demerits": 20},
+        {"pattern": "aniline_derivative", "demerits": 15}
+    ]
+}
+```
+
+---
+
+## Chemical Group Patterns
+
+### Hinge Binders
+
+**Purpose:** Identify kinase hinge-binding motifs
+
+**Common Patterns:**
+- Aminopyridines
+- Aminopyrimidines
+- Indazoles
+- Benzimidazoles
+
+**Usage:**
+```python
+group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
+has_hinge = group.has_match(mol_list)
+```
+
+**Application:** Kinase inhibitor design
+
+---
+
+### Phosphate Binders
+
+**Purpose:** Identify phosphate-binding groups
+
+**Common Patterns:**
+- Basic amines in specific geometries
+- Guanidinium groups
+- Arginine mimetics
+
+**Usage:**
+```python
+group = mc.groups.ChemicalGroup(groups=["phosphate_binders"])
+```
+
+**Application:** Kinase inhibitors, phosphatase inhibitors
+
+---
+
+### Michael Acceptors
+
+**Purpose:** Identify electrophilic Michael acceptor groups
+
+**Common Patterns:**
+- α,β-Unsaturated carbonyls
+- α,β-Unsaturated nitriles
+- Vinyl sulfones
+- Acrylamides
+
+**Usage:**
+```python
+group = mc.groups.ChemicalGroup(groups=["michael_acceptors"])
+```
+
+**Notes:**
+- Can be desirable for covalent inhibitors
+- Often flagged as reactive alerts in screening
+
+---
+
+### Reactive Groups
+
+**Purpose:** Identify generally reactive functionalities
+
+**Common Patterns:**
+- Epoxides
+- Aziridines
+- Acyl halides
+- Isocyanates
+- Sulfonyl chlorides
+
+**Usage:**
+```python
+group = mc.groups.ChemicalGroup(groups=["reactive_groups"])
+```
+
+---
+
+## Custom SMARTS Patterns
+
+Define custom structural patterns using SMARTS:
+
+```python
+custom_patterns = {
+    "my_warhead": "[C;H0](=O)C(F)(F)F",  # Trifluoromethyl ketone
+    "my_scaffold": "c1ccc2c(c1)ncc(n2)N",  # Aminobenzimidazole
+}
+
+group = mc.groups.ChemicalGroup(
+    groups=["hinge_binders"],
+    custom_smarts=custom_patterns
+)
+```
+
+---
+
+## Filter Selection Guidelines
+
+### Initial Screening (High-Throughput)
+
+Recommended filters:
+- Rule of Five
+- PAINS filter
+- Common Alerts (permissive settings)
+
+```python
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_five", "pains_filter"])
+alert_filter = mc.structural.CommonAlertsFilters()
+```
+
+---
+
+### Hit-to-Lead
+
+Recommended filters:
+- Rule of Oprea or Leadlike (soft)
+- NIBR filters
+- Lilly Demerits
+
+```python
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_oprea"])
+nibr_filter = mc.structural.NIBRFilters()
+lilly_filter = mc.structural.LillyDemeritsFilters()
+```
+
+---
+
+### Lead Optimization
+
+Recommended filters:
+- Rule of Drug
+- Leadlike (strict)
+- Full structural alert analysis
+- Complexity filters
+
+```python
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_drug", "rule_of_leadlike_strict"])
+alert_filter = mc.structural.CommonAlertsFilters()
+complexity_filter = mc.complexity.ComplexityFilter(max_complexity=400)
+```
+
+---
+
+### CNS Targets
+
+Recommended filters:
+- Rule of CNS
+- Reduced PAINS criteria (CNS-focused)
+- BBB permeability constraints
+
+```python
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_cns"])
+constraints = mc.constraints.Constraints(
+    tpsa_max=90,
+    hbd_max=2,
+    mw_range=(300, 450)
+)
+```
+
+---
+
+### Fragment-Based Drug Discovery
+
+Recommended filters:
+- Rule of Three
+- Minimal complexity
+- Basic reactive group check
+
+```python
+rfilter = mc.rules.RuleFilters(rule_list=["rule_of_three"])
+complexity_filter = mc.complexity.ComplexityFilter(max_complexity=250)
+```
+
+---
+
+## Important Considerations
+
+### False Positives and False Negatives
+
+**Filters are guidelines, not absolutes:**
+
+1. **False Positives** (good drugs flagged):
+   - ~10% of marketed drugs fail Rule of Five
+   - Natural products often violate standard rules
+   - Prodrugs intentionally break rules
+   - Antibiotics and antivirals frequently non-compliant
+
+2. **False Negatives** (bad compounds passing):
+   - Passing filters doesn't guarantee success
+   - Target-specific issues not captured
+   - In vivo properties not fully predicted
+
+### Context-Specific Application
+
+**Different contexts require different criteria:**
+
+- **Target Class:** Kinases vs GPCRs vs ion channels have different optimal spaces
+- **Modality:** Small molecules vs PROTACs vs molecular glues
+- **Administration Route:** Oral vs IV vs topical
+- **Disease Area:** CNS vs oncology vs infectious disease
+- **Stage:** Screening vs hit-to-lead vs lead optimization
+
+### Complementing with Machine Learning
+
+Modern approaches combine rules with ML:
+
+```python
+# Rule-based pre-filtering
+rule_results = mc.rules.RuleFilters(rule_list=["rule_of_five"])(mols)
+filtered_mols = [mol for mol, r in zip(mols, rule_results) if r["passes"]]
+
+# ML model scoring on filtered set
+ml_scores = ml_model.predict(filtered_mols)
+
+# Combined decision
+final_candidates = [
+    mol for mol, score in zip(filtered_mols, ml_scores)
+    if score > threshold
+]
+```
+
+---
+
+## References
+
+1. Lipinski CA et al. Adv Drug Deliv Rev (1997) 23:3-25
+2. Veber DF et al. J Med Chem (2002) 45:2615-2623
+3. Oprea TI et al. J Chem Inf Comput Sci (2001) 41:1308-1315
+4. Congreve M et al. Drug Discov Today (2003) 8:876-877
+5. Baell JB & Holloway GA. J Med Chem (2010) 53:2719-2740
+6. Johnson TW et al. J Med Chem (2009) 52:5487-5500
+7. Walters WP & Murcko MA. Adv Drug Deliv Rev (2002) 54:255-271
+8. Hann MM & Oprea TI. Curr Opin Chem Biol (2004) 8:255-263
+9. Rishton GM. Drug Discov Today (1997) 2:382-384
--- a/skills/medchem/scripts/filter_molecules.py
+++ b/skills/medchem/scripts/filter_molecules.py
@@ -0,0 +1,418 @@
+#!/usr/bin/env python3
+"""
+Batch molecular filtering using medchem library.
+
+This script provides a production-ready workflow for filtering compound libraries
+using medchem rules, structural alerts, and custom constraints.
+
+Usage:
+    python filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv
+    python filter_molecules.py input.sdf --rules rule_of_drug --lilly --complexity 400 --output results.csv
+    python filter_molecules.py smiles.txt --nibr --pains --n-jobs -1 --output clean.csv
+"""
+
+import argparse
+import sys
+from pathlib import Path
+from typing import List, Dict, Optional, Tuple
+import json
+
+try:
+    import pandas as pd
+    import datamol as dm
+    import medchem as mc
+    from rdkit import Chem
+    from tqdm import tqdm
+except ImportError as e:
+    print(f"Error: Missing required package: {e}")
+    print("Install dependencies: pip install medchem datamol pandas tqdm")
+    sys.exit(1)
+
+
+def load_molecules(input_file: Path, smiles_column: str = "smiles") -> Tuple[pd.DataFrame, List[Chem.Mol]]:
+    """
+    Load molecules from various file formats.
+
+    Supports:
+    - CSV/TSV with SMILES column
+    - SDF files
+    - Plain text files with one SMILES per line
+
+    Returns:
+        Tuple of (DataFrame with metadata, list of RDKit molecules)
+    """
+    suffix = input_file.suffix.lower()
+
+    if suffix == ".sdf":
+        print(f"Loading SDF file: {input_file}")
+        supplier = Chem.SDMolSupplier(str(input_file))
+        mols = [mol for mol in supplier if mol is not None]
+
+        # Create DataFrame from SDF properties
+        data = []
+        for mol in mols:
+            props = mol.GetPropsAsDict()
+            props["smiles"] = Chem.MolToSmiles(mol)
+            data.append(props)
+        df = pd.DataFrame(data)
+
+    elif suffix in [".csv", ".tsv"]:
+        print(f"Loading CSV/TSV file: {input_file}")
+        sep = "\t" if suffix == ".tsv" else ","
+        df = pd.read_csv(input_file, sep=sep)
+
+        if smiles_column not in df.columns:
+            print(f"Error: Column '{smiles_column}' not found in file")
+            print(f"Available columns: {', '.join(df.columns)}")
+            sys.exit(1)
+
+        print(f"Converting SMILES to molecules...")
+        mols = [dm.to_mol(smi) for smi in tqdm(df[smiles_column], desc="Parsing")]
+
+    elif suffix == ".txt":
+        print(f"Loading text file: {input_file}")
+        with open(input_file) as f:
+            smiles_list = [line.strip() for line in f if line.strip()]
+
+        df = pd.DataFrame({"smiles": smiles_list})
+        print(f"Converting SMILES to molecules...")
+        mols = [dm.to_mol(smi) for smi in tqdm(smiles_list, desc="Parsing")]
+
+    else:
+        print(f"Error: Unsupported file format: {suffix}")
+        print("Supported formats: .csv, .tsv, .sdf, .txt")
+        sys.exit(1)
+
+    # Filter out invalid molecules
+    valid_indices = [i for i, mol in enumerate(mols) if mol is not None]
+    if len(valid_indices) < len(mols):
+        n_invalid = len(mols) - len(valid_indices)
+        print(f"Warning: {n_invalid} invalid molecules removed")
+        df = df.iloc[valid_indices].reset_index(drop=True)
+        mols = [mols[i] for i in valid_indices]
+
+    print(f"Loaded {len(mols)} valid molecules")
+    return df, mols
+
+
+def apply_rule_filters(mols: List[Chem.Mol], rules: List[str], n_jobs: int) -> pd.DataFrame:
+    """Apply medicinal chemistry rule filters."""
+    print(f"\nApplying rule filters: {', '.join(rules)}")
+
+    rfilter = mc.rules.RuleFilters(rule_list=rules)
+    results = rfilter(mols=mols, n_jobs=n_jobs, progress=True)
+
+    # Convert to DataFrame
+    df_results = pd.DataFrame(results)
+
+    # Add summary column
+    df_results["passes_all_rules"] = df_results.all(axis=1)
+
+    return df_results
+
+
+def apply_structural_alerts(mols: List[Chem.Mol], alert_type: str, n_jobs: int) -> pd.DataFrame:
+    """Apply structural alert filters."""
+    print(f"\nApplying {alert_type} structural alerts...")
+
+    if alert_type == "common":
+        alert_filter = mc.structural.CommonAlertsFilters()
+        results = alert_filter(mols=mols, n_jobs=n_jobs, progress=True)
+
+        df_results = pd.DataFrame({
+            "has_common_alerts": [r["has_alerts"] for r in results],
+            "num_common_alerts": [r["num_alerts"] for r in results],
+            "common_alert_details": [", ".join(r["alert_details"]) if r["alert_details"] else "" for r in results]
+        })
+
+    elif alert_type == "nibr":
+        nibr_filter = mc.structural.NIBRFilters()
+        results = nibr_filter(mols=mols, n_jobs=n_jobs, progress=True)
+
+        df_results = pd.DataFrame({
+            "passes_nibr": results
+        })
+
+    elif alert_type == "lilly":
+        lilly_filter = mc.structural.LillyDemeritsFilters()
+        results = lilly_filter(mols=mols, n_jobs=n_jobs, progress=True)
+
+        df_results = pd.DataFrame({
+            "lilly_demerits": [r["demerits"] for r in results],
+            "passes_lilly": [r["passes"] for r in results],
+            "lilly_patterns": [", ".join([p["pattern"] for p in r["matched_patterns"]]) for r in results]
+        })
+
+    elif alert_type == "pains":
+        results = [mc.rules.basic_rules.pains_filter(mol) for mol in tqdm(mols, desc="PAINS")]
+
+        df_results = pd.DataFrame({
+            "passes_pains": results
+        })
+
+    else:
+        raise ValueError(f"Unknown alert type: {alert_type}")
+
+    return df_results
+
+
+def apply_complexity_filter(mols: List[Chem.Mol], max_complexity: float, method: str = "bertz") -> pd.DataFrame:
+    """Calculate molecular complexity."""
+    print(f"\nCalculating molecular complexity (method={method}, max={max_complexity})...")
+
+    complexity_scores = [
+        mc.complexity.calculate_complexity(mol, method=method)
+        for mol in tqdm(mols, desc="Complexity")
+    ]
+
+    df_results = pd.DataFrame({
+        "complexity_score": complexity_scores,
+        "passes_complexity": [score <= max_complexity for score in complexity_scores]
+    })
+
+    return df_results
+
+
+def apply_constraints(mols: List[Chem.Mol], constraints: Dict, n_jobs: int) -> pd.DataFrame:
+    """Apply custom property constraints."""
+    print(f"\nApplying constraints: {constraints}")
+
+    constraint_filter = mc.constraints.Constraints(**constraints)
+    results = constraint_filter(mols=mols, n_jobs=n_jobs, progress=True)
+
+    df_results = pd.DataFrame({
+        "passes_constraints": [r["passes"] for r in results],
+        "constraint_violations": [", ".join(r["violations"]) if r["violations"] else "" for r in results]
+    })
+
+    return df_results
+
+
+def apply_chemical_groups(mols: List[Chem.Mol], groups: List[str]) -> pd.DataFrame:
+    """Detect chemical groups."""
+    print(f"\nDetecting chemical groups: {', '.join(groups)}")
+
+    group_detector = mc.groups.ChemicalGroup(groups=groups)
+    results = group_detector.get_all_matches(mols)
+
+    df_results = pd.DataFrame()
+    for group in groups:
+        df_results[f"has_{group}"] = [bool(r.get(group)) for r in results]
+
+    return df_results
+
+
+def generate_summary(df: pd.DataFrame, output_file: Path):
+    """Generate filtering summary report."""
+    summary_file = output_file.parent / f"{output_file.stem}_summary.txt"
+
+    with open(summary_file, "w") as f:
+        f.write("=" * 80 + "\n")
+        f.write("MEDCHEM FILTERING SUMMARY\n")
+        f.write("=" * 80 + "\n\n")
+
+        f.write(f"Total molecules processed: {len(df)}\n\n")
+
+        # Rule results
+        rule_cols = [col for col in df.columns if col.startswith("rule_") or col == "passes_all_rules"]
+        if rule_cols:
+            f.write("RULE FILTERS:\n")
+            f.write("-" * 40 + "\n")
+            for col in rule_cols:
+                if col in df.columns and df[col].dtype == bool:
+                    n_pass = df[col].sum()
+                    pct = 100 * n_pass / len(df)
+                    f.write(f"  {col}: {n_pass} passed ({pct:.1f}%)\n")
+            f.write("\n")
+
+        # Structural alerts
+        alert_cols = [col for col in df.columns if "alert" in col.lower() or "nibr" in col.lower() or "lilly" in col.lower() or "pains" in col.lower()]
+        if alert_cols:
+            f.write("STRUCTURAL ALERTS:\n")
+            f.write("-" * 40 + "\n")
+            if "has_common_alerts" in df.columns:
+                n_clean = (~df["has_common_alerts"]).sum()
+                pct = 100 * n_clean / len(df)
+                f.write(f"  No common alerts: {n_clean} ({pct:.1f}%)\n")
+            if "passes_nibr" in df.columns:
+                n_pass = df["passes_nibr"].sum()
+                pct = 100 * n_pass / len(df)
+                f.write(f"  Passes NIBR: {n_pass} ({pct:.1f}%)\n")
+            if "passes_lilly" in df.columns:
+                n_pass = df["passes_lilly"].sum()
+                pct = 100 * n_pass / len(df)
+                f.write(f"  Passes Lilly: {n_pass} ({pct:.1f}%)\n")
+                avg_demerits = df["lilly_demerits"].mean()
+                f.write(f"  Average Lilly demerits: {avg_demerits:.1f}\n")
+            if "passes_pains" in df.columns:
+                n_pass = df["passes_pains"].sum()
+                pct = 100 * n_pass / len(df)
+                f.write(f"  Passes PAINS: {n_pass} ({pct:.1f}%)\n")
+            f.write("\n")
+
+        # Complexity
+        if "complexity_score" in df.columns:
+            f.write("COMPLEXITY:\n")
+            f.write("-" * 40 + "\n")
+            avg_complexity = df["complexity_score"].mean()
+            f.write(f"  Average complexity: {avg_complexity:.1f}\n")
+            if "passes_complexity" in df.columns:
+                n_pass = df["passes_complexity"].sum()
+                pct = 100 * n_pass / len(df)
+                f.write(f"  Within threshold: {n_pass} ({pct:.1f}%)\n")
+            f.write("\n")
+
+        # Constraints
+        if "passes_constraints" in df.columns:
+            f.write("CONSTRAINTS:\n")
+            f.write("-" * 40 + "\n")
+            n_pass = df["passes_constraints"].sum()
+            pct = 100 * n_pass / len(df)
+            f.write(f"  Passes all constraints: {n_pass} ({pct:.1f}%)\n")
+            f.write("\n")
+
+        # Overall pass rate
+        pass_cols = [col for col in df.columns if col.startswith("passes_")]
+        if pass_cols:
+            df["passes_all_filters"] = df[pass_cols].all(axis=1)
+            n_pass = df["passes_all_filters"].sum()
+            pct = 100 * n_pass / len(df)
+            f.write("OVERALL:\n")
+            f.write("-" * 40 + "\n")
+            f.write(f"  Molecules passing all filters: {n_pass} ({pct:.1f}%)\n")
+
+        f.write("\n" + "=" * 80 + "\n")
+
+    print(f"\nSummary report saved to: {summary_file}")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Batch molecular filtering using medchem",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=__doc__
+    )
+
+    # Input/Output
+    parser.add_argument("input", type=Path, help="Input file (CSV, TSV, SDF, or TXT)")
+    parser.add_argument("--output", "-o", type=Path, required=True, help="Output CSV file")
+    parser.add_argument("--smiles-column", default="smiles", help="Name of SMILES column (default: smiles)")
+
+    # Rule filters
+    parser.add_argument("--rules", help="Comma-separated list of rules (e.g., rule_of_five,rule_of_cns)")
+
+    # Structural alerts
+    parser.add_argument("--common-alerts", action="store_true", help="Apply common structural alerts")
+    parser.add_argument("--nibr", action="store_true", help="Apply NIBR filters")
+    parser.add_argument("--lilly", action="store_true", help="Apply Lilly demerits filter")
+    parser.add_argument("--pains", action="store_true", help="Apply PAINS filter")
+
+    # Complexity
+    parser.add_argument("--complexity", type=float, help="Maximum complexity threshold")
+    parser.add_argument("--complexity-method", default="bertz", choices=["bertz", "whitlock", "barone"],
+                       help="Complexity calculation method")
+
+    # Constraints
+    parser.add_argument("--mw-range", help="Molecular weight range (e.g., 200,500)")
+    parser.add_argument("--logp-range", help="LogP range (e.g., -2,5)")
+    parser.add_argument("--tpsa-max", type=float, help="Maximum TPSA")
+    parser.add_argument("--hbd-max", type=int, help="Maximum H-bond donors")
+    parser.add_argument("--hba-max", type=int, help="Maximum H-bond acceptors")
+    parser.add_argument("--rotatable-bonds-max", type=int, help="Maximum rotatable bonds")
+
+    # Chemical groups
+    parser.add_argument("--groups", help="Comma-separated chemical groups to detect")
+
+    # Processing options
+    parser.add_argument("--n-jobs", type=int, default=-1, help="Number of parallel jobs (-1 = all cores)")
+    parser.add_argument("--no-summary", action="store_true", help="Don't generate summary report")
+    parser.add_argument("--filter-output", action="store_true", help="Only output molecules passing all filters")
+
+    args = parser.parse_args()
+
+    # Load molecules
+    df, mols = load_molecules(args.input, args.smiles_column)
+
+    # Apply filters
+    result_dfs = [df]
+
+    # Rules
+    if args.rules:
+        rule_list = [r.strip() for r in args.rules.split(",")]
+        df_rules = apply_rule_filters(mols, rule_list, args.n_jobs)
+        result_dfs.append(df_rules)
+
+    # Structural alerts
+    if args.common_alerts:
+        df_alerts = apply_structural_alerts(mols, "common", args.n_jobs)
+        result_dfs.append(df_alerts)
+
+    if args.nibr:
+        df_nibr = apply_structural_alerts(mols, "nibr", args.n_jobs)
+        result_dfs.append(df_nibr)
+
+    if args.lilly:
+        df_lilly = apply_structural_alerts(mols, "lilly", args.n_jobs)
+        result_dfs.append(df_lilly)
+
+    if args.pains:
+        df_pains = apply_structural_alerts(mols, "pains", args.n_jobs)
+        result_dfs.append(df_pains)
+
+    # Complexity
+    if args.complexity:
+        df_complexity = apply_complexity_filter(mols, args.complexity, args.complexity_method)
+        result_dfs.append(df_complexity)
+
+    # Constraints
+    constraints = {}
+    if args.mw_range:
+        mw_min, mw_max = map(float, args.mw_range.split(","))
+        constraints["mw_range"] = (mw_min, mw_max)
+    if args.logp_range:
+        logp_min, logp_max = map(float, args.logp_range.split(","))
+        constraints["logp_range"] = (logp_min, logp_max)
+    if args.tpsa_max:
+        constraints["tpsa_max"] = args.tpsa_max
+    if args.hbd_max:
+        constraints["hbd_max"] = args.hbd_max
+    if args.hba_max:
+        constraints["hba_max"] = args.hba_max
+    if args.rotatable_bonds_max:
+        constraints["rotatable_bonds_max"] = args.rotatable_bonds_max
+
+    if constraints:
+        df_constraints = apply_constraints(mols, constraints, args.n_jobs)
+        result_dfs.append(df_constraints)
+
+    # Chemical groups
+    if args.groups:
+        group_list = [g.strip() for g in args.groups.split(",")]
+        df_groups = apply_chemical_groups(mols, group_list)
+        result_dfs.append(df_groups)
+
+    # Combine results
+    df_final = pd.concat(result_dfs, axis=1)
+
+    # Filter output if requested
+    if args.filter_output:
+        pass_cols = [col for col in df_final.columns if col.startswith("passes_")]
+        if pass_cols:
+            df_final["passes_all"] = df_final[pass_cols].all(axis=1)
+            df_final = df_final[df_final["passes_all"]]
+            print(f"\nFiltered to {len(df_final)} molecules passing all filters")
+
+    # Save results
+    args.output.parent.mkdir(parents=True, exist_ok=True)
+    df_final.to_csv(args.output, index=False)
+    print(f"\nResults saved to: {args.output}")
+
+    # Generate summary
+    if not args.no_summary:
+        generate_summary(df_final, args.output)
+
+    print("\nDone!")
+
+
+if __name__ == "__main__":
+    main()