Initial commit
This commit is contained in:
130
skills/datamol/references/core_api.md
Normal file
130
skills/datamol/references/core_api.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Datamol Core API Reference
|
||||
|
||||
This document covers the main functions available in the datamol namespace.
|
||||
|
||||
## Molecule Creation and Conversion
|
||||
|
||||
### `to_mol(mol, ...)`
|
||||
Convert SMILES string or other molecular representations to RDKit molecule objects.
|
||||
- **Parameters**: Accepts SMILES strings, InChI, or other molecular formats
|
||||
- **Returns**: `rdkit.Chem.Mol` object
|
||||
- **Common usage**: `mol = dm.to_mol("CCO")`
|
||||
|
||||
### `from_inchi(inchi)`
|
||||
Convert InChI string to molecule object.
|
||||
|
||||
### `from_smarts(smarts)`
|
||||
Convert SMARTS pattern to molecule object.
|
||||
|
||||
### `from_selfies(selfies)`
|
||||
Convert SELFIES string to molecule object.
|
||||
|
||||
### `copy_mol(mol)`
|
||||
Create a copy of a molecule object to avoid modifying the original.
|
||||
|
||||
## Molecule Export
|
||||
|
||||
### `to_smiles(mol, ...)`
|
||||
Convert molecule object to SMILES string.
|
||||
- **Common parameters**: `canonical=True`, `isomeric=True`
|
||||
|
||||
### `to_inchi(mol, ...)`
|
||||
Convert molecule to InChI string representation.
|
||||
|
||||
### `to_inchikey(mol)`
|
||||
Convert molecule to InChI key (fixed-length hash).
|
||||
|
||||
### `to_smarts(mol)`
|
||||
Convert molecule to SMARTS pattern.
|
||||
|
||||
### `to_selfies(mol)`
|
||||
Convert molecule to SELFIES (Self-Referencing Embedded Strings) format.
|
||||
|
||||
## Sanitization and Standardization
|
||||
|
||||
### `sanitize_mol(mol, ...)`
|
||||
Enhanced version of RDKit's sanitize operation using mol→SMILES→mol conversion and aromatic nitrogen fixing.
|
||||
- **Purpose**: Fix common molecular structure issues
|
||||
- **Returns**: Sanitized molecule or None if sanitization fails
|
||||
|
||||
### `standardize_mol(mol, disconnect_metals=False, normalize=True, reionize=True, ...)`
|
||||
Apply comprehensive standardization procedures including:
|
||||
- Metal disconnection
|
||||
- Normalization (charge corrections)
|
||||
- Reionization
|
||||
- Fragment handling (largest fragment selection)
|
||||
|
||||
### `standardize_smiles(smiles, ...)`
|
||||
Apply SMILES standardization procedures directly to a SMILES string.
|
||||
|
||||
### `fix_mol(mol)`
|
||||
Attempt to fix molecular structure issues automatically.
|
||||
|
||||
### `fix_valence(mol)`
|
||||
Correct valence errors in molecular structures.
|
||||
|
||||
## Molecular Properties
|
||||
|
||||
### `reorder_atoms(mol, ...)`
|
||||
Ensure consistent atom ordering for the same molecule regardless of original SMILES representation.
|
||||
- **Purpose**: Maintain reproducible feature generation
|
||||
|
||||
### `remove_hs(mol, ...)`
|
||||
Remove hydrogen atoms from molecular structure.
|
||||
|
||||
### `add_hs(mol, ...)`
|
||||
Add explicit hydrogen atoms to molecular structure.
|
||||
|
||||
## Fingerprints and Similarity
|
||||
|
||||
### `to_fp(mol, fp_type='ecfp', ...)`
|
||||
Generate molecular fingerprints for similarity calculations.
|
||||
- **Fingerprint types**:
|
||||
- `'ecfp'` - Extended Connectivity Fingerprints (Morgan)
|
||||
- `'fcfp'` - Functional Connectivity Fingerprints
|
||||
- `'maccs'` - MACCS keys
|
||||
- `'topological'` - Topological fingerprints
|
||||
- `'atompair'` - Atom pair fingerprints
|
||||
- **Common parameters**: `n_bits`, `radius`
|
||||
- **Returns**: Numpy array or RDKit fingerprint object
|
||||
|
||||
### `pdist(mols, ...)`
|
||||
Calculate pairwise Tanimoto distances between all molecules in a list.
|
||||
- **Supports**: Parallel processing via `n_jobs` parameter
|
||||
- **Returns**: Distance matrix
|
||||
|
||||
### `cdist(mols1, mols2, ...)`
|
||||
Calculate Tanimoto distances between two sets of molecules.
|
||||
|
||||
## Clustering and Diversity
|
||||
|
||||
### `cluster_mols(mols, cutoff=0.2, feature_fn=None, n_jobs=1)`
|
||||
Cluster molecules using Butina clustering algorithm.
|
||||
- **Parameters**:
|
||||
- `cutoff`: Distance threshold (default 0.2)
|
||||
- `feature_fn`: Custom function for molecular features
|
||||
- `n_jobs`: Parallelization (-1 for all cores)
|
||||
- **Important**: Builds full distance matrix - suitable for ~1000 structures, not for 10,000+
|
||||
- **Returns**: List of clusters (each cluster is a list of molecule indices)
|
||||
|
||||
### `pick_diverse(mols, npick, ...)`
|
||||
Select diverse subset of molecules based on fingerprint diversity.
|
||||
|
||||
### `pick_centroids(mols, npick, ...)`
|
||||
Select centroid molecules representing clusters.
|
||||
|
||||
## Graph Operations
|
||||
|
||||
### `to_graph(mol)`
|
||||
Convert molecule to graph representation for graph-based analysis.
|
||||
|
||||
### `get_all_path_between(mol, start, end)`
|
||||
Find all paths between two atoms in molecular structure.
|
||||
|
||||
## DataFrame Integration
|
||||
|
||||
### `to_df(mols, smiles_column='smiles', mol_column='mol')`
|
||||
Convert list of molecules to pandas DataFrame.
|
||||
|
||||
### `from_df(df, smiles_column='smiles', mol_column='mol')`
|
||||
Convert pandas DataFrame to list of molecules.
|
||||
Reference in New Issue
Block a user