# Datamol Core API Reference

This document covers the main functions available in the datamol namespace.

## Molecule Creation and Conversion

### `to_mol(mol, ...)`
Convert SMILES string or other molecular representations to RDKit molecule objects.
- **Parameters**: Accepts SMILES strings, InChI, or other molecular formats
- **Returns**: `rdkit.Chem.Mol` object
- **Common usage**: `mol = dm.to_mol("CCO")`

### `from_inchi(inchi)`
Convert InChI string to molecule object.

### `from_smarts(smarts)`
Convert SMARTS pattern to molecule object.

### `from_selfies(selfies)`
Convert SELFIES string to molecule object.

### `copy_mol(mol)`
Create a copy of a molecule object to avoid modifying the original.

## Molecule Export

### `to_smiles(mol, ...)`
Convert molecule object to SMILES string.
- **Common parameters**: `canonical=True`, `isomeric=True`

### `to_inchi(mol, ...)`
Convert molecule to InChI string representation.

### `to_inchikey(mol)`
Convert molecule to InChI key (fixed-length hash).

### `to_smarts(mol)`
Convert molecule to SMARTS pattern.

### `to_selfies(mol)`
Convert molecule to SELFIES (Self-Referencing Embedded Strings) format.

## Sanitization and Standardization

### `sanitize_mol(mol, ...)`
Enhanced version of RDKit's sanitize operation using mol→SMILES→mol conversion and aromatic nitrogen fixing.
- **Purpose**: Fix common molecular structure issues
- **Returns**: Sanitized molecule or None if sanitization fails

### `standardize_mol(mol, disconnect_metals=False, normalize=True, reionize=True, ...)`
Apply comprehensive standardization procedures including:
- Metal disconnection
- Normalization (charge corrections)
- Reionization
- Fragment handling (largest fragment selection)

### `standardize_smiles(smiles, ...)`
Apply SMILES standardization procedures directly to a SMILES string.

### `fix_mol(mol)`
Attempt to fix molecular structure issues automatically.

### `fix_valence(mol)`
Correct valence errors in molecular structures.

## Molecular Properties

### `reorder_atoms(mol, ...)`
Ensure consistent atom ordering for the same molecule regardless of original SMILES representation.
- **Purpose**: Maintain reproducible feature generation

### `remove_hs(mol, ...)`
Remove hydrogen atoms from molecular structure.

### `add_hs(mol, ...)`
Add explicit hydrogen atoms to molecular structure.

## Fingerprints and Similarity

### `to_fp(mol, fp_type='ecfp', ...)`
Generate molecular fingerprints for similarity calculations.
- **Fingerprint types**:
  - `'ecfp'` - Extended Connectivity Fingerprints (Morgan)
  - `'fcfp'` - Functional Connectivity Fingerprints
  - `'maccs'` - MACCS keys
  - `'topological'` - Topological fingerprints
  - `'atompair'` - Atom pair fingerprints
- **Common parameters**: `n_bits`, `radius`
- **Returns**: Numpy array or RDKit fingerprint object

### `pdist(mols, ...)`
Calculate pairwise Tanimoto distances between all molecules in a list.
- **Supports**: Parallel processing via `n_jobs` parameter
- **Returns**: Distance matrix

### `cdist(mols1, mols2, ...)`
Calculate Tanimoto distances between two sets of molecules.

## Clustering and Diversity

### `cluster_mols(mols, cutoff=0.2, feature_fn=None, n_jobs=1)`
Cluster molecules using Butina clustering algorithm.
- **Parameters**:
  - `cutoff`: Distance threshold (default 0.2)
  - `feature_fn`: Custom function for molecular features
  - `n_jobs`: Parallelization (-1 for all cores)
- **Important**: Builds full distance matrix - suitable for ~1000 structures, not for 10,000+
- **Returns**: List of clusters (each cluster is a list of molecule indices)

### `pick_diverse(mols, npick, ...)`
Select diverse subset of molecules based on fingerprint diversity.

### `pick_centroids(mols, npick, ...)`
Select centroid molecules representing clusters.

## Graph Operations

### `to_graph(mol)`
Convert molecule to graph representation for graph-based analysis.

### `get_all_path_between(mol, start, end)`
Find all paths between two atoms in molecular structure.

## DataFrame Integration

### `to_df(mols, smiles_column='smiles', mol_column='mol')`
Convert list of molecules to pandas DataFrame.

### `from_df(df, smiles_column='smiles', mol_column='mol')`
Convert pandas DataFrame to list of molecules.