637 lines
12 KiB
Markdown
637 lines
12 KiB
Markdown
# BioServices: Complete Services Reference
|
|
|
|
This document provides a comprehensive reference for all major services available in BioServices, including key methods, parameters, and use cases.
|
|
|
|
## Protein & Gene Resources
|
|
|
|
### UniProt
|
|
|
|
Protein sequence and functional information database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import UniProt
|
|
u = UniProt(verbose=False)
|
|
```
|
|
|
|
**Key Methods:**
|
|
|
|
- `search(query, frmt="tab", columns=None, limit=None, sort=None, compress=False, include=False, **kwargs)`
|
|
- Search UniProt with flexible query syntax
|
|
- `frmt`: "tab", "fasta", "xml", "rdf", "gff", "txt"
|
|
- `columns`: Comma-separated list (e.g., "id,genes,organism,length")
|
|
- Returns: String in requested format
|
|
|
|
- `retrieve(uniprot_id, frmt="txt")`
|
|
- Retrieve specific UniProt entry
|
|
- `frmt`: "txt", "fasta", "xml", "rdf", "gff"
|
|
- Returns: Entry data in requested format
|
|
|
|
- `mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")`
|
|
- Convert identifiers between databases
|
|
- `fr`/`to`: Database identifiers (see identifier_mapping.md)
|
|
- `query`: Single ID or comma-separated list
|
|
- Returns: Dictionary mapping input to output IDs
|
|
|
|
- `searchUniProtId(pattern, columns="entry name,length,organism", limit=100)`
|
|
- Convenience method for ID-based searches
|
|
- Returns: Tab-separated values
|
|
|
|
**Common columns:** id, entry name, genes, organism, protein names, length, sequence, go-id, ec, pathway, interactor
|
|
|
|
**Use cases:**
|
|
- Protein sequence retrieval for BLAST
|
|
- Functional annotation lookup
|
|
- Cross-database identifier mapping
|
|
- Batch protein information retrieval
|
|
|
|
---
|
|
|
|
### KEGG (Kyoto Encyclopedia of Genes and Genomes)
|
|
|
|
Metabolic pathways, genes, and organisms database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import KEGG
|
|
k = KEGG()
|
|
k.organism = "hsa" # Set default organism
|
|
```
|
|
|
|
**Key Methods:**
|
|
|
|
- `list(database)`
|
|
- List entries in KEGG database
|
|
- `database`: "organism", "pathway", "module", "disease", "drug", "compound"
|
|
- Returns: Multi-line string with entries
|
|
|
|
- `find(database, query)`
|
|
- Search database by keywords
|
|
- Returns: List of matching entries with IDs
|
|
|
|
- `get(entry_id)`
|
|
- Retrieve entry by ID
|
|
- Supports genes, pathways, compounds, etc.
|
|
- Returns: Raw entry text
|
|
|
|
- `parse(data)`
|
|
- Parse KEGG entry into dictionary
|
|
- Returns: Dict with structured data
|
|
|
|
- `lookfor_organism(name)`
|
|
- Search organisms by name pattern
|
|
- Returns: List of matching organism codes
|
|
|
|
- `lookfor_pathway(name)`
|
|
- Search pathways by name
|
|
- Returns: List of pathway IDs
|
|
|
|
- `get_pathway_by_gene(gene_id, organism)`
|
|
- Find pathways containing gene
|
|
- Returns: List of pathway IDs
|
|
|
|
- `parse_kgml_pathway(pathway_id)`
|
|
- Parse pathway KGML for interactions
|
|
- Returns: Dict with "entries" and "relations"
|
|
|
|
- `pathway2sif(pathway_id)`
|
|
- Extract Simple Interaction Format data
|
|
- Filters for activation/inhibition
|
|
- Returns: List of interaction tuples
|
|
|
|
**Organism codes:**
|
|
- hsa: Homo sapiens
|
|
- mmu: Mus musculus
|
|
- dme: Drosophila melanogaster
|
|
- sce: Saccharomyces cerevisiae
|
|
- eco: Escherichia coli
|
|
|
|
**Use cases:**
|
|
- Pathway analysis and visualization
|
|
- Gene function annotation
|
|
- Metabolic network reconstruction
|
|
- Protein-protein interaction extraction
|
|
|
|
---
|
|
|
|
### HGNC (Human Gene Nomenclature Committee)
|
|
|
|
Official human gene naming authority.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import HGNC
|
|
h = HGNC()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `search(query)`: Search gene symbols/names
|
|
- `fetch(format, query)`: Retrieve gene information
|
|
|
|
**Use cases:**
|
|
- Standardizing human gene names
|
|
- Looking up official gene symbols
|
|
|
|
---
|
|
|
|
### MyGeneInfo
|
|
|
|
Gene annotation and query service.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import MyGeneInfo
|
|
m = MyGeneInfo()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `querymany(ids, scopes, fields, species)`: Batch gene queries
|
|
- `getgene(geneid)`: Get gene annotation
|
|
|
|
**Use cases:**
|
|
- Batch gene annotation retrieval
|
|
- Gene ID conversion
|
|
|
|
---
|
|
|
|
## Chemical Compound Resources
|
|
|
|
### ChEBI (Chemical Entities of Biological Interest)
|
|
|
|
Dictionary of molecular entities.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import ChEBI
|
|
c = ChEBI()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `getCompleteEntity(chebi_id)`: Full compound information
|
|
- `getLiteEntity(chebi_id)`: Basic information
|
|
- `getCompleteEntityByList(chebi_ids)`: Batch retrieval
|
|
|
|
**Use cases:**
|
|
- Small molecule information
|
|
- Chemical structure data
|
|
- Compound property lookup
|
|
|
|
---
|
|
|
|
### ChEMBL
|
|
|
|
Bioactive drug-like compound database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import ChEMBL
|
|
c = ChEMBL()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `get_molecule_form(chembl_id)`: Compound details
|
|
- `get_target(chembl_id)`: Target information
|
|
- `get_similarity(chembl_id)`: Get similar compounds for given
|
|
- `get_assays()`: Bioassay data
|
|
|
|
**Use cases:**
|
|
- Drug discovery data
|
|
- Find similar compounds
|
|
- Bioactivity information
|
|
- Target-compound relationships
|
|
|
|
---
|
|
|
|
### UniChem
|
|
|
|
Chemical identifier mapping service.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import UniChem
|
|
u = UniChem()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `get_compound_id_from_kegg(kegg_id)`: KEGG → ChEMBL
|
|
- `get_all_compound_ids(src_compound_id, src_id)`: Get all IDs
|
|
- `get_src_compound_ids(src_compound_id, from_src_id, to_src_id)`: Convert IDs
|
|
|
|
**Source IDs:**
|
|
- 1: ChEMBL
|
|
- 2: DrugBank
|
|
- 3: PDB
|
|
- 6: KEGG
|
|
- 7: ChEBI
|
|
- 22: PubChem
|
|
|
|
**Use cases:**
|
|
- Cross-database compound ID mapping
|
|
- Linking chemical databases
|
|
|
|
---
|
|
|
|
### PubChem
|
|
|
|
Chemical compound database from NIH.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import PubChem
|
|
p = PubChem()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `get_compounds(identifier, namespace)`: Retrieve compounds
|
|
- `get_properties(properties, identifier, namespace)`: Get properties
|
|
|
|
**Use cases:**
|
|
- Chemical structure retrieval
|
|
- Compound property information
|
|
|
|
---
|
|
|
|
## Sequence Analysis Tools
|
|
|
|
### NCBIblast
|
|
|
|
Sequence similarity searching.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import NCBIblast
|
|
s = NCBIblast(verbose=False)
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `run(program, sequence, stype, database, email, **params)`
|
|
- Submit BLAST job
|
|
- `program`: "blastp", "blastn", "blastx", "tblastn", "tblastx"
|
|
- `stype`: "protein" or "dna"
|
|
- `database`: "uniprotkb", "pdb", "refseq_protein", etc.
|
|
- `email`: Required by NCBI
|
|
- Returns: Job ID
|
|
|
|
- `getStatus(jobid)`
|
|
- Check job status
|
|
- Returns: "RUNNING", "FINISHED", "ERROR"
|
|
|
|
- `getResult(jobid, result_type)`
|
|
- Retrieve results
|
|
- `result_type`: "out" (default), "ids", "xml"
|
|
|
|
**Important:** BLAST jobs are asynchronous. Always check status before retrieving results.
|
|
|
|
**Use cases:**
|
|
- Protein homology searches
|
|
- Sequence similarity analysis
|
|
- Functional annotation by homology
|
|
|
|
---
|
|
|
|
## Pathway & Interaction Resources
|
|
|
|
### Reactome
|
|
|
|
Pathway database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import Reactome
|
|
r = Reactome()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `get_pathway_by_id(pathway_id)`: Pathway details
|
|
- `search_pathway(query)`: Search pathways
|
|
|
|
**Use cases:**
|
|
- Human pathway analysis
|
|
- Biological process annotation
|
|
|
|
---
|
|
|
|
### PSICQUIC
|
|
|
|
Protein interaction query service (federates 30+ databases).
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import PSICQUIC
|
|
s = PSICQUIC()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `query(database, query_string)`
|
|
- Query specific interaction database
|
|
- Returns: PSI-MI TAB format
|
|
|
|
- `activeDBs`
|
|
- Property listing available databases
|
|
- Returns: List of database names
|
|
|
|
**Available databases:** MINT, IntAct, BioGRID, DIP, InnateDB, MatrixDB, MPIDB, UniProt, and 30+ more
|
|
|
|
**Query syntax:** Supports AND, OR, species filters
|
|
- Example: "ZAP70 AND species:9606"
|
|
|
|
**Use cases:**
|
|
- Protein-protein interaction discovery
|
|
- Network analysis
|
|
- Interactome mapping
|
|
|
|
---
|
|
|
|
### IntactComplex
|
|
|
|
Protein complex database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import IntactComplex
|
|
i = IntactComplex()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `search(query)`: Search complexes
|
|
- `details(complex_ac)`: Complex details
|
|
|
|
**Use cases:**
|
|
- Protein complex composition
|
|
- Multi-protein assembly analysis
|
|
|
|
---
|
|
|
|
### OmniPath
|
|
|
|
Integrated signaling pathway database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import OmniPath
|
|
o = OmniPath()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `interactions(datasets, organisms)`: Get interactions
|
|
- `ptms(datasets, organisms)`: Post-translational modifications
|
|
|
|
**Use cases:**
|
|
- Cell signaling analysis
|
|
- Regulatory network mapping
|
|
|
|
---
|
|
|
|
## Gene Ontology
|
|
|
|
### QuickGO
|
|
|
|
Gene Ontology annotation service.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import QuickGO
|
|
g = QuickGO()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `Term(go_id, frmt="obo")`
|
|
- Retrieve GO term information
|
|
- Returns: Term definition and metadata
|
|
|
|
- `Annotation(protein=None, goid=None, format="tsv")`
|
|
- Get GO annotations
|
|
- Returns: Annotations in requested format
|
|
|
|
**GO categories:**
|
|
- Biological Process (BP)
|
|
- Molecular Function (MF)
|
|
- Cellular Component (CC)
|
|
|
|
**Use cases:**
|
|
- Functional annotation
|
|
- Enrichment analysis
|
|
- GO term lookup
|
|
|
|
---
|
|
|
|
## Genomic Resources
|
|
|
|
### BioMart
|
|
|
|
Data mining tool for genomic data.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import BioMart
|
|
b = BioMart()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `datasets(dataset)`: List available datasets
|
|
- `attributes(dataset)`: List attributes
|
|
- `query(query_xml)`: Execute BioMart query
|
|
|
|
**Use cases:**
|
|
- Bulk genomic data retrieval
|
|
- Custom genome annotations
|
|
- SNP information
|
|
|
|
---
|
|
|
|
### ArrayExpress
|
|
|
|
Gene expression database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import ArrayExpress
|
|
a = ArrayExpress()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `queryExperiments(keywords)`: Search experiments
|
|
- `retrieveExperiment(accession)`: Get experiment data
|
|
|
|
**Use cases:**
|
|
- Gene expression data
|
|
- Microarray analysis
|
|
- RNA-seq data retrieval
|
|
|
|
---
|
|
|
|
### ENA (European Nucleotide Archive)
|
|
|
|
Nucleotide sequence database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import ENA
|
|
e = ENA()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `search_data(query)`: Search sequences
|
|
- `retrieve_data(accession)`: Retrieve sequences
|
|
|
|
**Use cases:**
|
|
- Nucleotide sequence retrieval
|
|
- Genome assembly access
|
|
|
|
---
|
|
|
|
## Structural Biology
|
|
|
|
### PDB (Protein Data Bank)
|
|
|
|
3D protein structure database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import PDB
|
|
p = PDB()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `get_file(pdb_id, file_format)`: Download structure files
|
|
- `search(query)`: Search structures
|
|
|
|
**File formats:** pdb, cif, xml
|
|
|
|
**Use cases:**
|
|
- 3D structure retrieval
|
|
- Structure-based analysis
|
|
- PyMOL visualization
|
|
|
|
---
|
|
|
|
### Pfam
|
|
|
|
Protein family database.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import Pfam
|
|
p = Pfam()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `searchSequence(sequence)`: Find domains in sequence
|
|
- `getPfamEntry(pfam_id)`: Domain information
|
|
|
|
**Use cases:**
|
|
- Protein domain identification
|
|
- Family classification
|
|
- Functional motif discovery
|
|
|
|
---
|
|
|
|
## Specialized Resources
|
|
|
|
### BioModels
|
|
|
|
Systems biology model repository.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import BioModels
|
|
b = BioModels()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `get_model_by_id(model_id)`: Retrieve SBML model
|
|
|
|
**Use cases:**
|
|
- Systems biology modeling
|
|
- SBML model retrieval
|
|
|
|
---
|
|
|
|
### COG (Clusters of Orthologous Genes)
|
|
|
|
Orthologous gene classification.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import COG
|
|
c = COG()
|
|
```
|
|
|
|
**Use cases:**
|
|
- Orthology analysis
|
|
- Functional classification
|
|
|
|
---
|
|
|
|
### BiGG Models
|
|
|
|
Metabolic network models.
|
|
|
|
**Initialization:**
|
|
```python
|
|
from bioservices import BiGG
|
|
b = BiGG()
|
|
```
|
|
|
|
**Key Methods:**
|
|
- `list_models()`: Available models
|
|
- `get_model(model_id)`: Model details
|
|
|
|
**Use cases:**
|
|
- Metabolic network analysis
|
|
- Flux balance analysis
|
|
|
|
---
|
|
|
|
## General Patterns
|
|
|
|
### Error Handling
|
|
|
|
All services may throw exceptions. Wrap calls in try-except:
|
|
|
|
```python
|
|
try:
|
|
result = service.method(params)
|
|
if result:
|
|
# Process result
|
|
pass
|
|
except Exception as e:
|
|
print(f"Error: {e}")
|
|
```
|
|
|
|
### Verbosity Control
|
|
|
|
Most services support `verbose` parameter:
|
|
```python
|
|
service = Service(verbose=False) # Suppress HTTP logs
|
|
```
|
|
|
|
### Rate Limiting
|
|
|
|
Services have timeouts and rate limits:
|
|
```python
|
|
service.TIMEOUT = 30 # Adjust timeout
|
|
service.DELAY = 1 # Delay between requests (if supported)
|
|
```
|
|
|
|
### Output Formats
|
|
|
|
Common format parameters:
|
|
- `frmt`: "xml", "json", "tab", "txt", "fasta"
|
|
- `format`: Service-specific variants
|
|
|
|
### Caching
|
|
|
|
Some services cache results:
|
|
```python
|
|
service.CACHE = True # Enable caching
|
|
service.clear_cache() # Clear cache
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
For detailed API documentation:
|
|
- Official docs: https://bioservices.readthedocs.io/
|
|
- Individual service docs linked from main page
|
|
- Source code: https://github.com/cokelaer/bioservices
|