12 KiB
BioServices: Complete Services Reference
This document provides a comprehensive reference for all major services available in BioServices, including key methods, parameters, and use cases.
Protein & Gene Resources
UniProt
Protein sequence and functional information database.
Initialization:
from bioservices import UniProt
u = UniProt(verbose=False)
Key Methods:
-
search(query, frmt="tab", columns=None, limit=None, sort=None, compress=False, include=False, **kwargs)- Search UniProt with flexible query syntax
frmt: "tab", "fasta", "xml", "rdf", "gff", "txt"columns: Comma-separated list (e.g., "id,genes,organism,length")- Returns: String in requested format
-
retrieve(uniprot_id, frmt="txt")- Retrieve specific UniProt entry
frmt: "txt", "fasta", "xml", "rdf", "gff"- Returns: Entry data in requested format
-
mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")- Convert identifiers between databases
fr/to: Database identifiers (see identifier_mapping.md)query: Single ID or comma-separated list- Returns: Dictionary mapping input to output IDs
-
searchUniProtId(pattern, columns="entry name,length,organism", limit=100)- Convenience method for ID-based searches
- Returns: Tab-separated values
Common columns: id, entry name, genes, organism, protein names, length, sequence, go-id, ec, pathway, interactor
Use cases:
- Protein sequence retrieval for BLAST
- Functional annotation lookup
- Cross-database identifier mapping
- Batch protein information retrieval
KEGG (Kyoto Encyclopedia of Genes and Genomes)
Metabolic pathways, genes, and organisms database.
Initialization:
from bioservices import KEGG
k = KEGG()
k.organism = "hsa" # Set default organism
Key Methods:
-
list(database)- List entries in KEGG database
database: "organism", "pathway", "module", "disease", "drug", "compound"- Returns: Multi-line string with entries
-
find(database, query)- Search database by keywords
- Returns: List of matching entries with IDs
-
get(entry_id)- Retrieve entry by ID
- Supports genes, pathways, compounds, etc.
- Returns: Raw entry text
-
parse(data)- Parse KEGG entry into dictionary
- Returns: Dict with structured data
-
lookfor_organism(name)- Search organisms by name pattern
- Returns: List of matching organism codes
-
lookfor_pathway(name)- Search pathways by name
- Returns: List of pathway IDs
-
get_pathway_by_gene(gene_id, organism)- Find pathways containing gene
- Returns: List of pathway IDs
-
parse_kgml_pathway(pathway_id)- Parse pathway KGML for interactions
- Returns: Dict with "entries" and "relations"
-
pathway2sif(pathway_id)- Extract Simple Interaction Format data
- Filters for activation/inhibition
- Returns: List of interaction tuples
Organism codes:
- hsa: Homo sapiens
- mmu: Mus musculus
- dme: Drosophila melanogaster
- sce: Saccharomyces cerevisiae
- eco: Escherichia coli
Use cases:
- Pathway analysis and visualization
- Gene function annotation
- Metabolic network reconstruction
- Protein-protein interaction extraction
HGNC (Human Gene Nomenclature Committee)
Official human gene naming authority.
Initialization:
from bioservices import HGNC
h = HGNC()
Key Methods:
search(query): Search gene symbols/namesfetch(format, query): Retrieve gene information
Use cases:
- Standardizing human gene names
- Looking up official gene symbols
MyGeneInfo
Gene annotation and query service.
Initialization:
from bioservices import MyGeneInfo
m = MyGeneInfo()
Key Methods:
querymany(ids, scopes, fields, species): Batch gene queriesgetgene(geneid): Get gene annotation
Use cases:
- Batch gene annotation retrieval
- Gene ID conversion
Chemical Compound Resources
ChEBI (Chemical Entities of Biological Interest)
Dictionary of molecular entities.
Initialization:
from bioservices import ChEBI
c = ChEBI()
Key Methods:
getCompleteEntity(chebi_id): Full compound informationgetLiteEntity(chebi_id): Basic informationgetCompleteEntityByList(chebi_ids): Batch retrieval
Use cases:
- Small molecule information
- Chemical structure data
- Compound property lookup
ChEMBL
Bioactive drug-like compound database.
Initialization:
from bioservices import ChEMBL
c = ChEMBL()
Key Methods:
get_molecule_form(chembl_id): Compound detailsget_target(chembl_id): Target informationget_similarity(chembl_id): Get similar compounds for givenget_assays(): Bioassay data
Use cases:
- Drug discovery data
- Find similar compounds
- Bioactivity information
- Target-compound relationships
UniChem
Chemical identifier mapping service.
Initialization:
from bioservices import UniChem
u = UniChem()
Key Methods:
get_compound_id_from_kegg(kegg_id): KEGG → ChEMBLget_all_compound_ids(src_compound_id, src_id): Get all IDsget_src_compound_ids(src_compound_id, from_src_id, to_src_id): Convert IDs
Source IDs:
- 1: ChEMBL
- 2: DrugBank
- 3: PDB
- 6: KEGG
- 7: ChEBI
- 22: PubChem
Use cases:
- Cross-database compound ID mapping
- Linking chemical databases
PubChem
Chemical compound database from NIH.
Initialization:
from bioservices import PubChem
p = PubChem()
Key Methods:
get_compounds(identifier, namespace): Retrieve compoundsget_properties(properties, identifier, namespace): Get properties
Use cases:
- Chemical structure retrieval
- Compound property information
Sequence Analysis Tools
NCBIblast
Sequence similarity searching.
Initialization:
from bioservices import NCBIblast
s = NCBIblast(verbose=False)
Key Methods:
-
run(program, sequence, stype, database, email, **params)- Submit BLAST job
program: "blastp", "blastn", "blastx", "tblastn", "tblastx"stype: "protein" or "dna"database: "uniprotkb", "pdb", "refseq_protein", etc.email: Required by NCBI- Returns: Job ID
-
getStatus(jobid)- Check job status
- Returns: "RUNNING", "FINISHED", "ERROR"
-
getResult(jobid, result_type)- Retrieve results
result_type: "out" (default), "ids", "xml"
Important: BLAST jobs are asynchronous. Always check status before retrieving results.
Use cases:
- Protein homology searches
- Sequence similarity analysis
- Functional annotation by homology
Pathway & Interaction Resources
Reactome
Pathway database.
Initialization:
from bioservices import Reactome
r = Reactome()
Key Methods:
get_pathway_by_id(pathway_id): Pathway detailssearch_pathway(query): Search pathways
Use cases:
- Human pathway analysis
- Biological process annotation
PSICQUIC
Protein interaction query service (federates 30+ databases).
Initialization:
from bioservices import PSICQUIC
s = PSICQUIC()
Key Methods:
-
query(database, query_string)- Query specific interaction database
- Returns: PSI-MI TAB format
-
activeDBs- Property listing available databases
- Returns: List of database names
Available databases: MINT, IntAct, BioGRID, DIP, InnateDB, MatrixDB, MPIDB, UniProt, and 30+ more
Query syntax: Supports AND, OR, species filters
- Example: "ZAP70 AND species:9606"
Use cases:
- Protein-protein interaction discovery
- Network analysis
- Interactome mapping
IntactComplex
Protein complex database.
Initialization:
from bioservices import IntactComplex
i = IntactComplex()
Key Methods:
search(query): Search complexesdetails(complex_ac): Complex details
Use cases:
- Protein complex composition
- Multi-protein assembly analysis
OmniPath
Integrated signaling pathway database.
Initialization:
from bioservices import OmniPath
o = OmniPath()
Key Methods:
interactions(datasets, organisms): Get interactionsptms(datasets, organisms): Post-translational modifications
Use cases:
- Cell signaling analysis
- Regulatory network mapping
Gene Ontology
QuickGO
Gene Ontology annotation service.
Initialization:
from bioservices import QuickGO
g = QuickGO()
Key Methods:
-
Term(go_id, frmt="obo")- Retrieve GO term information
- Returns: Term definition and metadata
-
Annotation(protein=None, goid=None, format="tsv")- Get GO annotations
- Returns: Annotations in requested format
GO categories:
- Biological Process (BP)
- Molecular Function (MF)
- Cellular Component (CC)
Use cases:
- Functional annotation
- Enrichment analysis
- GO term lookup
Genomic Resources
BioMart
Data mining tool for genomic data.
Initialization:
from bioservices import BioMart
b = BioMart()
Key Methods:
datasets(dataset): List available datasetsattributes(dataset): List attributesquery(query_xml): Execute BioMart query
Use cases:
- Bulk genomic data retrieval
- Custom genome annotations
- SNP information
ArrayExpress
Gene expression database.
Initialization:
from bioservices import ArrayExpress
a = ArrayExpress()
Key Methods:
queryExperiments(keywords): Search experimentsretrieveExperiment(accession): Get experiment data
Use cases:
- Gene expression data
- Microarray analysis
- RNA-seq data retrieval
ENA (European Nucleotide Archive)
Nucleotide sequence database.
Initialization:
from bioservices import ENA
e = ENA()
Key Methods:
search_data(query): Search sequencesretrieve_data(accession): Retrieve sequences
Use cases:
- Nucleotide sequence retrieval
- Genome assembly access
Structural Biology
PDB (Protein Data Bank)
3D protein structure database.
Initialization:
from bioservices import PDB
p = PDB()
Key Methods:
get_file(pdb_id, file_format): Download structure filessearch(query): Search structures
File formats: pdb, cif, xml
Use cases:
- 3D structure retrieval
- Structure-based analysis
- PyMOL visualization
Pfam
Protein family database.
Initialization:
from bioservices import Pfam
p = Pfam()
Key Methods:
searchSequence(sequence): Find domains in sequencegetPfamEntry(pfam_id): Domain information
Use cases:
- Protein domain identification
- Family classification
- Functional motif discovery
Specialized Resources
BioModels
Systems biology model repository.
Initialization:
from bioservices import BioModels
b = BioModels()
Key Methods:
get_model_by_id(model_id): Retrieve SBML model
Use cases:
- Systems biology modeling
- SBML model retrieval
COG (Clusters of Orthologous Genes)
Orthologous gene classification.
Initialization:
from bioservices import COG
c = COG()
Use cases:
- Orthology analysis
- Functional classification
BiGG Models
Metabolic network models.
Initialization:
from bioservices import BiGG
b = BiGG()
Key Methods:
list_models(): Available modelsget_model(model_id): Model details
Use cases:
- Metabolic network analysis
- Flux balance analysis
General Patterns
Error Handling
All services may throw exceptions. Wrap calls in try-except:
try:
result = service.method(params)
if result:
# Process result
pass
except Exception as e:
print(f"Error: {e}")
Verbosity Control
Most services support verbose parameter:
service = Service(verbose=False) # Suppress HTTP logs
Rate Limiting
Services have timeouts and rate limits:
service.TIMEOUT = 30 # Adjust timeout
service.DELAY = 1 # Delay between requests (if supported)
Output Formats
Common format parameters:
frmt: "xml", "json", "tab", "txt", "fasta"format: Service-specific variants
Caching
Some services cache results:
service.CACHE = True # Enable caching
service.clear_cache() # Clear cache
Additional Resources
For detailed API documentation:
- Official docs: https://bioservices.readthedocs.io/
- Individual service docs linked from main page
- Source code: https://github.com/cokelaer/bioservices