Files
gh-k-dense-ai-claude-scient…/skills/bioservices/references/services_reference.md
2025-11-30 08:30:10 +08:00

12 KiB

BioServices: Complete Services Reference

This document provides a comprehensive reference for all major services available in BioServices, including key methods, parameters, and use cases.

Protein & Gene Resources

UniProt

Protein sequence and functional information database.

Initialization:

from bioservices import UniProt
u = UniProt(verbose=False)

Key Methods:

  • search(query, frmt="tab", columns=None, limit=None, sort=None, compress=False, include=False, **kwargs)

    • Search UniProt with flexible query syntax
    • frmt: "tab", "fasta", "xml", "rdf", "gff", "txt"
    • columns: Comma-separated list (e.g., "id,genes,organism,length")
    • Returns: String in requested format
  • retrieve(uniprot_id, frmt="txt")

    • Retrieve specific UniProt entry
    • frmt: "txt", "fasta", "xml", "rdf", "gff"
    • Returns: Entry data in requested format
  • mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

    • Convert identifiers between databases
    • fr/to: Database identifiers (see identifier_mapping.md)
    • query: Single ID or comma-separated list
    • Returns: Dictionary mapping input to output IDs
  • searchUniProtId(pattern, columns="entry name,length,organism", limit=100)

    • Convenience method for ID-based searches
    • Returns: Tab-separated values

Common columns: id, entry name, genes, organism, protein names, length, sequence, go-id, ec, pathway, interactor

Use cases:

  • Protein sequence retrieval for BLAST
  • Functional annotation lookup
  • Cross-database identifier mapping
  • Batch protein information retrieval

KEGG (Kyoto Encyclopedia of Genes and Genomes)

Metabolic pathways, genes, and organisms database.

Initialization:

from bioservices import KEGG
k = KEGG()
k.organism = "hsa"  # Set default organism

Key Methods:

  • list(database)

    • List entries in KEGG database
    • database: "organism", "pathway", "module", "disease", "drug", "compound"
    • Returns: Multi-line string with entries
  • find(database, query)

    • Search database by keywords
    • Returns: List of matching entries with IDs
  • get(entry_id)

    • Retrieve entry by ID
    • Supports genes, pathways, compounds, etc.
    • Returns: Raw entry text
  • parse(data)

    • Parse KEGG entry into dictionary
    • Returns: Dict with structured data
  • lookfor_organism(name)

    • Search organisms by name pattern
    • Returns: List of matching organism codes
  • lookfor_pathway(name)

    • Search pathways by name
    • Returns: List of pathway IDs
  • get_pathway_by_gene(gene_id, organism)

    • Find pathways containing gene
    • Returns: List of pathway IDs
  • parse_kgml_pathway(pathway_id)

    • Parse pathway KGML for interactions
    • Returns: Dict with "entries" and "relations"
  • pathway2sif(pathway_id)

    • Extract Simple Interaction Format data
    • Filters for activation/inhibition
    • Returns: List of interaction tuples

Organism codes:

  • hsa: Homo sapiens
  • mmu: Mus musculus
  • dme: Drosophila melanogaster
  • sce: Saccharomyces cerevisiae
  • eco: Escherichia coli

Use cases:

  • Pathway analysis and visualization
  • Gene function annotation
  • Metabolic network reconstruction
  • Protein-protein interaction extraction

HGNC (Human Gene Nomenclature Committee)

Official human gene naming authority.

Initialization:

from bioservices import HGNC
h = HGNC()

Key Methods:

  • search(query): Search gene symbols/names
  • fetch(format, query): Retrieve gene information

Use cases:

  • Standardizing human gene names
  • Looking up official gene symbols

MyGeneInfo

Gene annotation and query service.

Initialization:

from bioservices import MyGeneInfo
m = MyGeneInfo()

Key Methods:

  • querymany(ids, scopes, fields, species): Batch gene queries
  • getgene(geneid): Get gene annotation

Use cases:

  • Batch gene annotation retrieval
  • Gene ID conversion

Chemical Compound Resources

ChEBI (Chemical Entities of Biological Interest)

Dictionary of molecular entities.

Initialization:

from bioservices import ChEBI
c = ChEBI()

Key Methods:

  • getCompleteEntity(chebi_id): Full compound information
  • getLiteEntity(chebi_id): Basic information
  • getCompleteEntityByList(chebi_ids): Batch retrieval

Use cases:

  • Small molecule information
  • Chemical structure data
  • Compound property lookup

ChEMBL

Bioactive drug-like compound database.

Initialization:

from bioservices import ChEMBL
c = ChEMBL()

Key Methods:

  • get_molecule_form(chembl_id): Compound details
  • get_target(chembl_id): Target information
  • get_similarity(chembl_id): Get similar compounds for given
  • get_assays(): Bioassay data

Use cases:

  • Drug discovery data
  • Find similar compounds
  • Bioactivity information
  • Target-compound relationships

UniChem

Chemical identifier mapping service.

Initialization:

from bioservices import UniChem
u = UniChem()

Key Methods:

  • get_compound_id_from_kegg(kegg_id): KEGG → ChEMBL
  • get_all_compound_ids(src_compound_id, src_id): Get all IDs
  • get_src_compound_ids(src_compound_id, from_src_id, to_src_id): Convert IDs

Source IDs:

  • 1: ChEMBL
  • 2: DrugBank
  • 3: PDB
  • 6: KEGG
  • 7: ChEBI
  • 22: PubChem

Use cases:

  • Cross-database compound ID mapping
  • Linking chemical databases

PubChem

Chemical compound database from NIH.

Initialization:

from bioservices import PubChem
p = PubChem()

Key Methods:

  • get_compounds(identifier, namespace): Retrieve compounds
  • get_properties(properties, identifier, namespace): Get properties

Use cases:

  • Chemical structure retrieval
  • Compound property information

Sequence Analysis Tools

NCBIblast

Sequence similarity searching.

Initialization:

from bioservices import NCBIblast
s = NCBIblast(verbose=False)

Key Methods:

  • run(program, sequence, stype, database, email, **params)

    • Submit BLAST job
    • program: "blastp", "blastn", "blastx", "tblastn", "tblastx"
    • stype: "protein" or "dna"
    • database: "uniprotkb", "pdb", "refseq_protein", etc.
    • email: Required by NCBI
    • Returns: Job ID
  • getStatus(jobid)

    • Check job status
    • Returns: "RUNNING", "FINISHED", "ERROR"
  • getResult(jobid, result_type)

    • Retrieve results
    • result_type: "out" (default), "ids", "xml"

Important: BLAST jobs are asynchronous. Always check status before retrieving results.

Use cases:

  • Protein homology searches
  • Sequence similarity analysis
  • Functional annotation by homology

Pathway & Interaction Resources

Reactome

Pathway database.

Initialization:

from bioservices import Reactome
r = Reactome()

Key Methods:

  • get_pathway_by_id(pathway_id): Pathway details
  • search_pathway(query): Search pathways

Use cases:

  • Human pathway analysis
  • Biological process annotation

PSICQUIC

Protein interaction query service (federates 30+ databases).

Initialization:

from bioservices import PSICQUIC
s = PSICQUIC()

Key Methods:

  • query(database, query_string)

    • Query specific interaction database
    • Returns: PSI-MI TAB format
  • activeDBs

    • Property listing available databases
    • Returns: List of database names

Available databases: MINT, IntAct, BioGRID, DIP, InnateDB, MatrixDB, MPIDB, UniProt, and 30+ more

Query syntax: Supports AND, OR, species filters

  • Example: "ZAP70 AND species:9606"

Use cases:

  • Protein-protein interaction discovery
  • Network analysis
  • Interactome mapping

IntactComplex

Protein complex database.

Initialization:

from bioservices import IntactComplex
i = IntactComplex()

Key Methods:

  • search(query): Search complexes
  • details(complex_ac): Complex details

Use cases:

  • Protein complex composition
  • Multi-protein assembly analysis

OmniPath

Integrated signaling pathway database.

Initialization:

from bioservices import OmniPath
o = OmniPath()

Key Methods:

  • interactions(datasets, organisms): Get interactions
  • ptms(datasets, organisms): Post-translational modifications

Use cases:

  • Cell signaling analysis
  • Regulatory network mapping

Gene Ontology

QuickGO

Gene Ontology annotation service.

Initialization:

from bioservices import QuickGO
g = QuickGO()

Key Methods:

  • Term(go_id, frmt="obo")

    • Retrieve GO term information
    • Returns: Term definition and metadata
  • Annotation(protein=None, goid=None, format="tsv")

    • Get GO annotations
    • Returns: Annotations in requested format

GO categories:

  • Biological Process (BP)
  • Molecular Function (MF)
  • Cellular Component (CC)

Use cases:

  • Functional annotation
  • Enrichment analysis
  • GO term lookup

Genomic Resources

BioMart

Data mining tool for genomic data.

Initialization:

from bioservices import BioMart
b = BioMart()

Key Methods:

  • datasets(dataset): List available datasets
  • attributes(dataset): List attributes
  • query(query_xml): Execute BioMart query

Use cases:

  • Bulk genomic data retrieval
  • Custom genome annotations
  • SNP information

ArrayExpress

Gene expression database.

Initialization:

from bioservices import ArrayExpress
a = ArrayExpress()

Key Methods:

  • queryExperiments(keywords): Search experiments
  • retrieveExperiment(accession): Get experiment data

Use cases:

  • Gene expression data
  • Microarray analysis
  • RNA-seq data retrieval

ENA (European Nucleotide Archive)

Nucleotide sequence database.

Initialization:

from bioservices import ENA
e = ENA()

Key Methods:

  • search_data(query): Search sequences
  • retrieve_data(accession): Retrieve sequences

Use cases:

  • Nucleotide sequence retrieval
  • Genome assembly access

Structural Biology

PDB (Protein Data Bank)

3D protein structure database.

Initialization:

from bioservices import PDB
p = PDB()

Key Methods:

  • get_file(pdb_id, file_format): Download structure files
  • search(query): Search structures

File formats: pdb, cif, xml

Use cases:

  • 3D structure retrieval
  • Structure-based analysis
  • PyMOL visualization

Pfam

Protein family database.

Initialization:

from bioservices import Pfam
p = Pfam()

Key Methods:

  • searchSequence(sequence): Find domains in sequence
  • getPfamEntry(pfam_id): Domain information

Use cases:

  • Protein domain identification
  • Family classification
  • Functional motif discovery

Specialized Resources

BioModels

Systems biology model repository.

Initialization:

from bioservices import BioModels
b = BioModels()

Key Methods:

  • get_model_by_id(model_id): Retrieve SBML model

Use cases:

  • Systems biology modeling
  • SBML model retrieval

COG (Clusters of Orthologous Genes)

Orthologous gene classification.

Initialization:

from bioservices import COG
c = COG()

Use cases:

  • Orthology analysis
  • Functional classification

BiGG Models

Metabolic network models.

Initialization:

from bioservices import BiGG
b = BiGG()

Key Methods:

  • list_models(): Available models
  • get_model(model_id): Model details

Use cases:

  • Metabolic network analysis
  • Flux balance analysis

General Patterns

Error Handling

All services may throw exceptions. Wrap calls in try-except:

try:
    result = service.method(params)
    if result:
        # Process result
        pass
except Exception as e:
    print(f"Error: {e}")

Verbosity Control

Most services support verbose parameter:

service = Service(verbose=False)  # Suppress HTTP logs

Rate Limiting

Services have timeouts and rate limits:

service.TIMEOUT = 30  # Adjust timeout
service.DELAY = 1     # Delay between requests (if supported)

Output Formats

Common format parameters:

  • frmt: "xml", "json", "tab", "txt", "fasta"
  • format: Service-specific variants

Caching

Some services cache results:

service.CACHE = True  # Enable caching
service.clear_cache()  # Clear cache

Additional Resources

For detailed API documentation: