Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/bioservices/references/services_reference.md
+++ b/skills/bioservices/references/services_reference.md
@@ -0,0 +1,636 @@
+# BioServices: Complete Services Reference
+
+This document provides a comprehensive reference for all major services available in BioServices, including key methods, parameters, and use cases.
+
+## Protein & Gene Resources
+
+### UniProt
+
+Protein sequence and functional information database.
+
+**Initialization:**
+```python
+from bioservices import UniProt
+u = UniProt(verbose=False)
+```
+
+**Key Methods:**
+
+- `search(query, frmt="tab", columns=None, limit=None, sort=None, compress=False, include=False, **kwargs)`
+  - Search UniProt with flexible query syntax
+  - `frmt`: "tab", "fasta", "xml", "rdf", "gff", "txt"
+  - `columns`: Comma-separated list (e.g., "id,genes,organism,length")
+  - Returns: String in requested format
+
+- `retrieve(uniprot_id, frmt="txt")`
+  - Retrieve specific UniProt entry
+  - `frmt`: "txt", "fasta", "xml", "rdf", "gff"
+  - Returns: Entry data in requested format
+
+- `mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")`
+  - Convert identifiers between databases
+  - `fr`/`to`: Database identifiers (see identifier_mapping.md)
+  - `query`: Single ID or comma-separated list
+  - Returns: Dictionary mapping input to output IDs
+
+- `searchUniProtId(pattern, columns="entry name,length,organism", limit=100)`
+  - Convenience method for ID-based searches
+  - Returns: Tab-separated values
+
+**Common columns:** id, entry name, genes, organism, protein names, length, sequence, go-id, ec, pathway, interactor
+
+**Use cases:**
+- Protein sequence retrieval for BLAST
+- Functional annotation lookup
+- Cross-database identifier mapping
+- Batch protein information retrieval
+
+---
+
+### KEGG (Kyoto Encyclopedia of Genes and Genomes)
+
+Metabolic pathways, genes, and organisms database.
+
+**Initialization:**
+```python
+from bioservices import KEGG
+k = KEGG()
+k.organism = "hsa"  # Set default organism
+```
+
+**Key Methods:**
+
+- `list(database)`
+  - List entries in KEGG database
+  - `database`: "organism", "pathway", "module", "disease", "drug", "compound"
+  - Returns: Multi-line string with entries
+
+- `find(database, query)`
+  - Search database by keywords
+  - Returns: List of matching entries with IDs
+
+- `get(entry_id)`
+  - Retrieve entry by ID
+  - Supports genes, pathways, compounds, etc.
+  - Returns: Raw entry text
+
+- `parse(data)`
+  - Parse KEGG entry into dictionary
+  - Returns: Dict with structured data
+
+- `lookfor_organism(name)`
+  - Search organisms by name pattern
+  - Returns: List of matching organism codes
+
+- `lookfor_pathway(name)`
+  - Search pathways by name
+  - Returns: List of pathway IDs
+
+- `get_pathway_by_gene(gene_id, organism)`
+  - Find pathways containing gene
+  - Returns: List of pathway IDs
+
+- `parse_kgml_pathway(pathway_id)`
+  - Parse pathway KGML for interactions
+  - Returns: Dict with "entries" and "relations"
+
+- `pathway2sif(pathway_id)`
+  - Extract Simple Interaction Format data
+  - Filters for activation/inhibition
+  - Returns: List of interaction tuples
+
+**Organism codes:**
+- hsa: Homo sapiens
+- mmu: Mus musculus
+- dme: Drosophila melanogaster
+- sce: Saccharomyces cerevisiae
+- eco: Escherichia coli
+
+**Use cases:**
+- Pathway analysis and visualization
+- Gene function annotation
+- Metabolic network reconstruction
+- Protein-protein interaction extraction
+
+---
+
+### HGNC (Human Gene Nomenclature Committee)
+
+Official human gene naming authority.
+
+**Initialization:**
+```python
+from bioservices import HGNC
+h = HGNC()
+```
+
+**Key Methods:**
+- `search(query)`: Search gene symbols/names
+- `fetch(format, query)`: Retrieve gene information
+
+**Use cases:**
+- Standardizing human gene names
+- Looking up official gene symbols
+
+---
+
+### MyGeneInfo
+
+Gene annotation and query service.
+
+**Initialization:**
+```python
+from bioservices import MyGeneInfo
+m = MyGeneInfo()
+```
+
+**Key Methods:**
+- `querymany(ids, scopes, fields, species)`: Batch gene queries
+- `getgene(geneid)`: Get gene annotation
+
+**Use cases:**
+- Batch gene annotation retrieval
+- Gene ID conversion
+
+---
+
+## Chemical Compound Resources
+
+### ChEBI (Chemical Entities of Biological Interest)
+
+Dictionary of molecular entities.
+
+**Initialization:**
+```python
+from bioservices import ChEBI
+c = ChEBI()
+```
+
+**Key Methods:**
+- `getCompleteEntity(chebi_id)`: Full compound information
+- `getLiteEntity(chebi_id)`: Basic information
+- `getCompleteEntityByList(chebi_ids)`: Batch retrieval
+
+**Use cases:**
+- Small molecule information
+- Chemical structure data
+- Compound property lookup
+
+---
+
+### ChEMBL
+
+Bioactive drug-like compound database.
+
+**Initialization:**
+```python
+from bioservices import ChEMBL
+c = ChEMBL()
+```
+
+**Key Methods:**
+- `get_molecule_form(chembl_id)`: Compound details
+- `get_target(chembl_id)`: Target information
+- `get_similarity(chembl_id)`: Get similar compounds for given 
+- `get_assays()`: Bioassay data
+
+**Use cases:**
+- Drug discovery data
+- Find similar compounds  
+- Bioactivity information
+- Target-compound relationships
+
+---
+
+### UniChem
+
+Chemical identifier mapping service.
+
+**Initialization:**
+```python
+from bioservices import UniChem
+u = UniChem()
+```
+
+**Key Methods:**
+- `get_compound_id_from_kegg(kegg_id)`: KEGG → ChEMBL
+- `get_all_compound_ids(src_compound_id, src_id)`: Get all IDs
+- `get_src_compound_ids(src_compound_id, from_src_id, to_src_id)`: Convert IDs
+
+**Source IDs:**
+- 1: ChEMBL
+- 2: DrugBank
+- 3: PDB
+- 6: KEGG
+- 7: ChEBI
+- 22: PubChem
+
+**Use cases:**
+- Cross-database compound ID mapping
+- Linking chemical databases
+
+---
+
+### PubChem
+
+Chemical compound database from NIH.
+
+**Initialization:**
+```python
+from bioservices import PubChem
+p = PubChem()
+```
+
+**Key Methods:**
+- `get_compounds(identifier, namespace)`: Retrieve compounds
+- `get_properties(properties, identifier, namespace)`: Get properties
+
+**Use cases:**
+- Chemical structure retrieval
+- Compound property information
+
+---
+
+## Sequence Analysis Tools
+
+### NCBIblast
+
+Sequence similarity searching.
+
+**Initialization:**
+```python
+from bioservices import NCBIblast
+s = NCBIblast(verbose=False)
+```
+
+**Key Methods:**
+- `run(program, sequence, stype, database, email, **params)`
+  - Submit BLAST job
+  - `program`: "blastp", "blastn", "blastx", "tblastn", "tblastx"
+  - `stype`: "protein" or "dna"
+  - `database`: "uniprotkb", "pdb", "refseq_protein", etc.
+  - `email`: Required by NCBI
+  - Returns: Job ID
+
+- `getStatus(jobid)`
+  - Check job status
+  - Returns: "RUNNING", "FINISHED", "ERROR"
+
+- `getResult(jobid, result_type)`
+  - Retrieve results
+  - `result_type`: "out" (default), "ids", "xml"
+
+**Important:** BLAST jobs are asynchronous. Always check status before retrieving results.
+
+**Use cases:**
+- Protein homology searches
+- Sequence similarity analysis
+- Functional annotation by homology
+
+---
+
+## Pathway & Interaction Resources
+
+### Reactome
+
+Pathway database.
+
+**Initialization:**
+```python
+from bioservices import Reactome
+r = Reactome()
+```
+
+**Key Methods:**
+- `get_pathway_by_id(pathway_id)`: Pathway details
+- `search_pathway(query)`: Search pathways
+
+**Use cases:**
+- Human pathway analysis
+- Biological process annotation
+
+---
+
+### PSICQUIC
+
+Protein interaction query service (federates 30+ databases).
+
+**Initialization:**
+```python
+from bioservices import PSICQUIC
+s = PSICQUIC()
+```
+
+**Key Methods:**
+- `query(database, query_string)`
+  - Query specific interaction database
+  - Returns: PSI-MI TAB format
+
+- `activeDBs`
+  - Property listing available databases
+  - Returns: List of database names
+
+**Available databases:** MINT, IntAct, BioGRID, DIP, InnateDB, MatrixDB, MPIDB, UniProt, and 30+ more
+
+**Query syntax:** Supports AND, OR, species filters
+- Example: "ZAP70 AND species:9606"
+
+**Use cases:**
+- Protein-protein interaction discovery
+- Network analysis
+- Interactome mapping
+
+---
+
+### IntactComplex
+
+Protein complex database.
+
+**Initialization:**
+```python
+from bioservices import IntactComplex
+i = IntactComplex()
+```
+
+**Key Methods:**
+- `search(query)`: Search complexes
+- `details(complex_ac)`: Complex details
+
+**Use cases:**
+- Protein complex composition
+- Multi-protein assembly analysis
+
+---
+
+### OmniPath
+
+Integrated signaling pathway database.
+
+**Initialization:**
+```python
+from bioservices import OmniPath
+o = OmniPath()
+```
+
+**Key Methods:**
+- `interactions(datasets, organisms)`: Get interactions
+- `ptms(datasets, organisms)`: Post-translational modifications
+
+**Use cases:**
+- Cell signaling analysis
+- Regulatory network mapping
+
+---
+
+## Gene Ontology
+
+### QuickGO
+
+Gene Ontology annotation service.
+
+**Initialization:**
+```python
+from bioservices import QuickGO
+g = QuickGO()
+```
+
+**Key Methods:**
+- `Term(go_id, frmt="obo")`
+  - Retrieve GO term information
+  - Returns: Term definition and metadata
+
+- `Annotation(protein=None, goid=None, format="tsv")`
+  - Get GO annotations
+  - Returns: Annotations in requested format
+
+**GO categories:**
+- Biological Process (BP)
+- Molecular Function (MF)
+- Cellular Component (CC)
+
+**Use cases:**
+- Functional annotation
+- Enrichment analysis
+- GO term lookup
+
+---
+
+## Genomic Resources
+
+### BioMart
+
+Data mining tool for genomic data.
+
+**Initialization:**
+```python
+from bioservices import BioMart
+b = BioMart()
+```
+
+**Key Methods:**
+- `datasets(dataset)`: List available datasets
+- `attributes(dataset)`: List attributes
+- `query(query_xml)`: Execute BioMart query
+
+**Use cases:**
+- Bulk genomic data retrieval
+- Custom genome annotations
+- SNP information
+
+---
+
+### ArrayExpress
+
+Gene expression database.
+
+**Initialization:**
+```python
+from bioservices import ArrayExpress
+a = ArrayExpress()
+```
+
+**Key Methods:**
+- `queryExperiments(keywords)`: Search experiments
+- `retrieveExperiment(accession)`: Get experiment data
+
+**Use cases:**
+- Gene expression data
+- Microarray analysis
+- RNA-seq data retrieval
+
+---
+
+### ENA (European Nucleotide Archive)
+
+Nucleotide sequence database.
+
+**Initialization:**
+```python
+from bioservices import ENA
+e = ENA()
+```
+
+**Key Methods:**
+- `search_data(query)`: Search sequences
+- `retrieve_data(accession)`: Retrieve sequences
+
+**Use cases:**
+- Nucleotide sequence retrieval
+- Genome assembly access
+
+---
+
+## Structural Biology
+
+### PDB (Protein Data Bank)
+
+3D protein structure database.
+
+**Initialization:**
+```python
+from bioservices import PDB
+p = PDB()
+```
+
+**Key Methods:**
+- `get_file(pdb_id, file_format)`: Download structure files
+- `search(query)`: Search structures
+
+**File formats:** pdb, cif, xml
+
+**Use cases:**
+- 3D structure retrieval
+- Structure-based analysis
+- PyMOL visualization
+
+---
+
+### Pfam
+
+Protein family database.
+
+**Initialization:**
+```python
+from bioservices import Pfam
+p = Pfam()
+```
+
+**Key Methods:**
+- `searchSequence(sequence)`: Find domains in sequence
+- `getPfamEntry(pfam_id)`: Domain information
+
+**Use cases:**
+- Protein domain identification
+- Family classification
+- Functional motif discovery
+
+---
+
+## Specialized Resources
+
+### BioModels
+
+Systems biology model repository.
+
+**Initialization:**
+```python
+from bioservices import BioModels
+b = BioModels()
+```
+
+**Key Methods:**
+- `get_model_by_id(model_id)`: Retrieve SBML model
+
+**Use cases:**
+- Systems biology modeling
+- SBML model retrieval
+
+---
+
+### COG (Clusters of Orthologous Genes)
+
+Orthologous gene classification.
+
+**Initialization:**
+```python
+from bioservices import COG
+c = COG()
+```
+
+**Use cases:**
+- Orthology analysis
+- Functional classification
+
+---
+
+### BiGG Models
+
+Metabolic network models.
+
+**Initialization:**
+```python
+from bioservices import BiGG
+b = BiGG()
+```
+
+**Key Methods:**
+- `list_models()`: Available models
+- `get_model(model_id)`: Model details
+
+**Use cases:**
+- Metabolic network analysis
+- Flux balance analysis
+
+---
+
+## General Patterns
+
+### Error Handling
+
+All services may throw exceptions. Wrap calls in try-except:
+
+```python
+try:
+    result = service.method(params)
+    if result:
+        # Process result
+        pass
+except Exception as e:
+    print(f"Error: {e}")
+```
+
+### Verbosity Control
+
+Most services support `verbose` parameter:
+```python
+service = Service(verbose=False)  # Suppress HTTP logs
+```
+
+### Rate Limiting
+
+Services have timeouts and rate limits:
+```python
+service.TIMEOUT = 30  # Adjust timeout
+service.DELAY = 1     # Delay between requests (if supported)
+```
+
+### Output Formats
+
+Common format parameters:
+- `frmt`: "xml", "json", "tab", "txt", "fasta"
+- `format`: Service-specific variants
+
+### Caching
+
+Some services cache results:
+```python
+service.CACHE = True  # Enable caching
+service.clear_cache()  # Clear cache
+```
+
+## Additional Resources
+
+For detailed API documentation:
+- Official docs: https://bioservices.readthedocs.io/
+- Individual service docs linked from main page
+- Source code: https://github.com/cokelaer/bioservices