13 KiB
STRING Database API Reference
Overview
STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is a comprehensive database of known and predicted protein-protein interactions integrating data from over 40 sources.
Database Statistics (v12.0+):
- Coverage: 5000+ genomes
- Proteins: ~59.3 million
- Interactions: 20+ billion
- Data types: Physical interactions, functional associations, co-expression, co-occurrence, text-mining, databases
Core Data Resource: Designated by Global Biodata Coalition and ELIXIR
API Base URLs
- Current version: https://string-db.org/api
- Version-specific: https://version-12-0.string-db.org/api (for reproducibility)
- API documentation: https://string-db.org/help/api/
Best Practices
- Identifier Mapping: Always map identifiers first using
get_string_idsfor faster subsequent queries - Use STRING IDs: Prefer STRING identifiers (e.g.,
9606.ENSP00000269305) over gene names - Specify Species: For networks with >10 proteins, always specify species NCBI taxon ID
- Rate Limiting: Wait 1 second between API calls to avoid server overload
- Versioned URLs: Use version-specific URLs for reproducible research
- POST over GET: Use POST requests for large protein lists
- Caller Identity: Include
caller_identityparameter for tracking (e.g., your application name)
API Methods
1. Identifier Mapping (get_string_ids)
Purpose: Maps common protein names, gene symbols, UniProt IDs, and other identifiers to STRING identifiers.
Endpoint: /api/tsv/get_string_ids
Parameters:
identifiers(required): Protein names/IDs separated by newlines (%0d)species(required): NCBI taxon IDlimit: Number of matches per identifier (default: 1)echo_query: Include query term in output (1 or 0)caller_identity: Application identifier
Output Format: TSV with columns:
queryItem: Original queryqueryIndex: Query positionstringId: STRING identifierncbiTaxonId: Species taxon IDtaxonName: Species namepreferredName: Preferred gene nameannotation: Protein description
Example:
identifiers=TP53%0dBRCA1&species=9606&limit=1
Use cases:
- Converting gene symbols to STRING IDs
- Validating protein identifiers
- Finding canonical protein names
2. Network Data (network)
Purpose: Retrieves protein-protein interaction network data in tabular format.
Endpoint: /api/tsv/network
Parameters:
identifiers(required): Protein IDs separated by%0dspecies: NCBI taxon IDrequired_score: Confidence threshold 0-1000 (default: 400)- 150: low confidence
- 400: medium confidence
- 700: high confidence
- 900: highest confidence
network_type:functional(default) orphysicaladd_nodes: Add N interacting proteins (0-10)caller_identity: Application identifier
Output Format: TSV with columns:
stringId_A,stringId_B: Interacting proteinspreferredName_A,preferredName_B: Gene namesncbiTaxonId: Speciesscore: Combined interaction score (0-1000)nscore: Neighborhood scorefscore: Fusion scorepscore: Phylogenetic profile scoreascore: Coexpression scoreescore: Experimental scoredscore: Database scoretscore: Text-mining score
Network Types:
- Functional: All interaction evidence types (recommended for most analyses)
- Physical: Only direct physical binding evidence
Example:
identifiers=9606.ENSP00000269305%0d9606.ENSP00000275493&required_score=700
3. Network Image (image/network)
Purpose: Generates visual network representation as PNG image.
Endpoint: /api/image/network
Parameters:
identifiers(required): Protein IDs separated by%0dspecies: NCBI taxon IDrequired_score: Confidence threshold 0-1000network_flavor: Visualization styleevidence: Show evidence types as colored linesconfidence: Show confidence as line thicknessactions: Show activating/inhibiting interactions
add_nodes: Add N interacting proteins (0-10)caller_identity: Application identifier
Output: PNG image (binary data)
Image Specifications:
- Format: PNG
- Size: Automatically scaled based on network size
- High-resolution option available (add
?highres=1)
Example:
identifiers=TP53%0dMDM2&species=9606&network_flavor=evidence
4. Interaction Partners (interaction_partners)
Purpose: Retrieves all STRING interaction partners for given protein(s).
Endpoint: /api/tsv/interaction_partners
Parameters:
identifiers(required): Protein IDsspecies: NCBI taxon IDrequired_score: Confidence threshold 0-1000limit: Maximum number of partners (default: 10)caller_identity: Application identifier
Output Format: TSV with same columns as network method
Use cases:
- Finding hub proteins
- Expanding networks
- Discovery of novel interactions
Example:
identifiers=TP53&species=9606&limit=20&required_score=700
5. Functional Enrichment (enrichment)
Purpose: Performs functional enrichment analysis for a set of proteins across multiple annotation databases.
Endpoint: /api/tsv/enrichment
Parameters:
identifiers(required): List of protein IDsspecies(required): NCBI taxon IDcaller_identity: Application identifier
Enrichment Categories:
- Gene Ontology: Biological Process, Molecular Function, Cellular Component
- KEGG Pathways: Metabolic and signaling pathways
- Pfam: Protein domains
- InterPro: Protein families and domains
- SMART: Domain architecture
- UniProt Keywords: Curated functional keywords
Output Format: TSV with columns:
category: Annotation categoryterm: Term IDdescription: Term descriptionnumber_of_genes: Genes in input with this termnumber_of_genes_in_background: Total genes with this termncbiTaxonId: SpeciesinputGenes: Comma-separated gene listpreferredNames: Comma-separated gene namesp_value: Enrichment p-value (uncorrected)fdr: False discovery rate (corrected p-value)
Statistical Method: Fisher's exact test with Benjamini-Hochberg FDR correction
Example:
identifiers=TP53%0dMDM2%0dATM%0dCHEK2&species=9606
6. PPI Enrichment (ppi_enrichment)
Purpose: Tests if a network has significantly more interactions than expected by chance.
Endpoint: /api/json/ppi_enrichment
Parameters:
identifiers(required): List of protein IDsspecies: NCBI taxon IDrequired_score: Confidence thresholdcaller_identity: Application identifier
Output Format: JSON with fields:
number_of_nodes: Proteins in networknumber_of_edges: Interactions observedexpected_number_of_edges: Expected interactions (random)p_value: Statistical significance
Interpretation:
- p-value < 0.05: Network is significantly enriched
- Low p-value indicates proteins form functional module
Example:
identifiers=TP53%0dMDM2%0dATM%0dCHEK2&species=9606
7. Homology Scores (homology)
Purpose: Retrieves protein similarity/homology scores.
Endpoint: /api/tsv/homology
Parameters:
identifiers(required): Protein IDsspecies: NCBI taxon IDcaller_identity: Application identifier
Output Format: TSV with homology scores between proteins
Use cases:
- Identifying protein families
- Paralog analysis
- Cross-species comparisons
8. Version Information (version)
Purpose: Returns current STRING database version.
Endpoint: /api/tsv/version
Output: Version string (e.g., "12.0")
Common Species NCBI Taxon IDs
| Organism | Common Name | Taxon ID |
|---|---|---|
| Homo sapiens | Human | 9606 |
| Mus musculus | Mouse | 10090 |
| Rattus norvegicus | Rat | 10116 |
| Drosophila melanogaster | Fruit fly | 7227 |
| Caenorhabditis elegans | C. elegans | 6239 |
| Saccharomyces cerevisiae | Yeast | 4932 |
| Arabidopsis thaliana | Thale cress | 3702 |
| Escherichia coli K-12 | E. coli | 511145 |
| Danio rerio | Zebrafish | 7955 |
| Gallus gallus | Chicken | 9031 |
Full list: https://string-db.org/cgi/input?input_page_active_form=organisms
STRING Identifier Format
STRING uses Ensembl protein IDs with taxon prefix:
- Format:
{taxonId}.{ensemblProteinId} - Example:
9606.ENSP00000269305(human TP53)
ID Components:
- Taxon ID: NCBI taxonomy identifier
- Protein ID: Usually Ensembl protein ID (ENSP...)
Interaction Confidence Scores
STRING provides combined confidence scores (0-1000) based on multiple evidence channels:
Evidence Channels
- Neighborhood (nscore): Gene fusion and conserved genomic neighborhood
- Fusion (fscore): Gene fusion events across species
- Phylogenetic Profile (pscore): Co-occurrence across species
- Coexpression (ascore): RNA expression correlation
- Experimental (escore): Biochemical/genetic experiments
- Database (dscore): Curated pathway/complex databases
- Text-mining (tscore): Literature co-occurrence
Recommended Thresholds
- 150: Low confidence (exploratory analysis)
- 400: Medium confidence (standard analysis)
- 700: High confidence (conservative analysis)
- 900: Highest confidence (very stringent)
Output Formats
Available Formats
- TSV: Tab-separated values (default, best for data processing)
- JSON: JavaScript Object Notation (structured data)
- XML: Extensible Markup Language
- PSI-MI: Proteomics Standards Initiative format
- PSI-MITAB: Tab-delimited PSI-MI format
- PNG: Image format (for network visualizations)
- SVG: Scalable vector graphics (for network visualizations)
Format Selection
Replace /tsv/ in URL with desired format:
/json/network- JSON format/xml/network- XML format/image/network- PNG image
Error Handling
HTTP Status Codes
- 200 OK: Successful request
- 400 Bad Request: Invalid parameters or syntax
- 404 Not Found: Protein/species not found
- 500 Internal Server Error: Server error
Common Errors
- "No proteins found": Invalid identifiers or species mismatch
- "Species required": Missing species parameter for large networks
- Empty results: No interactions above score threshold
- Timeout: Network too large, reduce protein count
Advanced Features
Bulk Network Upload
For complete proteome analysis:
- Navigate to https://string-db.org/
- Select "Upload proteome" option
- Upload FASTA file
- STRING generates complete interaction network and predicts functions
Values/Ranks Enrichment API
For differential expression/proteomics data:
- Get API Key:
/api/json/get_api_key
-
Submit Data: Tab-separated protein ID and value pairs
-
Check Status:
/api/json/valuesranks_enrichment_status?job_id={id}
- Retrieve Results: Access enrichment tables and figures
Requirements:
- Complete protein set (no filtering)
- Numeric values for each protein
- Proper species identifier
Network Customization
Network Size Control:
add_nodes=N: Adds N most connected proteinslimit: Controls partner retrieval
Confidence Filtering:
- Adjust
required_scorebased on analysis goals - Higher scores = fewer false positives, more false negatives
Network Type Selection:
functional: All evidence (recommended for pathway analysis)physical: Direct binding only (recommended for structural studies)
Integration with Other Tools
Python Libraries
requests (recommended):
import requests
url = "https://string-db.org/api/tsv/network"
params = {"identifiers": "TP53", "species": 9606}
response = requests.get(url, params=params)
urllib (standard library):
import urllib.request
url = "https://string-db.org/api/tsv/network?identifiers=TP53&species=9606"
response = urllib.request.urlopen(url)
R Integration
STRINGdb Bioconductor package:
library(STRINGdb)
string_db <- STRINGdb$new(version="12", species=9606)
Cytoscape
STRING networks can be imported into Cytoscape for visualization and analysis:
- Use stringApp plugin
- Import TSV network data
- Apply layouts and styling
Data License
STRING data is freely available under Creative Commons BY 4.0 license:
- ✓ Free to use for academic and commercial purposes
- ✓ Attribution required
- ✓ Modifications allowed
- ✓ Redistribution allowed
Citation: Szklarczyk et al. (latest publication)
Rate Limits and Usage
- Rate limiting: No strict limit, but avoid rapid-fire requests
- Recommendation: Wait 1 second between calls
- Large datasets: Use bulk download from https://string-db.org/cgi/download
- Proteome-scale: Use web upload feature instead of API
Related Resources
- STRING website: https://string-db.org
- Download page: https://string-db.org/cgi/download
- Help center: https://string-db.org/help/
- API documentation: https://string-db.org/help/api/
- Publications: https://string-db.org/cgi/about
Troubleshooting
No results returned:
- Verify species parameter matches identifiers
- Check identifier format
- Lower confidence threshold
- Use identifier mapping first
Timeout errors:
- Reduce number of input proteins
- Split large queries into batches
- Use bulk download for proteome-scale analyses
Version inconsistencies:
- Use version-specific URLs
- Check STRING version with
/versionendpoint - Update identifiers if using old IDs