6.5 KiB
6.5 KiB
UniProt ID Mapping Databases
Complete list of databases supported by the UniProt ID Mapping service. Use these database names when calling the ID mapping API.
Retrieving Database List Programmatically
import requests
response = requests.get("https://rest.uniprot.org/configure/idmapping/fields")
databases = response.json()
UniProt Databases
UniProtKB
UniProtKB_AC-ID- UniProt accession and IDUniProtKB- UniProt KnowledgebaseUniProtKB-Swiss-Prot- Reviewed (Swiss-Prot)UniProtKB-TrEMBL- Unreviewed (TrEMBL)UniParc- UniProt ArchiveUniRef50- UniRef 50% identity clustersUniRef90- UniRef 90% identity clustersUniRef100- UniRef 100% identity clusters
Sequence Databases
Nucleotide Sequence
EMBL- EMBL/GenBank/DDBJEMBL-CDS- EMBL coding sequencesRefSeq_Nucleotide- RefSeq nucleotide sequencesCCDS- Consensus CDS
Protein Sequence
RefSeq_Protein- RefSeq protein sequencesPIR- Protein Information Resource
Gene Databases
GeneID- Entrez GeneGene_Name- Gene nameGene_Synonym- Gene synonymGene_OrderedLocusName- Ordered locus nameGene_ORFName- ORF name
Genome Databases
General
Ensembl- EnsemblEnsemblGenomes- Ensembl GenomesEnsemblGenomes_PRO- Ensembl Genomes proteinEnsemblGenomes_TRS- Ensembl Genomes transcriptEnsembl_PRO- Ensembl proteinEnsembl_TRS- Ensembl transcript
Organism-Specific
KEGG- KEGG GenesPATRIC- PATRICUCSC- UCSC Genome BrowserVectorBase- VectorBaseWBParaSite- WormBase ParaSite
Structure Databases
PDB- Protein Data BankAlphaFoldDB- AlphaFold DatabaseBMRB- Biological Magnetic Resonance Data BankPDBsum- PDB summarySASBDB- Small Angle Scattering Biological Data BankSMR- SWISS-MODEL Repository
Protein Family and Domain Databases
InterPro- InterProPfam- Pfam protein familiesPROSITE- PROSITESMART- SMART domainsCDD- Conserved Domain DatabaseHAMAP- HAMAPPANTHER- PANTHERPRINTS- PRINTSProDom- ProDomSFLD- Structure-Function Linkage DatabaseSUPFAM- SUPERFAMILYTIGRFAMs- TIGRFAMs
Organism-Specific Databases
Model Organisms
MGI- Mouse Genome InformaticsRGD- Rat Genome DatabaseFlyBase- FlyBase (Drosophila)WormBase- WormBase (C. elegans)Xenbase- Xenbase (Xenopus)ZFIN- Zebrafish Information NetworkdictyBase- dictyBase (Dictyostelium)EcoGene- EcoGene (E. coli)SGD- Saccharomyces Genome Database (yeast)PomBase- PomBase (S. pombe)TAIR- The Arabidopsis Information Resource
Human-Specific
HGNC- HUGO Gene Nomenclature CommitteeCCDS- Consensus Coding Sequence Database
Pathway Databases
Reactome- ReactomeBioCyc- BioCycPlantReactome- Plant ReactomeSIGNOR- SIGNORSignaLink- SignaLink
Enzyme and Metabolism
EC- Enzyme Commission numberBRENDA- BRENDA enzyme databaseSABIO-RK- SABIO-RK (biochemical reactions)MetaCyc- MetaCyc
Disease and Phenotype Databases
OMIM- Online Mendelian Inheritance in ManMIM- MIM (same as OMIM)OrphaNet- Orphanet (rare diseases)DisGeNET- DisGeNETMalaCards- MalaCardsCTD- Comparative Toxicogenomics DatabaseOpenTargets- Open Targets
Drug and Chemical Databases
ChEMBL- ChEMBLDrugBank- DrugBankDrugCentral- DrugCentralGuidetoPHARMACOLOGY- Guide to PharmacologySwissLipids- SwissLipids
Gene Expression Databases
Bgee- Bgee gene expressionExpressionAtlas- Expression AtlasGenevisible- GenevisibleCleanEx- CleanEx
Proteomics Databases
PRIDE- PRIDE proteomicsPeptideAtlas- PeptideAtlasProteomicsDB- ProteomicsDBCPTAC- CPTACjPOST- jPOSTMassIVE- MassIVEMaxQB- MaxQBPaxDb- PaxDbTopDownProteomics- Top Down Proteomics
Protein-Protein Interaction
STRING- STRINGBioGRID- BioGRIDIntAct- IntActMINT- MINTDIP- Database of Interacting ProteinsComplexPortal- Complex Portal
Ontologies
GO- Gene OntologyGeneTree- Ensembl GeneTreeHOGENOM- HOGENOMHOVERGEN- HOVERGENKO- KEGG OrthologyOMA- OMA orthologyOrthoDB- OrthoDBTreeFam- TreeFam
Other Specialized Databases
Glycosylation
GlyConnect- GlyConnectGlyGen- GlyGen
Protein Modifications
PhosphoSitePlus- PhosphoSitePlusiPTMnet- iPTMnet
Antibodies
Antibodypedia- AntibodypediaDNASU- DNASU
Protein Localization
COMPARTMENTS- COMPARTMENTSNeXtProt- NeXtProt (human proteins)
Evolution and Phylogeny
eggNOG- eggNOGGeneTree- Ensembl GeneTreeInParanoid- InParanoid
Technical Resources
PRO- Protein OntologyGenomeRNAi- GenomeRNAiPubMed- PubMed literature references
Common Mapping Scenarios
Example 1: UniProt to PDB
from_db = "UniProtKB_AC-ID"
to_db = "PDB"
ids = ["P01308", "P04637"]
Example 2: Gene Name to UniProt
from_db = "Gene_Name"
to_db = "UniProtKB"
ids = ["BRCA1", "TP53", "INSR"]
Example 3: UniProt to Ensembl
from_db = "UniProtKB_AC-ID"
to_db = "Ensembl"
ids = ["P12345"]
Example 4: RefSeq to UniProt
from_db = "RefSeq_Protein"
to_db = "UniProtKB"
ids = ["NP_000207.1"]
Example 5: UniProt to GO Terms
from_db = "UniProtKB_AC-ID"
to_db = "GO"
ids = ["P01308"]
Usage Notes
-
Database names are case-sensitive: Use exact names as listed
-
Many-to-many mappings: One ID may map to multiple target IDs
-
Failed mappings: Some IDs may not have mappings; check the
failedIdsfield in results -
Batch size limit: Maximum 100,000 IDs per job
-
Result expiration: Results are stored for 7 days
-
Bidirectional mapping: Most databases support mapping in both directions
API Endpoints
Get available databases
GET https://rest.uniprot.org/configure/idmapping/fields
Submit mapping job
POST https://rest.uniprot.org/idmapping/run
Content-Type: application/x-www-form-urlencoded
from={from_db}&to={to_db}&ids={comma_separated_ids}
Check job status
GET https://rest.uniprot.org/idmapping/status/{jobId}
Get results
GET https://rest.uniprot.org/idmapping/results/{jobId}
Resources
- ID Mapping tool: https://www.uniprot.org/id-mapping
- API documentation: https://www.uniprot.org/help/id_mapping
- Programmatic access: https://www.uniprot.org/help/api_idmapping