6.7 KiB
6.7 KiB
UniProt Query Syntax Reference
Comprehensive guide to UniProt search query syntax for constructing complex searches.
Basic Syntax
Simple Queries
insulin
kinase
Field-Specific Searches
gene:BRCA1
accession:P12345
organism_name:human
protein_name:kinase
Boolean Operators
AND (both terms must be present)
insulin AND diabetes
kinase AND human
gene:BRCA1 AND reviewed:true
OR (either term can be present)
diabetes OR insulin
(cancer OR tumor) AND human
NOT (exclude terms)
kinase NOT human
protein_name:kinase NOT organism_name:mouse
Grouping with Parentheses
(diabetes OR insulin) AND reviewed:true
(gene:BRCA1 OR gene:BRCA2) AND organism_id:9606
Common Search Fields
Identification
accession:P12345- UniProt accession numberid:INSR_HUMAN- Entry namegene:BRCA1- Gene namegene_exact:BRCA1- Exact gene name match
Organism/Taxonomy
organism_name:human- Organism nameorganism_name:"Homo sapiens"- Exact organism name (use quotes for multi-word)organism_id:9606- NCBI taxonomy IDtaxonomy_id:9606- Same as organism_idtaxonomy_name:"Homo sapiens"- Taxonomy name
Protein Information
protein_name:insulin- Protein nameprotein_name:"insulin receptor"- Exact protein namereviewed:true- Only Swiss-Prot (reviewed) entriesreviewed:false- Only TrEMBL (unreviewed) entries
Sequence Properties
length:[100 TO 500]- Sequence length rangemass:[50000 TO 100000]- Molecular mass in Daltonssequence:MVLSPADKTNVK- Exact sequence matchfragment:false- Exclude fragment sequences
Gene Ontology (GO)
go:0005515- GO term ID (0005515 = protein binding)go_f:*- Any molecular functiongo_p:*- Any biological processgo_c:*- Any cellular component
Annotations
annotation:(type:signal)- Has signal peptide annotationannotation:(type:transmem)- Has transmembrane regioncc_function:*- Has function commentcc_interaction:*- Has interaction commentft_domain:*- Has domain feature
Database Cross-References
xref:pdb- Has PDB structurexref:ensembl- Has Ensembl referencedatabase:pdb- Same as xrefdatabase:(type:pdb)- Alternative syntax
Protein Families and Domains
family:"protein kinase"- Protein familykeyword:"Protein kinase"- Keyword annotationcc_similarity:*- Has similarity comment
Range Queries
Numeric Ranges
length:[100 TO 500] # Between 100 and 500
mass:[* TO 50000] # Less than or equal to 50000
created:[2023-01-01 TO *] # Created after Jan 1, 2023
Date Ranges
created:[2023-01-01 TO 2023-12-31]
modified:[2024-01-01 TO *]
Wildcards
Single Character (?)
gene:BRCA? # Matches BRCA1, BRCA2, etc.
Multiple Characters (*)
gene:BRCA* # Matches BRCA1, BRCA2, BRCA1P1, etc.
protein_name:kinase*
organism_name:Homo*
Advanced Searches
Existence Queries
cc_function:* # Has any function annotation
ft_domain:* # Has any domain feature
xref:pdb # Has PDB structure
Combined Complex Queries
# Human reviewed kinases with PDB structure
(protein_name:kinase OR family:kinase) AND organism_id:9606 AND reviewed:true AND xref:pdb
# Cancer-related proteins excluding mice
(disease:cancer OR keyword:cancer) NOT organism_name:mouse
# Membrane proteins with signal peptides
annotation:(type:transmem) AND annotation:(type:signal) AND reviewed:true
# Recently updated human proteins
organism_id:9606 AND modified:[2024-01-01 TO *] AND reviewed:true
Field-Specific Examples
Protein Names
protein_name:"insulin receptor" # Exact phrase
protein_name:insulin* # Starts with insulin
recommended_name:insulin # Recommended name only
alternative_name:insulin # Alternative names only
Genes
gene:BRCA1 # Gene symbol
gene_exact:BRCA1 # Exact gene match
olnName:BRCA1 # Ordered locus name
orfName:BRCA1 # ORF name
Organisms
organism_name:human # Common name
organism_name:"Homo sapiens" # Scientific name
organism_id:9606 # Taxonomy ID
lineage:primates # Taxonomic lineage
Features
ft_signal:* # Signal peptide
ft_transmem:* # Transmembrane region
ft_domain:"Protein kinase" # Specific domain
ft_binding:* # Binding site
ft_site:* # Any site
Comments (cc_)
cc_function:* # Function description
cc_catalytic_activity:* # Catalytic activity
cc_pathway:* # Pathway involvement
cc_interaction:* # Protein interactions
cc_subcellular_location:* # Subcellular location
cc_tissue_specificity:* # Tissue specificity
cc_disease:cancer # Disease association
Tips and Best Practices
-
Use quotes for exact phrases:
organism_name:"Homo sapiens"notorganism_name:Homo sapiens -
Filter by review status: Add
AND reviewed:truefor high-quality Swiss-Prot entries -
Combine wildcards carefully:
*kinase*may be too broad;kinase*is more specific -
Use parentheses for complex logic:
(A OR B) AND (C OR D)is clearer thanA OR B AND C OR D -
Numeric ranges are inclusive:
length:[100 TO 500]includes both 100 and 500 -
Field prefixes: Learn common prefixes:
cc_= Commentsft_= Featuresgo_= Gene Ontologyxref_= Cross-references
-
Check field names: Use the API's
/configure/uniprotkb/result-fieldsendpoint to see all available fields
Query Validation
Test queries using:
- Web interface: https://www.uniprot.org/uniprotkb
- API: https://rest.uniprot.org/uniprotkb/search?query=YOUR_QUERY
- API documentation: https://www.uniprot.org/help/query-fields
Common Patterns
Find well-characterized proteins
reviewed:true AND xref:pdb AND cc_function:*
Find disease-associated proteins
cc_disease:* AND organism_id:9606 AND reviewed:true
Find proteins with experimental evidence
existence:"Evidence at protein level" AND reviewed:true
Find secreted proteins
cc_subcellular_location:secreted AND reviewed:true
Find drug targets
keyword:"Pharmaceutical" OR keyword:"Drug target"
Resources
- Full query field reference: https://www.uniprot.org/help/query-fields
- API query documentation: https://www.uniprot.org/help/api_queries
- Text search documentation: https://www.uniprot.org/help/text-search