Files
gh-k-dense-ai-claude-scient…/skills/uniprot-database/references/query_syntax.md
2025-11-30 08:30:10 +08:00

257 lines
6.7 KiB
Markdown

# UniProt Query Syntax Reference
Comprehensive guide to UniProt search query syntax for constructing complex searches.
## Basic Syntax
### Simple Queries
```
insulin
kinase
```
### Field-Specific Searches
```
gene:BRCA1
accession:P12345
organism_name:human
protein_name:kinase
```
## Boolean Operators
### AND (both terms must be present)
```
insulin AND diabetes
kinase AND human
gene:BRCA1 AND reviewed:true
```
### OR (either term can be present)
```
diabetes OR insulin
(cancer OR tumor) AND human
```
### NOT (exclude terms)
```
kinase NOT human
protein_name:kinase NOT organism_name:mouse
```
### Grouping with Parentheses
```
(diabetes OR insulin) AND reviewed:true
(gene:BRCA1 OR gene:BRCA2) AND organism_id:9606
```
## Common Search Fields
### Identification
- `accession:P12345` - UniProt accession number
- `id:INSR_HUMAN` - Entry name
- `gene:BRCA1` - Gene name
- `gene_exact:BRCA1` - Exact gene name match
### Organism/Taxonomy
- `organism_name:human` - Organism name
- `organism_name:"Homo sapiens"` - Exact organism name (use quotes for multi-word)
- `organism_id:9606` - NCBI taxonomy ID
- `taxonomy_id:9606` - Same as organism_id
- `taxonomy_name:"Homo sapiens"` - Taxonomy name
### Protein Information
- `protein_name:insulin` - Protein name
- `protein_name:"insulin receptor"` - Exact protein name
- `reviewed:true` - Only Swiss-Prot (reviewed) entries
- `reviewed:false` - Only TrEMBL (unreviewed) entries
### Sequence Properties
- `length:[100 TO 500]` - Sequence length range
- `mass:[50000 TO 100000]` - Molecular mass in Daltons
- `sequence:MVLSPADKTNVK` - Exact sequence match
- `fragment:false` - Exclude fragment sequences
### Gene Ontology (GO)
- `go:0005515` - GO term ID (0005515 = protein binding)
- `go_f:* ` - Any molecular function
- `go_p:*` - Any biological process
- `go_c:*` - Any cellular component
### Annotations
- `annotation:(type:signal)` - Has signal peptide annotation
- `annotation:(type:transmem)` - Has transmembrane region
- `cc_function:*` - Has function comment
- `cc_interaction:*` - Has interaction comment
- `ft_domain:*` - Has domain feature
### Database Cross-References
- `xref:pdb` - Has PDB structure
- `xref:ensembl` - Has Ensembl reference
- `database:pdb` - Same as xref
- `database:(type:pdb)` - Alternative syntax
### Protein Families and Domains
- `family:"protein kinase"` - Protein family
- `keyword:"Protein kinase"` - Keyword annotation
- `cc_similarity:*` - Has similarity comment
## Range Queries
### Numeric Ranges
```
length:[100 TO 500] # Between 100 and 500
mass:[* TO 50000] # Less than or equal to 50000
created:[2023-01-01 TO *] # Created after Jan 1, 2023
```
### Date Ranges
```
created:[2023-01-01 TO 2023-12-31]
modified:[2024-01-01 TO *]
```
## Wildcards
### Single Character (?)
```
gene:BRCA? # Matches BRCA1, BRCA2, etc.
```
### Multiple Characters (*)
```
gene:BRCA* # Matches BRCA1, BRCA2, BRCA1P1, etc.
protein_name:kinase*
organism_name:Homo*
```
## Advanced Searches
### Existence Queries
```
cc_function:* # Has any function annotation
ft_domain:* # Has any domain feature
xref:pdb # Has PDB structure
```
### Combined Complex Queries
```
# Human reviewed kinases with PDB structure
(protein_name:kinase OR family:kinase) AND organism_id:9606 AND reviewed:true AND xref:pdb
# Cancer-related proteins excluding mice
(disease:cancer OR keyword:cancer) NOT organism_name:mouse
# Membrane proteins with signal peptides
annotation:(type:transmem) AND annotation:(type:signal) AND reviewed:true
# Recently updated human proteins
organism_id:9606 AND modified:[2024-01-01 TO *] AND reviewed:true
```
## Field-Specific Examples
### Protein Names
```
protein_name:"insulin receptor" # Exact phrase
protein_name:insulin* # Starts with insulin
recommended_name:insulin # Recommended name only
alternative_name:insulin # Alternative names only
```
### Genes
```
gene:BRCA1 # Gene symbol
gene_exact:BRCA1 # Exact gene match
olnName:BRCA1 # Ordered locus name
orfName:BRCA1 # ORF name
```
### Organisms
```
organism_name:human # Common name
organism_name:"Homo sapiens" # Scientific name
organism_id:9606 # Taxonomy ID
lineage:primates # Taxonomic lineage
```
### Features
```
ft_signal:* # Signal peptide
ft_transmem:* # Transmembrane region
ft_domain:"Protein kinase" # Specific domain
ft_binding:* # Binding site
ft_site:* # Any site
```
### Comments (cc_)
```
cc_function:* # Function description
cc_catalytic_activity:* # Catalytic activity
cc_pathway:* # Pathway involvement
cc_interaction:* # Protein interactions
cc_subcellular_location:* # Subcellular location
cc_tissue_specificity:* # Tissue specificity
cc_disease:cancer # Disease association
```
## Tips and Best Practices
1. **Use quotes for exact phrases**: `organism_name:"Homo sapiens"` not `organism_name:Homo sapiens`
2. **Filter by review status**: Add `AND reviewed:true` for high-quality Swiss-Prot entries
3. **Combine wildcards carefully**: `*kinase*` may be too broad; `kinase*` is more specific
4. **Use parentheses for complex logic**: `(A OR B) AND (C OR D)` is clearer than `A OR B AND C OR D`
5. **Numeric ranges are inclusive**: `length:[100 TO 500]` includes both 100 and 500
6. **Field prefixes**: Learn common prefixes:
- `cc_` = Comments
- `ft_` = Features
- `go_` = Gene Ontology
- `xref_` = Cross-references
7. **Check field names**: Use the API's `/configure/uniprotkb/result-fields` endpoint to see all available fields
## Query Validation
Test queries using:
- **Web interface**: https://www.uniprot.org/uniprotkb
- **API**: https://rest.uniprot.org/uniprotkb/search?query=YOUR_QUERY
- **API documentation**: https://www.uniprot.org/help/query-fields
## Common Patterns
### Find well-characterized proteins
```
reviewed:true AND xref:pdb AND cc_function:*
```
### Find disease-associated proteins
```
cc_disease:* AND organism_id:9606 AND reviewed:true
```
### Find proteins with experimental evidence
```
existence:"Evidence at protein level" AND reviewed:true
```
### Find secreted proteins
```
cc_subcellular_location:secreted AND reviewed:true
```
### Find drug targets
```
keyword:"Pharmaceutical" OR keyword:"Drug target"
```
## Resources
- Full query field reference: https://www.uniprot.org/help/query-fields
- API query documentation: https://www.uniprot.org/help/api_queries
- Text search documentation: https://www.uniprot.org/help/text-search