Files
gh-k-dense-ai-claude-scient…/skills/uniprot-database/references/api_fields.md
2025-11-30 08:30:10 +08:00

276 lines
7.5 KiB
Markdown

# UniProt API Fields Reference
Complete list of available fields for customizing UniProt API queries. Use these fields with the `fields` parameter to retrieve only the data you need.
## Usage
Add fields parameter to your query:
```
https://rest.uniprot.org/uniprotkb/search?query=insulin&fields=accession,gene_names,organism_name,length
```
Multiple fields are comma-separated. No spaces after commas.
## Core Fields
### Identification
- `accession` - Primary accession number (e.g., P12345)
- `id` - Entry name (e.g., INSR_HUMAN)
- `uniprotkb_id` - Same as id
- `entryType` - REVIEWED (Swiss-Prot) or UNREVIEWED (TrEMBL)
### Protein Names
- `protein_name` - Recommended and alternative protein names
- `gene_names` - Gene name(s)
- `gene_primary` - Primary gene name
- `gene_synonym` - Gene synonyms
- `gene_oln` - Ordered locus names
- `gene_orf` - ORF names
### Organism Information
- `organism_name` - Organism scientific name
- `organism_id` - NCBI taxonomy identifier
- `lineage` - Taxonomic lineage
- `virus_hosts` - Virus host organisms (for viral proteins)
### Sequence Information
- `sequence` - Amino acid sequence
- `length` - Sequence length
- `mass` - Molecular mass (Daltons)
- `fragment` - Whether entry is a fragment
- `checksum` - Sequence CRC64 checksum
## Annotation Fields
### Function and Biology
- `cc_function` - Function description
- `cc_catalytic_activity` - Catalytic activity
- `cc_activity_regulation` - Activity regulation
- `cc_pathway` - Metabolic pathway information
- `cc_cofactor` - Cofactor information
### Interaction and Localization
- `cc_interaction` - Protein-protein interactions
- `cc_subunit` - Subunit structure
- `cc_subcellular_location` - Subcellular location
- `cc_tissue_specificity` - Tissue specificity
- `cc_developmental_stage` - Developmental stage expression
### Disease and Phenotype
- `cc_disease` - Disease associations
- `cc_disruption_phenotype` - Disruption phenotype
- `cc_allergen` - Allergen information
- `cc_toxic_dose` - Toxic dose information
### Post-translational Modifications
- `cc_ptm` - Post-translational modifications
- `cc_mass_spectrometry` - Mass spectrometry data
### Other Comments
- `cc_alternative_products` - Alternative products (isoforms)
- `cc_polymorphism` - Polymorphism information
- `cc_rna_editing` - RNA editing
- `cc_caution` - Caution notes
- `cc_miscellaneous` - Miscellaneous information
- `cc_similarity` - Sequence similarities
- `cc_sequence_caution` - Sequence caution
- `cc_web_resource` - Web resources
## Feature Fields (ft_)
### Molecular Processing
- `ft_signal` - Signal peptide
- `ft_transit` - Transit peptide
- `ft_init_met` - Initiator methionine
- `ft_propep` - Propeptide
- `ft_chain` - Chain (mature protein)
- `ft_peptide` - Peptide
### Regions and Sites
- `ft_domain` - Domain
- `ft_repeat` - Repeat
- `ft_ca_bind` - Calcium binding
- `ft_zn_fing` - Zinc finger
- `ft_dna_bind` - DNA binding
- `ft_np_bind` - Nucleotide binding
- `ft_region` - Region of interest
- `ft_coiled` - Coiled coil
- `ft_motif` - Short sequence motif
- `ft_compbias` - Compositional bias
### Sites and Modifications
- `ft_act_site` - Active site
- `ft_metal` - Metal binding
- `ft_binding` - Binding site
- `ft_site` - Site
- `ft_mod_res` - Modified residue
- `ft_lipid` - Lipidation
- `ft_carbohyd` - Glycosylation
- `ft_disulfid` - Disulfide bond
- `ft_crosslnk` - Cross-link
### Structural Features
- `ft_helix` - Helix
- `ft_strand` - Beta strand
- `ft_turn` - Turn
- `ft_transmem` - Transmembrane region
- `ft_intramem` - Intramembrane region
- `ft_topo_dom` - Topological domain
### Variation and Conflict
- `ft_variant` - Natural variant
- `ft_var_seq` - Alternative sequence
- `ft_mutagen` - Mutagenesis
- `ft_unsure` - Unsure residue
- `ft_conflict` - Sequence conflict
- `ft_non_cons` - Non-consecutive residues
- `ft_non_ter` - Non-terminal residue
- `ft_non_std` - Non-standard residue
## Gene Ontology (GO)
- `go` - All GO terms
- `go_p` - Biological process
- `go_c` - Cellular component
- `go_f` - Molecular function
- `go_id` - GO term identifiers
## Cross-References (xref_)
### Sequence Databases
- `xref_embl` - EMBL/GenBank/DDBJ
- `xref_refseq` - RefSeq
- `xref_ccds` - CCDS
- `xref_pir` - PIR
### 3D Structure Databases
- `xref_pdb` - Protein Data Bank
- `xref_pcddb` - PCD database
- `xref_alphafolddb` - AlphaFold database
- `xref_smr` - SWISS-MODEL Repository
### Protein Family/Domain Databases
- `xref_interpro` - InterPro
- `xref_pfam` - Pfam
- `xref_prosite` - PROSITE
- `xref_smart` - SMART
### Genome Databases
- `xref_ensembl` - Ensembl
- `xref_ensemblgenomes` - Ensembl Genomes
- `xref_geneid` - Entrez Gene
- `xref_kegg` - KEGG
### Organism-Specific Databases
- `xref_mgi` - MGI (mouse)
- `xref_rgd` - RGD (rat)
- `xref_flybase` - FlyBase (fly)
- `xref_wormbase` - WormBase (worm)
- `xref_xenbase` - Xenbase (frog)
- `xref_zfin` - ZFIN (zebrafish)
### Pathway Databases
- `xref_reactome` - Reactome
- `xref_signor` - SIGNOR
- `xref_signalink` - SignaLink
### Disease Databases
- `xref_disgenet` - DisGeNET
- `xref_malacards` - MalaCards
- `xref_omim` - OMIM
- `xref_orphanet` - Orphanet
### Drug Databases
- `xref_chembl` - ChEMBL
- `xref_drugbank` - DrugBank
- `xref_guidetopharmacology` - Guide to Pharmacology
### Expression Databases
- `xref_bgee` - Bgee
- `xref_expressionetatlas` - Expression Atlas
- `xref_genevisible` - Genevisible
## Metadata Fields
### Dates
- `date_created` - Entry creation date
- `date_modified` - Last modification date
- `date_sequence_modified` - Last sequence modification date
### Evidence and Quality
- `annotation_score` - Annotation score (1-5)
- `protein_existence` - Protein existence level
- `reviewed` - Whether entry is reviewed (Swiss-Prot)
### Literature
- `lit_pubmed_id` - PubMed identifiers
- `lit_doi` - DOI identifiers
### Proteomics
- `proteome` - Proteome identifier
- `tools` - Tools used for annotation
## Retrieving Available Fields Programmatically
Use the configuration endpoint to get all available fields:
```bash
curl https://rest.uniprot.org/configure/uniprotkb/result-fields
```
Or in Python:
```python
import requests
response = requests.get("https://rest.uniprot.org/configure/uniprotkb/result-fields")
fields = response.json()
```
## Common Field Combinations
### Basic protein information
```
fields=accession,id,protein_name,gene_names,organism_name,length
```
### Sequence and structure
```
fields=accession,sequence,length,mass,xref_pdb,xref_alphafolddb
```
### Functional annotation
```
fields=accession,protein_name,cc_function,cc_catalytic_activity,cc_pathway,go
```
### Disease information
```
fields=accession,protein_name,gene_names,cc_disease,xref_omim,xref_malacards
```
### Expression patterns
```
fields=accession,gene_names,cc_tissue_specificity,cc_developmental_stage,xref_bgee
```
### Complete annotation
```
fields=accession,id,protein_name,gene_names,organism_name,sequence,length,cc_*,ft_*,go,xref_pdb
```
## Notes
1. **Wildcards**: Some fields support wildcards (e.g., `cc_*` for all comment fields, `ft_*` for all features)
2. **Performance**: Requesting fewer fields improves response time and reduces bandwidth
3. **Format dependency**: Some fields may be formatted differently depending on output format (JSON vs TSV)
4. **Null values**: Fields without data may be omitted from response (JSON) or empty (TSV)
5. **Arrays vs strings**: In JSON format, many fields return arrays of objects rather than simple strings
## Resources
- Interactive field explorer: https://www.uniprot.org/api-documentation
- API fields endpoint: https://rest.uniprot.org/configure/uniprotkb/result-fields
- Return fields documentation: https://www.uniprot.org/help/return_fields