327 lines
10 KiB
Markdown
327 lines
10 KiB
Markdown
# KEGG Database Reference
|
|
|
|
## Overview
|
|
|
|
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource that maintains manually curated pathway maps and molecular interaction networks. It provides "wiring diagrams of molecular interactions, reactions and relations" for understanding biological systems.
|
|
|
|
**Base URL**: https://rest.kegg.jp
|
|
**Official Documentation**: https://www.kegg.jp/kegg/rest/keggapi.html
|
|
**Access Restrictions**: KEGG API is made available only for academic use by academic users.
|
|
|
|
## KEGG Databases
|
|
|
|
KEGG integrates 16 primary databases organized into systems information, genomic information, chemical information, and health information categories:
|
|
|
|
### Systems Information
|
|
- **PATHWAY**: Manually drawn pathway maps for metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development
|
|
- **MODULE**: Functional units and building blocks of pathways
|
|
- **BRITE**: Hierarchical classifications and ontologies
|
|
|
|
### Genomic Information
|
|
- **GENOME**: Complete genomes with annotations
|
|
- **GENES**: Gene catalogs for all organisms
|
|
- **ORTHOLOGY**: Ortholog groups (KO: KEGG Orthology)
|
|
- **SSDB**: Sequence similarity database
|
|
|
|
### Chemical Information
|
|
- **COMPOUND**: Metabolites and other chemical substances
|
|
- **GLYCAN**: Glycan structures
|
|
- **REACTION**: Chemical reactions
|
|
- **RCLASS**: Reaction class (chemical structure transformation patterns)
|
|
- **ENZYME**: Enzyme nomenclature
|
|
- **NETWORK**: Network variations
|
|
|
|
### Health Information
|
|
- **DISEASE**: Human diseases with genetic and environmental factors
|
|
- **DRUG**: Approved drugs with chemical structures and target information
|
|
- **DGROUP**: Drug groups
|
|
|
|
### External Database Links
|
|
KEGG cross-references to external databases including:
|
|
- **PubMed**: Literature references
|
|
- **NCBI Gene**: Gene database
|
|
- **UniProt**: Protein sequences
|
|
- **PubChem**: Chemical compounds
|
|
- **ChEBI**: Chemical entities of biological interest
|
|
|
|
## REST API Operations
|
|
|
|
### 1. INFO - Database Metadata
|
|
|
|
**Syntax**: `/info/<database>`
|
|
|
|
Retrieves release information and statistics for a database.
|
|
|
|
**Examples**:
|
|
- `/info/kegg` - KEGG system information
|
|
- `/info/pathway` - Pathway database information
|
|
- `/info/hsa` - Human organism information
|
|
|
|
### 2. LIST - Entry Listings
|
|
|
|
**Syntax**: `/list/<database>[/<organism>]`
|
|
|
|
Lists entry identifiers and associated names.
|
|
|
|
**Parameters**:
|
|
- `database` - Database name (pathway, enzyme, genes, etc.) or entry (hsa:10458)
|
|
- `organism` - Optional organism code (e.g., hsa for human, eco for E. coli)
|
|
|
|
**Examples**:
|
|
- `/list/pathway` - All reference pathways
|
|
- `/list/pathway/hsa` - Human-specific pathways
|
|
- `/list/hsa:10458+ece:Z5100` - Specific gene entries (max 10)
|
|
|
|
**Organism Codes**: Three or four letter codes
|
|
- `hsa` - Homo sapiens (human)
|
|
- `mmu` - Mus musculus (mouse)
|
|
- `dme` - Drosophila melanogaster (fruit fly)
|
|
- `sce` - Saccharomyces cerevisiae (yeast)
|
|
- `eco` - Escherichia coli K-12 MG1655
|
|
|
|
### 3. FIND - Search Entries
|
|
|
|
**Syntax**: `/find/<database>/<query>[/<option>]`
|
|
|
|
Searches for entries by keywords or molecular properties.
|
|
|
|
**Parameters**:
|
|
- `database` - Database to search
|
|
- `query` - Search term or molecular property
|
|
- `option` - Optional: `formula`, `exact_mass`, `mol_weight`
|
|
|
|
**Search Fields** (database dependent):
|
|
- ENTRY, NAME, SYMBOL, GENE_NAME, DESCRIPTION, DEFINITION
|
|
- ORGANISM, TAXONOMY, ORTHOLOGY, PATHWAY, etc.
|
|
|
|
**Examples**:
|
|
- `/find/genes/shiga toxin` - Keyword search in genes
|
|
- `/find/compound/C7H10N4O2/formula` - Exact formula match
|
|
- `/find/drug/300-310/exact_mass` - Mass range search (300-310 Da)
|
|
- `/find/compound/300-310/mol_weight` - Molecular weight range
|
|
|
|
### 4. GET - Retrieve Entries
|
|
|
|
**Syntax**: `/get/<entry>[+<entry>...][/<option>]`
|
|
|
|
Retrieves full database entries or specific data formats.
|
|
|
|
**Parameters**:
|
|
- `entry` - Entry ID(s) (max 10, joined with +)
|
|
- `option` - Output format (optional)
|
|
|
|
**Output Options**:
|
|
- `aaseq` - Amino acid sequences (FASTA)
|
|
- `ntseq` - Nucleotide sequences (FASTA)
|
|
- `mol` - MOL format (compounds/drugs)
|
|
- `kcf` - KCF format (KEGG Chemical Function, compounds/drugs)
|
|
- `image` - PNG image (pathway maps, single entry only)
|
|
- `kgml` - KGML XML (pathway structure, single entry only)
|
|
- `json` - JSON format (pathway only, single entry only)
|
|
|
|
**Examples**:
|
|
- `/get/hsa00010` - Glycolysis pathway (human)
|
|
- `/get/hsa:10458+ece:Z5100` - Multiple genes (max 10)
|
|
- `/get/hsa:10458/aaseq` - Protein sequence
|
|
- `/get/cpd:C00002` - ATP compound entry
|
|
- `/get/hsa05130/json` - Pathways in cancer as JSON
|
|
- `/get/hsa05130/image` - Pathway map as PNG
|
|
|
|
**Image Restrictions**: Only one entry allowed with image option
|
|
|
|
### 5. CONV - ID Conversion
|
|
|
|
**Syntax**: `/conv/<target_db>/<source_db>`
|
|
|
|
Converts identifiers between KEGG and external databases.
|
|
|
|
**Supported Conversions**:
|
|
- `ncbi-geneid` ↔ KEGG genes
|
|
- `ncbi-proteinid` ↔ KEGG genes
|
|
- `uniprot` ↔ KEGG genes
|
|
- `pubchem` ↔ KEGG compounds/drugs
|
|
- `chebi` ↔ KEGG compounds/drugs
|
|
|
|
**Examples**:
|
|
- `/conv/ncbi-geneid/hsa` - All human genes to NCBI Gene IDs
|
|
- `/conv/hsa/ncbi-geneid` - NCBI Gene IDs to human genes (reverse)
|
|
- `/conv/uniprot/hsa:10458` - Specific gene to UniProt
|
|
- `/conv/pubchem/compound` - All compounds to PubChem IDs
|
|
|
|
### 6. LINK - Cross-References
|
|
|
|
**Syntax**: `/link/<target_db>/<source_db>`
|
|
|
|
Finds related entries within and between KEGG databases.
|
|
|
|
**Common Links**:
|
|
- genes ↔ pathway
|
|
- pathway ↔ compound
|
|
- pathway ↔ enzyme
|
|
- genes ↔ orthology (KO)
|
|
- compound ↔ reaction
|
|
|
|
**Examples**:
|
|
- `/link/pathway/hsa` - All pathways linked to human genes
|
|
- `/link/genes/hsa00010` - Genes in glycolysis pathway
|
|
- `/link/pathway/hsa:10458` - Pathways containing specific gene
|
|
- `/link/compound/hsa00010` - Compounds in pathway
|
|
|
|
### 7. DDI - Drug-Drug Interactions
|
|
|
|
**Syntax**: `/ddi/<drug>[+<drug>...]`
|
|
|
|
Retrieves drug-drug interaction information extracted from Japanese drug labels.
|
|
|
|
**Parameters**:
|
|
- `drug` - Drug entry ID(s) (max 10, joined with +)
|
|
|
|
**Examples**:
|
|
- `/ddi/D00001` - Interactions for single drug
|
|
- `/ddi/D00001+D00002` - Interactions between multiple drugs
|
|
|
|
## Pathway Classification
|
|
|
|
KEGG organizes pathways into seven major categories:
|
|
|
|
### 1. Metabolism
|
|
Carbohydrate, energy, lipid, nucleotide, amino acid, glycan biosynthesis and metabolism, cofactor and vitamin metabolism, terpenoid and polyketide metabolism, secondary metabolite biosynthesis, xenobiotics biodegradation
|
|
|
|
**Example pathways**:
|
|
- `map00010` - Glycolysis / Gluconeogenesis
|
|
- `map00020` - Citrate cycle (TCA cycle)
|
|
- `map00190` - Oxidative phosphorylation
|
|
|
|
### 2. Genetic Information Processing
|
|
Transcription, translation, folding/sorting/degradation, replication and repair
|
|
|
|
**Example pathways**:
|
|
- `map03010` - Ribosome
|
|
- `map03020` - RNA polymerase
|
|
- `map03040` - Spliceosome
|
|
|
|
### 3. Environmental Information Processing
|
|
Membrane transport, signal transduction
|
|
|
|
**Example pathways**:
|
|
- `map02010` - ABC transporters
|
|
- `map04010` - MAPK signaling pathway
|
|
|
|
### 4. Cellular Processes
|
|
Transport and catabolism, cell growth and death, cellular community, cell motility
|
|
|
|
**Example pathways**:
|
|
- `map04140` - Autophagy
|
|
- `map04210` - Apoptosis
|
|
|
|
### 5. Organismal Systems
|
|
Immune, endocrine, circulatory, digestive, nervous, sensory, development, environmental adaptation
|
|
|
|
**Example pathways**:
|
|
- `map04610` - Complement and coagulation cascades
|
|
- `map04910` - Insulin signaling pathway
|
|
|
|
### 6. Human Diseases
|
|
Cancer, immune diseases, neurodegenerative diseases, cardiovascular diseases, metabolic diseases, infectious diseases
|
|
|
|
**Example pathways**:
|
|
- `map05200` - Pathways in cancer
|
|
- `map05010` - Alzheimer disease
|
|
|
|
### 7. Drug Development
|
|
Chronological classification and target-based classification
|
|
|
|
## Common Identifiers and Naming
|
|
|
|
### Pathway IDs
|
|
- `map#####` - Reference pathway (generic)
|
|
- `hsa#####` - Human-specific pathway
|
|
- `mmu#####` - Mouse-specific pathway
|
|
- Format: organism code + 5-digit number
|
|
|
|
### Gene IDs
|
|
- `hsa:10458` - Human gene (organism:gene_id)
|
|
- Format: organism code + colon + gene number
|
|
|
|
### Compound IDs
|
|
- `cpd:C00002` - ATP
|
|
- Format: cpd:C#####
|
|
|
|
### Drug IDs
|
|
- `dr:D00001` - Drug entry
|
|
- Format: dr:D#####
|
|
|
|
### Enzyme IDs
|
|
- `ec:1.1.1.1` - Alcohol dehydrogenase
|
|
- Format: ec:EC_number
|
|
|
|
### KO (KEGG Orthology) IDs
|
|
- `ko:K00001` - Ortholog group
|
|
- Format: ko:K#####
|
|
|
|
## API Limitations and Best Practices
|
|
|
|
### Rate Limits and Restrictions
|
|
- Maximum 10 entries per single operation (except image/kgml: 1 entry)
|
|
- Academic use only - commercial use requires separate licensing
|
|
- No explicit rate limit documented, but avoid rapid-fire requests
|
|
|
|
### HTTP Status Codes
|
|
- `200` - Success
|
|
- `400` - Bad request (syntax error in query)
|
|
- `404` - Not found (entry or database doesn't exist)
|
|
|
|
### Best Practices
|
|
1. Always check HTTP status codes in responses
|
|
2. For bulk operations, batch entries using + (up to 10)
|
|
3. Cache results locally to reduce API calls
|
|
4. Use specific organism codes when possible for faster results
|
|
5. For pathway visualization, use the web interface or KGML/JSON formats
|
|
6. Parse tab-delimited output carefully (consistent format across operations)
|
|
|
|
## Integration with Other Tools
|
|
|
|
### Biopython Integration
|
|
Biopython provides `Bio.KEGG.REST` module for easier Python integration:
|
|
```python
|
|
from Bio.KEGG import REST
|
|
result = REST.kegg_list("pathway").read()
|
|
```
|
|
|
|
### KEGGREST (R/Bioconductor)
|
|
R users can use the KEGGREST package:
|
|
```r
|
|
library(KEGGREST)
|
|
pathways <- keggList("pathway")
|
|
```
|
|
|
|
## Common Analysis Workflows
|
|
|
|
### Workflow 1: Gene to Pathway Mapping
|
|
1. Get gene ID(s) from your organism
|
|
2. Use `/link/pathway/<gene_id>` to find associated pathways
|
|
3. Use `/get/<pathway_id>` to retrieve detailed pathway information
|
|
|
|
### Workflow 2: Pathway Enrichment Context
|
|
1. Use `/list/pathway/<org>` to get all organism pathways
|
|
2. Use `/link/genes/<pathway_id>` to get genes in each pathway
|
|
3. Perform statistical enrichment analysis
|
|
|
|
### Workflow 3: Compound to Reaction Mapping
|
|
1. Use `/find/compound/<name>` to find compound ID
|
|
2. Use `/link/reaction/<compound_id>` to find reactions
|
|
3. Use `/link/pathway/<reaction_id>` to find pathways containing reactions
|
|
|
|
### Workflow 4: ID Conversion for Integration
|
|
1. Use `/conv/uniprot/<org>` to map KEGG genes to UniProt
|
|
2. Use `/conv/ncbi-geneid/<org>` to map to NCBI Gene IDs
|
|
3. Integrate with other databases using converted IDs
|
|
|
|
## Additional Resources
|
|
|
|
- **KEGG Mapper**: https://www.kegg.jp/kegg/mapper/ - Interactive pathway mapping
|
|
- **BlastKOALA**: Automated annotation for sequenced genomes
|
|
- **GhostKOALA**: Annotation for metagenomes and metatranscriptomes
|
|
- **KEGG Modules**: https://www.kegg.jp/kegg/module.html
|
|
- **KEGG Brite**: https://www.kegg.jp/kegg/brite.html
|