Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/kegg-database/references/kegg_reference.md
+++ b/skills/kegg-database/references/kegg_reference.md
@@ -0,0 +1,326 @@
+# KEGG Database Reference
+
+## Overview
+
+KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource that maintains manually curated pathway maps and molecular interaction networks. It provides "wiring diagrams of molecular interactions, reactions and relations" for understanding biological systems.
+
+**Base URL**: https://rest.kegg.jp
+**Official Documentation**: https://www.kegg.jp/kegg/rest/keggapi.html
+**Access Restrictions**: KEGG API is made available only for academic use by academic users.
+
+## KEGG Databases
+
+KEGG integrates 16 primary databases organized into systems information, genomic information, chemical information, and health information categories:
+
+### Systems Information
+- **PATHWAY**: Manually drawn pathway maps for metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development
+- **MODULE**: Functional units and building blocks of pathways
+- **BRITE**: Hierarchical classifications and ontologies
+
+### Genomic Information
+- **GENOME**: Complete genomes with annotations
+- **GENES**: Gene catalogs for all organisms
+- **ORTHOLOGY**: Ortholog groups (KO: KEGG Orthology)
+- **SSDB**: Sequence similarity database
+
+### Chemical Information
+- **COMPOUND**: Metabolites and other chemical substances
+- **GLYCAN**: Glycan structures
+- **REACTION**: Chemical reactions
+- **RCLASS**: Reaction class (chemical structure transformation patterns)
+- **ENZYME**: Enzyme nomenclature
+- **NETWORK**: Network variations
+
+### Health Information
+- **DISEASE**: Human diseases with genetic and environmental factors
+- **DRUG**: Approved drugs with chemical structures and target information
+- **DGROUP**: Drug groups
+
+### External Database Links
+KEGG cross-references to external databases including:
+- **PubMed**: Literature references
+- **NCBI Gene**: Gene database
+- **UniProt**: Protein sequences
+- **PubChem**: Chemical compounds
+- **ChEBI**: Chemical entities of biological interest
+
+## REST API Operations
+
+### 1. INFO - Database Metadata
+
+**Syntax**: `/info/<database>`
+
+Retrieves release information and statistics for a database.
+
+**Examples**:
+- `/info/kegg` - KEGG system information
+- `/info/pathway` - Pathway database information
+- `/info/hsa` - Human organism information
+
+### 2. LIST - Entry Listings
+
+**Syntax**: `/list/<database>[/<organism>]`
+
+Lists entry identifiers and associated names.
+
+**Parameters**:
+- `database` - Database name (pathway, enzyme, genes, etc.) or entry (hsa:10458)
+- `organism` - Optional organism code (e.g., hsa for human, eco for E. coli)
+
+**Examples**:
+- `/list/pathway` - All reference pathways
+- `/list/pathway/hsa` - Human-specific pathways
+- `/list/hsa:10458+ece:Z5100` - Specific gene entries (max 10)
+
+**Organism Codes**: Three or four letter codes
+- `hsa` - Homo sapiens (human)
+- `mmu` - Mus musculus (mouse)
+- `dme` - Drosophila melanogaster (fruit fly)
+- `sce` - Saccharomyces cerevisiae (yeast)
+- `eco` - Escherichia coli K-12 MG1655
+
+### 3. FIND - Search Entries
+
+**Syntax**: `/find/<database>/<query>[/<option>]`
+
+Searches for entries by keywords or molecular properties.
+
+**Parameters**:
+- `database` - Database to search
+- `query` - Search term or molecular property
+- `option` - Optional: `formula`, `exact_mass`, `mol_weight`
+
+**Search Fields** (database dependent):
+- ENTRY, NAME, SYMBOL, GENE_NAME, DESCRIPTION, DEFINITION
+- ORGANISM, TAXONOMY, ORTHOLOGY, PATHWAY, etc.
+
+**Examples**:
+- `/find/genes/shiga toxin` - Keyword search in genes
+- `/find/compound/C7H10N4O2/formula` - Exact formula match
+- `/find/drug/300-310/exact_mass` - Mass range search (300-310 Da)
+- `/find/compound/300-310/mol_weight` - Molecular weight range
+
+### 4. GET - Retrieve Entries
+
+**Syntax**: `/get/<entry>[+<entry>...][/<option>]`
+
+Retrieves full database entries or specific data formats.
+
+**Parameters**:
+- `entry` - Entry ID(s) (max 10, joined with +)
+- `option` - Output format (optional)
+
+**Output Options**:
+- `aaseq` - Amino acid sequences (FASTA)
+- `ntseq` - Nucleotide sequences (FASTA)
+- `mol` - MOL format (compounds/drugs)
+- `kcf` - KCF format (KEGG Chemical Function, compounds/drugs)
+- `image` - PNG image (pathway maps, single entry only)
+- `kgml` - KGML XML (pathway structure, single entry only)
+- `json` - JSON format (pathway only, single entry only)
+
+**Examples**:
+- `/get/hsa00010` - Glycolysis pathway (human)
+- `/get/hsa:10458+ece:Z5100` - Multiple genes (max 10)
+- `/get/hsa:10458/aaseq` - Protein sequence
+- `/get/cpd:C00002` - ATP compound entry
+- `/get/hsa05130/json` - Pathways in cancer as JSON
+- `/get/hsa05130/image` - Pathway map as PNG
+
+**Image Restrictions**: Only one entry allowed with image option
+
+### 5. CONV - ID Conversion
+
+**Syntax**: `/conv/<target_db>/<source_db>`
+
+Converts identifiers between KEGG and external databases.
+
+**Supported Conversions**:
+- `ncbi-geneid` ↔ KEGG genes
+- `ncbi-proteinid` ↔ KEGG genes
+- `uniprot` ↔ KEGG genes
+- `pubchem` ↔ KEGG compounds/drugs
+- `chebi` ↔ KEGG compounds/drugs
+
+**Examples**:
+- `/conv/ncbi-geneid/hsa` - All human genes to NCBI Gene IDs
+- `/conv/hsa/ncbi-geneid` - NCBI Gene IDs to human genes (reverse)
+- `/conv/uniprot/hsa:10458` - Specific gene to UniProt
+- `/conv/pubchem/compound` - All compounds to PubChem IDs
+
+### 6. LINK - Cross-References
+
+**Syntax**: `/link/<target_db>/<source_db>`
+
+Finds related entries within and between KEGG databases.
+
+**Common Links**:
+- genes ↔ pathway
+- pathway ↔ compound
+- pathway ↔ enzyme
+- genes ↔ orthology (KO)
+- compound ↔ reaction
+
+**Examples**:
+- `/link/pathway/hsa` - All pathways linked to human genes
+- `/link/genes/hsa00010` - Genes in glycolysis pathway
+- `/link/pathway/hsa:10458` - Pathways containing specific gene
+- `/link/compound/hsa00010` - Compounds in pathway
+
+### 7. DDI - Drug-Drug Interactions
+
+**Syntax**: `/ddi/<drug>[+<drug>...]`
+
+Retrieves drug-drug interaction information extracted from Japanese drug labels.
+
+**Parameters**:
+- `drug` - Drug entry ID(s) (max 10, joined with +)
+
+**Examples**:
+- `/ddi/D00001` - Interactions for single drug
+- `/ddi/D00001+D00002` - Interactions between multiple drugs
+
+## Pathway Classification
+
+KEGG organizes pathways into seven major categories:
+
+### 1. Metabolism
+Carbohydrate, energy, lipid, nucleotide, amino acid, glycan biosynthesis and metabolism, cofactor and vitamin metabolism, terpenoid and polyketide metabolism, secondary metabolite biosynthesis, xenobiotics biodegradation
+
+**Example pathways**:
+- `map00010` - Glycolysis / Gluconeogenesis
+- `map00020` - Citrate cycle (TCA cycle)
+- `map00190` - Oxidative phosphorylation
+
+### 2. Genetic Information Processing
+Transcription, translation, folding/sorting/degradation, replication and repair
+
+**Example pathways**:
+- `map03010` - Ribosome
+- `map03020` - RNA polymerase
+- `map03040` - Spliceosome
+
+### 3. Environmental Information Processing
+Membrane transport, signal transduction
+
+**Example pathways**:
+- `map02010` - ABC transporters
+- `map04010` - MAPK signaling pathway
+
+### 4. Cellular Processes
+Transport and catabolism, cell growth and death, cellular community, cell motility
+
+**Example pathways**:
+- `map04140` - Autophagy
+- `map04210` - Apoptosis
+
+### 5. Organismal Systems
+Immune, endocrine, circulatory, digestive, nervous, sensory, development, environmental adaptation
+
+**Example pathways**:
+- `map04610` - Complement and coagulation cascades
+- `map04910` - Insulin signaling pathway
+
+### 6. Human Diseases
+Cancer, immune diseases, neurodegenerative diseases, cardiovascular diseases, metabolic diseases, infectious diseases
+
+**Example pathways**:
+- `map05200` - Pathways in cancer
+- `map05010` - Alzheimer disease
+
+### 7. Drug Development
+Chronological classification and target-based classification
+
+## Common Identifiers and Naming
+
+### Pathway IDs
+- `map#####` - Reference pathway (generic)
+- `hsa#####` - Human-specific pathway
+- `mmu#####` - Mouse-specific pathway
+- Format: organism code + 5-digit number
+
+### Gene IDs
+- `hsa:10458` - Human gene (organism:gene_id)
+- Format: organism code + colon + gene number
+
+### Compound IDs
+- `cpd:C00002` - ATP
+- Format: cpd:C#####
+
+### Drug IDs
+- `dr:D00001` - Drug entry
+- Format: dr:D#####
+
+### Enzyme IDs
+- `ec:1.1.1.1` - Alcohol dehydrogenase
+- Format: ec:EC_number
+
+### KO (KEGG Orthology) IDs
+- `ko:K00001` - Ortholog group
+- Format: ko:K#####
+
+## API Limitations and Best Practices
+
+### Rate Limits and Restrictions
+- Maximum 10 entries per single operation (except image/kgml: 1 entry)
+- Academic use only - commercial use requires separate licensing
+- No explicit rate limit documented, but avoid rapid-fire requests
+
+### HTTP Status Codes
+- `200` - Success
+- `400` - Bad request (syntax error in query)
+- `404` - Not found (entry or database doesn't exist)
+
+### Best Practices
+1. Always check HTTP status codes in responses
+2. For bulk operations, batch entries using + (up to 10)
+3. Cache results locally to reduce API calls
+4. Use specific organism codes when possible for faster results
+5. For pathway visualization, use the web interface or KGML/JSON formats
+6. Parse tab-delimited output carefully (consistent format across operations)
+
+## Integration with Other Tools
+
+### Biopython Integration
+Biopython provides `Bio.KEGG.REST` module for easier Python integration:
+```python
+from Bio.KEGG import REST
+result = REST.kegg_list("pathway").read()
+```
+
+### KEGGREST (R/Bioconductor)
+R users can use the KEGGREST package:
+```r
+library(KEGGREST)
+pathways <- keggList("pathway")
+```
+
+## Common Analysis Workflows
+
+### Workflow 1: Gene to Pathway Mapping
+1. Get gene ID(s) from your organism
+2. Use `/link/pathway/<gene_id>` to find associated pathways
+3. Use `/get/<pathway_id>` to retrieve detailed pathway information
+
+### Workflow 2: Pathway Enrichment Context
+1. Use `/list/pathway/<org>` to get all organism pathways
+2. Use `/link/genes/<pathway_id>` to get genes in each pathway
+3. Perform statistical enrichment analysis
+
+### Workflow 3: Compound to Reaction Mapping
+1. Use `/find/compound/<name>` to find compound ID
+2. Use `/link/reaction/<compound_id>` to find reactions
+3. Use `/link/pathway/<reaction_id>` to find pathways containing reactions
+
+### Workflow 4: ID Conversion for Integration
+1. Use `/conv/uniprot/<org>` to map KEGG genes to UniProt
+2. Use `/conv/ncbi-geneid/<org>` to map to NCBI Gene IDs
+3. Integrate with other databases using converted IDs
+
+## Additional Resources
+
+- **KEGG Mapper**: https://www.kegg.jp/kegg/mapper/ - Interactive pathway mapping
+- **BlastKOALA**: Automated annotation for sequenced genomes
+- **GhostKOALA**: Annotation for metagenomes and metatranscriptomes
+- **KEGG Modules**: https://www.kegg.jp/kegg/module.html
+- **KEGG Brite**: https://www.kegg.jp/kegg/brite.html