zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

10 KiB

Raw Blame History

KEGG Database Reference

Overview

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource that maintains manually curated pathway maps and molecular interaction networks. It provides "wiring diagrams of molecular interactions, reactions and relations" for understanding biological systems.

Base URL: https://rest.kegg.jp Official Documentation: https://www.kegg.jp/kegg/rest/keggapi.html Access Restrictions: KEGG API is made available only for academic use by academic users.

KEGG Databases

KEGG integrates 16 primary databases organized into systems information, genomic information, chemical information, and health information categories:

Systems Information

PATHWAY: Manually drawn pathway maps for metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development
MODULE: Functional units and building blocks of pathways
BRITE: Hierarchical classifications and ontologies

Genomic Information

GENOME: Complete genomes with annotations
GENES: Gene catalogs for all organisms
ORTHOLOGY: Ortholog groups (KO: KEGG Orthology)
SSDB: Sequence similarity database

Chemical Information

COMPOUND: Metabolites and other chemical substances
GLYCAN: Glycan structures
REACTION: Chemical reactions
RCLASS: Reaction class (chemical structure transformation patterns)
ENZYME: Enzyme nomenclature
NETWORK: Network variations

Health Information

DISEASE: Human diseases with genetic and environmental factors
DRUG: Approved drugs with chemical structures and target information
DGROUP: Drug groups

External Database Links

KEGG cross-references to external databases including:

PubMed: Literature references
NCBI Gene: Gene database
UniProt: Protein sequences
PubChem: Chemical compounds
ChEBI: Chemical entities of biological interest

REST API Operations

1. INFO - Database Metadata

Syntax: /info/<database>

Retrieves release information and statistics for a database.

Examples:

/info/kegg - KEGG system information
/info/pathway - Pathway database information
/info/hsa - Human organism information

2. LIST - Entry Listings

Syntax: /list/<database>[/<organism>]

Lists entry identifiers and associated names.

Parameters:

database - Database name (pathway, enzyme, genes, etc.) or entry (hsa:10458)
organism - Optional organism code (e.g., hsa for human, eco for E. coli)

Examples:

/list/pathway - All reference pathways
/list/pathway/hsa - Human-specific pathways
/list/hsa:10458+ece:Z5100 - Specific gene entries (max 10)

Organism Codes: Three or four letter codes

hsa - Homo sapiens (human)
mmu - Mus musculus (mouse)
dme - Drosophila melanogaster (fruit fly)
sce - Saccharomyces cerevisiae (yeast)
eco - Escherichia coli K-12 MG1655

3. FIND - Search Entries

Syntax: /find/<database>/<query>[/<option>]

Searches for entries by keywords or molecular properties.

Parameters:

database - Database to search
query - Search term or molecular property
option - Optional: formula, exact_mass, mol_weight

Search Fields (database dependent):

ENTRY, NAME, SYMBOL, GENE_NAME, DESCRIPTION, DEFINITION
ORGANISM, TAXONOMY, ORTHOLOGY, PATHWAY, etc.

Examples:

/find/genes/shiga toxin - Keyword search in genes
/find/compound/C7H10N4O2/formula - Exact formula match
/find/drug/300-310/exact_mass - Mass range search (300-310 Da)
/find/compound/300-310/mol_weight - Molecular weight range

4. GET - Retrieve Entries

Syntax: /get/<entry>[+<entry>...][/<option>]

Retrieves full database entries or specific data formats.

Parameters:

entry - Entry ID(s) (max 10, joined with +)
option - Output format (optional)

Output Options:

aaseq - Amino acid sequences (FASTA)
ntseq - Nucleotide sequences (FASTA)
mol - MOL format (compounds/drugs)
kcf - KCF format (KEGG Chemical Function, compounds/drugs)
image - PNG image (pathway maps, single entry only)
kgml - KGML XML (pathway structure, single entry only)
json - JSON format (pathway only, single entry only)

Examples:

/get/hsa00010 - Glycolysis pathway (human)
/get/hsa:10458+ece:Z5100 - Multiple genes (max 10)
/get/hsa:10458/aaseq - Protein sequence
/get/cpd:C00002 - ATP compound entry
/get/hsa05130/json - Pathways in cancer as JSON
/get/hsa05130/image - Pathway map as PNG

Image Restrictions: Only one entry allowed with image option

5. CONV - ID Conversion

Syntax: /conv/<target_db>/<source_db>

Converts identifiers between KEGG and external databases.

Supported Conversions:

ncbi-geneid ↔ KEGG genes
ncbi-proteinid ↔ KEGG genes
uniprot ↔ KEGG genes
pubchem ↔ KEGG compounds/drugs
chebi ↔ KEGG compounds/drugs

Examples:

/conv/ncbi-geneid/hsa - All human genes to NCBI Gene IDs
/conv/hsa/ncbi-geneid - NCBI Gene IDs to human genes (reverse)
/conv/uniprot/hsa:10458 - Specific gene to UniProt
/conv/pubchem/compound - All compounds to PubChem IDs

6. LINK - Cross-References

Syntax: /link/<target_db>/<source_db>

Finds related entries within and between KEGG databases.

Common Links:

genes ↔ pathway
pathway ↔ compound
pathway ↔ enzyme
genes ↔ orthology (KO)
compound ↔ reaction

Examples:

/link/pathway/hsa - All pathways linked to human genes
/link/genes/hsa00010 - Genes in glycolysis pathway
/link/pathway/hsa:10458 - Pathways containing specific gene
/link/compound/hsa00010 - Compounds in pathway

7. DDI - Drug-Drug Interactions

Syntax: /ddi/<drug>[+<drug>...]

Retrieves drug-drug interaction information extracted from Japanese drug labels.

Parameters:

drug - Drug entry ID(s) (max 10, joined with +)

Examples:

/ddi/D00001 - Interactions for single drug
/ddi/D00001+D00002 - Interactions between multiple drugs

Pathway Classification

KEGG organizes pathways into seven major categories:

1. Metabolism

Carbohydrate, energy, lipid, nucleotide, amino acid, glycan biosynthesis and metabolism, cofactor and vitamin metabolism, terpenoid and polyketide metabolism, secondary metabolite biosynthesis, xenobiotics biodegradation

Example pathways:

map00010 - Glycolysis / Gluconeogenesis
map00020 - Citrate cycle (TCA cycle)
map00190 - Oxidative phosphorylation

2. Genetic Information Processing

Transcription, translation, folding/sorting/degradation, replication and repair

Example pathways:

map03010 - Ribosome
map03020 - RNA polymerase
map03040 - Spliceosome

3. Environmental Information Processing

Membrane transport, signal transduction

Example pathways:

map02010 - ABC transporters
map04010 - MAPK signaling pathway

4. Cellular Processes

Transport and catabolism, cell growth and death, cellular community, cell motility

Example pathways:

map04140 - Autophagy
map04210 - Apoptosis

5. Organismal Systems

Immune, endocrine, circulatory, digestive, nervous, sensory, development, environmental adaptation

Example pathways:

map04610 - Complement and coagulation cascades
map04910 - Insulin signaling pathway

6. Human Diseases

Cancer, immune diseases, neurodegenerative diseases, cardiovascular diseases, metabolic diseases, infectious diseases

Example pathways:

map05200 - Pathways in cancer
map05010 - Alzheimer disease

7. Drug Development

Chronological classification and target-based classification

Common Identifiers and Naming

Pathway IDs

map##### - Reference pathway (generic)
hsa##### - Human-specific pathway
mmu##### - Mouse-specific pathway
Format: organism code + 5-digit number

Gene IDs

hsa:10458 - Human gene (organism:gene_id)
Format: organism code + colon + gene number

Compound IDs

cpd:C00002 - ATP
Format: cpd:C#####

Drug IDs

dr:D00001 - Drug entry
Format: dr:D#####

Enzyme IDs

ec:1.1.1.1 - Alcohol dehydrogenase
Format: ec:EC_number

KO (KEGG Orthology) IDs

ko:K00001 - Ortholog group
Format: ko:K#####

API Limitations and Best Practices

Rate Limits and Restrictions

Maximum 10 entries per single operation (except image/kgml: 1 entry)
Academic use only - commercial use requires separate licensing
No explicit rate limit documented, but avoid rapid-fire requests

HTTP Status Codes

200 - Success
400 - Bad request (syntax error in query)
404 - Not found (entry or database doesn't exist)

Best Practices

Always check HTTP status codes in responses
For bulk operations, batch entries using + (up to 10)
Cache results locally to reduce API calls
Use specific organism codes when possible for faster results
For pathway visualization, use the web interface or KGML/JSON formats
Parse tab-delimited output carefully (consistent format across operations)

Integration with Other Tools

Biopython Integration

Biopython provides Bio.KEGG.REST module for easier Python integration:

from Bio.KEGG import REST
result = REST.kegg_list("pathway").read()

KEGGREST (R/Bioconductor)

R users can use the KEGGREST package:

library(KEGGREST)
pathways <- keggList("pathway")

Common Analysis Workflows

Workflow 1: Gene to Pathway Mapping

Get gene ID(s) from your organism
Use /link/pathway/<gene_id> to find associated pathways
Use /get/<pathway_id> to retrieve detailed pathway information

Workflow 2: Pathway Enrichment Context

Use /list/pathway/<org> to get all organism pathways
Use /link/genes/<pathway_id> to get genes in each pathway
Perform statistical enrichment analysis

Workflow 3: Compound to Reaction Mapping

Use /find/compound/<name> to find compound ID
Use /link/reaction/<compound_id> to find reactions
Use /link/pathway/<reaction_id> to find pathways containing reactions

Workflow 4: ID Conversion for Integration

Use /conv/uniprot/<org> to map KEGG genes to UniProt
Use /conv/ncbi-geneid/<org> to map to NCBI Gene IDs
Integrate with other databases using converted IDs

Additional Resources

KEGG Mapper: https://www.kegg.jp/kegg/mapper/ - Interactive pathway mapping
BlastKOALA: Automated annotation for sequenced genomes
GhostKOALA: Annotation for metagenomes and metatranscriptomes
KEGG Modules: https://www.kegg.jp/kegg/module.html
KEGG Brite: https://www.kegg.jp/kegg/brite.html

10 KiB Raw Blame History

KEGG Database Reference

Overview

KEGG Databases

Systems Information

Genomic Information

Chemical Information

Health Information

External Database Links

REST API Operations

1. INFO - Database Metadata

2. LIST - Entry Listings

3. FIND - Search Entries

4. GET - Retrieve Entries

5. CONV - ID Conversion

6. LINK - Cross-References

7. DDI - Drug-Drug Interactions

Pathway Classification

1. Metabolism

2. Genetic Information Processing

3. Environmental Information Processing

4. Cellular Processes

5. Organismal Systems

6. Human Diseases

7. Drug Development

Common Identifiers and Naming

Pathway IDs

Gene IDs

Compound IDs

Drug IDs

Enzyme IDs

KO (KEGG Orthology) IDs

API Limitations and Best Practices

Rate Limits and Restrictions

HTTP Status Codes

Best Practices

Integration with Other Tools

Biopython Integration

KEGGREST (R/Bioconductor)

Common Analysis Workflows

Workflow 1: Gene to Pathway Mapping

Workflow 2: Pathway Enrichment Context

Workflow 3: Compound to Reaction Mapping

Workflow 4: ID Conversion for Integration

Additional Resources

10 KiB

Raw Blame History