Files
gh-k-dense-ai-claude-scient…/skills/gget/references/module_reference.md
2025-11-30 08:30:10 +08:00

17 KiB

gget Module Reference

Comprehensive parameter reference for all gget modules.

Reference & Gene Information Modules

gget ref

Retrieve Ensembl reference genome FTPs and metadata.

Parameters:

Parameter Type Description Default
species str Species in Genus_species format or shortcuts ('human', 'mouse') Required
-w/--which str File types to return: gtf, cdna, dna, cds, cdrna, pep All
-r/--release int Ensembl release number Latest
-od/--out_dir str Output directory path None
-o/--out str JSON file path for results None
-l/--list_species flag List available vertebrate species False
-liv/--list_iv_species flag List available invertebrate species False
-ftp flag Return only FTP links False
-d/--download flag Download files (requires curl) False
-q/--quiet flag Suppress progress information False

Returns: JSON containing FTP links, Ensembl release numbers, release dates, file sizes


Search for genes by name or description in Ensembl.

Parameters:

Parameter Type Description Default
searchwords str/list Search terms (case-insensitive) Required
-s/--species str Target species or core database name Required
-r/--release int Ensembl release number Latest
-t/--id_type str Return 'gene' or 'transcript' 'gene'
-ao/--andor str 'or' (ANY term) or 'and' (ALL terms) 'or'
-l/--limit int Maximum results to return None
-o/--out str Output file path (CSV/JSON) None

Returns: ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL


gget info

Get comprehensive gene/transcript metadata from Ensembl, UniProt, and NCBI.

Parameters:

Parameter Type Description Default
ens_ids str/list Ensembl IDs (WormBase, Flybase also supported) Required
-o/--out str Output file path (CSV/JSON) None
-n/--ncbi bool Disable NCBI data retrieval False
-u/--uniprot bool Disable UniProt data retrieval False
-pdb bool Include PDB identifiers False
-csv flag Return CSV format (CLI) False
-q/--quiet flag Suppress progress display False

Python-specific:

  • save=True: Save output to current directory
  • wrap_text=True: Format dataframe with wrapped text

Note: Processing >1000 IDs simultaneously may cause server errors.

Returns: UniProt ID, NCBI gene ID, gene name, synonyms, protein names, descriptions, biotype, canonical transcript


gget seq

Retrieve nucleotide or amino acid sequences in FASTA format.

Parameters:

Parameter Type Description Default
ens_ids str/list Ensembl identifiers Required
-o/--out str Output file path stdout
-t/--translate flag Fetch amino acid sequences False
-iso/--isoforms flag Return all transcript variants False
-q/--quiet flag Suppress progress information False

Data sources: Ensembl (nucleotide), UniProt (amino acid)

Returns: FASTA format sequences


Sequence Analysis & Alignment Modules

gget blast

BLAST sequences against standard databases.

Parameters:

Parameter Type Description Default
sequence str Sequence or path to FASTA/.txt Required
-p/--program str blastn, blastp, blastx, tblastn, tblastx Auto-detect
-db/--database str nt, refseq_rna, pdbnt, nr, swissprot, pdbaa, refseq_protein nt or nr
-l/--limit int Max hits returned 50
-e/--expect float E-value cutoff 10.0
-lcf/--low_comp_filt flag Enable low complexity filtering False
-mbo/--megablast_off flag Disable MegaBLAST (blastn only) False
-o/--out str Output file path None
-q/--quiet flag Suppress progress False

Returns: Description, Scientific Name, Common Name, Taxid, Max Score, Total Score, Query Coverage


gget blat

Find genomic positions using UCSC BLAT.

Parameters:

Parameter Type Description Default
sequence str Sequence or path to FASTA/.txt Required
-st/--seqtype str 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' Auto-detect
-a/--assembly str Target assembly (hg38, mm39, taeGut2, etc.) 'human'/hg38
-o/--out str Output file path None
-csv flag Return CSV format (CLI) False
-q/--quiet flag Suppress progress False

Returns: genome, query size, alignment start/end, matches, mismatches, alignment percentage


gget muscle

Align multiple sequences using Muscle5.

Parameters:

Parameter Type Description Default
fasta str/list Sequences or FASTA file path Required
-o/--out str Output file path stdout
-s5/--super5 flag Use Super5 algorithm (faster, large datasets) False
-q/--quiet flag Suppress progress False

Returns: ClustalW format alignment or aligned FASTA (.afa)


gget diamond

Fast local protein/translated DNA alignment.

Parameters:

Parameter Type Description Default
query str/list Query sequences or FASTA file Required
--reference str/list Reference sequences or FASTA file Required
--sensitivity str fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitive very-sensitive
--threads int CPU threads 1
--diamond_binary str Path to DIAMOND installation Auto-detect
--diamond_db str Save database for reuse None
--translated flag Enable nucleotide-to-amino acid alignment False
-o/--out str Output file path None
-csv flag CSV format (CLI) False
-q/--quiet flag Suppress progress False

Returns: Identity %, sequence lengths, match positions, gap openings, E-values, bit scores


Structural & Protein Analysis Modules

gget pdb

Query RCSB Protein Data Bank.

Parameters:

Parameter Type Description Default
pdb_id str PDB identifier (e.g., '7S7U') Required
-r/--resource str pdb, entry, pubmed, assembly, entity types 'pdb'
-i/--identifier str Assembly, entity, or chain ID None
-o/--out str Output file path stdout

Returns: PDB format (structures) or JSON (metadata)


gget alphafold

Predict 3D protein structures using AlphaFold2.

Setup: Requires OpenMM and gget setup alphafold (~4GB download)

Parameters:

Parameter Type Description Default
sequence str/list Amino acid sequence(s) or FASTA file Required
-mr/--multimer_recycles int Recycling iterations for multimers 3
-o/--out str Output folder path timestamped
-mfm/--multimer_for_monomer flag Apply multimer model to monomers False
-r/--relax flag AMBER relaxation for top model False
-q/--quiet flag Suppress progress False

Python-only:

  • plot (bool): Generate 3D visualization (default: True)
  • show_sidechains (bool): Include side chains (default: True)

Note: Multiple sequences automatically trigger multimer modeling

Returns: PDB structure file, JSON alignment error data, optional 3D plot


gget elm

Predict Eukaryotic Linear Motifs.

Setup: Requires gget setup elm

Parameters:

Parameter Type Description Default
sequence str Amino acid sequence or UniProt Acc Required
-s/--sensitivity str DIAMOND alignment sensitivity very-sensitive
-t/--threads int Number of threads 1
-bin/--diamond_binary str Path to DIAMOND binary Auto-detect
-o/--out str Output directory path None
-u/--uniprot flag Input is UniProt Acc False
-e/--expand flag Include protein names, organisms, references False
-csv flag CSV format (CLI) False
-q/--quiet flag Suppress progress False

Returns: Two outputs:

  1. ortholog_df: Motifs from orthologous proteins
  2. regex_df: Motifs matched in input sequence

Expression & Disease Data Modules

gget archs4

Query ARCHS4 for gene correlation or tissue expression.

Parameters:

Parameter Type Description Default
gene str Gene symbol or Ensembl ID Required
-w/--which str 'correlation' or 'tissue' 'correlation'
-s/--species str 'human' or 'mouse' (tissue only) 'human'
-o/--out str Output file path None
-e/--ensembl flag Input is Ensembl ID False
-csv flag CSV format (CLI) False
-q/--quiet flag Suppress progress False

Returns:

  • correlation: Gene symbols, Pearson correlation coefficients (top 100)
  • tissue: Tissue IDs, min/Q1/median/Q3/max expression

gget cellxgene

Query CZ CELLxGENE Discover Census for single-cell data.

Setup: Requires gget setup cellxgene

Parameters:

Parameter Type Description Default
--gene (-g) list Gene names or Ensembl IDs (case-sensitive!) Required
--tissue list Tissue type(s) None
--cell_type list Cell type(s) None
--species (-s) str 'homo_sapiens' or 'mus_musculus' 'homo_sapiens'
--census_version (-cv) str "stable", "latest", or dated version "stable"
-o/--out str Output file path (required for CLI) Required
--ensembl (-e) flag Use Ensembl IDs False
--meta_only (-mo) flag Return metadata only False
-q/--quiet flag Suppress progress False

Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type

Important: Gene symbols are case-sensitive ('PAX7' for human, 'Pax7' for mouse)

Returns: AnnData object with count matrices and metadata


gget enrichr

Perform enrichment analysis using Enrichr/modEnrichr.

Parameters:

Parameter Type Description Default
genes list Gene symbols or Ensembl IDs Required
-db/--database str Reference database or shortcut Required
-s/--species str human, mouse, fly, yeast, worm, fish 'human'
-bkg_l/--background_list list Background genes None
-o/--out str Output file path None
-ko/--kegg_out str KEGG pathway images directory None

Python-only:

  • plot (bool): Generate graphical results

Database shortcuts:

  • 'pathway' → KEGG_2021_Human
  • 'transcription' → ChEA_2016
  • 'ontology' → GO_Biological_Process_2021
  • 'diseases_drugs' → GWAS_Catalog_2019
  • 'celltypes' → PanglaoDB_Augmented_2021

Returns: Pathway/function associations with adjusted p-values, overlapping gene counts


gget bgee

Retrieve orthology and expression from Bgee.

Parameters:

Parameter Type Description Default
ens_id str/list Ensembl or NCBI gene ID Required
-t/--type str 'orthologs' or 'expression' 'orthologs'
-o/--out str Output file path None
-csv flag CSV format (CLI) False
-q/--quiet flag Suppress progress False

Note: Multiple IDs supported when type='expression'

Returns:

  • orthologs: Genes across species with IDs, names, taxonomic info
  • expression: Anatomical entities, confidence scores, expression status

gget opentargets

Retrieve disease/drug associations from OpenTargets.

Parameters:

Parameter Type Description Default
ens_id str Ensembl gene ID Required
-r/--resource str diseases, drugs, tractability, pharmacogenetics, expression, depmap, interactions 'diseases'
-l/--limit int Maximum results None
-o/--out str Output file path None
-csv flag CSV format (CLI) False
-q/--quiet flag Suppress progress False

Resource-specific filters:

  • drugs: --filter_disease
  • pharmacogenetics: --filter_drug
  • expression/depmap: --filter_tissue, --filter_anat_sys, --filter_organ
  • interactions: --filter_protein_a, --filter_protein_b, --filter_gene_b

Returns: Disease/drug associations, tractability, pharmacogenetics, expression, DepMap, interactions


gget cbio

Plot cancer genomics heatmaps from cBioPortal.

Subcommands: search, plot

search parameters:

Parameter Type Description Default
keywords list Search terms Required

plot parameters:

Parameter Type Description Default
-s/--study_ids list cBioPortal study IDs Required
-g/--genes list Gene names or Ensembl IDs Required
-st/--stratification str tissue, cancer_type, cancer_type_detailed, study_id, sample None
-vt/--variation_type str mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence None
-f/--filter str Filter by column value (e.g., 'study_id:msk_impact_2017') None
-dd/--data_dir str Cache directory ./gget_cbio_cache
-fd/--figure_dir str Output directory ./gget_cbio_figures
-t/--title str Custom figure title None
-dpi int Resolution 100
-q/--quiet flag Suppress progress False
-nc/--no_confirm flag Skip download confirmations False
-sh/--show flag Display plot in window False

Returns: PNG heatmap figure


gget cosmic

Search COSMIC database for cancer mutations.

Important: License fees for commercial use. Requires COSMIC account.

Query parameters:

Parameter Type Description Default
searchterm str Gene name, Ensembl ID, mutation, sample ID Required
-ctp/--cosmic_tsv_path str Path to COSMIC TSV file Required
-l/--limit int Maximum results 100
-csv flag CSV format (CLI) False

Download parameters:

Parameter Type Description Default
-d/--download_cosmic flag Activate download mode False
-gm/--gget_mutate flag Create version for gget mutate False
-cp/--cosmic_project str cancer, census, cell_line, resistance, genome_screen, targeted_screen None
-cv/--cosmic_version str COSMIC version Latest
-gv/--grch_version int Human reference genome (37 or 38) None
--email str COSMIC account email Required
--password str COSMIC account password Required

Note: First-time users must download database

Returns: Mutation data from COSMIC


Additional Tools

gget mutate

Generate mutated nucleotide sequences.

Parameters:

Parameter Type Description Default
sequences str/list FASTA file or sequences Required
-m/--mutations str/df CSV/TSV file or DataFrame Required
-mc/--mut_column str Mutation column name 'mutation'
-sic/--seq_id_column str Sequence ID column 'seq_ID'
-mic/--mut_id_column str Mutation ID column None
-k/--k int Length of flanking sequences 30
-o/--out str Output FASTA file path stdout
-q/--quiet flag Suppress progress False

Returns: Mutated sequences in FASTA format


gget gpt

Generate text using OpenAI's API.

Setup: Requires gget setup gpt and OpenAI API key

Parameters:

Parameter Type Description Default
prompt str Text input for generation Required
api_key str OpenAI API key Required
model str OpenAI model name gpt-3.5-turbo
temperature float Sampling temperature (0-2) 1.0
top_p float Nucleus sampling 1.0
max_tokens int Maximum tokens to generate None
frequency_penalty float Frequency penalty (0-2) 0
presence_penalty float Presence penalty (0-2) 0

Important: Free tier limited to 3 months. Set billing limits.

Returns: Generated text string


gget setup

Install/download dependencies for modules.

Parameters:

Parameter Type Description Default
module str Module name Required
-o/--out str Output folder (elm only) Package install folder
-q/--quiet flag Suppress progress False

Modules requiring setup:

  • alphafold - Downloads ~4GB model parameters
  • cellxgene - Installs cellxgene-census
  • elm - Downloads local ELM database
  • gpt - Configures OpenAI integration

Returns: None (installs dependencies)