13 KiB
DNAnexus App Configuration and Dependencies
Overview
This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers.
dxapp.json Structure
The dxapp.json file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies.
Minimal Example
{
"name": "my-app",
"title": "My Analysis App",
"summary": "Performs analysis on input files",
"dxapi": "1.0.0",
"version": "1.0.0",
"inputSpec": [],
"outputSpec": [],
"runSpec": {
"interpreter": "python3",
"file": "src/my-app.py",
"distribution": "Ubuntu",
"release": "24.04"
}
}
Metadata Fields
Required Fields
{
"name": "my-app", // Unique identifier (lowercase, numbers, hyphens, underscores)
"title": "My App", // Human-readable name
"summary": "One line description",
"dxapi": "1.0.0" // API version
}
Optional Metadata
{
"version": "1.0.0", // Semantic version (required for apps)
"description": "Extended description...",
"developerNotes": "Implementation notes...",
"categories": [ // For app discovery
"Read Mapping",
"Variation Calling"
],
"details": { // Arbitrary metadata
"contactEmail": "dev@example.com",
"upstreamVersion": "2.1.0",
"citations": ["doi:10.1000/example"],
"changelog": {
"1.0.0": "Initial release"
}
}
}
Input Specification
Define input parameters:
{
"inputSpec": [
{
"name": "reads",
"label": "Input reads",
"class": "file",
"patterns": ["*.fastq", "*.fastq.gz"],
"optional": false,
"help": "FASTQ file containing sequencing reads"
},
{
"name": "quality_threshold",
"label": "Quality threshold",
"class": "int",
"default": 30,
"optional": true,
"help": "Minimum base quality score"
},
{
"name": "reference",
"label": "Reference genome",
"class": "file",
"patterns": ["*.fa", "*.fasta"],
"suggestions": [
{
"name": "Human GRCh38",
"project": "project-xxxx",
"path": "/references/human_g1k_v37.fasta"
}
]
}
]
}
Input Classes
file- File objectrecord- Record objectapplet- Applet referencestring- Text stringint- Integer numberfloat- Floating point numberboolean- True/falsehash- Key-value mappingarray:class- Array of specified class
Input Options
name- Parameter name (required)class- Data type (required)optional- Whether parameter is optional (default: false)default- Default value for optional parameterslabel- Display name in UIhelp- Description textpatterns- File name patterns (for files)suggestions- Pre-defined reference datachoices- Allowed values (for strings/numbers)group- UI grouping
Output Specification
Define output parameters:
{
"outputSpec": [
{
"name": "aligned_reads",
"label": "Aligned reads",
"class": "file",
"patterns": ["*.bam"],
"help": "BAM file with aligned reads"
},
{
"name": "mapping_stats",
"label": "Mapping statistics",
"class": "record",
"help": "Record containing alignment statistics"
}
]
}
Run Specification
Define how the app executes:
{
"runSpec": {
"interpreter": "python3", // or "bash"
"file": "src/my-app.py", // Entry point script
"distribution": "Ubuntu",
"release": "24.04",
"version": "0", // Distribution version
"execDepends": [ // System packages
{"name": "samtools"},
{"name": "bwa"}
],
"bundledDepends": [ // Bundled resources
{"name": "scripts.tar.gz", "id": {"$dnanexus_link": "file-xxxx"}}
],
"assetDepends": [ // Asset dependencies
{"name": "asset-name", "id": {"$dnanexus_link": "record-xxxx"}}
],
"systemRequirements": {
"*": {
"instanceType": "mem2_ssd1_v2_x4"
}
},
"headJobOnDemand": true,
"restartableEntryPoints": ["main"]
}
}
System Requirements
Instance Type Selection
{
"systemRequirements": {
"main": {
"instanceType": "mem2_ssd1_v2_x8"
},
"process": {
"instanceType": "mem3_ssd1_v2_x16"
}
}
}
Common instance types:
mem1_ssd1_v2_x2- 2 cores, 3.9 GB RAMmem1_ssd1_v2_x4- 4 cores, 7.8 GB RAMmem2_ssd1_v2_x4- 4 cores, 15.6 GB RAMmem2_ssd1_v2_x8- 8 cores, 31.2 GB RAMmem3_ssd1_v2_x8- 8 cores, 62.5 GB RAMmem3_ssd1_v2_x16- 16 cores, 125 GB RAM
Cluster Specifications
For distributed computing:
{
"systemRequirements": {
"main": {
"clusterSpec": {
"type": "spark",
"version": "3.1.2",
"initialInstanceCount": 3,
"instanceType": "mem1_ssd1_v2_x4",
"bootstrapScript": "bootstrap.sh"
}
}
}
}
Regional Options
Deploy apps across regions:
{
"regionalOptions": {
"aws:us-east-1": {
"systemRequirements": {
"*": {"instanceType": "mem2_ssd1_v2_x4"}
},
"assetDepends": [
{"id": "record-xxxx"}
]
},
"azure:westus": {
"systemRequirements": {
"*": {"instanceType": "azure:mem2_ssd1_x4"}
}
}
}
}
Dependency Management
System Packages (execDepends)
Install Ubuntu packages at runtime:
{
"runSpec": {
"execDepends": [
{"name": "samtools"},
{"name": "bwa"},
{"name": "python3-pip"},
{"name": "r-base", "version": "4.0.0"}
]
}
}
Packages are installed using apt-get from Ubuntu repositories.
Python Dependencies
Option 1: Install via pip in execDepends
{
"runSpec": {
"execDepends": [
{"name": "python3-pip"}
]
}
}
Then in your app script:
import subprocess
subprocess.check_call(["pip", "install", "numpy==1.24.0", "pandas==2.0.0"])
Option 2: Requirements file
Create resources/requirements.txt:
numpy==1.24.0
pandas==2.0.0
scikit-learn==1.3.0
In your app:
subprocess.check_call(["pip", "install", "-r", "requirements.txt"])
Bundled Dependencies
Include custom tools or libraries in the app:
File structure:
my-app/
├── dxapp.json
├── src/
│ └── my-app.py
└── resources/
├── tools/
│ └── custom_tool
└── scripts/
└── helper.py
Access resources in app:
import os
# Resources are in parent directory
resources_dir = os.path.join(os.path.dirname(__file__), "..", "resources")
tool_path = os.path.join(resources_dir, "tools", "custom_tool")
# Run bundled tool
subprocess.check_call([tool_path, "arg1", "arg2"])
Asset Dependencies
Assets are pre-built bundles of dependencies that can be shared across apps.
Using Assets
{
"runSpec": {
"assetDepends": [
{
"name": "bwa-asset",
"id": {"$dnanexus_link": "record-xxxx"}
}
]
}
}
Assets are mounted at runtime and accessible via environment variable:
import os
asset_dir = os.environ.get("DX_ASSET_BWA")
bwa_path = os.path.join(asset_dir, "bin", "bwa")
Creating Assets
Create asset directory:
mkdir bwa-asset
cd bwa-asset
# Install software
./configure --prefix=$PWD/usr/local
make && make install
Build asset:
dx build_asset bwa-asset --destination=project-xxxx:/assets/
Docker Integration
Using Docker Images
{
"runSpec": {
"interpreter": "python3",
"file": "src/my-app.py",
"distribution": "Ubuntu",
"release": "24.04",
"systemRequirements": {
"*": {
"instanceType": "mem2_ssd1_v2_x4"
}
},
"execDepends": [
{"name": "docker.io"}
]
}
}
Use Docker in app:
import subprocess
# Pull Docker image
subprocess.check_call(["docker", "pull", "biocontainers/samtools:v1.9"])
# Run command in container
subprocess.check_call([
"docker", "run",
"-v", f"{os.getcwd()}:/data",
"biocontainers/samtools:v1.9",
"samtools", "view", "/data/input.bam"
])
Docker as Base Image
For apps that run entirely in Docker:
{
"runSpec": {
"interpreter": "bash",
"file": "src/wrapper.sh",
"distribution": "Ubuntu",
"release": "24.04",
"execDepends": [
{"name": "docker.io"}
]
}
}
Access Requirements
Request special permissions:
{
"access": {
"network": ["*"], // Internet access
"project": "CONTRIBUTE", // Project write access
"allProjects": "VIEW", // Read other projects
"developer": true // Advanced permissions
}
}
Network access:
["*"]- Full internet["github.com", "pypi.org"]- Specific domains
Timeout Configuration
{
"runSpec": {
"timeoutPolicy": {
"*": {
"days": 1,
"hours": 12,
"minutes": 30
}
}
}
}
Example: Complete dxapp.json
{
"name": "rna-seq-pipeline",
"title": "RNA-Seq Analysis Pipeline",
"summary": "Aligns RNA-seq reads and quantifies gene expression",
"description": "Comprehensive RNA-seq pipeline using STAR aligner and featureCounts",
"version": "1.0.0",
"dxapi": "1.0.0",
"categories": ["Read Mapping", "RNA-Seq"],
"inputSpec": [
{
"name": "reads",
"label": "FASTQ reads",
"class": "array:file",
"patterns": ["*.fastq.gz", "*.fq.gz"],
"help": "Single-end or paired-end RNA-seq reads"
},
{
"name": "reference_genome",
"label": "Reference genome",
"class": "file",
"patterns": ["*.fa", "*.fasta"],
"suggestions": [
{
"name": "Human GRCh38",
"project": "project-reference",
"path": "/genomes/GRCh38.fa"
}
]
},
{
"name": "gtf_file",
"label": "Gene annotation (GTF)",
"class": "file",
"patterns": ["*.gtf", "*.gtf.gz"]
}
],
"outputSpec": [
{
"name": "aligned_bam",
"label": "Aligned reads (BAM)",
"class": "file",
"patterns": ["*.bam"]
},
{
"name": "counts",
"label": "Gene counts",
"class": "file",
"patterns": ["*.counts.txt"]
},
{
"name": "qc_report",
"label": "QC report",
"class": "file",
"patterns": ["*.html"]
}
],
"runSpec": {
"interpreter": "python3",
"file": "src/rna-seq-pipeline.py",
"distribution": "Ubuntu",
"release": "24.04",
"execDepends": [
{"name": "python3-pip"},
{"name": "samtools"},
{"name": "subread"}
],
"assetDepends": [
{
"name": "star-aligner",
"id": {"$dnanexus_link": "record-star-asset"}
}
],
"systemRequirements": {
"main": {
"instanceType": "mem3_ssd1_v2_x16"
}
},
"timeoutPolicy": {
"*": {"hours": 8}
}
},
"access": {
"network": ["*"]
},
"details": {
"contactEmail": "support@example.com",
"upstreamVersion": "STAR 2.7.10a, Subread 2.0.3",
"citations": ["doi:10.1093/bioinformatics/bts635"]
}
}
Best Practices
- Version Management: Use semantic versioning for apps
- Instance Type: Start with smaller instances, scale up as needed
- Dependencies: Document all dependencies clearly
- Error Messages: Provide helpful error messages for invalid inputs
- Testing: Test with various input types and sizes
- Documentation: Write clear descriptions and help text
- Resources: Bundle frequently-used tools to avoid repeated downloads
- Docker: Use Docker for complex dependency chains
- Assets: Create assets for heavy dependencies shared across apps
- Timeouts: Set reasonable timeouts based on expected runtime
- Network Access: Request only necessary network permissions
- Region Support: Use regionalOptions for multi-region apps
Common Patterns
Bioinformatics Tool
{
"inputSpec": [
{"name": "input_file", "class": "file", "patterns": ["*.bam"]},
{"name": "threads", "class": "int", "default": 4, "optional": true}
],
"runSpec": {
"execDepends": [{"name": "tool-name"}],
"systemRequirements": {
"main": {"instanceType": "mem2_ssd1_v2_x8"}
}
}
}
Python Data Analysis
{
"runSpec": {
"interpreter": "python3",
"execDepends": [
{"name": "python3-pip"}
],
"systemRequirements": {
"main": {"instanceType": "mem2_ssd1_v2_x4"}
}
}
}
Docker-based App
{
"runSpec": {
"interpreter": "bash",
"execDepends": [
{"name": "docker.io"}
],
"systemRequirements": {
"main": {"instanceType": "mem2_ssd1_v2_x8"}
}
},
"access": {
"network": ["*"]
}
}