Files
2025-11-30 08:30:10 +08:00

647 lines
13 KiB
Markdown

# DNAnexus App Configuration and Dependencies
## Overview
This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers.
## dxapp.json Structure
The `dxapp.json` file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies.
### Minimal Example
```json
{
"name": "my-app",
"title": "My Analysis App",
"summary": "Performs analysis on input files",
"dxapi": "1.0.0",
"version": "1.0.0",
"inputSpec": [],
"outputSpec": [],
"runSpec": {
"interpreter": "python3",
"file": "src/my-app.py",
"distribution": "Ubuntu",
"release": "24.04"
}
}
```
## Metadata Fields
### Required Fields
```json
{
"name": "my-app", // Unique identifier (lowercase, numbers, hyphens, underscores)
"title": "My App", // Human-readable name
"summary": "One line description",
"dxapi": "1.0.0" // API version
}
```
### Optional Metadata
```json
{
"version": "1.0.0", // Semantic version (required for apps)
"description": "Extended description...",
"developerNotes": "Implementation notes...",
"categories": [ // For app discovery
"Read Mapping",
"Variation Calling"
],
"details": { // Arbitrary metadata
"contactEmail": "dev@example.com",
"upstreamVersion": "2.1.0",
"citations": ["doi:10.1000/example"],
"changelog": {
"1.0.0": "Initial release"
}
}
}
```
## Input Specification
Define input parameters:
```json
{
"inputSpec": [
{
"name": "reads",
"label": "Input reads",
"class": "file",
"patterns": ["*.fastq", "*.fastq.gz"],
"optional": false,
"help": "FASTQ file containing sequencing reads"
},
{
"name": "quality_threshold",
"label": "Quality threshold",
"class": "int",
"default": 30,
"optional": true,
"help": "Minimum base quality score"
},
{
"name": "reference",
"label": "Reference genome",
"class": "file",
"patterns": ["*.fa", "*.fasta"],
"suggestions": [
{
"name": "Human GRCh38",
"project": "project-xxxx",
"path": "/references/human_g1k_v37.fasta"
}
]
}
]
}
```
### Input Classes
- `file` - File object
- `record` - Record object
- `applet` - Applet reference
- `string` - Text string
- `int` - Integer number
- `float` - Floating point number
- `boolean` - True/false
- `hash` - Key-value mapping
- `array:class` - Array of specified class
### Input Options
- `name` - Parameter name (required)
- `class` - Data type (required)
- `optional` - Whether parameter is optional (default: false)
- `default` - Default value for optional parameters
- `label` - Display name in UI
- `help` - Description text
- `patterns` - File name patterns (for files)
- `suggestions` - Pre-defined reference data
- `choices` - Allowed values (for strings/numbers)
- `group` - UI grouping
## Output Specification
Define output parameters:
```json
{
"outputSpec": [
{
"name": "aligned_reads",
"label": "Aligned reads",
"class": "file",
"patterns": ["*.bam"],
"help": "BAM file with aligned reads"
},
{
"name": "mapping_stats",
"label": "Mapping statistics",
"class": "record",
"help": "Record containing alignment statistics"
}
]
}
```
## Run Specification
Define how the app executes:
```json
{
"runSpec": {
"interpreter": "python3", // or "bash"
"file": "src/my-app.py", // Entry point script
"distribution": "Ubuntu",
"release": "24.04",
"version": "0", // Distribution version
"execDepends": [ // System packages
{"name": "samtools"},
{"name": "bwa"}
],
"bundledDepends": [ // Bundled resources
{"name": "scripts.tar.gz", "id": {"$dnanexus_link": "file-xxxx"}}
],
"assetDepends": [ // Asset dependencies
{"name": "asset-name", "id": {"$dnanexus_link": "record-xxxx"}}
],
"systemRequirements": {
"*": {
"instanceType": "mem2_ssd1_v2_x4"
}
},
"headJobOnDemand": true,
"restartableEntryPoints": ["main"]
}
}
```
## System Requirements
### Instance Type Selection
```json
{
"systemRequirements": {
"main": {
"instanceType": "mem2_ssd1_v2_x8"
},
"process": {
"instanceType": "mem3_ssd1_v2_x16"
}
}
}
```
**Common instance types**:
- `mem1_ssd1_v2_x2` - 2 cores, 3.9 GB RAM
- `mem1_ssd1_v2_x4` - 4 cores, 7.8 GB RAM
- `mem2_ssd1_v2_x4` - 4 cores, 15.6 GB RAM
- `mem2_ssd1_v2_x8` - 8 cores, 31.2 GB RAM
- `mem3_ssd1_v2_x8` - 8 cores, 62.5 GB RAM
- `mem3_ssd1_v2_x16` - 16 cores, 125 GB RAM
### Cluster Specifications
For distributed computing:
```json
{
"systemRequirements": {
"main": {
"clusterSpec": {
"type": "spark",
"version": "3.1.2",
"initialInstanceCount": 3,
"instanceType": "mem1_ssd1_v2_x4",
"bootstrapScript": "bootstrap.sh"
}
}
}
}
```
## Regional Options
Deploy apps across regions:
```json
{
"regionalOptions": {
"aws:us-east-1": {
"systemRequirements": {
"*": {"instanceType": "mem2_ssd1_v2_x4"}
},
"assetDepends": [
{"id": "record-xxxx"}
]
},
"azure:westus": {
"systemRequirements": {
"*": {"instanceType": "azure:mem2_ssd1_x4"}
}
}
}
}
```
## Dependency Management
### System Packages (execDepends)
Install Ubuntu packages at runtime:
```json
{
"runSpec": {
"execDepends": [
{"name": "samtools"},
{"name": "bwa"},
{"name": "python3-pip"},
{"name": "r-base", "version": "4.0.0"}
]
}
}
```
Packages are installed using `apt-get` from Ubuntu repositories.
### Python Dependencies
#### Option 1: Install via pip in execDepends
```json
{
"runSpec": {
"execDepends": [
{"name": "python3-pip"}
]
}
}
```
Then in your app script:
```python
import subprocess
subprocess.check_call(["pip", "install", "numpy==1.24.0", "pandas==2.0.0"])
```
#### Option 2: Requirements file
Create `resources/requirements.txt`:
```
numpy==1.24.0
pandas==2.0.0
scikit-learn==1.3.0
```
In your app:
```python
subprocess.check_call(["pip", "install", "-r", "requirements.txt"])
```
### Bundled Dependencies
Include custom tools or libraries in the app:
**File structure**:
```
my-app/
├── dxapp.json
├── src/
│ └── my-app.py
└── resources/
├── tools/
│ └── custom_tool
└── scripts/
└── helper.py
```
Access resources in app:
```python
import os
# Resources are in parent directory
resources_dir = os.path.join(os.path.dirname(__file__), "..", "resources")
tool_path = os.path.join(resources_dir, "tools", "custom_tool")
# Run bundled tool
subprocess.check_call([tool_path, "arg1", "arg2"])
```
### Asset Dependencies
Assets are pre-built bundles of dependencies that can be shared across apps.
#### Using Assets
```json
{
"runSpec": {
"assetDepends": [
{
"name": "bwa-asset",
"id": {"$dnanexus_link": "record-xxxx"}
}
]
}
}
```
Assets are mounted at runtime and accessible via environment variable:
```python
import os
asset_dir = os.environ.get("DX_ASSET_BWA")
bwa_path = os.path.join(asset_dir, "bin", "bwa")
```
#### Creating Assets
Create asset directory:
```bash
mkdir bwa-asset
cd bwa-asset
# Install software
./configure --prefix=$PWD/usr/local
make && make install
```
Build asset:
```bash
dx build_asset bwa-asset --destination=project-xxxx:/assets/
```
## Docker Integration
### Using Docker Images
```json
{
"runSpec": {
"interpreter": "python3",
"file": "src/my-app.py",
"distribution": "Ubuntu",
"release": "24.04",
"systemRequirements": {
"*": {
"instanceType": "mem2_ssd1_v2_x4"
}
},
"execDepends": [
{"name": "docker.io"}
]
}
}
```
Use Docker in app:
```python
import subprocess
# Pull Docker image
subprocess.check_call(["docker", "pull", "biocontainers/samtools:v1.9"])
# Run command in container
subprocess.check_call([
"docker", "run",
"-v", f"{os.getcwd()}:/data",
"biocontainers/samtools:v1.9",
"samtools", "view", "/data/input.bam"
])
```
### Docker as Base Image
For apps that run entirely in Docker:
```json
{
"runSpec": {
"interpreter": "bash",
"file": "src/wrapper.sh",
"distribution": "Ubuntu",
"release": "24.04",
"execDepends": [
{"name": "docker.io"}
]
}
}
```
## Access Requirements
Request special permissions:
```json
{
"access": {
"network": ["*"], // Internet access
"project": "CONTRIBUTE", // Project write access
"allProjects": "VIEW", // Read other projects
"developer": true // Advanced permissions
}
}
```
**Network access**:
- `["*"]` - Full internet
- `["github.com", "pypi.org"]` - Specific domains
## Timeout Configuration
```json
{
"runSpec": {
"timeoutPolicy": {
"*": {
"days": 1,
"hours": 12,
"minutes": 30
}
}
}
}
```
## Example: Complete dxapp.json
```json
{
"name": "rna-seq-pipeline",
"title": "RNA-Seq Analysis Pipeline",
"summary": "Aligns RNA-seq reads and quantifies gene expression",
"description": "Comprehensive RNA-seq pipeline using STAR aligner and featureCounts",
"version": "1.0.0",
"dxapi": "1.0.0",
"categories": ["Read Mapping", "RNA-Seq"],
"inputSpec": [
{
"name": "reads",
"label": "FASTQ reads",
"class": "array:file",
"patterns": ["*.fastq.gz", "*.fq.gz"],
"help": "Single-end or paired-end RNA-seq reads"
},
{
"name": "reference_genome",
"label": "Reference genome",
"class": "file",
"patterns": ["*.fa", "*.fasta"],
"suggestions": [
{
"name": "Human GRCh38",
"project": "project-reference",
"path": "/genomes/GRCh38.fa"
}
]
},
{
"name": "gtf_file",
"label": "Gene annotation (GTF)",
"class": "file",
"patterns": ["*.gtf", "*.gtf.gz"]
}
],
"outputSpec": [
{
"name": "aligned_bam",
"label": "Aligned reads (BAM)",
"class": "file",
"patterns": ["*.bam"]
},
{
"name": "counts",
"label": "Gene counts",
"class": "file",
"patterns": ["*.counts.txt"]
},
{
"name": "qc_report",
"label": "QC report",
"class": "file",
"patterns": ["*.html"]
}
],
"runSpec": {
"interpreter": "python3",
"file": "src/rna-seq-pipeline.py",
"distribution": "Ubuntu",
"release": "24.04",
"execDepends": [
{"name": "python3-pip"},
{"name": "samtools"},
{"name": "subread"}
],
"assetDepends": [
{
"name": "star-aligner",
"id": {"$dnanexus_link": "record-star-asset"}
}
],
"systemRequirements": {
"main": {
"instanceType": "mem3_ssd1_v2_x16"
}
},
"timeoutPolicy": {
"*": {"hours": 8}
}
},
"access": {
"network": ["*"]
},
"details": {
"contactEmail": "support@example.com",
"upstreamVersion": "STAR 2.7.10a, Subread 2.0.3",
"citations": ["doi:10.1093/bioinformatics/bts635"]
}
}
```
## Best Practices
1. **Version Management**: Use semantic versioning for apps
2. **Instance Type**: Start with smaller instances, scale up as needed
3. **Dependencies**: Document all dependencies clearly
4. **Error Messages**: Provide helpful error messages for invalid inputs
5. **Testing**: Test with various input types and sizes
6. **Documentation**: Write clear descriptions and help text
7. **Resources**: Bundle frequently-used tools to avoid repeated downloads
8. **Docker**: Use Docker for complex dependency chains
9. **Assets**: Create assets for heavy dependencies shared across apps
10. **Timeouts**: Set reasonable timeouts based on expected runtime
11. **Network Access**: Request only necessary network permissions
12. **Region Support**: Use regionalOptions for multi-region apps
## Common Patterns
### Bioinformatics Tool
```json
{
"inputSpec": [
{"name": "input_file", "class": "file", "patterns": ["*.bam"]},
{"name": "threads", "class": "int", "default": 4, "optional": true}
],
"runSpec": {
"execDepends": [{"name": "tool-name"}],
"systemRequirements": {
"main": {"instanceType": "mem2_ssd1_v2_x8"}
}
}
}
```
### Python Data Analysis
```json
{
"runSpec": {
"interpreter": "python3",
"execDepends": [
{"name": "python3-pip"}
],
"systemRequirements": {
"main": {"instanceType": "mem2_ssd1_v2_x4"}
}
}
}
```
### Docker-based App
```json
{
"runSpec": {
"interpreter": "bash",
"execDepends": [
{"name": "docker.io"}
],
"systemRequirements": {
"main": {"instanceType": "mem2_ssd1_v2_x8"}
}
},
"access": {
"network": ["*"]
}
}
```