Initial commit
This commit is contained in:
646
skills/dnanexus-integration/references/configuration.md
Normal file
646
skills/dnanexus-integration/references/configuration.md
Normal file
@@ -0,0 +1,646 @@
|
||||
# DNAnexus App Configuration and Dependencies
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers.
|
||||
|
||||
## dxapp.json Structure
|
||||
|
||||
The `dxapp.json` file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies.
|
||||
|
||||
### Minimal Example
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-app",
|
||||
"title": "My Analysis App",
|
||||
"summary": "Performs analysis on input files",
|
||||
"dxapi": "1.0.0",
|
||||
"version": "1.0.0",
|
||||
"inputSpec": [],
|
||||
"outputSpec": [],
|
||||
"runSpec": {
|
||||
"interpreter": "python3",
|
||||
"file": "src/my-app.py",
|
||||
"distribution": "Ubuntu",
|
||||
"release": "24.04"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Metadata Fields
|
||||
|
||||
### Required Fields
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-app", // Unique identifier (lowercase, numbers, hyphens, underscores)
|
||||
"title": "My App", // Human-readable name
|
||||
"summary": "One line description",
|
||||
"dxapi": "1.0.0" // API version
|
||||
}
|
||||
```
|
||||
|
||||
### Optional Metadata
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0.0", // Semantic version (required for apps)
|
||||
"description": "Extended description...",
|
||||
"developerNotes": "Implementation notes...",
|
||||
"categories": [ // For app discovery
|
||||
"Read Mapping",
|
||||
"Variation Calling"
|
||||
],
|
||||
"details": { // Arbitrary metadata
|
||||
"contactEmail": "dev@example.com",
|
||||
"upstreamVersion": "2.1.0",
|
||||
"citations": ["doi:10.1000/example"],
|
||||
"changelog": {
|
||||
"1.0.0": "Initial release"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Input Specification
|
||||
|
||||
Define input parameters:
|
||||
|
||||
```json
|
||||
{
|
||||
"inputSpec": [
|
||||
{
|
||||
"name": "reads",
|
||||
"label": "Input reads",
|
||||
"class": "file",
|
||||
"patterns": ["*.fastq", "*.fastq.gz"],
|
||||
"optional": false,
|
||||
"help": "FASTQ file containing sequencing reads"
|
||||
},
|
||||
{
|
||||
"name": "quality_threshold",
|
||||
"label": "Quality threshold",
|
||||
"class": "int",
|
||||
"default": 30,
|
||||
"optional": true,
|
||||
"help": "Minimum base quality score"
|
||||
},
|
||||
{
|
||||
"name": "reference",
|
||||
"label": "Reference genome",
|
||||
"class": "file",
|
||||
"patterns": ["*.fa", "*.fasta"],
|
||||
"suggestions": [
|
||||
{
|
||||
"name": "Human GRCh38",
|
||||
"project": "project-xxxx",
|
||||
"path": "/references/human_g1k_v37.fasta"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Input Classes
|
||||
|
||||
- `file` - File object
|
||||
- `record` - Record object
|
||||
- `applet` - Applet reference
|
||||
- `string` - Text string
|
||||
- `int` - Integer number
|
||||
- `float` - Floating point number
|
||||
- `boolean` - True/false
|
||||
- `hash` - Key-value mapping
|
||||
- `array:class` - Array of specified class
|
||||
|
||||
### Input Options
|
||||
|
||||
- `name` - Parameter name (required)
|
||||
- `class` - Data type (required)
|
||||
- `optional` - Whether parameter is optional (default: false)
|
||||
- `default` - Default value for optional parameters
|
||||
- `label` - Display name in UI
|
||||
- `help` - Description text
|
||||
- `patterns` - File name patterns (for files)
|
||||
- `suggestions` - Pre-defined reference data
|
||||
- `choices` - Allowed values (for strings/numbers)
|
||||
- `group` - UI grouping
|
||||
|
||||
## Output Specification
|
||||
|
||||
Define output parameters:
|
||||
|
||||
```json
|
||||
{
|
||||
"outputSpec": [
|
||||
{
|
||||
"name": "aligned_reads",
|
||||
"label": "Aligned reads",
|
||||
"class": "file",
|
||||
"patterns": ["*.bam"],
|
||||
"help": "BAM file with aligned reads"
|
||||
},
|
||||
{
|
||||
"name": "mapping_stats",
|
||||
"label": "Mapping statistics",
|
||||
"class": "record",
|
||||
"help": "Record containing alignment statistics"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Run Specification
|
||||
|
||||
Define how the app executes:
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"interpreter": "python3", // or "bash"
|
||||
"file": "src/my-app.py", // Entry point script
|
||||
"distribution": "Ubuntu",
|
||||
"release": "24.04",
|
||||
"version": "0", // Distribution version
|
||||
"execDepends": [ // System packages
|
||||
{"name": "samtools"},
|
||||
{"name": "bwa"}
|
||||
],
|
||||
"bundledDepends": [ // Bundled resources
|
||||
{"name": "scripts.tar.gz", "id": {"$dnanexus_link": "file-xxxx"}}
|
||||
],
|
||||
"assetDepends": [ // Asset dependencies
|
||||
{"name": "asset-name", "id": {"$dnanexus_link": "record-xxxx"}}
|
||||
],
|
||||
"systemRequirements": {
|
||||
"*": {
|
||||
"instanceType": "mem2_ssd1_v2_x4"
|
||||
}
|
||||
},
|
||||
"headJobOnDemand": true,
|
||||
"restartableEntryPoints": ["main"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## System Requirements
|
||||
|
||||
### Instance Type Selection
|
||||
|
||||
```json
|
||||
{
|
||||
"systemRequirements": {
|
||||
"main": {
|
||||
"instanceType": "mem2_ssd1_v2_x8"
|
||||
},
|
||||
"process": {
|
||||
"instanceType": "mem3_ssd1_v2_x16"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Common instance types**:
|
||||
- `mem1_ssd1_v2_x2` - 2 cores, 3.9 GB RAM
|
||||
- `mem1_ssd1_v2_x4` - 4 cores, 7.8 GB RAM
|
||||
- `mem2_ssd1_v2_x4` - 4 cores, 15.6 GB RAM
|
||||
- `mem2_ssd1_v2_x8` - 8 cores, 31.2 GB RAM
|
||||
- `mem3_ssd1_v2_x8` - 8 cores, 62.5 GB RAM
|
||||
- `mem3_ssd1_v2_x16` - 16 cores, 125 GB RAM
|
||||
|
||||
### Cluster Specifications
|
||||
|
||||
For distributed computing:
|
||||
|
||||
```json
|
||||
{
|
||||
"systemRequirements": {
|
||||
"main": {
|
||||
"clusterSpec": {
|
||||
"type": "spark",
|
||||
"version": "3.1.2",
|
||||
"initialInstanceCount": 3,
|
||||
"instanceType": "mem1_ssd1_v2_x4",
|
||||
"bootstrapScript": "bootstrap.sh"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Regional Options
|
||||
|
||||
Deploy apps across regions:
|
||||
|
||||
```json
|
||||
{
|
||||
"regionalOptions": {
|
||||
"aws:us-east-1": {
|
||||
"systemRequirements": {
|
||||
"*": {"instanceType": "mem2_ssd1_v2_x4"}
|
||||
},
|
||||
"assetDepends": [
|
||||
{"id": "record-xxxx"}
|
||||
]
|
||||
},
|
||||
"azure:westus": {
|
||||
"systemRequirements": {
|
||||
"*": {"instanceType": "azure:mem2_ssd1_x4"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dependency Management
|
||||
|
||||
### System Packages (execDepends)
|
||||
|
||||
Install Ubuntu packages at runtime:
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"execDepends": [
|
||||
{"name": "samtools"},
|
||||
{"name": "bwa"},
|
||||
{"name": "python3-pip"},
|
||||
{"name": "r-base", "version": "4.0.0"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Packages are installed using `apt-get` from Ubuntu repositories.
|
||||
|
||||
### Python Dependencies
|
||||
|
||||
#### Option 1: Install via pip in execDepends
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"execDepends": [
|
||||
{"name": "python3-pip"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then in your app script:
|
||||
```python
|
||||
import subprocess
|
||||
subprocess.check_call(["pip", "install", "numpy==1.24.0", "pandas==2.0.0"])
|
||||
```
|
||||
|
||||
#### Option 2: Requirements file
|
||||
|
||||
Create `resources/requirements.txt`:
|
||||
```
|
||||
numpy==1.24.0
|
||||
pandas==2.0.0
|
||||
scikit-learn==1.3.0
|
||||
```
|
||||
|
||||
In your app:
|
||||
```python
|
||||
subprocess.check_call(["pip", "install", "-r", "requirements.txt"])
|
||||
```
|
||||
|
||||
### Bundled Dependencies
|
||||
|
||||
Include custom tools or libraries in the app:
|
||||
|
||||
**File structure**:
|
||||
```
|
||||
my-app/
|
||||
├── dxapp.json
|
||||
├── src/
|
||||
│ └── my-app.py
|
||||
└── resources/
|
||||
├── tools/
|
||||
│ └── custom_tool
|
||||
└── scripts/
|
||||
└── helper.py
|
||||
```
|
||||
|
||||
Access resources in app:
|
||||
```python
|
||||
import os
|
||||
|
||||
# Resources are in parent directory
|
||||
resources_dir = os.path.join(os.path.dirname(__file__), "..", "resources")
|
||||
tool_path = os.path.join(resources_dir, "tools", "custom_tool")
|
||||
|
||||
# Run bundled tool
|
||||
subprocess.check_call([tool_path, "arg1", "arg2"])
|
||||
```
|
||||
|
||||
### Asset Dependencies
|
||||
|
||||
Assets are pre-built bundles of dependencies that can be shared across apps.
|
||||
|
||||
#### Using Assets
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"assetDepends": [
|
||||
{
|
||||
"name": "bwa-asset",
|
||||
"id": {"$dnanexus_link": "record-xxxx"}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Assets are mounted at runtime and accessible via environment variable:
|
||||
```python
|
||||
import os
|
||||
asset_dir = os.environ.get("DX_ASSET_BWA")
|
||||
bwa_path = os.path.join(asset_dir, "bin", "bwa")
|
||||
```
|
||||
|
||||
#### Creating Assets
|
||||
|
||||
Create asset directory:
|
||||
```bash
|
||||
mkdir bwa-asset
|
||||
cd bwa-asset
|
||||
# Install software
|
||||
./configure --prefix=$PWD/usr/local
|
||||
make && make install
|
||||
```
|
||||
|
||||
Build asset:
|
||||
```bash
|
||||
dx build_asset bwa-asset --destination=project-xxxx:/assets/
|
||||
```
|
||||
|
||||
## Docker Integration
|
||||
|
||||
### Using Docker Images
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"interpreter": "python3",
|
||||
"file": "src/my-app.py",
|
||||
"distribution": "Ubuntu",
|
||||
"release": "24.04",
|
||||
"systemRequirements": {
|
||||
"*": {
|
||||
"instanceType": "mem2_ssd1_v2_x4"
|
||||
}
|
||||
},
|
||||
"execDepends": [
|
||||
{"name": "docker.io"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Use Docker in app:
|
||||
```python
|
||||
import subprocess
|
||||
|
||||
# Pull Docker image
|
||||
subprocess.check_call(["docker", "pull", "biocontainers/samtools:v1.9"])
|
||||
|
||||
# Run command in container
|
||||
subprocess.check_call([
|
||||
"docker", "run",
|
||||
"-v", f"{os.getcwd()}:/data",
|
||||
"biocontainers/samtools:v1.9",
|
||||
"samtools", "view", "/data/input.bam"
|
||||
])
|
||||
```
|
||||
|
||||
### Docker as Base Image
|
||||
|
||||
For apps that run entirely in Docker:
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"interpreter": "bash",
|
||||
"file": "src/wrapper.sh",
|
||||
"distribution": "Ubuntu",
|
||||
"release": "24.04",
|
||||
"execDepends": [
|
||||
{"name": "docker.io"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Access Requirements
|
||||
|
||||
Request special permissions:
|
||||
|
||||
```json
|
||||
{
|
||||
"access": {
|
||||
"network": ["*"], // Internet access
|
||||
"project": "CONTRIBUTE", // Project write access
|
||||
"allProjects": "VIEW", // Read other projects
|
||||
"developer": true // Advanced permissions
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Network access**:
|
||||
- `["*"]` - Full internet
|
||||
- `["github.com", "pypi.org"]` - Specific domains
|
||||
|
||||
## Timeout Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"timeoutPolicy": {
|
||||
"*": {
|
||||
"days": 1,
|
||||
"hours": 12,
|
||||
"minutes": 30
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Example: Complete dxapp.json
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "rna-seq-pipeline",
|
||||
"title": "RNA-Seq Analysis Pipeline",
|
||||
"summary": "Aligns RNA-seq reads and quantifies gene expression",
|
||||
"description": "Comprehensive RNA-seq pipeline using STAR aligner and featureCounts",
|
||||
"version": "1.0.0",
|
||||
"dxapi": "1.0.0",
|
||||
"categories": ["Read Mapping", "RNA-Seq"],
|
||||
|
||||
"inputSpec": [
|
||||
{
|
||||
"name": "reads",
|
||||
"label": "FASTQ reads",
|
||||
"class": "array:file",
|
||||
"patterns": ["*.fastq.gz", "*.fq.gz"],
|
||||
"help": "Single-end or paired-end RNA-seq reads"
|
||||
},
|
||||
{
|
||||
"name": "reference_genome",
|
||||
"label": "Reference genome",
|
||||
"class": "file",
|
||||
"patterns": ["*.fa", "*.fasta"],
|
||||
"suggestions": [
|
||||
{
|
||||
"name": "Human GRCh38",
|
||||
"project": "project-reference",
|
||||
"path": "/genomes/GRCh38.fa"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "gtf_file",
|
||||
"label": "Gene annotation (GTF)",
|
||||
"class": "file",
|
||||
"patterns": ["*.gtf", "*.gtf.gz"]
|
||||
}
|
||||
],
|
||||
|
||||
"outputSpec": [
|
||||
{
|
||||
"name": "aligned_bam",
|
||||
"label": "Aligned reads (BAM)",
|
||||
"class": "file",
|
||||
"patterns": ["*.bam"]
|
||||
},
|
||||
{
|
||||
"name": "counts",
|
||||
"label": "Gene counts",
|
||||
"class": "file",
|
||||
"patterns": ["*.counts.txt"]
|
||||
},
|
||||
{
|
||||
"name": "qc_report",
|
||||
"label": "QC report",
|
||||
"class": "file",
|
||||
"patterns": ["*.html"]
|
||||
}
|
||||
],
|
||||
|
||||
"runSpec": {
|
||||
"interpreter": "python3",
|
||||
"file": "src/rna-seq-pipeline.py",
|
||||
"distribution": "Ubuntu",
|
||||
"release": "24.04",
|
||||
|
||||
"execDepends": [
|
||||
{"name": "python3-pip"},
|
||||
{"name": "samtools"},
|
||||
{"name": "subread"}
|
||||
],
|
||||
|
||||
"assetDepends": [
|
||||
{
|
||||
"name": "star-aligner",
|
||||
"id": {"$dnanexus_link": "record-star-asset"}
|
||||
}
|
||||
],
|
||||
|
||||
"systemRequirements": {
|
||||
"main": {
|
||||
"instanceType": "mem3_ssd1_v2_x16"
|
||||
}
|
||||
},
|
||||
|
||||
"timeoutPolicy": {
|
||||
"*": {"hours": 8}
|
||||
}
|
||||
},
|
||||
|
||||
"access": {
|
||||
"network": ["*"]
|
||||
},
|
||||
|
||||
"details": {
|
||||
"contactEmail": "support@example.com",
|
||||
"upstreamVersion": "STAR 2.7.10a, Subread 2.0.3",
|
||||
"citations": ["doi:10.1093/bioinformatics/bts635"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Version Management**: Use semantic versioning for apps
|
||||
2. **Instance Type**: Start with smaller instances, scale up as needed
|
||||
3. **Dependencies**: Document all dependencies clearly
|
||||
4. **Error Messages**: Provide helpful error messages for invalid inputs
|
||||
5. **Testing**: Test with various input types and sizes
|
||||
6. **Documentation**: Write clear descriptions and help text
|
||||
7. **Resources**: Bundle frequently-used tools to avoid repeated downloads
|
||||
8. **Docker**: Use Docker for complex dependency chains
|
||||
9. **Assets**: Create assets for heavy dependencies shared across apps
|
||||
10. **Timeouts**: Set reasonable timeouts based on expected runtime
|
||||
11. **Network Access**: Request only necessary network permissions
|
||||
12. **Region Support**: Use regionalOptions for multi-region apps
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Bioinformatics Tool
|
||||
|
||||
```json
|
||||
{
|
||||
"inputSpec": [
|
||||
{"name": "input_file", "class": "file", "patterns": ["*.bam"]},
|
||||
{"name": "threads", "class": "int", "default": 4, "optional": true}
|
||||
],
|
||||
"runSpec": {
|
||||
"execDepends": [{"name": "tool-name"}],
|
||||
"systemRequirements": {
|
||||
"main": {"instanceType": "mem2_ssd1_v2_x8"}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Python Data Analysis
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"interpreter": "python3",
|
||||
"execDepends": [
|
||||
{"name": "python3-pip"}
|
||||
],
|
||||
"systemRequirements": {
|
||||
"main": {"instanceType": "mem2_ssd1_v2_x4"}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Docker-based App
|
||||
|
||||
```json
|
||||
{
|
||||
"runSpec": {
|
||||
"interpreter": "bash",
|
||||
"execDepends": [
|
||||
{"name": "docker.io"}
|
||||
],
|
||||
"systemRequirements": {
|
||||
"main": {"instanceType": "mem2_ssd1_v2_x8"}
|
||||
}
|
||||
},
|
||||
"access": {
|
||||
"network": ["*"]
|
||||
}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user