# DNAnexus App Configuration and Dependencies ## Overview This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers. ## dxapp.json Structure The `dxapp.json` file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies. ### Minimal Example ```json { "name": "my-app", "title": "My Analysis App", "summary": "Performs analysis on input files", "dxapi": "1.0.0", "version": "1.0.0", "inputSpec": [], "outputSpec": [], "runSpec": { "interpreter": "python3", "file": "src/my-app.py", "distribution": "Ubuntu", "release": "24.04" } } ``` ## Metadata Fields ### Required Fields ```json { "name": "my-app", // Unique identifier (lowercase, numbers, hyphens, underscores) "title": "My App", // Human-readable name "summary": "One line description", "dxapi": "1.0.0" // API version } ``` ### Optional Metadata ```json { "version": "1.0.0", // Semantic version (required for apps) "description": "Extended description...", "developerNotes": "Implementation notes...", "categories": [ // For app discovery "Read Mapping", "Variation Calling" ], "details": { // Arbitrary metadata "contactEmail": "dev@example.com", "upstreamVersion": "2.1.0", "citations": ["doi:10.1000/example"], "changelog": { "1.0.0": "Initial release" } } } ``` ## Input Specification Define input parameters: ```json { "inputSpec": [ { "name": "reads", "label": "Input reads", "class": "file", "patterns": ["*.fastq", "*.fastq.gz"], "optional": false, "help": "FASTQ file containing sequencing reads" }, { "name": "quality_threshold", "label": "Quality threshold", "class": "int", "default": 30, "optional": true, "help": "Minimum base quality score" }, { "name": "reference", "label": "Reference genome", "class": "file", "patterns": ["*.fa", "*.fasta"], "suggestions": [ { "name": "Human GRCh38", "project": "project-xxxx", "path": "/references/human_g1k_v37.fasta" } ] } ] } ``` ### Input Classes - `file` - File object - `record` - Record object - `applet` - Applet reference - `string` - Text string - `int` - Integer number - `float` - Floating point number - `boolean` - True/false - `hash` - Key-value mapping - `array:class` - Array of specified class ### Input Options - `name` - Parameter name (required) - `class` - Data type (required) - `optional` - Whether parameter is optional (default: false) - `default` - Default value for optional parameters - `label` - Display name in UI - `help` - Description text - `patterns` - File name patterns (for files) - `suggestions` - Pre-defined reference data - `choices` - Allowed values (for strings/numbers) - `group` - UI grouping ## Output Specification Define output parameters: ```json { "outputSpec": [ { "name": "aligned_reads", "label": "Aligned reads", "class": "file", "patterns": ["*.bam"], "help": "BAM file with aligned reads" }, { "name": "mapping_stats", "label": "Mapping statistics", "class": "record", "help": "Record containing alignment statistics" } ] } ``` ## Run Specification Define how the app executes: ```json { "runSpec": { "interpreter": "python3", // or "bash" "file": "src/my-app.py", // Entry point script "distribution": "Ubuntu", "release": "24.04", "version": "0", // Distribution version "execDepends": [ // System packages {"name": "samtools"}, {"name": "bwa"} ], "bundledDepends": [ // Bundled resources {"name": "scripts.tar.gz", "id": {"$dnanexus_link": "file-xxxx"}} ], "assetDepends": [ // Asset dependencies {"name": "asset-name", "id": {"$dnanexus_link": "record-xxxx"}} ], "systemRequirements": { "*": { "instanceType": "mem2_ssd1_v2_x4" } }, "headJobOnDemand": true, "restartableEntryPoints": ["main"] } } ``` ## System Requirements ### Instance Type Selection ```json { "systemRequirements": { "main": { "instanceType": "mem2_ssd1_v2_x8" }, "process": { "instanceType": "mem3_ssd1_v2_x16" } } } ``` **Common instance types**: - `mem1_ssd1_v2_x2` - 2 cores, 3.9 GB RAM - `mem1_ssd1_v2_x4` - 4 cores, 7.8 GB RAM - `mem2_ssd1_v2_x4` - 4 cores, 15.6 GB RAM - `mem2_ssd1_v2_x8` - 8 cores, 31.2 GB RAM - `mem3_ssd1_v2_x8` - 8 cores, 62.5 GB RAM - `mem3_ssd1_v2_x16` - 16 cores, 125 GB RAM ### Cluster Specifications For distributed computing: ```json { "systemRequirements": { "main": { "clusterSpec": { "type": "spark", "version": "3.1.2", "initialInstanceCount": 3, "instanceType": "mem1_ssd1_v2_x4", "bootstrapScript": "bootstrap.sh" } } } } ``` ## Regional Options Deploy apps across regions: ```json { "regionalOptions": { "aws:us-east-1": { "systemRequirements": { "*": {"instanceType": "mem2_ssd1_v2_x4"} }, "assetDepends": [ {"id": "record-xxxx"} ] }, "azure:westus": { "systemRequirements": { "*": {"instanceType": "azure:mem2_ssd1_x4"} } } } } ``` ## Dependency Management ### System Packages (execDepends) Install Ubuntu packages at runtime: ```json { "runSpec": { "execDepends": [ {"name": "samtools"}, {"name": "bwa"}, {"name": "python3-pip"}, {"name": "r-base", "version": "4.0.0"} ] } } ``` Packages are installed using `apt-get` from Ubuntu repositories. ### Python Dependencies #### Option 1: Install via pip in execDepends ```json { "runSpec": { "execDepends": [ {"name": "python3-pip"} ] } } ``` Then in your app script: ```python import subprocess subprocess.check_call(["pip", "install", "numpy==1.24.0", "pandas==2.0.0"]) ``` #### Option 2: Requirements file Create `resources/requirements.txt`: ``` numpy==1.24.0 pandas==2.0.0 scikit-learn==1.3.0 ``` In your app: ```python subprocess.check_call(["pip", "install", "-r", "requirements.txt"]) ``` ### Bundled Dependencies Include custom tools or libraries in the app: **File structure**: ``` my-app/ ├── dxapp.json ├── src/ │ └── my-app.py └── resources/ ├── tools/ │ └── custom_tool └── scripts/ └── helper.py ``` Access resources in app: ```python import os # Resources are in parent directory resources_dir = os.path.join(os.path.dirname(__file__), "..", "resources") tool_path = os.path.join(resources_dir, "tools", "custom_tool") # Run bundled tool subprocess.check_call([tool_path, "arg1", "arg2"]) ``` ### Asset Dependencies Assets are pre-built bundles of dependencies that can be shared across apps. #### Using Assets ```json { "runSpec": { "assetDepends": [ { "name": "bwa-asset", "id": {"$dnanexus_link": "record-xxxx"} } ] } } ``` Assets are mounted at runtime and accessible via environment variable: ```python import os asset_dir = os.environ.get("DX_ASSET_BWA") bwa_path = os.path.join(asset_dir, "bin", "bwa") ``` #### Creating Assets Create asset directory: ```bash mkdir bwa-asset cd bwa-asset # Install software ./configure --prefix=$PWD/usr/local make && make install ``` Build asset: ```bash dx build_asset bwa-asset --destination=project-xxxx:/assets/ ``` ## Docker Integration ### Using Docker Images ```json { "runSpec": { "interpreter": "python3", "file": "src/my-app.py", "distribution": "Ubuntu", "release": "24.04", "systemRequirements": { "*": { "instanceType": "mem2_ssd1_v2_x4" } }, "execDepends": [ {"name": "docker.io"} ] } } ``` Use Docker in app: ```python import subprocess # Pull Docker image subprocess.check_call(["docker", "pull", "biocontainers/samtools:v1.9"]) # Run command in container subprocess.check_call([ "docker", "run", "-v", f"{os.getcwd()}:/data", "biocontainers/samtools:v1.9", "samtools", "view", "/data/input.bam" ]) ``` ### Docker as Base Image For apps that run entirely in Docker: ```json { "runSpec": { "interpreter": "bash", "file": "src/wrapper.sh", "distribution": "Ubuntu", "release": "24.04", "execDepends": [ {"name": "docker.io"} ] } } ``` ## Access Requirements Request special permissions: ```json { "access": { "network": ["*"], // Internet access "project": "CONTRIBUTE", // Project write access "allProjects": "VIEW", // Read other projects "developer": true // Advanced permissions } } ``` **Network access**: - `["*"]` - Full internet - `["github.com", "pypi.org"]` - Specific domains ## Timeout Configuration ```json { "runSpec": { "timeoutPolicy": { "*": { "days": 1, "hours": 12, "minutes": 30 } } } } ``` ## Example: Complete dxapp.json ```json { "name": "rna-seq-pipeline", "title": "RNA-Seq Analysis Pipeline", "summary": "Aligns RNA-seq reads and quantifies gene expression", "description": "Comprehensive RNA-seq pipeline using STAR aligner and featureCounts", "version": "1.0.0", "dxapi": "1.0.0", "categories": ["Read Mapping", "RNA-Seq"], "inputSpec": [ { "name": "reads", "label": "FASTQ reads", "class": "array:file", "patterns": ["*.fastq.gz", "*.fq.gz"], "help": "Single-end or paired-end RNA-seq reads" }, { "name": "reference_genome", "label": "Reference genome", "class": "file", "patterns": ["*.fa", "*.fasta"], "suggestions": [ { "name": "Human GRCh38", "project": "project-reference", "path": "/genomes/GRCh38.fa" } ] }, { "name": "gtf_file", "label": "Gene annotation (GTF)", "class": "file", "patterns": ["*.gtf", "*.gtf.gz"] } ], "outputSpec": [ { "name": "aligned_bam", "label": "Aligned reads (BAM)", "class": "file", "patterns": ["*.bam"] }, { "name": "counts", "label": "Gene counts", "class": "file", "patterns": ["*.counts.txt"] }, { "name": "qc_report", "label": "QC report", "class": "file", "patterns": ["*.html"] } ], "runSpec": { "interpreter": "python3", "file": "src/rna-seq-pipeline.py", "distribution": "Ubuntu", "release": "24.04", "execDepends": [ {"name": "python3-pip"}, {"name": "samtools"}, {"name": "subread"} ], "assetDepends": [ { "name": "star-aligner", "id": {"$dnanexus_link": "record-star-asset"} } ], "systemRequirements": { "main": { "instanceType": "mem3_ssd1_v2_x16" } }, "timeoutPolicy": { "*": {"hours": 8} } }, "access": { "network": ["*"] }, "details": { "contactEmail": "support@example.com", "upstreamVersion": "STAR 2.7.10a, Subread 2.0.3", "citations": ["doi:10.1093/bioinformatics/bts635"] } } ``` ## Best Practices 1. **Version Management**: Use semantic versioning for apps 2. **Instance Type**: Start with smaller instances, scale up as needed 3. **Dependencies**: Document all dependencies clearly 4. **Error Messages**: Provide helpful error messages for invalid inputs 5. **Testing**: Test with various input types and sizes 6. **Documentation**: Write clear descriptions and help text 7. **Resources**: Bundle frequently-used tools to avoid repeated downloads 8. **Docker**: Use Docker for complex dependency chains 9. **Assets**: Create assets for heavy dependencies shared across apps 10. **Timeouts**: Set reasonable timeouts based on expected runtime 11. **Network Access**: Request only necessary network permissions 12. **Region Support**: Use regionalOptions for multi-region apps ## Common Patterns ### Bioinformatics Tool ```json { "inputSpec": [ {"name": "input_file", "class": "file", "patterns": ["*.bam"]}, {"name": "threads", "class": "int", "default": 4, "optional": true} ], "runSpec": { "execDepends": [{"name": "tool-name"}], "systemRequirements": { "main": {"instanceType": "mem2_ssd1_v2_x8"} } } } ``` ### Python Data Analysis ```json { "runSpec": { "interpreter": "python3", "execDepends": [ {"name": "python3-pip"} ], "systemRequirements": { "main": {"instanceType": "mem2_ssd1_v2_x4"} } } } ``` ### Docker-based App ```json { "runSpec": { "interpreter": "bash", "execDepends": [ {"name": "docker.io"} ], "systemRequirements": { "main": {"instanceType": "mem2_ssd1_v2_x8"} } }, "access": { "network": ["*"] } } ```