Files
2025-11-30 08:30:10 +08:00

13 KiB

DNAnexus App Configuration and Dependencies

Overview

This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers.

dxapp.json Structure

The dxapp.json file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies.

Minimal Example

{
  "name": "my-app",
  "title": "My Analysis App",
  "summary": "Performs analysis on input files",
  "dxapi": "1.0.0",
  "version": "1.0.0",
  "inputSpec": [],
  "outputSpec": [],
  "runSpec": {
    "interpreter": "python3",
    "file": "src/my-app.py",
    "distribution": "Ubuntu",
    "release": "24.04"
  }
}

Metadata Fields

Required Fields

{
  "name": "my-app",           // Unique identifier (lowercase, numbers, hyphens, underscores)
  "title": "My App",          // Human-readable name
  "summary": "One line description",
  "dxapi": "1.0.0"           // API version
}

Optional Metadata

{
  "version": "1.0.0",        // Semantic version (required for apps)
  "description": "Extended description...",
  "developerNotes": "Implementation notes...",
  "categories": [            // For app discovery
    "Read Mapping",
    "Variation Calling"
  ],
  "details": {               // Arbitrary metadata
    "contactEmail": "dev@example.com",
    "upstreamVersion": "2.1.0",
    "citations": ["doi:10.1000/example"],
    "changelog": {
      "1.0.0": "Initial release"
    }
  }
}

Input Specification

Define input parameters:

{
  "inputSpec": [
    {
      "name": "reads",
      "label": "Input reads",
      "class": "file",
      "patterns": ["*.fastq", "*.fastq.gz"],
      "optional": false,
      "help": "FASTQ file containing sequencing reads"
    },
    {
      "name": "quality_threshold",
      "label": "Quality threshold",
      "class": "int",
      "default": 30,
      "optional": true,
      "help": "Minimum base quality score"
    },
    {
      "name": "reference",
      "label": "Reference genome",
      "class": "file",
      "patterns": ["*.fa", "*.fasta"],
      "suggestions": [
        {
          "name": "Human GRCh38",
          "project": "project-xxxx",
          "path": "/references/human_g1k_v37.fasta"
        }
      ]
    }
  ]
}

Input Classes

  • file - File object
  • record - Record object
  • applet - Applet reference
  • string - Text string
  • int - Integer number
  • float - Floating point number
  • boolean - True/false
  • hash - Key-value mapping
  • array:class - Array of specified class

Input Options

  • name - Parameter name (required)
  • class - Data type (required)
  • optional - Whether parameter is optional (default: false)
  • default - Default value for optional parameters
  • label - Display name in UI
  • help - Description text
  • patterns - File name patterns (for files)
  • suggestions - Pre-defined reference data
  • choices - Allowed values (for strings/numbers)
  • group - UI grouping

Output Specification

Define output parameters:

{
  "outputSpec": [
    {
      "name": "aligned_reads",
      "label": "Aligned reads",
      "class": "file",
      "patterns": ["*.bam"],
      "help": "BAM file with aligned reads"
    },
    {
      "name": "mapping_stats",
      "label": "Mapping statistics",
      "class": "record",
      "help": "Record containing alignment statistics"
    }
  ]
}

Run Specification

Define how the app executes:

{
  "runSpec": {
    "interpreter": "python3",        // or "bash"
    "file": "src/my-app.py",         // Entry point script
    "distribution": "Ubuntu",
    "release": "24.04",
    "version": "0",                   // Distribution version
    "execDepends": [                  // System packages
      {"name": "samtools"},
      {"name": "bwa"}
    ],
    "bundledDepends": [              // Bundled resources
      {"name": "scripts.tar.gz", "id": {"$dnanexus_link": "file-xxxx"}}
    ],
    "assetDepends": [                // Asset dependencies
      {"name": "asset-name", "id": {"$dnanexus_link": "record-xxxx"}}
    ],
    "systemRequirements": {
      "*": {
        "instanceType": "mem2_ssd1_v2_x4"
      }
    },
    "headJobOnDemand": true,
    "restartableEntryPoints": ["main"]
  }
}

System Requirements

Instance Type Selection

{
  "systemRequirements": {
    "main": {
      "instanceType": "mem2_ssd1_v2_x8"
    },
    "process": {
      "instanceType": "mem3_ssd1_v2_x16"
    }
  }
}

Common instance types:

  • mem1_ssd1_v2_x2 - 2 cores, 3.9 GB RAM
  • mem1_ssd1_v2_x4 - 4 cores, 7.8 GB RAM
  • mem2_ssd1_v2_x4 - 4 cores, 15.6 GB RAM
  • mem2_ssd1_v2_x8 - 8 cores, 31.2 GB RAM
  • mem3_ssd1_v2_x8 - 8 cores, 62.5 GB RAM
  • mem3_ssd1_v2_x16 - 16 cores, 125 GB RAM

Cluster Specifications

For distributed computing:

{
  "systemRequirements": {
    "main": {
      "clusterSpec": {
        "type": "spark",
        "version": "3.1.2",
        "initialInstanceCount": 3,
        "instanceType": "mem1_ssd1_v2_x4",
        "bootstrapScript": "bootstrap.sh"
      }
    }
  }
}

Regional Options

Deploy apps across regions:

{
  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {
        "*": {"instanceType": "mem2_ssd1_v2_x4"}
      },
      "assetDepends": [
        {"id": "record-xxxx"}
      ]
    },
    "azure:westus": {
      "systemRequirements": {
        "*": {"instanceType": "azure:mem2_ssd1_x4"}
      }
    }
  }
}

Dependency Management

System Packages (execDepends)

Install Ubuntu packages at runtime:

{
  "runSpec": {
    "execDepends": [
      {"name": "samtools"},
      {"name": "bwa"},
      {"name": "python3-pip"},
      {"name": "r-base", "version": "4.0.0"}
    ]
  }
}

Packages are installed using apt-get from Ubuntu repositories.

Python Dependencies

Option 1: Install via pip in execDepends

{
  "runSpec": {
    "execDepends": [
      {"name": "python3-pip"}
    ]
  }
}

Then in your app script:

import subprocess
subprocess.check_call(["pip", "install", "numpy==1.24.0", "pandas==2.0.0"])

Option 2: Requirements file

Create resources/requirements.txt:

numpy==1.24.0
pandas==2.0.0
scikit-learn==1.3.0

In your app:

subprocess.check_call(["pip", "install", "-r", "requirements.txt"])

Bundled Dependencies

Include custom tools or libraries in the app:

File structure:

my-app/
├── dxapp.json
├── src/
│   └── my-app.py
└── resources/
    ├── tools/
    │   └── custom_tool
    └── scripts/
        └── helper.py

Access resources in app:

import os

# Resources are in parent directory
resources_dir = os.path.join(os.path.dirname(__file__), "..", "resources")
tool_path = os.path.join(resources_dir, "tools", "custom_tool")

# Run bundled tool
subprocess.check_call([tool_path, "arg1", "arg2"])

Asset Dependencies

Assets are pre-built bundles of dependencies that can be shared across apps.

Using Assets

{
  "runSpec": {
    "assetDepends": [
      {
        "name": "bwa-asset",
        "id": {"$dnanexus_link": "record-xxxx"}
      }
    ]
  }
}

Assets are mounted at runtime and accessible via environment variable:

import os
asset_dir = os.environ.get("DX_ASSET_BWA")
bwa_path = os.path.join(asset_dir, "bin", "bwa")

Creating Assets

Create asset directory:

mkdir bwa-asset
cd bwa-asset
# Install software
./configure --prefix=$PWD/usr/local
make && make install

Build asset:

dx build_asset bwa-asset --destination=project-xxxx:/assets/

Docker Integration

Using Docker Images

{
  "runSpec": {
    "interpreter": "python3",
    "file": "src/my-app.py",
    "distribution": "Ubuntu",
    "release": "24.04",
    "systemRequirements": {
      "*": {
        "instanceType": "mem2_ssd1_v2_x4"
      }
    },
    "execDepends": [
      {"name": "docker.io"}
    ]
  }
}

Use Docker in app:

import subprocess

# Pull Docker image
subprocess.check_call(["docker", "pull", "biocontainers/samtools:v1.9"])

# Run command in container
subprocess.check_call([
    "docker", "run",
    "-v", f"{os.getcwd()}:/data",
    "biocontainers/samtools:v1.9",
    "samtools", "view", "/data/input.bam"
])

Docker as Base Image

For apps that run entirely in Docker:

{
  "runSpec": {
    "interpreter": "bash",
    "file": "src/wrapper.sh",
    "distribution": "Ubuntu",
    "release": "24.04",
    "execDepends": [
      {"name": "docker.io"}
    ]
  }
}

Access Requirements

Request special permissions:

{
  "access": {
    "network": ["*"],           // Internet access
    "project": "CONTRIBUTE",    // Project write access
    "allProjects": "VIEW",      // Read other projects
    "developer": true           // Advanced permissions
  }
}

Network access:

  • ["*"] - Full internet
  • ["github.com", "pypi.org"] - Specific domains

Timeout Configuration

{
  "runSpec": {
    "timeoutPolicy": {
      "*": {
        "days": 1,
        "hours": 12,
        "minutes": 30
      }
    }
  }
}

Example: Complete dxapp.json

{
  "name": "rna-seq-pipeline",
  "title": "RNA-Seq Analysis Pipeline",
  "summary": "Aligns RNA-seq reads and quantifies gene expression",
  "description": "Comprehensive RNA-seq pipeline using STAR aligner and featureCounts",
  "version": "1.0.0",
  "dxapi": "1.0.0",
  "categories": ["Read Mapping", "RNA-Seq"],

  "inputSpec": [
    {
      "name": "reads",
      "label": "FASTQ reads",
      "class": "array:file",
      "patterns": ["*.fastq.gz", "*.fq.gz"],
      "help": "Single-end or paired-end RNA-seq reads"
    },
    {
      "name": "reference_genome",
      "label": "Reference genome",
      "class": "file",
      "patterns": ["*.fa", "*.fasta"],
      "suggestions": [
        {
          "name": "Human GRCh38",
          "project": "project-reference",
          "path": "/genomes/GRCh38.fa"
        }
      ]
    },
    {
      "name": "gtf_file",
      "label": "Gene annotation (GTF)",
      "class": "file",
      "patterns": ["*.gtf", "*.gtf.gz"]
    }
  ],

  "outputSpec": [
    {
      "name": "aligned_bam",
      "label": "Aligned reads (BAM)",
      "class": "file",
      "patterns": ["*.bam"]
    },
    {
      "name": "counts",
      "label": "Gene counts",
      "class": "file",
      "patterns": ["*.counts.txt"]
    },
    {
      "name": "qc_report",
      "label": "QC report",
      "class": "file",
      "patterns": ["*.html"]
    }
  ],

  "runSpec": {
    "interpreter": "python3",
    "file": "src/rna-seq-pipeline.py",
    "distribution": "Ubuntu",
    "release": "24.04",

    "execDepends": [
      {"name": "python3-pip"},
      {"name": "samtools"},
      {"name": "subread"}
    ],

    "assetDepends": [
      {
        "name": "star-aligner",
        "id": {"$dnanexus_link": "record-star-asset"}
      }
    ],

    "systemRequirements": {
      "main": {
        "instanceType": "mem3_ssd1_v2_x16"
      }
    },

    "timeoutPolicy": {
      "*": {"hours": 8}
    }
  },

  "access": {
    "network": ["*"]
  },

  "details": {
    "contactEmail": "support@example.com",
    "upstreamVersion": "STAR 2.7.10a, Subread 2.0.3",
    "citations": ["doi:10.1093/bioinformatics/bts635"]
  }
}

Best Practices

  1. Version Management: Use semantic versioning for apps
  2. Instance Type: Start with smaller instances, scale up as needed
  3. Dependencies: Document all dependencies clearly
  4. Error Messages: Provide helpful error messages for invalid inputs
  5. Testing: Test with various input types and sizes
  6. Documentation: Write clear descriptions and help text
  7. Resources: Bundle frequently-used tools to avoid repeated downloads
  8. Docker: Use Docker for complex dependency chains
  9. Assets: Create assets for heavy dependencies shared across apps
  10. Timeouts: Set reasonable timeouts based on expected runtime
  11. Network Access: Request only necessary network permissions
  12. Region Support: Use regionalOptions for multi-region apps

Common Patterns

Bioinformatics Tool

{
  "inputSpec": [
    {"name": "input_file", "class": "file", "patterns": ["*.bam"]},
    {"name": "threads", "class": "int", "default": 4, "optional": true}
  ],
  "runSpec": {
    "execDepends": [{"name": "tool-name"}],
    "systemRequirements": {
      "main": {"instanceType": "mem2_ssd1_v2_x8"}
    }
  }
}

Python Data Analysis

{
  "runSpec": {
    "interpreter": "python3",
    "execDepends": [
      {"name": "python3-pip"}
    ],
    "systemRequirements": {
      "main": {"instanceType": "mem2_ssd1_v2_x4"}
    }
  }
}

Docker-based App

{
  "runSpec": {
    "interpreter": "bash",
    "execDepends": [
      {"name": "docker.io"}
    ],
    "systemRequirements": {
      "main": {"instanceType": "mem2_ssd1_v2_x8"}
    }
  },
  "access": {
    "network": ["*"]
  }
}