Files
2025-11-30 08:30:10 +08:00

5.2 KiB

DNAnexus App Development

Overview

Apps and applets are executable programs that run on the DNAnexus platform. They can be written in Python or Bash and are deployed with all necessary dependencies and configuration.

Applets vs Apps

  • Applets: Data objects that live inside projects. Good for development and testing.
  • Apps: Versioned, shareable executables that don't live inside projects. Can be published for others to use.

Both are created identically until the final build step. Applets can be converted to apps later.

Creating an App/Applet

Using dx-app-wizard

Generate a skeleton app directory structure:

dx-app-wizard

This creates:

  • dxapp.json - Configuration file
  • src/ - Source code directory
  • resources/ - Bundled dependencies
  • test/ - Test files

Building and Deploying

Build an applet:

dx build

Build an app:

dx build --app

The build process:

  1. Validates dxapp.json configuration
  2. Bundles source code and resources
  3. Deploys to the platform
  4. Returns the applet/app ID

App Directory Structure

my-app/
├── dxapp.json          # Metadata and configuration
├── src/
│   └── my-app.py       # Main executable (Python)
│   └── my-app.sh       # Or Bash script
├── resources/          # Bundled files and dependencies
│   └── tools/
│   └── data/
└── test/               # Test data and scripts
    └── test.json

Python App Structure

Entry Points

Python apps use the @dxpy.entry_point() decorator to define functions:

import dxpy

@dxpy.entry_point('main')
def main(input1, input2):
    # Process inputs
    # Return outputs
    return {
        "output1": result1,
        "output2": result2
    }

dxpy.run()

Input/Output Handling

Inputs: DNAnexus data objects are represented as dicts containing links:

@dxpy.entry_point('main')
def main(reads_file):
    # Convert link to handler
    reads_dxfile = dxpy.DXFile(reads_file)

    # Download to local filesystem
    dxpy.download_dxfile(reads_dxfile.get_id(), "reads.fastq")

    # Process file...

Outputs: Return primitive types directly, convert file outputs to links:

    # Upload result file
    output_file = dxpy.upload_local_file("output.fastq")

    return {
        "trimmed_reads": dxpy.dxlink(output_file)
    }

Bash App Structure

Bash apps use a simpler shell script approach:

#!/bin/bash
set -e -x -o pipefail

main() {
    # Download inputs
    dx download "$reads_file" -o reads.fastq

    # Process
    process_reads reads.fastq > output.fastq

    # Upload outputs
    trimmed_reads=$(dx upload output.fastq --brief)

    # Set job output
    dx-jobutil-add-output trimmed_reads "$trimmed_reads" --class=file
}

Common Development Patterns

1. Bioinformatics Pipeline

Download → Process → Upload pattern:

# Download input
dxpy.download_dxfile(input_file_id, "input.fastq")

# Run analysis
subprocess.check_call(["tool", "input.fastq", "output.bam"])

# Upload result
output = dxpy.upload_local_file("output.bam")
return {"aligned_reads": dxpy.dxlink(output)}

2. Multi-file Processing

# Process multiple inputs
for file_link in input_files:
    file_handler = dxpy.DXFile(file_link)
    local_path = f"{file_handler.name}"
    dxpy.download_dxfile(file_handler.get_id(), local_path)
    # Process each file...

3. Parallel Processing

Apps can spawn subjobs for parallel execution:

# Create subjobs
subjobs = []
for item in input_list:
    subjob = dxpy.new_dxjob(
        fn_input={"input": item},
        fn_name="process_item"
    )
    subjobs.append(subjob)

# Collect results
results = [job.get_output_ref("result") for job in subjobs]

Execution Environment

Apps run in isolated Linux VMs (Ubuntu 24.04) with:

  • Internet access
  • DNAnexus API access
  • Temporary scratch space in /home/dnanexus
  • Input files downloaded to job workspace
  • Root access for installing dependencies

Testing Apps

Local Testing

Test app logic locally before deploying:

cd my-app
python src/my-app.py

Platform Testing

Run the applet on the platform:

dx run applet-xxxx -i input1=file-yyyy

Monitor job execution:

dx watch job-zzzz

View job logs:

dx watch job-zzzz --get-streams

Best Practices

  1. Error Handling: Use try-except blocks and provide informative error messages
  2. Logging: Print progress and debug information to stdout/stderr
  3. Validation: Validate inputs before processing
  4. Cleanup: Remove temporary files when done
  5. Documentation: Include clear descriptions in dxapp.json
  6. Testing: Test with various input types and edge cases
  7. Versioning: Use semantic versioning for apps

Common Issues

File Not Found

Ensure files are properly downloaded before accessing:

dxpy.download_dxfile(file_id, local_path)
# Now safe to open local_path

Out of Memory

Specify larger instance type in dxapp.json systemRequirements

Timeout

Increase timeout in dxapp.json or split into smaller jobs

Permission Errors

Ensure app has necessary permissions in dxapp.json