# DNAnexus App Configuration and Dependencies

## Overview

This guide covers configuring apps through dxapp.json metadata and managing dependencies including system packages, Python libraries, and Docker containers.

## dxapp.json Structure

The `dxapp.json` file is the configuration file for DNAnexus apps and applets. It defines metadata, inputs, outputs, execution requirements, and dependencies.

### Minimal Example

```json
{
  "name": "my-app",
  "title": "My Analysis App",
  "summary": "Performs analysis on input files",
  "dxapi": "1.0.0",
  "version": "1.0.0",
  "inputSpec": [],
  "outputSpec": [],
  "runSpec": {
    "interpreter": "python3",
    "file": "src/my-app.py",
    "distribution": "Ubuntu",
    "release": "24.04"
  }
}
```

## Metadata Fields

### Required Fields

```json
{
  "name": "my-app",           // Unique identifier (lowercase, numbers, hyphens, underscores)
  "title": "My App",          // Human-readable name
  "summary": "One line description",
  "dxapi": "1.0.0"           // API version
}
```

### Optional Metadata

```json
{
  "version": "1.0.0",        // Semantic version (required for apps)
  "description": "Extended description...",
  "developerNotes": "Implementation notes...",
  "categories": [            // For app discovery
    "Read Mapping",
    "Variation Calling"
  ],
  "details": {               // Arbitrary metadata
    "contactEmail": "dev@example.com",
    "upstreamVersion": "2.1.0",
    "citations": ["doi:10.1000/example"],
    "changelog": {
      "1.0.0": "Initial release"
    }
  }
}
```

## Input Specification

Define input parameters:

```json
{
  "inputSpec": [
    {
      "name": "reads",
      "label": "Input reads",
      "class": "file",
      "patterns": ["*.fastq", "*.fastq.gz"],
      "optional": false,
      "help": "FASTQ file containing sequencing reads"
    },
    {
      "name": "quality_threshold",
      "label": "Quality threshold",
      "class": "int",
      "default": 30,
      "optional": true,
      "help": "Minimum base quality score"
    },
    {
      "name": "reference",
      "label": "Reference genome",
      "class": "file",
      "patterns": ["*.fa", "*.fasta"],
      "suggestions": [
        {
          "name": "Human GRCh38",
          "project": "project-xxxx",
          "path": "/references/human_g1k_v37.fasta"
        }
      ]
    }
  ]
}
```

### Input Classes

- `file` - File object
- `record` - Record object
- `applet` - Applet reference
- `string` - Text string
- `int` - Integer number
- `float` - Floating point number
- `boolean` - True/false
- `hash` - Key-value mapping
- `array:class` - Array of specified class

### Input Options

- `name` - Parameter name (required)
- `class` - Data type (required)
- `optional` - Whether parameter is optional (default: false)
- `default` - Default value for optional parameters
- `label` - Display name in UI
- `help` - Description text
- `patterns` - File name patterns (for files)
- `suggestions` - Pre-defined reference data
- `choices` - Allowed values (for strings/numbers)
- `group` - UI grouping

## Output Specification

Define output parameters:

```json
{
  "outputSpec": [
    {
      "name": "aligned_reads",
      "label": "Aligned reads",
      "class": "file",
      "patterns": ["*.bam"],
      "help": "BAM file with aligned reads"
    },
    {
      "name": "mapping_stats",
      "label": "Mapping statistics",
      "class": "record",
      "help": "Record containing alignment statistics"
    }
  ]
}
```

## Run Specification

Define how the app executes:

```json
{
  "runSpec": {
    "interpreter": "python3",        // or "bash"
    "file": "src/my-app.py",         // Entry point script
    "distribution": "Ubuntu",
    "release": "24.04",
    "version": "0",                   // Distribution version
    "execDepends": [                  // System packages
      {"name": "samtools"},
      {"name": "bwa"}
    ],
    "bundledDepends": [              // Bundled resources
      {"name": "scripts.tar.gz", "id": {"$dnanexus_link": "file-xxxx"}}
    ],
    "assetDepends": [                // Asset dependencies
      {"name": "asset-name", "id": {"$dnanexus_link": "record-xxxx"}}
    ],
    "systemRequirements": {
      "*": {
        "instanceType": "mem2_ssd1_v2_x4"
      }
    },
    "headJobOnDemand": true,
    "restartableEntryPoints": ["main"]
  }
}
```

## System Requirements

### Instance Type Selection

```json
{
  "systemRequirements": {
    "main": {
      "instanceType": "mem2_ssd1_v2_x8"
    },
    "process": {
      "instanceType": "mem3_ssd1_v2_x16"
    }
  }
}
```

**Common instance types**:
- `mem1_ssd1_v2_x2` - 2 cores, 3.9 GB RAM
- `mem1_ssd1_v2_x4` - 4 cores, 7.8 GB RAM
- `mem2_ssd1_v2_x4` - 4 cores, 15.6 GB RAM
- `mem2_ssd1_v2_x8` - 8 cores, 31.2 GB RAM
- `mem3_ssd1_v2_x8` - 8 cores, 62.5 GB RAM
- `mem3_ssd1_v2_x16` - 16 cores, 125 GB RAM

### Cluster Specifications

For distributed computing:

```json
{
  "systemRequirements": {
    "main": {
      "clusterSpec": {
        "type": "spark",
        "version": "3.1.2",
        "initialInstanceCount": 3,
        "instanceType": "mem1_ssd1_v2_x4",
        "bootstrapScript": "bootstrap.sh"
      }
    }
  }
}
```

## Regional Options

Deploy apps across regions:

```json
{
  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {
        "*": {"instanceType": "mem2_ssd1_v2_x4"}
      },
      "assetDepends": [
        {"id": "record-xxxx"}
      ]
    },
    "azure:westus": {
      "systemRequirements": {
        "*": {"instanceType": "azure:mem2_ssd1_x4"}
      }
    }
  }
}
```

## Dependency Management

### System Packages (execDepends)

Install Ubuntu packages at runtime:

```json
{
  "runSpec": {
    "execDepends": [
      {"name": "samtools"},
      {"name": "bwa"},
      {"name": "python3-pip"},
      {"name": "r-base", "version": "4.0.0"}
    ]
  }
}
```

Packages are installed using `apt-get` from Ubuntu repositories.

### Python Dependencies

#### Option 1: Install via pip in execDepends

```json
{
  "runSpec": {
    "execDepends": [
      {"name": "python3-pip"}
    ]
  }
}
```

Then in your app script:
```python
import subprocess
subprocess.check_call(["pip", "install", "numpy==1.24.0", "pandas==2.0.0"])
```

#### Option 2: Requirements file

Create `resources/requirements.txt`:
```
numpy==1.24.0
pandas==2.0.0
scikit-learn==1.3.0
```

In your app:
```python
subprocess.check_call(["pip", "install", "-r", "requirements.txt"])
```

### Bundled Dependencies

Include custom tools or libraries in the app:

**File structure**:
```
my-app/
├── dxapp.json
├── src/
│   └── my-app.py
└── resources/
    ├── tools/
    │   └── custom_tool
    └── scripts/
        └── helper.py
```

Access resources in app:
```python
import os

# Resources are in parent directory
resources_dir = os.path.join(os.path.dirname(__file__), "..", "resources")
tool_path = os.path.join(resources_dir, "tools", "custom_tool")

# Run bundled tool
subprocess.check_call([tool_path, "arg1", "arg2"])
```

### Asset Dependencies

Assets are pre-built bundles of dependencies that can be shared across apps.

#### Using Assets

```json
{
  "runSpec": {
    "assetDepends": [
      {
        "name": "bwa-asset",
        "id": {"$dnanexus_link": "record-xxxx"}
      }
    ]
  }
}
```

Assets are mounted at runtime and accessible via environment variable:
```python
import os
asset_dir = os.environ.get("DX_ASSET_BWA")
bwa_path = os.path.join(asset_dir, "bin", "bwa")
```

#### Creating Assets

Create asset directory:
```bash
mkdir bwa-asset
cd bwa-asset
# Install software
./configure --prefix=$PWD/usr/local
make && make install
```

Build asset:
```bash
dx build_asset bwa-asset --destination=project-xxxx:/assets/
```

## Docker Integration

### Using Docker Images

```json
{
  "runSpec": {
    "interpreter": "python3",
    "file": "src/my-app.py",
    "distribution": "Ubuntu",
    "release": "24.04",
    "systemRequirements": {
      "*": {
        "instanceType": "mem2_ssd1_v2_x4"
      }
    },
    "execDepends": [
      {"name": "docker.io"}
    ]
  }
}
```

Use Docker in app:
```python
import subprocess

# Pull Docker image
subprocess.check_call(["docker", "pull", "biocontainers/samtools:v1.9"])

# Run command in container
subprocess.check_call([
    "docker", "run",
    "-v", f"{os.getcwd()}:/data",
    "biocontainers/samtools:v1.9",
    "samtools", "view", "/data/input.bam"
])
```

### Docker as Base Image

For apps that run entirely in Docker:

```json
{
  "runSpec": {
    "interpreter": "bash",
    "file": "src/wrapper.sh",
    "distribution": "Ubuntu",
    "release": "24.04",
    "execDepends": [
      {"name": "docker.io"}
    ]
  }
}
```

## Access Requirements

Request special permissions:

```json
{
  "access": {
    "network": ["*"],           // Internet access
    "project": "CONTRIBUTE",    // Project write access
    "allProjects": "VIEW",      // Read other projects
    "developer": true           // Advanced permissions
  }
}
```

**Network access**:
- `["*"]` - Full internet
- `["github.com", "pypi.org"]` - Specific domains

## Timeout Configuration

```json
{
  "runSpec": {
    "timeoutPolicy": {
      "*": {
        "days": 1,
        "hours": 12,
        "minutes": 30
      }
    }
  }
}
```

## Example: Complete dxapp.json

```json
{
  "name": "rna-seq-pipeline",
  "title": "RNA-Seq Analysis Pipeline",
  "summary": "Aligns RNA-seq reads and quantifies gene expression",
  "description": "Comprehensive RNA-seq pipeline using STAR aligner and featureCounts",
  "version": "1.0.0",
  "dxapi": "1.0.0",
  "categories": ["Read Mapping", "RNA-Seq"],

  "inputSpec": [
    {
      "name": "reads",
      "label": "FASTQ reads",
      "class": "array:file",
      "patterns": ["*.fastq.gz", "*.fq.gz"],
      "help": "Single-end or paired-end RNA-seq reads"
    },
    {
      "name": "reference_genome",
      "label": "Reference genome",
      "class": "file",
      "patterns": ["*.fa", "*.fasta"],
      "suggestions": [
        {
          "name": "Human GRCh38",
          "project": "project-reference",
          "path": "/genomes/GRCh38.fa"
        }
      ]
    },
    {
      "name": "gtf_file",
      "label": "Gene annotation (GTF)",
      "class": "file",
      "patterns": ["*.gtf", "*.gtf.gz"]
    }
  ],

  "outputSpec": [
    {
      "name": "aligned_bam",
      "label": "Aligned reads (BAM)",
      "class": "file",
      "patterns": ["*.bam"]
    },
    {
      "name": "counts",
      "label": "Gene counts",
      "class": "file",
      "patterns": ["*.counts.txt"]
    },
    {
      "name": "qc_report",
      "label": "QC report",
      "class": "file",
      "patterns": ["*.html"]
    }
  ],

  "runSpec": {
    "interpreter": "python3",
    "file": "src/rna-seq-pipeline.py",
    "distribution": "Ubuntu",
    "release": "24.04",

    "execDepends": [
      {"name": "python3-pip"},
      {"name": "samtools"},
      {"name": "subread"}
    ],

    "assetDepends": [
      {
        "name": "star-aligner",
        "id": {"$dnanexus_link": "record-star-asset"}
      }
    ],

    "systemRequirements": {
      "main": {
        "instanceType": "mem3_ssd1_v2_x16"
      }
    },

    "timeoutPolicy": {
      "*": {"hours": 8}
    }
  },

  "access": {
    "network": ["*"]
  },

  "details": {
    "contactEmail": "support@example.com",
    "upstreamVersion": "STAR 2.7.10a, Subread 2.0.3",
    "citations": ["doi:10.1093/bioinformatics/bts635"]
  }
}
```

## Best Practices

1. **Version Management**: Use semantic versioning for apps
2. **Instance Type**: Start with smaller instances, scale up as needed
3. **Dependencies**: Document all dependencies clearly
4. **Error Messages**: Provide helpful error messages for invalid inputs
5. **Testing**: Test with various input types and sizes
6. **Documentation**: Write clear descriptions and help text
7. **Resources**: Bundle frequently-used tools to avoid repeated downloads
8. **Docker**: Use Docker for complex dependency chains
9. **Assets**: Create assets for heavy dependencies shared across apps
10. **Timeouts**: Set reasonable timeouts based on expected runtime
11. **Network Access**: Request only necessary network permissions
12. **Region Support**: Use regionalOptions for multi-region apps

## Common Patterns

### Bioinformatics Tool

```json
{
  "inputSpec": [
    {"name": "input_file", "class": "file", "patterns": ["*.bam"]},
    {"name": "threads", "class": "int", "default": 4, "optional": true}
  ],
  "runSpec": {
    "execDepends": [{"name": "tool-name"}],
    "systemRequirements": {
      "main": {"instanceType": "mem2_ssd1_v2_x8"}
    }
  }
}
```

### Python Data Analysis

```json
{
  "runSpec": {
    "interpreter": "python3",
    "execDepends": [
      {"name": "python3-pip"}
    ],
    "systemRequirements": {
      "main": {"instanceType": "mem2_ssd1_v2_x4"}
    }
  }
}
```

### Docker-based App

```json
{
  "runSpec": {
    "interpreter": "bash",
    "execDepends": [
      {"name": "docker.io"}
    ],
    "systemRequirements": {
      "main": {"instanceType": "mem2_ssd1_v2_x8"}
    }
  },
  "access": {
    "network": ["*"]
  }
}
```