248 lines
5.2 KiB
Markdown
248 lines
5.2 KiB
Markdown
# DNAnexus App Development
|
|
|
|
## Overview
|
|
|
|
Apps and applets are executable programs that run on the DNAnexus platform. They can be written in Python or Bash and are deployed with all necessary dependencies and configuration.
|
|
|
|
## Applets vs Apps
|
|
|
|
- **Applets**: Data objects that live inside projects. Good for development and testing.
|
|
- **Apps**: Versioned, shareable executables that don't live inside projects. Can be published for others to use.
|
|
|
|
Both are created identically until the final build step. Applets can be converted to apps later.
|
|
|
|
## Creating an App/Applet
|
|
|
|
### Using dx-app-wizard
|
|
|
|
Generate a skeleton app directory structure:
|
|
|
|
```bash
|
|
dx-app-wizard
|
|
```
|
|
|
|
This creates:
|
|
- `dxapp.json` - Configuration file
|
|
- `src/` - Source code directory
|
|
- `resources/` - Bundled dependencies
|
|
- `test/` - Test files
|
|
|
|
### Building and Deploying
|
|
|
|
Build an applet:
|
|
```bash
|
|
dx build
|
|
```
|
|
|
|
Build an app:
|
|
```bash
|
|
dx build --app
|
|
```
|
|
|
|
The build process:
|
|
1. Validates dxapp.json configuration
|
|
2. Bundles source code and resources
|
|
3. Deploys to the platform
|
|
4. Returns the applet/app ID
|
|
|
|
## App Directory Structure
|
|
|
|
```
|
|
my-app/
|
|
├── dxapp.json # Metadata and configuration
|
|
├── src/
|
|
│ └── my-app.py # Main executable (Python)
|
|
│ └── my-app.sh # Or Bash script
|
|
├── resources/ # Bundled files and dependencies
|
|
│ └── tools/
|
|
│ └── data/
|
|
└── test/ # Test data and scripts
|
|
└── test.json
|
|
```
|
|
|
|
## Python App Structure
|
|
|
|
### Entry Points
|
|
|
|
Python apps use the `@dxpy.entry_point()` decorator to define functions:
|
|
|
|
```python
|
|
import dxpy
|
|
|
|
@dxpy.entry_point('main')
|
|
def main(input1, input2):
|
|
# Process inputs
|
|
# Return outputs
|
|
return {
|
|
"output1": result1,
|
|
"output2": result2
|
|
}
|
|
|
|
dxpy.run()
|
|
```
|
|
|
|
### Input/Output Handling
|
|
|
|
**Inputs**: DNAnexus data objects are represented as dicts containing links:
|
|
|
|
```python
|
|
@dxpy.entry_point('main')
|
|
def main(reads_file):
|
|
# Convert link to handler
|
|
reads_dxfile = dxpy.DXFile(reads_file)
|
|
|
|
# Download to local filesystem
|
|
dxpy.download_dxfile(reads_dxfile.get_id(), "reads.fastq")
|
|
|
|
# Process file...
|
|
```
|
|
|
|
**Outputs**: Return primitive types directly, convert file outputs to links:
|
|
|
|
```python
|
|
# Upload result file
|
|
output_file = dxpy.upload_local_file("output.fastq")
|
|
|
|
return {
|
|
"trimmed_reads": dxpy.dxlink(output_file)
|
|
}
|
|
```
|
|
|
|
## Bash App Structure
|
|
|
|
Bash apps use a simpler shell script approach:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
set -e -x -o pipefail
|
|
|
|
main() {
|
|
# Download inputs
|
|
dx download "$reads_file" -o reads.fastq
|
|
|
|
# Process
|
|
process_reads reads.fastq > output.fastq
|
|
|
|
# Upload outputs
|
|
trimmed_reads=$(dx upload output.fastq --brief)
|
|
|
|
# Set job output
|
|
dx-jobutil-add-output trimmed_reads "$trimmed_reads" --class=file
|
|
}
|
|
```
|
|
|
|
## Common Development Patterns
|
|
|
|
### 1. Bioinformatics Pipeline
|
|
|
|
Download → Process → Upload pattern:
|
|
|
|
```python
|
|
# Download input
|
|
dxpy.download_dxfile(input_file_id, "input.fastq")
|
|
|
|
# Run analysis
|
|
subprocess.check_call(["tool", "input.fastq", "output.bam"])
|
|
|
|
# Upload result
|
|
output = dxpy.upload_local_file("output.bam")
|
|
return {"aligned_reads": dxpy.dxlink(output)}
|
|
```
|
|
|
|
### 2. Multi-file Processing
|
|
|
|
```python
|
|
# Process multiple inputs
|
|
for file_link in input_files:
|
|
file_handler = dxpy.DXFile(file_link)
|
|
local_path = f"{file_handler.name}"
|
|
dxpy.download_dxfile(file_handler.get_id(), local_path)
|
|
# Process each file...
|
|
```
|
|
|
|
### 3. Parallel Processing
|
|
|
|
Apps can spawn subjobs for parallel execution:
|
|
|
|
```python
|
|
# Create subjobs
|
|
subjobs = []
|
|
for item in input_list:
|
|
subjob = dxpy.new_dxjob(
|
|
fn_input={"input": item},
|
|
fn_name="process_item"
|
|
)
|
|
subjobs.append(subjob)
|
|
|
|
# Collect results
|
|
results = [job.get_output_ref("result") for job in subjobs]
|
|
```
|
|
|
|
## Execution Environment
|
|
|
|
Apps run in isolated Linux VMs (Ubuntu 24.04) with:
|
|
- Internet access
|
|
- DNAnexus API access
|
|
- Temporary scratch space in `/home/dnanexus`
|
|
- Input files downloaded to job workspace
|
|
- Root access for installing dependencies
|
|
|
|
## Testing Apps
|
|
|
|
### Local Testing
|
|
|
|
Test app logic locally before deploying:
|
|
|
|
```bash
|
|
cd my-app
|
|
python src/my-app.py
|
|
```
|
|
|
|
### Platform Testing
|
|
|
|
Run the applet on the platform:
|
|
|
|
```bash
|
|
dx run applet-xxxx -i input1=file-yyyy
|
|
```
|
|
|
|
Monitor job execution:
|
|
|
|
```bash
|
|
dx watch job-zzzz
|
|
```
|
|
|
|
View job logs:
|
|
|
|
```bash
|
|
dx watch job-zzzz --get-streams
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Error Handling**: Use try-except blocks and provide informative error messages
|
|
2. **Logging**: Print progress and debug information to stdout/stderr
|
|
3. **Validation**: Validate inputs before processing
|
|
4. **Cleanup**: Remove temporary files when done
|
|
5. **Documentation**: Include clear descriptions in dxapp.json
|
|
6. **Testing**: Test with various input types and edge cases
|
|
7. **Versioning**: Use semantic versioning for apps
|
|
|
|
## Common Issues
|
|
|
|
### File Not Found
|
|
Ensure files are properly downloaded before accessing:
|
|
```python
|
|
dxpy.download_dxfile(file_id, local_path)
|
|
# Now safe to open local_path
|
|
```
|
|
|
|
### Out of Memory
|
|
Specify larger instance type in dxapp.json systemRequirements
|
|
|
|
### Timeout
|
|
Increase timeout in dxapp.json or split into smaller jobs
|
|
|
|
### Permission Errors
|
|
Ensure app has necessary permissions in dxapp.json
|