Files
2025-11-30 08:30:10 +08:00

248 lines
5.2 KiB
Markdown

# DNAnexus App Development
## Overview
Apps and applets are executable programs that run on the DNAnexus platform. They can be written in Python or Bash and are deployed with all necessary dependencies and configuration.
## Applets vs Apps
- **Applets**: Data objects that live inside projects. Good for development and testing.
- **Apps**: Versioned, shareable executables that don't live inside projects. Can be published for others to use.
Both are created identically until the final build step. Applets can be converted to apps later.
## Creating an App/Applet
### Using dx-app-wizard
Generate a skeleton app directory structure:
```bash
dx-app-wizard
```
This creates:
- `dxapp.json` - Configuration file
- `src/` - Source code directory
- `resources/` - Bundled dependencies
- `test/` - Test files
### Building and Deploying
Build an applet:
```bash
dx build
```
Build an app:
```bash
dx build --app
```
The build process:
1. Validates dxapp.json configuration
2. Bundles source code and resources
3. Deploys to the platform
4. Returns the applet/app ID
## App Directory Structure
```
my-app/
├── dxapp.json # Metadata and configuration
├── src/
│ └── my-app.py # Main executable (Python)
│ └── my-app.sh # Or Bash script
├── resources/ # Bundled files and dependencies
│ └── tools/
│ └── data/
└── test/ # Test data and scripts
└── test.json
```
## Python App Structure
### Entry Points
Python apps use the `@dxpy.entry_point()` decorator to define functions:
```python
import dxpy
@dxpy.entry_point('main')
def main(input1, input2):
# Process inputs
# Return outputs
return {
"output1": result1,
"output2": result2
}
dxpy.run()
```
### Input/Output Handling
**Inputs**: DNAnexus data objects are represented as dicts containing links:
```python
@dxpy.entry_point('main')
def main(reads_file):
# Convert link to handler
reads_dxfile = dxpy.DXFile(reads_file)
# Download to local filesystem
dxpy.download_dxfile(reads_dxfile.get_id(), "reads.fastq")
# Process file...
```
**Outputs**: Return primitive types directly, convert file outputs to links:
```python
# Upload result file
output_file = dxpy.upload_local_file("output.fastq")
return {
"trimmed_reads": dxpy.dxlink(output_file)
}
```
## Bash App Structure
Bash apps use a simpler shell script approach:
```bash
#!/bin/bash
set -e -x -o pipefail
main() {
# Download inputs
dx download "$reads_file" -o reads.fastq
# Process
process_reads reads.fastq > output.fastq
# Upload outputs
trimmed_reads=$(dx upload output.fastq --brief)
# Set job output
dx-jobutil-add-output trimmed_reads "$trimmed_reads" --class=file
}
```
## Common Development Patterns
### 1. Bioinformatics Pipeline
Download → Process → Upload pattern:
```python
# Download input
dxpy.download_dxfile(input_file_id, "input.fastq")
# Run analysis
subprocess.check_call(["tool", "input.fastq", "output.bam"])
# Upload result
output = dxpy.upload_local_file("output.bam")
return {"aligned_reads": dxpy.dxlink(output)}
```
### 2. Multi-file Processing
```python
# Process multiple inputs
for file_link in input_files:
file_handler = dxpy.DXFile(file_link)
local_path = f"{file_handler.name}"
dxpy.download_dxfile(file_handler.get_id(), local_path)
# Process each file...
```
### 3. Parallel Processing
Apps can spawn subjobs for parallel execution:
```python
# Create subjobs
subjobs = []
for item in input_list:
subjob = dxpy.new_dxjob(
fn_input={"input": item},
fn_name="process_item"
)
subjobs.append(subjob)
# Collect results
results = [job.get_output_ref("result") for job in subjobs]
```
## Execution Environment
Apps run in isolated Linux VMs (Ubuntu 24.04) with:
- Internet access
- DNAnexus API access
- Temporary scratch space in `/home/dnanexus`
- Input files downloaded to job workspace
- Root access for installing dependencies
## Testing Apps
### Local Testing
Test app logic locally before deploying:
```bash
cd my-app
python src/my-app.py
```
### Platform Testing
Run the applet on the platform:
```bash
dx run applet-xxxx -i input1=file-yyyy
```
Monitor job execution:
```bash
dx watch job-zzzz
```
View job logs:
```bash
dx watch job-zzzz --get-streams
```
## Best Practices
1. **Error Handling**: Use try-except blocks and provide informative error messages
2. **Logging**: Print progress and debug information to stdout/stderr
3. **Validation**: Validate inputs before processing
4. **Cleanup**: Remove temporary files when done
5. **Documentation**: Include clear descriptions in dxapp.json
6. **Testing**: Test with various input types and edge cases
7. **Versioning**: Use semantic versioning for apps
## Common Issues
### File Not Found
Ensure files are properly downloaded before accessing:
```python
dxpy.download_dxfile(file_id, local_path)
# Now safe to open local_path
```
### Out of Memory
Specify larger instance type in dxapp.json systemRequirements
### Timeout
Increase timeout in dxapp.json or split into smaller jobs
### Permission Errors
Ensure app has necessary permissions in dxapp.json