Initial commit
This commit is contained in:
247
skills/dnanexus-integration/references/app-development.md
Normal file
247
skills/dnanexus-integration/references/app-development.md
Normal file
@@ -0,0 +1,247 @@
|
||||
# DNAnexus App Development
|
||||
|
||||
## Overview
|
||||
|
||||
Apps and applets are executable programs that run on the DNAnexus platform. They can be written in Python or Bash and are deployed with all necessary dependencies and configuration.
|
||||
|
||||
## Applets vs Apps
|
||||
|
||||
- **Applets**: Data objects that live inside projects. Good for development and testing.
|
||||
- **Apps**: Versioned, shareable executables that don't live inside projects. Can be published for others to use.
|
||||
|
||||
Both are created identically until the final build step. Applets can be converted to apps later.
|
||||
|
||||
## Creating an App/Applet
|
||||
|
||||
### Using dx-app-wizard
|
||||
|
||||
Generate a skeleton app directory structure:
|
||||
|
||||
```bash
|
||||
dx-app-wizard
|
||||
```
|
||||
|
||||
This creates:
|
||||
- `dxapp.json` - Configuration file
|
||||
- `src/` - Source code directory
|
||||
- `resources/` - Bundled dependencies
|
||||
- `test/` - Test files
|
||||
|
||||
### Building and Deploying
|
||||
|
||||
Build an applet:
|
||||
```bash
|
||||
dx build
|
||||
```
|
||||
|
||||
Build an app:
|
||||
```bash
|
||||
dx build --app
|
||||
```
|
||||
|
||||
The build process:
|
||||
1. Validates dxapp.json configuration
|
||||
2. Bundles source code and resources
|
||||
3. Deploys to the platform
|
||||
4. Returns the applet/app ID
|
||||
|
||||
## App Directory Structure
|
||||
|
||||
```
|
||||
my-app/
|
||||
├── dxapp.json # Metadata and configuration
|
||||
├── src/
|
||||
│ └── my-app.py # Main executable (Python)
|
||||
│ └── my-app.sh # Or Bash script
|
||||
├── resources/ # Bundled files and dependencies
|
||||
│ └── tools/
|
||||
│ └── data/
|
||||
└── test/ # Test data and scripts
|
||||
└── test.json
|
||||
```
|
||||
|
||||
## Python App Structure
|
||||
|
||||
### Entry Points
|
||||
|
||||
Python apps use the `@dxpy.entry_point()` decorator to define functions:
|
||||
|
||||
```python
|
||||
import dxpy
|
||||
|
||||
@dxpy.entry_point('main')
|
||||
def main(input1, input2):
|
||||
# Process inputs
|
||||
# Return outputs
|
||||
return {
|
||||
"output1": result1,
|
||||
"output2": result2
|
||||
}
|
||||
|
||||
dxpy.run()
|
||||
```
|
||||
|
||||
### Input/Output Handling
|
||||
|
||||
**Inputs**: DNAnexus data objects are represented as dicts containing links:
|
||||
|
||||
```python
|
||||
@dxpy.entry_point('main')
|
||||
def main(reads_file):
|
||||
# Convert link to handler
|
||||
reads_dxfile = dxpy.DXFile(reads_file)
|
||||
|
||||
# Download to local filesystem
|
||||
dxpy.download_dxfile(reads_dxfile.get_id(), "reads.fastq")
|
||||
|
||||
# Process file...
|
||||
```
|
||||
|
||||
**Outputs**: Return primitive types directly, convert file outputs to links:
|
||||
|
||||
```python
|
||||
# Upload result file
|
||||
output_file = dxpy.upload_local_file("output.fastq")
|
||||
|
||||
return {
|
||||
"trimmed_reads": dxpy.dxlink(output_file)
|
||||
}
|
||||
```
|
||||
|
||||
## Bash App Structure
|
||||
|
||||
Bash apps use a simpler shell script approach:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
set -e -x -o pipefail
|
||||
|
||||
main() {
|
||||
# Download inputs
|
||||
dx download "$reads_file" -o reads.fastq
|
||||
|
||||
# Process
|
||||
process_reads reads.fastq > output.fastq
|
||||
|
||||
# Upload outputs
|
||||
trimmed_reads=$(dx upload output.fastq --brief)
|
||||
|
||||
# Set job output
|
||||
dx-jobutil-add-output trimmed_reads "$trimmed_reads" --class=file
|
||||
}
|
||||
```
|
||||
|
||||
## Common Development Patterns
|
||||
|
||||
### 1. Bioinformatics Pipeline
|
||||
|
||||
Download → Process → Upload pattern:
|
||||
|
||||
```python
|
||||
# Download input
|
||||
dxpy.download_dxfile(input_file_id, "input.fastq")
|
||||
|
||||
# Run analysis
|
||||
subprocess.check_call(["tool", "input.fastq", "output.bam"])
|
||||
|
||||
# Upload result
|
||||
output = dxpy.upload_local_file("output.bam")
|
||||
return {"aligned_reads": dxpy.dxlink(output)}
|
||||
```
|
||||
|
||||
### 2. Multi-file Processing
|
||||
|
||||
```python
|
||||
# Process multiple inputs
|
||||
for file_link in input_files:
|
||||
file_handler = dxpy.DXFile(file_link)
|
||||
local_path = f"{file_handler.name}"
|
||||
dxpy.download_dxfile(file_handler.get_id(), local_path)
|
||||
# Process each file...
|
||||
```
|
||||
|
||||
### 3. Parallel Processing
|
||||
|
||||
Apps can spawn subjobs for parallel execution:
|
||||
|
||||
```python
|
||||
# Create subjobs
|
||||
subjobs = []
|
||||
for item in input_list:
|
||||
subjob = dxpy.new_dxjob(
|
||||
fn_input={"input": item},
|
||||
fn_name="process_item"
|
||||
)
|
||||
subjobs.append(subjob)
|
||||
|
||||
# Collect results
|
||||
results = [job.get_output_ref("result") for job in subjobs]
|
||||
```
|
||||
|
||||
## Execution Environment
|
||||
|
||||
Apps run in isolated Linux VMs (Ubuntu 24.04) with:
|
||||
- Internet access
|
||||
- DNAnexus API access
|
||||
- Temporary scratch space in `/home/dnanexus`
|
||||
- Input files downloaded to job workspace
|
||||
- Root access for installing dependencies
|
||||
|
||||
## Testing Apps
|
||||
|
||||
### Local Testing
|
||||
|
||||
Test app logic locally before deploying:
|
||||
|
||||
```bash
|
||||
cd my-app
|
||||
python src/my-app.py
|
||||
```
|
||||
|
||||
### Platform Testing
|
||||
|
||||
Run the applet on the platform:
|
||||
|
||||
```bash
|
||||
dx run applet-xxxx -i input1=file-yyyy
|
||||
```
|
||||
|
||||
Monitor job execution:
|
||||
|
||||
```bash
|
||||
dx watch job-zzzz
|
||||
```
|
||||
|
||||
View job logs:
|
||||
|
||||
```bash
|
||||
dx watch job-zzzz --get-streams
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Error Handling**: Use try-except blocks and provide informative error messages
|
||||
2. **Logging**: Print progress and debug information to stdout/stderr
|
||||
3. **Validation**: Validate inputs before processing
|
||||
4. **Cleanup**: Remove temporary files when done
|
||||
5. **Documentation**: Include clear descriptions in dxapp.json
|
||||
6. **Testing**: Test with various input types and edge cases
|
||||
7. **Versioning**: Use semantic versioning for apps
|
||||
|
||||
## Common Issues
|
||||
|
||||
### File Not Found
|
||||
Ensure files are properly downloaded before accessing:
|
||||
```python
|
||||
dxpy.download_dxfile(file_id, local_path)
|
||||
# Now safe to open local_path
|
||||
```
|
||||
|
||||
### Out of Memory
|
||||
Specify larger instance type in dxapp.json systemRequirements
|
||||
|
||||
### Timeout
|
||||
Increase timeout in dxapp.json or split into smaller jobs
|
||||
|
||||
### Permission Errors
|
||||
Ensure app has necessary permissions in dxapp.json
|
||||
Reference in New Issue
Block a user