Initial commit
This commit is contained in:
412
skills/dnanexus-integration/references/job-execution.md
Normal file
412
skills/dnanexus-integration/references/job-execution.md
Normal file
@@ -0,0 +1,412 @@
|
||||
# DNAnexus Job Execution and Workflows
|
||||
|
||||
## Overview
|
||||
|
||||
Jobs are the fundamental execution units on DNAnexus. When an applet or app runs, a job is created and executed on a worker node in an isolated Linux environment with constant API access.
|
||||
|
||||
## Job Types
|
||||
|
||||
### Origin Jobs
|
||||
Initially created by users or automated systems.
|
||||
|
||||
### Master Jobs
|
||||
Result from directly launching an executable (app/applet).
|
||||
|
||||
### Child Jobs
|
||||
Spawned by parent jobs for parallel processing or sub-workflows.
|
||||
|
||||
## Running Jobs
|
||||
|
||||
### Running an Applet
|
||||
|
||||
**Basic execution**:
|
||||
```python
|
||||
import dxpy
|
||||
|
||||
# Run an applet
|
||||
job = dxpy.DXApplet("applet-xxxx").run({
|
||||
"input1": {"$dnanexus_link": "file-yyyy"},
|
||||
"input2": "parameter_value"
|
||||
})
|
||||
|
||||
print(f"Job ID: {job.get_id()}")
|
||||
```
|
||||
|
||||
**Using command line**:
|
||||
```bash
|
||||
dx run applet-xxxx -i input1=file-yyyy -i input2="value"
|
||||
```
|
||||
|
||||
### Running an App
|
||||
|
||||
```python
|
||||
# Run an app by name
|
||||
job = dxpy.DXApp(name="my-app").run({
|
||||
"reads": {"$dnanexus_link": "file-xxxx"},
|
||||
"quality_threshold": 30
|
||||
})
|
||||
```
|
||||
|
||||
### Specifying Execution Parameters
|
||||
|
||||
```python
|
||||
job = dxpy.DXApplet("applet-xxxx").run(
|
||||
applet_input={
|
||||
"input_file": {"$dnanexus_link": "file-yyyy"}
|
||||
},
|
||||
project="project-zzzz", # Output project
|
||||
folder="/results", # Output folder
|
||||
name="My Analysis Job", # Job name
|
||||
instance_type="mem2_hdd2_x4", # Override instance type
|
||||
priority="high" # Job priority
|
||||
)
|
||||
```
|
||||
|
||||
## Job Monitoring
|
||||
|
||||
### Checking Job Status
|
||||
|
||||
```python
|
||||
job = dxpy.DXJob("job-xxxx")
|
||||
state = job.describe()["state"]
|
||||
|
||||
# States: idle, waiting_on_input, runnable, running, done, failed, terminated
|
||||
print(f"Job state: {state}")
|
||||
```
|
||||
|
||||
**Using command line**:
|
||||
```bash
|
||||
dx watch job-xxxx
|
||||
```
|
||||
|
||||
### Waiting for Job Completion
|
||||
|
||||
```python
|
||||
# Block until job completes
|
||||
job.wait_on_done()
|
||||
|
||||
# Check if successful
|
||||
if job.describe()["state"] == "done":
|
||||
output = job.describe()["output"]
|
||||
print(f"Job completed: {output}")
|
||||
else:
|
||||
print("Job failed")
|
||||
```
|
||||
|
||||
### Getting Job Output
|
||||
|
||||
```python
|
||||
job = dxpy.DXJob("job-xxxx")
|
||||
|
||||
# Wait for completion
|
||||
job.wait_on_done()
|
||||
|
||||
# Get outputs
|
||||
output = job.describe()["output"]
|
||||
output_file_id = output["result_file"]["$dnanexus_link"]
|
||||
|
||||
# Download result
|
||||
dxpy.download_dxfile(output_file_id, "result.txt")
|
||||
```
|
||||
|
||||
### Job Output References
|
||||
|
||||
Create references to job outputs before they complete:
|
||||
|
||||
```python
|
||||
# Launch first job
|
||||
job1 = dxpy.DXApplet("applet-1").run({"input": "..."})
|
||||
|
||||
# Launch second job using output reference
|
||||
job2 = dxpy.DXApplet("applet-2").run({
|
||||
"input": dxpy.dxlink(job1.get_output_ref("output_name"))
|
||||
})
|
||||
```
|
||||
|
||||
## Job Logs
|
||||
|
||||
### Viewing Logs
|
||||
|
||||
**Command line**:
|
||||
```bash
|
||||
dx watch job-xxxx --get-streams
|
||||
```
|
||||
|
||||
**Programmatically**:
|
||||
```python
|
||||
import sys
|
||||
|
||||
# Get job logs
|
||||
job = dxpy.DXJob("job-xxxx")
|
||||
log = dxpy.api.job_get_log(job.get_id())
|
||||
|
||||
for log_entry in log["loglines"]:
|
||||
print(log_entry)
|
||||
```
|
||||
|
||||
## Parallel Execution
|
||||
|
||||
### Creating Subjobs
|
||||
|
||||
```python
|
||||
@dxpy.entry_point('main')
|
||||
def main(input_files):
|
||||
# Create subjobs for parallel processing
|
||||
subjobs = []
|
||||
|
||||
for input_file in input_files:
|
||||
subjob = dxpy.new_dxjob(
|
||||
fn_input={"file": input_file},
|
||||
fn_name="process_file"
|
||||
)
|
||||
subjobs.append(subjob)
|
||||
|
||||
# Collect results
|
||||
results = []
|
||||
for subjob in subjobs:
|
||||
result = subjob.get_output_ref("processed_file")
|
||||
results.append(result)
|
||||
|
||||
return {"all_results": results}
|
||||
|
||||
@dxpy.entry_point('process_file')
|
||||
def process_file(file):
|
||||
# Process single file
|
||||
# ...
|
||||
return {"processed_file": output_file}
|
||||
```
|
||||
|
||||
### Scatter-Gather Pattern
|
||||
|
||||
```python
|
||||
# Scatter: Process items in parallel
|
||||
scatter_jobs = []
|
||||
for item in items:
|
||||
job = dxpy.new_dxjob(
|
||||
fn_input={"item": item},
|
||||
fn_name="process_item"
|
||||
)
|
||||
scatter_jobs.append(job)
|
||||
|
||||
# Gather: Combine results
|
||||
gather_job = dxpy.new_dxjob(
|
||||
fn_input={
|
||||
"results": [job.get_output_ref("result") for job in scatter_jobs]
|
||||
},
|
||||
fn_name="combine_results"
|
||||
)
|
||||
```
|
||||
|
||||
## Workflows
|
||||
|
||||
Workflows combine multiple apps/applets into multi-step pipelines.
|
||||
|
||||
### Creating a Workflow
|
||||
|
||||
```python
|
||||
# Create workflow
|
||||
workflow = dxpy.new_dxworkflow(
|
||||
name="My Analysis Pipeline",
|
||||
project="project-xxxx"
|
||||
)
|
||||
|
||||
# Add stages
|
||||
stage1 = workflow.add_stage(
|
||||
dxpy.DXApplet("applet-1"),
|
||||
name="Quality Control",
|
||||
folder="/qc"
|
||||
)
|
||||
|
||||
stage2 = workflow.add_stage(
|
||||
dxpy.DXApplet("applet-2"),
|
||||
name="Alignment",
|
||||
folder="/alignment"
|
||||
)
|
||||
|
||||
# Connect stages
|
||||
stage2.set_input("reads", stage1.get_output_ref("filtered_reads"))
|
||||
|
||||
# Close workflow
|
||||
workflow.close()
|
||||
```
|
||||
|
||||
### Running a Workflow
|
||||
|
||||
```python
|
||||
# Run workflow
|
||||
analysis = workflow.run({
|
||||
"stage-xxxx.input1": {"$dnanexus_link": "file-yyyy"}
|
||||
})
|
||||
|
||||
# Monitor analysis (collection of jobs)
|
||||
analysis.wait_on_done()
|
||||
|
||||
# Get workflow outputs
|
||||
outputs = analysis.describe()["output"]
|
||||
```
|
||||
|
||||
**Using command line**:
|
||||
```bash
|
||||
dx run workflow-xxxx -i stage-1.input=file-yyyy
|
||||
```
|
||||
|
||||
## Job Permissions and Context
|
||||
|
||||
### Workspace Context
|
||||
|
||||
Jobs run in a workspace project with cloned input data:
|
||||
- Jobs require `CONTRIBUTE` permission to workspace
|
||||
- Jobs need `VIEW` access to source projects
|
||||
- All charges accumulate to the originating project
|
||||
|
||||
### Data Requirements
|
||||
|
||||
Jobs cannot start until:
|
||||
1. All input data objects are in `closed` state
|
||||
2. Required permissions are available
|
||||
3. Resources are allocated
|
||||
|
||||
Output objects must reach `closed` state before workspace cleanup.
|
||||
|
||||
## Job Lifecycle
|
||||
|
||||
```
|
||||
Created → Waiting on Input → Runnable → Running → Done/Failed
|
||||
```
|
||||
|
||||
**States**:
|
||||
- `idle`: Job created but not yet queued
|
||||
- `waiting_on_input`: Waiting for input data objects to close
|
||||
- `runnable`: Ready to run, waiting for resources
|
||||
- `running`: Currently executing
|
||||
- `done`: Completed successfully
|
||||
- `failed`: Execution failed
|
||||
- `terminated`: Manually stopped
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Job Failure
|
||||
|
||||
```python
|
||||
job = dxpy.DXJob("job-xxxx")
|
||||
job.wait_on_done()
|
||||
|
||||
desc = job.describe()
|
||||
if desc["state"] == "failed":
|
||||
print(f"Job failed: {desc.get('failureReason', 'Unknown')}")
|
||||
print(f"Failure message: {desc.get('failureMessage', '')}")
|
||||
```
|
||||
|
||||
### Retry Failed Jobs
|
||||
|
||||
```python
|
||||
# Rerun failed job
|
||||
new_job = dxpy.DXApplet(desc["applet"]).run(
|
||||
desc["originalInput"],
|
||||
project=desc["project"]
|
||||
)
|
||||
```
|
||||
|
||||
### Terminating Jobs
|
||||
|
||||
```python
|
||||
# Stop a running job
|
||||
job = dxpy.DXJob("job-xxxx")
|
||||
job.terminate()
|
||||
```
|
||||
|
||||
**Using command line**:
|
||||
```bash
|
||||
dx terminate job-xxxx
|
||||
```
|
||||
|
||||
## Resource Management
|
||||
|
||||
### Instance Types
|
||||
|
||||
Specify computational resources:
|
||||
|
||||
```python
|
||||
# Run with specific instance type
|
||||
job = dxpy.DXApplet("applet-xxxx").run(
|
||||
{"input": "..."},
|
||||
instance_type="mem3_ssd1_v2_x8" # 8 cores, high memory, SSD
|
||||
)
|
||||
```
|
||||
|
||||
Common instance types:
|
||||
- `mem1_ssd1_v2_x4` - 4 cores, standard memory
|
||||
- `mem2_ssd1_v2_x8` - 8 cores, high memory
|
||||
- `mem3_ssd1_v2_x16` - 16 cores, very high memory
|
||||
- `mem1_ssd1_v2_x36` - 36 cores for parallel workloads
|
||||
|
||||
### Timeout Settings
|
||||
|
||||
Set maximum execution time:
|
||||
|
||||
```python
|
||||
job = dxpy.DXApplet("applet-xxxx").run(
|
||||
{"input": "..."},
|
||||
timeout="24h" # Maximum runtime
|
||||
)
|
||||
```
|
||||
|
||||
## Job Tagging and Metadata
|
||||
|
||||
### Add Job Tags
|
||||
|
||||
```python
|
||||
job = dxpy.DXApplet("applet-xxxx").run(
|
||||
{"input": "..."},
|
||||
tags=["experiment1", "batch2", "production"]
|
||||
)
|
||||
```
|
||||
|
||||
### Add Job Properties
|
||||
|
||||
```python
|
||||
job = dxpy.DXApplet("applet-xxxx").run(
|
||||
{"input": "..."},
|
||||
properties={
|
||||
"experiment": "exp001",
|
||||
"sample": "sample1",
|
||||
"batch": "batch2"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Finding Jobs
|
||||
|
||||
```python
|
||||
# Find jobs by tag
|
||||
jobs = dxpy.find_jobs(
|
||||
project="project-xxxx",
|
||||
tags=["experiment1"],
|
||||
describe=True
|
||||
)
|
||||
|
||||
for job in jobs:
|
||||
print(f"{job['describe']['name']}: {job['id']}")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Job Naming**: Use descriptive names for easier tracking
|
||||
2. **Tags and Properties**: Tag jobs for organization and searchability
|
||||
3. **Resource Selection**: Choose appropriate instance types for workload
|
||||
4. **Error Handling**: Check job state and handle failures gracefully
|
||||
5. **Parallel Processing**: Use subjobs for independent parallel tasks
|
||||
6. **Workflows**: Use workflows for complex multi-step analyses
|
||||
7. **Monitoring**: Monitor long-running jobs and check logs for issues
|
||||
8. **Cost Management**: Use appropriate instance types to balance cost/performance
|
||||
9. **Timeouts**: Set reasonable timeouts to prevent runaway jobs
|
||||
10. **Cleanup**: Remove failed or obsolete jobs
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
1. **Check Logs**: Always review job logs for error messages
|
||||
2. **Verify Inputs**: Ensure input files are closed and accessible
|
||||
3. **Test Locally**: Test logic locally before deploying to platform
|
||||
4. **Start Small**: Test with small datasets before scaling up
|
||||
5. **Monitor Resources**: Check if job is running out of memory or disk space
|
||||
6. **Instance Type**: Try larger instance if job fails due to resources
|
||||
Reference in New Issue
Block a user