Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:26:08 +08:00
commit 8f22ddf339
295 changed files with 59710 additions and 0 deletions

View File

@@ -0,0 +1,309 @@
---
name: Workflow Compose
description: Executes multi-step workflows by chaining Betty Framework skills.
---
# workflow.compose
## Purpose
Allows declarative execution of Betty Framework workflows by reading a YAML definition and chaining skills like `skill.create`, `skill.define`, and `registry.update`.
Enables complex multi-step processes to be defined once and executed reliably with proper error handling and audit logging.
## Usage
### Basic Usage
```bash
python skills/workflow.compose/workflow_compose.py <path_to_workflow.yaml>
```
### Arguments
| Argument | Type | Required | Description |
|----------|------|----------|-------------|
| workflow_path | string | Yes | Path to the workflow YAML file to execute |
## Workflow YAML Structure
```yaml
# workflows/create_and_register.yaml
name: "Create and Register Skill"
description: "Complete lifecycle: create, validate, and register a new skill"
steps:
- skill: skill.create
args: ["workflow.validate", "Validates workflow definitions"]
required: true
- skill: skill.define
args: ["skills/workflow.validate/skill.yaml"]
required: true
- skill: registry.update
args: ["skills/workflow.validate/skill.yaml"]
required: false # Continue even if this fails
```
### Workflow Fields
| Field | Required | Description | Example |
|-------|----------|-------------|---------|
| `name` | No | Workflow name | `"API Design Workflow"` |
| `description` | No | What the workflow does | `"Complete API lifecycle"` |
| `steps` | Yes | Array of steps to execute | See below |
### Step Fields
| Field | Required | Description | Example |
|-------|----------|-------------|---------|
| `skill` | Yes | Skill name to execute | `api.validate` |
| `args` | No | Arguments to pass to skill | `["specs/api.yaml", "zalando"]` |
| `required` | No | Stop workflow if step fails | `true` (default: `false`) |
## Behavior
1. **Load Workflow**: Parses the workflow YAML file
2. **Sequential Execution**: Runs each step in order
3. **Error Handling**:
- If `required: true`, workflow stops on failure
- If `required: false`, workflow continues and logs error
4. **Audit Logging**: Calls `audit.log` skill (if available) for each step
5. **History Tracking**: Records execution history in `/registry/workflow_history.json`
## Outputs
### Success Response
```json
{
"ok": true,
"status": "success",
"errors": [],
"path": "workflows/create_and_register.yaml",
"details": {
"workflow_name": "Create and Register Skill",
"steps_executed": 3,
"steps_succeeded": 3,
"steps_failed": 0,
"duration_ms": 1234,
"history_file": "/registry/workflow_history.json"
}
}
```
### Partial Failure Response
```json
{
"ok": false,
"status": "failed",
"errors": [
"Step 2 (skill.define) failed: Missing required fields: version"
],
"path": "workflows/create_and_register.yaml",
"details": {
"workflow_name": "Create and Register Skill",
"steps_executed": 2,
"steps_succeeded": 1,
"steps_failed": 1,
"failed_step": "skill.define",
"failed_step_index": 1
}
}
```
## Example Workflow Files
### Example 1: Complete Skill Lifecycle
```yaml
# workflows/create_and_register.yaml
name: "Create and Register Skill"
description: "Scaffold, validate, and register a new skill"
steps:
- skill: skill.create
args: ["workflow.validate", "Validates workflow definitions"]
required: true
- skill: skill.define
args: ["skills/workflow.validate/skill.yaml"]
required: true
- skill: registry.update
args: ["skills/workflow.validate/skill.yaml"]
required: true
```
**Execution**:
```bash
$ python skills/workflow.compose/workflow_compose.py workflows/create_and_register.yaml
{
"ok": true,
"status": "success",
"details": {
"steps_executed": 3,
"steps_succeeded": 3
}
}
```
### Example 2: API Design Workflow
```yaml
# workflows/api_design.yaml
name: "API Design Workflow"
description: "Design, validate, and generate models for new API"
steps:
- skill: api.define
args: ["user-service", "openapi", "zalando", "specs", "1.0.0"]
required: true
- skill: api.validate
args: ["specs/user-service.openapi.yaml", "zalando", "true"]
required: true
- skill: api.generate-models
args: ["specs/user-service.openapi.yaml", "typescript", "src/models"]
required: false # Continue even if model generation fails
```
### Example 3: Multi-Spec Validation
```yaml
# workflows/validate_all_specs.yaml
name: "Validate All API Specs"
description: "Validate all OpenAPI specifications in specs directory"
steps:
- skill: api.validate
args: ["specs/users.openapi.yaml", "zalando"]
required: false
- skill: api.validate
args: ["specs/orders.openapi.yaml", "zalando"]
required: false
- skill: api.validate
args: ["specs/payments.openapi.yaml", "zalando"]
required: false
```
## Workflow History
Execution history is logged to `/registry/workflow_history.json`:
```json
{
"executions": [
{
"workflow_path": "workflows/create_and_register.yaml",
"workflow_name": "Create and Register Skill",
"timestamp": "2025-10-23T12:34:56Z",
"status": "success",
"steps_executed": 3,
"steps_succeeded": 3,
"duration_ms": 1234
}
]
}
```
## Audit Integration
If `audit.log` skill is available, each step execution is logged:
```python
log_audit_entry(
skill_name="api.validate",
status="success",
duration_ms=456,
metadata={"workflow": "api_design.yaml", "step": 1}
)
```
## Integration
### With workflow.validate
Validate workflow syntax before execution:
```bash
# Validate first
python skills/workflow.validate/workflow_validate.py workflows/my-workflow.yaml
# Then execute
python skills/workflow.compose/workflow_compose.py workflows/my-workflow.yaml
```
### With Hooks
Auto-validate workflows when saved:
```bash
python skills/hook.define/hook_define.py \
--event on_file_save \
--pattern "workflows/*.yaml" \
--command "python skills/workflow.validate/workflow_validate.py {file_path}" \
--blocking true
```
### In CI/CD
```yaml
# .github/workflows/test.yml
- name: Run workflow tests
run: |
python skills/workflow.compose/workflow_compose.py workflows/test_suite.yaml
```
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| "Workflow file not found" | Path incorrect | Check workflow file path |
| "Invalid YAML in workflow" | Malformed YAML | Fix YAML syntax errors |
| "Skill handler not found" | Referenced skill doesn't exist | Ensure skill is registered or path is correct |
| "Step X failed" | Skill execution failed | Check skill's error output, fix issues |
| "Skill execution timed out" | Skill took >5 minutes | Optimize skill or increase timeout in code |
## Best Practices
1. **Validate First**: Run `workflow.validate` before executing workflows
2. **Use Required Judiciously**: Only mark critical steps as `required: true`
3. **Small Workflows**: Keep workflows focused on single logical task
4. **Error Handling**: Plan for partial failures in non-required steps
5. **Test Workflows**: Test workflows in development before using in production
6. **Version Control**: Keep workflow files in git
## Files Modified
- **History**: `/registry/workflow_history.json` Execution history
- **Logs**: Step execution logged to Betty's logging system
## Exit Codes
- **0**: Success (all required steps succeeded)
- **1**: Failure (at least one required step failed)
## Timeout
Each skill has a 5-minute (300 second) timeout by default. If a skill exceeds this, the workflow fails.
## See Also
- **workflow.validate** Validate workflow syntax ([workflow.validate SKILL.md](../workflow.validate/SKILL.md))
- **Betty Architecture** [Five-Layer Model](../../docs/betty-architecture.md) for understanding workflows
- **API-Driven Development** [Example workflows](../../docs/api-driven-development.md)
## Status
**Active** Production-ready, core orchestration skill
## Version History
- **0.1.0** (Oct 2025) Initial implementation with sequential execution, error handling, and audit logging

View File

@@ -0,0 +1 @@
# Auto-generated package initializer for skills.

View File

@@ -0,0 +1,30 @@
name: workflow.compose
version: 0.1.0
description: >
Executes multi-step Betty Framework workflows by chaining existing skills.
Enables declarative orchestration of skill pipelines.
inputs:
- workflow_path
outputs:
- workflow_history.json
dependencies:
- skill.create
- skill.define
- registry.update
status: active
entrypoints:
- command: /workflow/compose
handler: workflow_compose.py
runtime: python
description: >
Execute a Betty workflow defined in a YAML file.
parameters:
- name: workflow_path
type: string
required: true
description: Path to a workflow YAML file to execute.
permissions:
- filesystem
- read
- write

View File

@@ -0,0 +1,496 @@
#!/usr/bin/env python3
"""
workflow_compose.py Implementation of the workflow.compose Skill
Executes multi-step Betty Framework workflows by chaining existing skills.
"""
import os
import sys
import yaml
import json
from typing import Dict, Any, List, Optional, Tuple
from datetime import datetime, timezone
from pydantic import ValidationError as PydanticValidationError
from betty.config import BASE_DIR, WORKFLOW_HISTORY_FILE, get_skill_handler_path
from betty.file_utils import safe_update_json
from betty.validation import validate_path
from betty.logging_utils import setup_logger
from betty.errors import WorkflowError, format_error_response
from betty.telemetry_capture import capture_skill_execution
from betty.models import WorkflowDefinition
from betty.skill_executor import execute_skill_in_process
from betty.provenance import compute_hash, get_provenance_logger
from utils.telemetry_utils import capture_telemetry
logger = setup_logger(__name__)
def log_audit_entry(
skill_name: str,
status: str,
duration_ms: Optional[int] = None,
errors: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""
Log an audit entry for skill execution.
Args:
skill_name: Name of the skill
status: Execution status (success, failed, etc.)
duration_ms: Execution duration in milliseconds
errors: List of errors (if any)
metadata: Additional metadata
"""
try:
args = [skill_name, status]
if duration_ms is not None:
args.append(str(duration_ms))
else:
args.append("")
if errors:
args.append(json.dumps(errors))
else:
args.append("[]")
if metadata:
args.append(json.dumps(metadata))
result = execute_skill_in_process("audit.log", args, timeout=10)
if result["returncode"] != 0:
logger.warning(f"Failed to log audit entry for {skill_name}: {result['stderr']}")
except Exception as e:
logger.warning(f"Failed to log audit entry for {skill_name}: {e}")
def build_response(ok: bool, path: str, errors: Optional[List[str]] = None, details: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
response: Dict[str, Any] = {
"ok": ok,
"status": "success" if ok else "failed",
"errors": errors or [],
"path": path,
}
if details is not None:
response["details"] = details
return response
def load_workflow(workflow_file: str) -> Dict[str, Any]:
"""
Load and parse a workflow YAML file.
Args:
workflow_file: Path to workflow YAML file
Returns:
Parsed workflow dictionary
Raises:
WorkflowError: If workflow cannot be loaded
"""
try:
with open(workflow_file) as f:
workflow = yaml.safe_load(f)
# Validate with Pydantic schema
try:
WorkflowDefinition.model_validate(workflow)
logger.info("Pydantic schema validation passed for workflow")
except PydanticValidationError as exc:
errors = []
for error in exc.errors():
field = ".".join(str(loc) for loc in error["loc"])
message = error["msg"]
errors.append(f"{field}: {message}")
raise WorkflowError(f"Workflow schema validation failed: {'; '.join(errors)}")
return workflow
except FileNotFoundError:
raise WorkflowError(f"Workflow file not found: {workflow_file}")
except yaml.YAMLError as e:
raise WorkflowError(f"Invalid YAML in workflow: {e}")
def run_skill(skill_name: str, args: List[str]) -> Dict[str, Any]:
"""
Run a skill handler in-process using dynamic imports.
Args:
skill_name: Name of the skill (e.g., "workflow.validate", "audit.log")
args: Arguments to pass to the skill
Returns:
Dictionary with stdout, stderr, and return code
Raises:
WorkflowError: If skill execution fails
"""
logger.info(f"▶ Running skill {skill_name} with args: {' '.join(args)}")
# Verify skill handler exists
skill_path = get_skill_handler_path(skill_name)
if not os.path.exists(skill_path):
raise WorkflowError(f"Skill handler not found: {skill_path}")
try:
# Execute skill in-process with 5 minute timeout
result = execute_skill_in_process(skill_name, args, timeout=300)
return result
except Exception as e:
raise WorkflowError(f"Failed to execute skill: {e}")
def save_workflow_history(log: Dict[str, Any]) -> None:
"""
Save workflow execution history with content hashing for provenance.
Args:
log: Workflow execution log
"""
try:
# Compute content hash for provenance tracking
content_hash = compute_hash(log)
log["content_hash"] = content_hash
# Log to provenance system
provenance = get_provenance_logger()
workflow_name = log.get("workflow", "unknown")
artifact_id = f"workflow.execution.{workflow_name}"
provenance.log_artifact(
artifact_id=artifact_id,
version=log.get("started_at", "unknown"),
content_hash=content_hash,
artifact_type="workflow-execution",
metadata={
"workflow": workflow_name,
"status": log.get("status", "unknown"),
"total_steps": len(log.get("steps", [])),
}
)
# Save to history file
def update_fn(history_data):
"""Update function for safe_update_json."""
if not isinstance(history_data, list):
history_data = []
history_data.append(log)
# Keep only last 100 workflow runs
return history_data[-100:]
safe_update_json(WORKFLOW_HISTORY_FILE, update_fn, default=[])
logger.info(f"Workflow history saved to {WORKFLOW_HISTORY_FILE} with hash {content_hash[:8]}...")
except Exception as e:
logger.warning(f"Failed to save workflow history: {e}")
def execute_workflow(workflow_file: str) -> Dict[str, Any]:
"""Read a workflow YAML and execute skills sequentially."""
validate_path(workflow_file, must_exist=True)
workflow = load_workflow(workflow_file)
fail_fast = workflow.get("fail_fast", True)
log: Dict[str, Any] = {
"workflow": os.path.basename(workflow_file),
"workflow_path": workflow_file,
"started_at": datetime.now(timezone.utc).isoformat(),
"fail_fast": fail_fast,
"steps": [],
"status": "running",
}
aggregated_errors: List[str] = []
# Validate workflow definition before executing steps
validation_result = run_skill("workflow.validate", [workflow_file])
validation_log: Dict[str, Any] = {
"step": "validation",
"skill": "workflow.validate",
"args": [workflow_file],
"returncode": validation_result["returncode"],
"stdout": validation_result["stdout"],
"stderr": validation_result["stderr"],
"parsed": validation_result.get("parsed"),
"parse_error": validation_result.get("parse_error"),
"status": "success" if validation_result["returncode"] == 0 else "failed",
"errors": [],
}
parsed_validation = validation_result.get("parsed")
if isinstance(parsed_validation, dict) and parsed_validation.get("errors"):
validation_log["errors"] = [str(err) for err in parsed_validation.get("errors", [])]
if validation_result.get("parse_error") and not validation_log["errors"]:
validation_log["errors"] = [validation_result["parse_error"]]
log["validation"] = validation_log
if validation_log["status"] != "success" or (
isinstance(parsed_validation, dict) and not parsed_validation.get("ok", True)
):
if validation_log["errors"]:
aggregated_errors.extend(validation_log["errors"])
else:
aggregated_errors.append(
f"workflow.validate failed with return code {validation_result['returncode']}"
)
log["status"] = "failed"
log["errors"] = aggregated_errors
log["completed_at"] = datetime.now(timezone.utc).isoformat()
save_workflow_history(log)
return log
steps = workflow.get("steps", [])
if not isinstance(steps, list) or not steps:
raise WorkflowError("Workflow has no steps defined")
logger.info(f"Executing workflow: {workflow_file}")
logger.info(f"Total steps: {len(steps)}")
failed_steps: List[Dict[str, Any]] = []
for i, step in enumerate(steps, 1):
# Support both 'skill' and 'agent' step types
is_agent_step = "agent" in step
is_skill_step = "skill" in step
step_log: Dict[str, Any] = {
"step_number": i,
"skill": step.get("skill") if is_skill_step else None,
"agent": step.get("agent") if is_agent_step else None,
"args": step.get("args", []),
"status": "pending",
}
# Validate step has either skill or agent field
if not is_skill_step and not is_agent_step:
error = f"Step {i} missing required 'skill' or 'agent' field"
logger.error(error)
step_log["status"] = "failed"
step_log["errors"] = [error]
aggregated_errors.append(error)
failed_steps.append({"step": i, "error": error})
log["steps"].append(step_log)
if fail_fast:
break
continue
if is_skill_step and is_agent_step:
error = f"Step {i} cannot have both 'skill' and 'agent' fields"
logger.error(error)
step_log["status"] = "failed"
step_log["errors"] = [error]
aggregated_errors.append(error)
failed_steps.append({"step": i, "error": error})
log["steps"].append(step_log)
if fail_fast:
break
continue
# Handle agent steps by delegating to run.agent skill
if is_agent_step:
agent_name = step["agent"]
input_text = step.get("input", "")
skill_name = "run.agent"
args = [agent_name]
if input_text:
args.append(input_text)
logger.info(f"\n=== Step {i}/{len(steps)}: Executing agent {agent_name} via run.agent ===")
else:
skill_name = step["skill"]
args = step.get("args", [])
logger.info(f"\n=== Step {i}/{len(steps)}: Executing {skill_name} ===")
try:
step_start_time = datetime.now(timezone.utc)
execution_result = run_skill(skill_name, args)
step_end_time = datetime.now(timezone.utc)
step_duration_ms = int((step_end_time - step_start_time).total_seconds() * 1000)
parsed_step = execution_result.get("parsed")
step_errors: List[str] = []
if isinstance(parsed_step, dict) and parsed_step.get("errors"):
step_errors = [str(err) for err in parsed_step.get("errors", [])]
elif execution_result["returncode"] != 0:
step_errors = [
f"Step {i} failed with return code {execution_result['returncode']}"
]
step_log.update(
{
"skill": skill_name,
"args": args,
"returncode": execution_result["returncode"],
"stdout": execution_result["stdout"],
"stderr": execution_result["stderr"],
"parsed": parsed_step,
"parse_error": execution_result.get("parse_error"),
"status": "success" if execution_result["returncode"] == 0 else "failed",
"errors": step_errors,
"duration_ms": step_duration_ms,
}
)
log["steps"].append(step_log)
# Log audit entry for this step
log_audit_entry(
skill_name=skill_name,
status="success" if execution_result["returncode"] == 0 else "failed",
duration_ms=step_duration_ms,
errors=step_errors if step_errors else None,
metadata={
"workflow": os.path.basename(workflow_file),
"step_number": i,
"total_steps": len(steps),
}
)
# Capture telemetry for this step
capture_skill_execution(
skill_name=skill_name,
inputs={"args": args},
status="success" if execution_result["returncode"] == 0 else "failed",
duration_ms=step_duration_ms,
workflow=os.path.basename(workflow_file),
caller="workflow.compose",
error=step_errors[0] if step_errors else None,
step_number=i,
total_steps=len(steps),
)
if execution_result["returncode"] != 0:
failed_steps.append({
"step": i,
"skill": skill_name,
"returncode": execution_result["returncode"],
"errors": step_errors,
})
aggregated_errors.extend(step_errors or [
f"Step {i} ({skill_name}) failed with return code {execution_result['returncode']}"
])
logger.error(
f"❌ Step {i} failed with return code {execution_result['returncode']}"
)
if fail_fast:
logger.error("Stopping workflow due to failure (fail_fast=true)")
break
else:
logger.info(f"✅ Step {i} completed successfully")
except WorkflowError as e:
error_msg = str(e)
logger.error(f"❌ Step {i} failed: {error_msg}")
step_log.update(
{
"returncode": None,
"stdout": "",
"stderr": "",
"parsed": None,
"parse_error": None,
"status": "failed",
"errors": [error_msg],
}
)
aggregated_errors.append(error_msg)
failed_steps.append({"step": i, "skill": skill_name, "error": error_msg})
log["steps"].append(step_log)
# Log audit entry for failed step
log_audit_entry(
skill_name=skill_name,
status="failed",
errors=[error_msg],
metadata={
"workflow": os.path.basename(workflow_file),
"step_number": i,
"total_steps": len(steps),
"error_type": "WorkflowError",
}
)
# Capture telemetry for failed step
capture_skill_execution(
skill_name=skill_name,
inputs={"args": args},
status="failed",
duration_ms=0, # No duration available for exception cases
workflow=os.path.basename(workflow_file),
caller="workflow.compose",
error=error_msg,
step_number=i,
total_steps=len(steps),
error_type="WorkflowError",
)
if fail_fast:
break
if failed_steps:
log["status"] = "failed"
log["failed_steps"] = failed_steps
else:
log["status"] = "success"
log["errors"] = aggregated_errors
log["completed_at"] = datetime.now(timezone.utc).isoformat()
save_workflow_history(log)
# Calculate total workflow duration
workflow_duration_ms = None
if "started_at" in log and "completed_at" in log:
start = datetime.fromisoformat(log["started_at"])
end = datetime.fromisoformat(log["completed_at"])
workflow_duration_ms = int((end - start).total_seconds() * 1000)
# Log audit entry for overall workflow
log_audit_entry(
skill_name="workflow.compose",
status=log["status"],
duration_ms=workflow_duration_ms,
errors=aggregated_errors if aggregated_errors else None,
metadata={
"workflow": os.path.basename(workflow_file),
"total_steps": len(steps),
"failed_steps": len(failed_steps),
}
)
# Capture telemetry for overall workflow
capture_skill_execution(
skill_name="workflow.compose",
inputs={"workflow_file": workflow_file},
status=log["status"],
duration_ms=workflow_duration_ms or 0,
caller="cli",
error=aggregated_errors[0] if aggregated_errors else None,
workflow=os.path.basename(workflow_file),
total_steps=len(steps),
failed_steps=len(failed_steps),
completed_steps=len(steps) - len(failed_steps),
)
if log["status"] == "success":
logger.info("\n✅ Workflow completed successfully")
else:
logger.error(f"\n❌ Workflow completed with {len(failed_steps)} failed step(s)")
return log
@capture_telemetry(skill_name="workflow.compose", caller="cli")
def main():
"""Main CLI entry point."""
if len(sys.argv) < 2:
message = "Usage: workflow_compose.py <workflow.yaml>"
response = build_response(
False,
path="",
errors=[message],
details={"error": {"error": "UsageError", "message": message, "details": {}}},
)
print(json.dumps(response, indent=2))
sys.exit(1)
workflow_file = sys.argv[1]
try:
details = execute_workflow(workflow_file)
response = build_response(
details.get("status") == "success",
path=WORKFLOW_HISTORY_FILE,
errors=details.get("errors", []),
details=details,
)
print(json.dumps(response, indent=2))
sys.exit(0 if response["ok"] else 1)
except WorkflowError as e:
logger.error(str(e))
error_info = format_error_response(e)
response = build_response(
False,
path=workflow_file,
errors=[error_info.get("message", str(e))],
details={"error": error_info},
)
print(json.dumps(response, indent=2))
sys.exit(1)
except Exception as e:
logger.error(f"Unexpected error: {e}")
error_info = format_error_response(e, include_traceback=True)
response = build_response(
False,
path=workflow_file,
errors=[error_info.get("message", str(e))],
details={"error": error_info},
)
print(json.dumps(response, indent=2))
sys.exit(1)
if __name__ == "__main__":
main()