Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:26:08 +08:00
commit 8f22ddf339
295 changed files with 59710 additions and 0 deletions

View File

@@ -0,0 +1,378 @@
# artifact.validate.types
## Overview
Validates artifact type names against the Betty Framework registry and returns complete metadata for each type. Provides intelligent fuzzy matching and suggestions for invalid types.
**Version**: 0.1.0
**Status**: active
## Purpose
This skill is critical for ensuring skills reference valid artifact types before creation. It validates artifact type names against `registry/artifact_types.json`, retrieves complete metadata (file_pattern, content_type, schema), and suggests alternatives for invalid types using three fuzzy matching strategies:
1. **Singular/Plural Detection** (high confidence) - Detects "data-flow-diagram" vs "data-flow-diagrams"
2. **Generic vs Specific Variants** (medium confidence) - Suggests "logical-data-model" for "data-model"
3. **Levenshtein Distance** (low confidence) - Catches typos like "thret-model" → "threat-model"
This skill is specifically designed to be called by `meta.skill` during Step 2 (Validate Artifact Types) of the skill creation workflow.
## Inputs
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `artifact_types` | array | Yes | - | List of artifact type names to validate |
| `check_schemas` | boolean | No | `true` | Whether to verify schema files exist on filesystem |
| `suggest_alternatives` | boolean | No | `true` | Whether to suggest similar types for invalid ones |
| `max_suggestions` | number | No | `3` | Maximum number of suggestions per invalid type |
## Outputs
| Output | Type | Description |
|--------|------|-------------|
| `validation_results` | object | Validation results for each artifact type with complete metadata |
| `all_valid` | boolean | Whether all artifact types are valid |
| `invalid_types` | array | List of artifact types that don't exist in registry |
| `suggestions` | object | Suggested alternatives for each invalid type |
| `warnings` | array | List of warnings (e.g., schema file missing) |
## Artifact Metadata
### Produces
- **validation-report** (`*.validation.json`) - Validation results with metadata and suggestions
### Consumes
None - reads directly from registry files
## Usage
### Example 1: Validate Single Artifact Type
```bash
python artifact_validate_types.py \
--artifact_types '["threat-model"]' \
--check_schemas true
```
**Output:**
```json
{
"validation_results": {
"threat-model": {
"valid": true,
"file_pattern": "*.threat-model.yaml",
"content_type": "application/yaml",
"schema": "schemas/artifacts/threat-model-schema.json",
"description": "Threat model (STRIDE, attack trees)..."
}
},
"all_valid": true,
"invalid_types": [],
"suggestions": {},
"warnings": []
}
```
### Example 2: Invalid Type with Suggestions
```bash
python artifact_validate_types.py \
--artifact_types '["data-flow-diagram", "threat-model"]' \
--suggest_alternatives true
```
**Output:**
```json
{
"validation_results": {
"data-flow-diagram": {
"valid": false
},
"threat-model": {
"valid": true,
"file_pattern": "*.threat-model.yaml",
"content_type": "application/yaml",
"schema": "schemas/artifacts/threat-model-schema.json"
}
},
"all_valid": false,
"invalid_types": ["data-flow-diagram"],
"suggestions": {
"data-flow-diagram": [
{
"type": "data-flow-diagrams",
"reason": "Plural form",
"confidence": "high"
},
{
"type": "dataflow-diagram",
"reason": "Similar spelling",
"confidence": "low"
}
]
},
"warnings": [],
"ok": false,
"status": "validation_failed"
}
```
### Example 3: Multiple Invalid Types with Generic → Specific Suggestions
```bash
python artifact_validate_types.py \
--artifact_types '["data-model", "api-spec", "test-result"]' \
--max_suggestions 3
```
**Output:**
```json
{
"all_valid": false,
"invalid_types": ["data-model", "api-spec"],
"suggestions": {
"data-model": [
{
"type": "logical-data-model",
"reason": "Specific variant of model",
"confidence": "medium"
},
{
"type": "physical-data-model",
"reason": "Specific variant of model",
"confidence": "medium"
},
{
"type": "enterprise-data-model",
"reason": "Specific variant of model",
"confidence": "medium"
}
],
"api-spec": [
{
"type": "openapi-spec",
"reason": "Specific variant of spec",
"confidence": "medium"
},
{
"type": "asyncapi-spec",
"reason": "Specific variant of spec",
"confidence": "medium"
}
]
},
"validation_results": {
"test-result": {
"valid": true,
"file_pattern": "*.test-result.json",
"content_type": "application/json"
}
}
}
```
### Example 4: Save Validation Report to File
```bash
python artifact_validate_types.py \
--artifact_types '["threat-model", "architecture-overview"]' \
--output validation-results.validation.json
```
Creates `validation-results.validation.json` with complete validation report.
## Integration with meta.skill
The `meta.skill` agent calls this skill in Step 2 of its workflow:
```yaml
# meta.skill workflow Step 2
2. **Validate Artifact Types**
- Extract artifact types from skill description
- Call artifact.validate.types with all types
- If all_valid == false:
→ Display suggestions to user
→ Ask user to confirm correct types
→ HALT until types are validated
- Store validated metadata for use in skill.yaml generation
```
**Example Integration:**
```python
# meta.skill calls artifact.validate.types
result = subprocess.run([
'python', 'skills/artifact.validate.types/artifact_validate_types.py',
'--artifact_types', json.dumps(["threat-model", "data-flow-diagrams"]),
'--suggest_alternatives', 'true'
], capture_output=True, text=True)
validation = json.loads(result.stdout)
if not validation['all_valid']:
print(f"❌ Invalid artifact types: {validation['invalid_types']}")
for invalid_type, suggestions in validation['suggestions'].items():
print(f"\n Suggestions for '{invalid_type}':")
for s in suggestions:
print(f" - {s['type']} ({s['confidence']} confidence): {s['reason']}")
# HALT skill creation
else:
print("✅ All artifact types validated")
# Continue with skill.yaml generation using validated metadata
```
## Fuzzy Matching Strategies
### Strategy 1: Singular/Plural Detection (High Confidence)
Detects when a user forgets the "s":
| Invalid Type | Suggested Type | Reason |
|-------------|----------------|--------|
| `data-flow-diagram` | `data-flow-diagrams` | Plural form |
| `threat-models` | `threat-model` | Singular form |
### Strategy 2: Generic vs Specific Variants (Medium Confidence)
Suggests specific variants when a generic term is used:
| Invalid Type | Suggested Types |
|-------------|-----------------|
| `data-model` | `logical-data-model`, `physical-data-model`, `enterprise-data-model` |
| `api-spec` | `openapi-spec`, `asyncapi-spec`, `graphql-spec` |
| `architecture-diagram` | `system-architecture-diagram`, `component-architecture-diagram` |
### Strategy 3: Levenshtein Distance (Low Confidence)
Catches typos and misspellings (60%+ similarity):
| Invalid Type | Suggested Type | Similarity |
|-------------|----------------|------------|
| `thret-model` | `threat-model` | ~90% |
| `architecure-overview` | `architecture-overview` | ~85% |
| `api-specfication` | `api-specification` | ~92% |
## Error Handling
### Missing Registry File
```json
{
"ok": false,
"status": "error",
"error": "Artifact registry not found: registry/artifact_types.json"
}
```
**Resolution**: Ensure you're running from the Betty Framework root directory.
### Invalid JSON in artifact_types Parameter
```json
{
"ok": false,
"status": "error",
"error": "Invalid JSON: Expecting ',' delimiter: line 1 column 15 (char 14)"
}
```
**Resolution**: Ensure artifact_types is a valid JSON array with proper quoting.
### Corrupted Registry File
```json
{
"ok": false,
"status": "error",
"error": "Invalid JSON in registry file: ..."
}
```
**Resolution**: Validate and fix `registry/artifact_types.json` syntax.
## Performance
- **Single type validation**: <100ms
- **20 types validation**: <1 second
- **All 409 types validation**: <5 seconds
Memory usage is minimal as registry is loaded once and indexed by name for O(1) lookups.
## Dependencies
- **Python 3.7+**
- **PyYAML** - For reading registry
- **difflib** - For fuzzy matching (Python stdlib)
- **jsonschema** - For validation (optional)
## Testing
Run the test suite:
```bash
cd skills/artifact.validate.types
python test_artifact_validate_types.py
```
**Test Coverage:**
- ✅ Valid artifact type validation
- ✅ Invalid artifact type detection
- ✅ Singular/plural suggestion
- ✅ Generic → specific suggestion
- ✅ Typo detection with Levenshtein distance
- ✅ Max suggestions limit
- ✅ Schema file existence checking
- ✅ Empty input handling
- ✅ Mixed valid/invalid types
## Quality Standards
- **Accuracy**: 100% for exact matches in registry
- **Suggestion Quality**: >80% relevant for common mistakes
- **Performance**: <1s for 20 types, <100ms for single type
- **Schema Verification**: 100% accurate file existence check
- **Error Handling**: Graceful handling of corrupted registry files
## Success Criteria
- ✅ Validates all 409 artifact types correctly
- ✅ Provides accurate suggestions for common mistakes (singular/plural)
- ✅ Returns exact metadata from registry (file_pattern, content_type, schema)
- ✅ Detects missing schema files and warns appropriately
- ✅ Completes validation in <1 second for up to 20 types
- ✅ Fuzzy matching handles typos within 40% character difference
## Troubleshooting
### Skill returns all_valid=false but I think types are correct
1. Check the exact spelling in `registry/artifact_types.json`
2. Look at suggestions - they often reveal plural/singular issues
3. Use `jq` to search registry:
```bash
jq '.artifact_types[] | select(.name | contains("your-search"))' registry/artifact_types.json
```
### Fuzzy matching isn't suggesting the type I expect
1. Check if the type name follows patterns (ending in common suffix like "-model", "-spec")
2. Increase `max_suggestions` to see more options
3. The type might be too dissimilar (< 60% match threshold)
### Schema warnings appearing for valid types
This is normal if schema files haven't been created yet. Schema files are optional for many artifact types. Set `check_schemas=false` to suppress these warnings.
## Related Skills
- **artifact.define** - Define new artifact types
- **artifact.create** - Create artifact files
- **skill.define** - Validate skill manifests
- **registry.update** - Update skill registry
## References
- [Python difflib](https://docs.python.org/3/library/difflib.html) - Fuzzy string matching
- [Betty Artifact Registry](../../registry/artifact_types.json) - Source of truth for artifact types
- [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) - String similarity algorithm
- [meta.skill Agent](../../agents/meta.skill/agent.yaml) - Primary consumer of this skill