# artifact.validate.types

## Overview

Validates artifact type names against the Betty Framework registry and returns complete metadata for each type. Provides intelligent fuzzy matching and suggestions for invalid types.

**Version**: 0.1.0
**Status**: active

## Purpose

This skill is critical for ensuring skills reference valid artifact types before creation. It validates artifact type names against `registry/artifact_types.json`, retrieves complete metadata (file_pattern, content_type, schema), and suggests alternatives for invalid types using three fuzzy matching strategies:

1. **Singular/Plural Detection** (high confidence) - Detects "data-flow-diagram" vs "data-flow-diagrams"
2. **Generic vs Specific Variants** (medium confidence) - Suggests "logical-data-model" for "data-model"
3. **Levenshtein Distance** (low confidence) - Catches typos like "thret-model" → "threat-model"

This skill is specifically designed to be called by `meta.skill` during Step 2 (Validate Artifact Types) of the skill creation workflow.

## Inputs

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `artifact_types` | array | Yes | - | List of artifact type names to validate |
| `check_schemas` | boolean | No | `true` | Whether to verify schema files exist on filesystem |
| `suggest_alternatives` | boolean | No | `true` | Whether to suggest similar types for invalid ones |
| `max_suggestions` | number | No | `3` | Maximum number of suggestions per invalid type |

## Outputs

| Output | Type | Description |
|--------|------|-------------|
| `validation_results` | object | Validation results for each artifact type with complete metadata |
| `all_valid` | boolean | Whether all artifact types are valid |
| `invalid_types` | array | List of artifact types that don't exist in registry |
| `suggestions` | object | Suggested alternatives for each invalid type |
| `warnings` | array | List of warnings (e.g., schema file missing) |

## Artifact Metadata

### Produces
- **validation-report** (`*.validation.json`) - Validation results with metadata and suggestions

### Consumes
None - reads directly from registry files

## Usage

### Example 1: Validate Single Artifact Type

```bash
python artifact_validate_types.py \
  --artifact_types '["threat-model"]' \
  --check_schemas true
```

**Output:**
```json
{
  "validation_results": {
    "threat-model": {
      "valid": true,
      "file_pattern": "*.threat-model.yaml",
      "content_type": "application/yaml",
      "schema": "schemas/artifacts/threat-model-schema.json",
      "description": "Threat model (STRIDE, attack trees)..."
    }
  },
  "all_valid": true,
  "invalid_types": [],
  "suggestions": {},
  "warnings": []
}
```

### Example 2: Invalid Type with Suggestions

```bash
python artifact_validate_types.py \
  --artifact_types '["data-flow-diagram", "threat-model"]' \
  --suggest_alternatives true
```

**Output:**
```json
{
  "validation_results": {
    "data-flow-diagram": {
      "valid": false
    },
    "threat-model": {
      "valid": true,
      "file_pattern": "*.threat-model.yaml",
      "content_type": "application/yaml",
      "schema": "schemas/artifacts/threat-model-schema.json"
    }
  },
  "all_valid": false,
  "invalid_types": ["data-flow-diagram"],
  "suggestions": {
    "data-flow-diagram": [
      {
        "type": "data-flow-diagrams",
        "reason": "Plural form",
        "confidence": "high"
      },
      {
        "type": "dataflow-diagram",
        "reason": "Similar spelling",
        "confidence": "low"
      }
    ]
  },
  "warnings": [],
  "ok": false,
  "status": "validation_failed"
}
```

### Example 3: Multiple Invalid Types with Generic → Specific Suggestions

```bash
python artifact_validate_types.py \
  --artifact_types '["data-model", "api-spec", "test-result"]' \
  --max_suggestions 3
```

**Output:**
```json
{
  "all_valid": false,
  "invalid_types": ["data-model", "api-spec"],
  "suggestions": {
    "data-model": [
      {
        "type": "logical-data-model",
        "reason": "Specific variant of model",
        "confidence": "medium"
      },
      {
        "type": "physical-data-model",
        "reason": "Specific variant of model",
        "confidence": "medium"
      },
      {
        "type": "enterprise-data-model",
        "reason": "Specific variant of model",
        "confidence": "medium"
      }
    ],
    "api-spec": [
      {
        "type": "openapi-spec",
        "reason": "Specific variant of spec",
        "confidence": "medium"
      },
      {
        "type": "asyncapi-spec",
        "reason": "Specific variant of spec",
        "confidence": "medium"
      }
    ]
  },
  "validation_results": {
    "test-result": {
      "valid": true,
      "file_pattern": "*.test-result.json",
      "content_type": "application/json"
    }
  }
}
```

### Example 4: Save Validation Report to File

```bash
python artifact_validate_types.py \
  --artifact_types '["threat-model", "architecture-overview"]' \
  --output validation-results.validation.json
```

Creates `validation-results.validation.json` with complete validation report.

## Integration with meta.skill

The `meta.skill` agent calls this skill in Step 2 of its workflow:

```yaml
# meta.skill workflow Step 2
2. **Validate Artifact Types**
   - Extract artifact types from skill description
   - Call artifact.validate.types with all types
   - If all_valid == false:
     → Display suggestions to user
     → Ask user to confirm correct types
     → HALT until types are validated
   - Store validated metadata for use in skill.yaml generation
```

**Example Integration:**

```python
# meta.skill calls artifact.validate.types
result = subprocess.run([
    'python', 'skills/artifact.validate.types/artifact_validate_types.py',
    '--artifact_types', json.dumps(["threat-model", "data-flow-diagrams"]),
    '--suggest_alternatives', 'true'
], capture_output=True, text=True)

validation = json.loads(result.stdout)

if not validation['all_valid']:
    print(f"❌ Invalid artifact types: {validation['invalid_types']}")
    for invalid_type, suggestions in validation['suggestions'].items():
        print(f"\n  Suggestions for '{invalid_type}':")
        for s in suggestions:
            print(f"    - {s['type']} ({s['confidence']} confidence): {s['reason']}")
    # HALT skill creation
else:
    print("✅ All artifact types validated")
    # Continue with skill.yaml generation using validated metadata
```

## Fuzzy Matching Strategies

### Strategy 1: Singular/Plural Detection (High Confidence)

Detects when a user forgets the "s":

| Invalid Type | Suggested Type | Reason |
|-------------|----------------|--------|
| `data-flow-diagram` | `data-flow-diagrams` | Plural form |
| `threat-models` | `threat-model` | Singular form |

### Strategy 2: Generic vs Specific Variants (Medium Confidence)

Suggests specific variants when a generic term is used:

| Invalid Type | Suggested Types |
|-------------|-----------------|
| `data-model` | `logical-data-model`, `physical-data-model`, `enterprise-data-model` |
| `api-spec` | `openapi-spec`, `asyncapi-spec`, `graphql-spec` |
| `architecture-diagram` | `system-architecture-diagram`, `component-architecture-diagram` |

### Strategy 3: Levenshtein Distance (Low Confidence)

Catches typos and misspellings (60%+ similarity):

| Invalid Type | Suggested Type | Similarity |
|-------------|----------------|------------|
| `thret-model` | `threat-model` | ~90% |
| `architecure-overview` | `architecture-overview` | ~85% |
| `api-specfication` | `api-specification` | ~92% |

## Error Handling

### Missing Registry File

```json
{
  "ok": false,
  "status": "error",
  "error": "Artifact registry not found: registry/artifact_types.json"
}
```

**Resolution**: Ensure you're running from the Betty Framework root directory.

### Invalid JSON in artifact_types Parameter

```json
{
  "ok": false,
  "status": "error",
  "error": "Invalid JSON: Expecting ',' delimiter: line 1 column 15 (char 14)"
}
```

**Resolution**: Ensure artifact_types is a valid JSON array with proper quoting.

### Corrupted Registry File

```json
{
  "ok": false,
  "status": "error",
  "error": "Invalid JSON in registry file: ..."
}
```

**Resolution**: Validate and fix `registry/artifact_types.json` syntax.

## Performance

- **Single type validation**: <100ms
- **20 types validation**: <1 second
- **All 409 types validation**: <5 seconds

Memory usage is minimal as registry is loaded once and indexed by name for O(1) lookups.

## Dependencies

- **Python 3.7+**
- **PyYAML** - For reading registry
- **difflib** - For fuzzy matching (Python stdlib)
- **jsonschema** - For validation (optional)

## Testing

Run the test suite:

```bash
cd skills/artifact.validate.types
python test_artifact_validate_types.py
```

**Test Coverage:**
- ✅ Valid artifact type validation
- ✅ Invalid artifact type detection
- ✅ Singular/plural suggestion
- ✅ Generic → specific suggestion
- ✅ Typo detection with Levenshtein distance
- ✅ Max suggestions limit
- ✅ Schema file existence checking
- ✅ Empty input handling
- ✅ Mixed valid/invalid types

## Quality Standards

- **Accuracy**: 100% for exact matches in registry
- **Suggestion Quality**: >80% relevant for common mistakes
- **Performance**: <1s for 20 types, <100ms for single type
- **Schema Verification**: 100% accurate file existence check
- **Error Handling**: Graceful handling of corrupted registry files

## Success Criteria

- ✅ Validates all 409 artifact types correctly
- ✅ Provides accurate suggestions for common mistakes (singular/plural)
- ✅ Returns exact metadata from registry (file_pattern, content_type, schema)
- ✅ Detects missing schema files and warns appropriately
- ✅ Completes validation in <1 second for up to 20 types
- ✅ Fuzzy matching handles typos within 40% character difference

## Troubleshooting

### Skill returns all_valid=false but I think types are correct

1. Check the exact spelling in `registry/artifact_types.json`
2. Look at suggestions - they often reveal plural/singular issues
3. Use `jq` to search registry:
   ```bash
   jq '.artifact_types[] | select(.name | contains("your-search"))' registry/artifact_types.json
   ```

### Fuzzy matching isn't suggesting the type I expect

1. Check if the type name follows patterns (ending in common suffix like "-model", "-spec")
2. Increase `max_suggestions` to see more options
3. The type might be too dissimilar (< 60% match threshold)

### Schema warnings appearing for valid types

This is normal if schema files haven't been created yet. Schema files are optional for many artifact types. Set `check_schemas=false` to suppress these warnings.

## Related Skills

- **artifact.define** - Define new artifact types
- **artifact.create** - Create artifact files
- **skill.define** - Validate skill manifests
- **registry.update** - Update skill registry

## References

- [Python difflib](https://docs.python.org/3/library/difflib.html) - Fuzzy string matching
- [Betty Artifact Registry](../../registry/artifact_types.json) - Source of truth for artifact types
- [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) - String similarity algorithm
- [meta.skill Agent](../../agents/meta.skill/agent.yaml) - Primary consumer of this skill