Files
gh-jpicklyk-task-orchestrat…/skills/orchestration-qa/task-content-quality.md
2025-11-30 08:29:28 +08:00

21 KiB

Task Content Quality Analysis

Purpose: Analyze information added to tasks by specialists to detect wasteful content, measure information density, and suggest improvements.

When: After Implementation Specialists complete tasks (Backend, Frontend, Database, Test, Technical Writer)

Applies To: Implementation Specialist Subagents only

Token Cost: ~500-700 tokens

Overview

Implementation specialists add content to tasks through:

  1. Summary field (300-500 chars) - What was accomplished
  2. Task sections - Detailed results, approach, decisions
  3. Files Changed section (ordinal 999) - List of modified files

This analysis ensures specialists add high-density, non-redundant information while avoiding token waste.

Quality Metrics

1. Information Density

Definition: Ratio of useful information to total tokens added

Formula: density = (unique_concepts + actionable_details) / total_tokens

Target: ≥ 70% (7 concepts per 10 tokens)

Good Example (High Density):

Summary (87 tokens):
"Implemented OAuth2 authentication with JWT tokens. Added UserService with
login/logout endpoints. All 12 tests passing. Files: AuthController.kt,
UserService.kt, SecurityConfig.kt, AuthControllerTest.kt"

Density: 85% (7 concepts: OAuth2, JWT, UserService, login, logout, tests passing, files)

Bad Example (Low Density):

Summary (143 tokens):
"I have successfully completed the implementation of the authentication feature
as requested. The work involved creating the necessary components and ensuring
everything works correctly. Testing was performed and all tests are now passing
successfully."

Density: 35% (3 concepts: authentication, components created, tests passing)
Waste: 60 tokens of filler words

2. Redundancy Score

Definition: Percentage of information duplicated across summary + sections

Formula: redundancy = duplicate_tokens / (summary_tokens + section_tokens)

Target: ≤ 20% (minimal overlap between summary and sections)

Detection:

// Extract key phrases from summary
summaryPhrases = extractPhrases(task.summary)
// e.g., ["OAuth2 authentication", "JWT tokens", "UserService", "12 tests passing"]

// Check sections for duplicate phrases
sectionContent = task.sections.map(s => s.content).join(" ")
duplicates = summaryPhrases.filter(phrase => sectionContent.includes(phrase))

redundancy = (duplicates.length / summaryPhrases.length) * 100

High Redundancy Example (Bad):

Summary:
"Implemented OAuth2 authentication with JWT tokens. Added UserService."

Technical Approach Section:
"For this task, I implemented OAuth2 authentication using JWT tokens.
I created a UserService to handle authentication logic..."

Redundancy: 70% (both mention OAuth2, JWT, UserService)

Low Redundancy Example (Good):

Summary:
"Implemented OAuth2 authentication. 12 tests passing."

Technical Approach Section:
"Used Spring Security OAuth2 library. Token validation in JwtFilter.
Refresh token rotation every 24h. Rate limiting: 5 attempts/min."

Redundancy: 15% (summary is high-level, section adds technical details)

3. Code Snippet Ratio

Definition: Percentage of section content that is code vs explanation

Formula: code_ratio = code_block_tokens / section_tokens

Target: ≤ 30% (sections explain, files contain code)

Detection:

// Count tokens in code blocks
codeBlocks = extractCodeBlocks(section.content)  // ```language ... ```
codeTokens = sum(codeBlocks.map(b => estimateTokens(b)))

// Total section tokens
sectionTokens = estimateTokens(section.content)

ratio = (codeTokens / sectionTokens) * 100

Bad Example (High Code Ratio):

## Implementation Details

Here's the UserService implementation:

```kotlin
@Service
class UserService(
    private val userRepository: UserRepository,
    private val passwordEncoder: PasswordEncoder
) {
    fun login(email: String, password: String): User? {
        val user = userRepository.findByEmail(email)
        return if (user != null && passwordEncoder.matches(password, user.password)) {
            user
        } else null
    }
    // ... 50 more lines
}

And here's the test:

@Test
fun `login with valid credentials returns user`() {
    // ... 30 lines of test code
}

Code Ratio: 85% (300 code tokens / 350 total tokens) Issue: Full code belongs in files, not task sections


**Good Example** (Low Code Ratio):
```markdown
## Implementation Details

Created UserService with login/logout methods. Key decisions:
- Password hashing: BCrypt (cost factor 12)
- Session management: JWT with 1h expiration
- Rate limiting: 5 failed attempts → 15min lockout

Example usage:
```kotlin
userService.login(email, password) // Returns User or null

Code Ratio: 12% (20 code tokens / 165 total tokens) Quality: Explains approach, minimal code snippet for clarity


### 4. Summary Quality
**Definition**: Summary is concise, informative, and follows best practices

**Checks**:
- ✅ Length: 300-500 characters (enforced by Status Progression Skill)
- ✅ Mentions what was done (not how or why - that's in sections)
- ✅ Includes test status
- ✅ Lists key files changed
- ✅ No filler words ("I have...", "successfully...", "as requested...")

**Scoring**:
```javascript
quality = {
  length: inRange(summary.length, 300, 500) ? 25 : 0,
  mentions_what: containsActionVerbs(summary) ? 25 : 0,  // "Implemented", "Added", "Fixed"
  test_status: mentionsTests(summary) ? 25 : 0,          // "12 tests passing"
  no_filler: !containsFiller(summary) ? 25 : 0           // No "successfully", "I have"
}

score = sum(quality.values)  // 0-100

Example Scores:

90/100 (Excellent):

"Implemented OAuth2 authentication with JWT tokens. Added UserService for
user management. All 12 tests passing. Files: AuthController.kt, UserService.kt,
SecurityConfig.kt"

✓ Length: 387 chars
✓ Mentions what: "Implemented", "Added"
✓ Test status: "12 tests passing"
✓ No filler: Clean, direct

50/100 (Poor):

"I have successfully completed the authentication feature as requested. The
implementation involved creating the necessary components and ensuring that
everything works correctly. All tests are passing."

✓ Length: 349 chars
✗ Mentions what: Vague "components"
✓ Test status: "tests are passing"
✗ No filler: "successfully", "as requested", "I have"

5. Section Usefulness

Definition: Sections add value beyond what's in summary and files

Checks per section:

  • Explains decisions/trade-offs
  • Documents non-obvious approach
  • Provides context for future developers
  • References files instead of duplicating code
  • Concise (bullet points > paragraphs)

Scoring:

usefulness = {
  explains_why: containsRationale(section) ? 20 : 0,     // "Chose X because..."
  approach: describesApproach(section) ? 20 : 0,         // "Used pattern Y"
  future_context: providesContext(section) ? 20 : 0,     // "Note: Z limitation"
  references_files: hasFileReferences(section) ? 20 : 0, // "See AuthController.kt:45"
  concise: isConcise(section) ? 20 : 0                   // Bullet points, not prose
}

score = sum(usefulness.values)  // 0-100

Wasteful Patterns to Detect

Pattern 1: Full Code in Sections

Issue: Code belongs in files, not task documentation

Detection:

if (section.codeBlockCount > 2 || section.codeRatio > 30) {
  return {
    pattern: "Full code in sections",
    severity: "WARN",
    found: `${section.codeBlockCount} code blocks, ${section.codeRatio}% of content`,
    expected: "≤ 2 brief code snippets, ≤ 30% code ratio",
    recommendation: "Move code to files, reference with: 'See FileName.kt:lineNumber'",
    savings: estimateSavings(section)  // e.g., "~500 tokens"
  }
}

Pattern 2: Full Test Output

Issue: Test results should be summarized, not pasted verbatim

Detection:

if (section.title.includes("Test") && section.content.includes("PASSED") && section.content.length > 500) {
  return {
    pattern: "Full test output in section",
    severity: "WARN",
    found: `${section.content.length} chars of test output`,
    expected: "Test summary: X/Y passed, failure details if any",
    recommendation: "Summarize: '12/12 tests passing' or '11/12 passing (1 flaky test)'",
    savings: `~${section.content.length * 0.75} tokens`
  }
}

Pattern 3: Summary Redundancy

Issue: Summary repeats information already in sections

Detection:

overlap = calculateOverlap(task.summary, task.sections)

if (overlap > 40) {
  return {
    pattern: "High summary-section redundancy",
    severity: "INFO",
    found: `${overlap}% overlap between summary and sections`,
    expected: "≤ 20% overlap (summary = high-level, sections = details)",
    recommendation: "Make summary more concise, or add new details to sections",
    savings: `~${estimateRedundantTokens(task)} tokens`
  }
}

Pattern 4: Filler Language

Issue: Verbose, unnecessary words that don't add information

Detection:

fillerPhrases = [
  "I have successfully",
  "as requested",
  "in order to",
  "it should be noted that",
  "for the purpose of",
  "with regards to",
  "in conclusion"
]

found = fillerPhrases.filter(phrase => task.summary.includes(phrase))

if (found.length > 0) {
  return {
    pattern: "Filler language in summary",
    severity: "INFO",
    found: found.join(", "),
    expected: "Direct, concise language",
    recommendation: "Remove filler: 'Implemented X' not 'I have successfully implemented X as requested'",
    savings: `~${found.length * 3} tokens`
  }
}

Pattern 5: Over-Explaining Obvious

Issue: Explaining what's clear from file/function names

Detection:

if (section.title == "Implementation" && containsObvious(section.content)) {
  return {
    pattern: "Over-explaining obvious implementation",
    severity: "INFO",
    example: "Explaining 'UserService manages users' when class is named UserService",
    recommendation: "Focus on non-obvious: design decisions, trade-offs, gotchas",
    savings: "~100-200 tokens"
  }
}

Pattern 6: Uncustomized Template Sections

Issue: Generic template sections with placeholder text that provide zero value

Detection:

placeholderPatterns = [
  /\[Component\s*\d*\]/i,
  /\[Library\s*Name\]/i,
  /\[Phase\s*Name\]/i,
  /\[Library\]/i,
  /\[Version\]/i,
  /\[What it does\]/i,
  /\[Why chosen\]/i,
  /\[Goal\]:/i,
  /\[Deliverables\]:/i
]

for (section in task.sections) {
  // Check for placeholder patterns
  hasPlaceholder = placeholderPatterns.some(pattern => pattern.test(section.content))

  // Check for generic template titles with minimal content
  genericTitles = ["Architecture Overview", "Key Dependencies", "Implementation Strategy"]
  isGenericTitle = genericTitles.includes(section.title)
  hasMinimalCustomization = section.content.length < 300 || section.content.includes('[')

  if (hasPlaceholder || (isGenericTitle && hasMinimalCustomization)) {
    return {
      pattern: "Uncustomized template section",
      severity: "WARN",  // High priority - significant token waste
      found: `Section "${section.title}" contains placeholder text or generic template`,
      expected: "Task-specific content ≥200 chars, OR delete section entirely",
      recommendation: "DELETE section using manage_sections(operation='delete', id='${section.id}') - Templates provide sufficient structure",
      savings: `~${estimateTokens(section.content)} tokens`,
      sectionId: section.id,
      action: "DELETE"  // Explicit action to take
    }
  }
}

Common Placeholder Patterns:

  • [Component 1], [Component 2] - Generic component names
  • [Library Name], [Version] - Dependency table placeholders
  • [Phase Name], [Goal]:, [Deliverables]: - Implementation strategy placeholders
  • [What it does], [Why chosen] - Generic explanations

Examples of Violations:

Bad Example 1 - Architecture Overview with placeholders:

Title: Architecture Overview
Content:
This task involves the following components:
- [Component 1]: [What it does]
- [Component 2]: [What it does]

Technical approach:
- [Library Name] for [functionality]
- [Library Name] for [functionality]

(72 tokens of waste - DELETE this section)

Bad Example 2 - Key Dependencies with placeholders:

Title: Key Dependencies
Content:
| Library | Version | Purpose |
|---------|---------|---------|
| [Library Name] | [Version] | [What it does] |
| [Library Name] | [Version] | [What it does] |

Rationale:
- [Library]: [Why chosen]

(85 tokens of waste - DELETE this section)

Bad Example 3 - Implementation Strategy with placeholders:

Title: Implementation Strategy
Content:
Phase 1: [Phase Name]
- Goal: [Goal]
- Deliverables: [Deliverables]

Phase 2: [Phase Name]
- Goal: [Goal]
- Deliverables: [Deliverables]

(98 tokens of waste - DELETE this section)

Proper Response When Detected:

⚠️ WARN - Uncustomized Template Sections (Pattern 6)

**Found**: 3 task sections contain placeholder text, wasting ~255 tokens

**Violations**:
1. Task [ID] - Section "Architecture Overview" (72 tokens)
   - Placeholder patterns: `[Component 1]`, `[What it does]`
   - **Action**: DELETE section (ID: xxx)
   - **Reason**: Templates provide sufficient structure

2. Task [ID] - Section "Key Dependencies" (85 tokens)
   - Placeholder patterns: `[Library Name]`, `[Version]`, `[Why chosen]`
   - **Action**: DELETE section (ID: yyy)
   - **Reason**: Generic table with no actual dependencies

3. Task [ID] - Section "Implementation Strategy" (98 tokens)
   - Placeholder patterns: `[Phase Name]`, `[Goal]:`, `[Deliverables]:`
   - **Action**: DELETE section (ID: zzz)
   - **Reason**: Uncustomized phases with no specific strategy

**Expected**: Task-specific content ≥200 chars with NO placeholder text, OR delete section entirely

**Recommendation**:
- Planning Specialist must customize ALL sections before returning to orchestrator (Step 7.5 validation)
- Implementation Specialists must DELETE any placeholder sections during Step 4
- Templates provide sufficient structure for 95% of tasks (complexity ≤7)

**Root Cause**: Planning Specialist's bulkCreate operation included generic template sections without customization

**Prevention**:
1. Planning Specialist Step 7.5 (Validate Task Quality) must detect and delete placeholder sections
2. Implementation Specialists Step 4 must check for and delete placeholder sections
3. Orchestration QA Skill now detects this pattern automatically

**Token Savings**: ~255 tokens (current waste) → 0 tokens (after deletion)

Analysis Workflow

Step 1: Capture Baseline

Before specialist executes:

baseline = {
  taskId: task.id,
  summaryLength: task.summary?.length || 0,
  sectionCount: task.sections.length,
  totalTokens: estimateTaskTokens(task)
}

Step 2: Measure Addition

After specialist completes:

delta = {
  summaryAdded: task.summary.length - baseline.summaryLength,
  sectionsAdded: task.sections.length - baseline.sectionCount,
  tokensAdded: estimateTaskTokens(task) - baseline.totalTokens
}

Step 3: Analyze Quality

Run quality checks:

analysis = {
  informationDensity: calculateDensity(task, delta),
  redundancyScore: calculateRedundancy(task),
  codeRatio: calculateCodeRatio(task),
  summaryQuality: scoreSummary(task.summary),
  sectionUsefulness: task.sections.map(s => scoreSection(s)),
  wastefulPatterns: detectWaste(task)
}

Step 4: Generate Report

Format findings:

report = {
  specialist: entityType,
  taskId: task.id,
  tokensAdded: delta.tokensAdded,
  quality: {
    informationDensity: `${analysis.informationDensity}%`,
    redundancy: `${analysis.redundancyScore}%`,
    codeRatio: `${analysis.codeRatio}%`,
    summaryScore: `${analysis.summaryQuality}/100`,
    avgSectionScore: average(analysis.sectionUsefulness)
  },
  wastefulPatterns: analysis.wastefulPatterns,
  potentialSavings: calculateSavings(analysis.wastefulPatterns)
}

Aggregate across tasks:

session.contentQuality.push(report)

// After N tasks (e.g., 5), analyze trends
if (session.contentQuality.length >= 5) {
  trends = analyzeTrends(session.contentQuality)
  // e.g., "Backend Engineer consistently has high code ratio (avg 65%)"
}

Report Template

## 📊 Task Content Quality Analysis

**Specialist**: [Backend Engineer / Frontend Developer / etc.]
**Task**: [Task Title] ([ID])

### Tokens Added
- Summary: [X] chars ([Y] tokens)
- Sections: [N] sections added ([Z] tokens)
- **Total Added**: [Y+Z] tokens

### Quality Metrics
- **Information Density**: [X]% ([Target: ≥70%])
- **Redundancy Score**: [Y]% ([Target: ≤20%])
- **Code Ratio**: [Z]% ([Target: ≤30%])
- **Summary Quality**: [Score]/100

### ✅ Strengths
- [What was done well]
- [Good practice observed]

### ⚠️ Wasteful Patterns Detected ([count])

**Pattern 1: [Name]**
- Found: [What was observed]
- Expected: [Best practice]
- Recommendation: [How to improve]
- Potential Savings: ~[X] tokens

**Pattern 2: [Name]**
- Found: [What was observed]
- Expected: [Best practice]
- Recommendation: [How to improve]
- Potential Savings: ~[Y] tokens

### 💰 Total Potential Savings
- Current: [N] tokens added
- Optimized: [N-X-Y] tokens
- **Savings**: ~[X+Y] tokens ([Z]% reduction)

### 🎯 Specific Recommendations
1. [Most impactful improvement]
2. [Secondary improvement]
3. [Optional enhancement]

Trend Analysis (After 5+ Tasks)

## 📈 Content Quality Trends

**Session**: [N] tasks analyzed
**Specialists**: [List of specialists used]

### Average Metrics
- Information Density: [X]% (Target: ≥70%)
- Redundancy: [Y]% (Target: ≤20%)
- Code Ratio: [Z]% (Target: ≤30%)
- Summary Quality: [Score]/100

### Recurring Patterns
**Most Common Issue**: [Pattern name] ([N] occurrences)
- **Specialists Affected**: [Backend Engineer (3x), Frontend (2x)]
- **Total Waste**: ~[X] tokens across tasks
- **Recommendation**: Update [specialist].md to emphasize [practice]

**Second Most Common**: [Pattern name] ([M] occurrences)
- **Specialists Affected**: [...]
- **Recommendation**: [...]

### Specialist Performance

**Backend Engineer** ([N] tasks):
- Avg Density: [X]%
- Avg Redundancy: [Y]%
- Common Issue: High code ratio (avg [Z]%)
- **Recommendation**: Reference files instead of embedding code

**Frontend Developer** ([M] tasks):
- Avg Density: [X]%
- Avg Redundancy: [Y]%
- Strengths: Excellent summary quality (avg 85/100)

### System-Wide Opportunities
1. **Update Specialist Templates**
   - Add "Code in Files, Not Sections" guideline to all implementation specialists
   - Estimated Impact: [X]% token reduction

2. **Enhance Summary Guidelines**
   - Add anti-pattern examples (filler language)
   - Estimated Impact: [Y]% improvement in quality scores

3. **Section Template Improvements**
   - Provide better examples of useful vs wasteful sections
   - Estimated Impact: [Z]% reduction in redundancy

Integration with Post-Execution Review

// In post-execution.md, after Step 4 (Validate completion quality):

if (isImplementationSpecialist(entityType)) {
  // Read task-content-quality.md
  Read ".claude/skills/orchestration-qa/task-content-quality.md"

  // Run content quality analysis
  contentAnalysis = analyzeTaskContent(task, baseline)

  // Add to report
  report.contentQuality = contentAnalysis

  // Track for trends
  session.contentQuality.push(contentAnalysis)

  // If patterns found, add to deviations
  if (contentAnalysis.wastefulPatterns.length > 0) {
    deviations.push({
      severity: "INFO",  // Usually INFO, can be WARN if severe
      type: "Content Quality",
      patterns: contentAnalysis.wastefulPatterns,
      savings: contentAnalysis.potentialSavings
    })
  }
}

When to Report

Individual Task:

  • Report if wasteful patterns detected
  • Report if quality scores below targets

Session Trends:

  • After 5+ tasks analyzed
  • When recurring patterns detected (same issue 2+ times)
  • At session end (via phase="summary")

Add to TodoWrite (If Issues Found)

if (contentAnalysis.potentialSavings > 100) {
  TodoWrite([{
    content: `Review ${specialist} content quality: ${contentAnalysis.potentialSavings} tokens wasted`,
    activeForm: `Reviewing ${specialist} content patterns`,
    status: "pending"
  }])
}

// If recurring pattern
if (trends.recurringPatterns.length > 0) {
  TodoWrite([{
    content: `Update ${specialist}.md: ${trends.recurringPatterns[0].name} pattern recurring`,
    activeForm: `Improving ${specialist} guidelines`,
    status: "pending"
  }])
}

Target Benchmarks

Excellent (95%+ of metrics in target):

  • Information Density: ≥ 80%
  • Redundancy: ≤ 15%
  • Code Ratio: ≤ 20%
  • Summary Quality: ≥ 85/100
  • No wasteful patterns

Good (80%+ of metrics in target):

  • Information Density: 70-79%
  • Redundancy: 16-20%
  • Code Ratio: 21-30%
  • Summary Quality: 70-84/100
  • Minor wasteful patterns (< 100 tokens waste)

Needs Improvement (< 80% in target):

  • Information Density: < 70%
  • Redundancy: > 20%
  • Code Ratio: > 30%
  • Summary Quality: < 70/100
  • Significant waste (> 100 tokens)

Continuous Improvement

Track over time:

  • Are quality scores improving?
  • Are wasteful patterns decreasing?
  • Which specialists need guideline updates?
  • What best practices emerge from high-quality tasks?

Update specialist definitions when:

  • Same pattern occurs 3+ times
  • Potential savings > 500 tokens across multiple tasks
  • Quality scores consistently below targets