gh-neolabhq-context-enginee…/commands/reflect.md

---
description: Reflect on previus response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification
argument-hint: None required - automatically reviews recent work output
---

# Self-Refinement and Iterative Improvement Framework

Reflect on previus response and output.

## TASK COMPLEXITY TRIAGE

First, categorize the task to apply appropriate reflection depth:

### Quick Path (5-second check)

For simple tasks like:

- Single file edits
- Documentation updates
- Simple queries or explanations
- Straightforward bug fixes

→ **Skip to "Final Verification" section**

### Standard Path (Full reflection)

For tasks involving:

- Multiple file changes
- New feature implementation
- Architecture decisions
- Complex problem solving

→ **Follow complete framework + require confidence >70%**

### Deep Reflection Path

For critical tasks:

- Core system changes
- Security-related code
- Performance-critical sections
- API design decisions

→ **Follow framework + require confidence >90%**

## IMMEDIATE REFLECTION PROTOCOL

### Step 1: Initial Assessment

Before proceeding, evaluate your most recent output against these criteria:

1. **Completeness Check**
   - [ ] Does the solution fully address the user's request?
   - [ ] Are all requirements explicitly mentioned by the user covered?
   - [ ] Are there any implicit requirements that should be addressed?

2. **Quality Assessment**
   - [ ] Is the solution at the appropriate level of complexity?
   - [ ] Could the approach be simplified without losing functionality?
   - [ ] Are there obvious improvements that could be made?

3. **Correctness Verification**
   - [ ] Have you verified the logical correctness of your solution?
   - [ ] Are there edge cases that haven't been considered?
   - [ ] Could there be unintended side effects?

4. **Fact-Checking Required**
   - [ ] Have you made any claims about performance? (needs verification)
   - [ ] Have you stated any technical facts? (needs source/verification)
   - [ ] Have you referenced best practices? (needs validation)
   - [ ] Have you made security assertions? (needs careful review)

### Step 2: Decision Point

Based on the assessment above, determine:

**REFINEMENT NEEDED?** [YES/NO]

If YES, proceed to Step 3. If NO, skip to Final Verification.

### Step 3: Refinement Planning

If improvement is needed, generate a specific plan:

1. **Identify Issues** (List specific problems found)
   - Issue 1: [Describe]
   - Issue 2: [Describe]
   - ...

2. **Propose Solutions** (For each issue)
   - Solution 1: [Specific improvement]
   - Solution 2: [Specific improvement]
   - ...

3. **Priority Order**
   - Critical fixes first
   - Performance improvements second
   - Style/readability improvements last

### Concrete Example

**Issue Identified**: Function has 6 levels of nesting
**Solution**: Extract nested logic into separate functions
**Implementation**:

```
Before: if (a) { if (b) { if (c) { ... } } }
After: if (!shouldProcess(a, b, c)) return;
       processData();
```

## CODE-SPECIFIC REFLECTION CRITERIA

When the output involves code, additionally evaluate:

### STOP: Library & Existing Solution Check

**BEFORE PROCEEDING WITH CUSTOM CODE:**

1. **Search for Existing Libraries**
   - [ ] Have you searched npm/PyPI/Maven for existing solutions?
   - [ ] Is this a common problem that others have already solved?
   - [ ] Are you reinventing the wheel for utility functions?

   **Common areas to check:**
   - Date/time manipulation → moment.js, date-fns, dayjs
   - Form validation → joi, yup, zod
   - HTTP requests → axios, fetch, got
   - State management → Redux, MobX, Zustand
   - Utility functions → lodash, ramda, underscore

2. **Existing Service/Solution Evaluation**
   - [ ] Could this be handled by an existing service/SaaS?
   - [ ] Is there an open-source solution that fits?
   - [ ] Would a third-party API be more maintainable?

   **Examples:**
   - Authentication → Auth0, Supabase, Firebase Auth
   - Email sending → SendGrid, Mailgun, AWS SES
   - File storage → S3, Cloudinary, Firebase Storage
   - Search → Elasticsearch, Algolia, MeiliSearch
   - Queue/Jobs → Bull, RabbitMQ, AWS SQS

3. **Decision Framework**

   ```
   IF common utility function → Use established library
   ELSE IF complex domain-specific → Check for specialized libraries
   ELSE IF infrastructure concern → Look for managed services
   ELSE → Consider custom implementation
   ```

4. **When Custom Code IS Justified**
   - Specific business logic unique to your domain
   - Performance-critical paths with special requirements
   - When external dependencies would be overkill (e.g., lodash for one function)
   - Security-sensitive code requiring full control
   - When existing solutions don't meet requirements after evaluation

### Real Examples of Library-First Approach

**❌ BAD: Custom Implementation**

```javascript
// utils/dateFormatter.js
function formatDate(date) {
  const d = new Date(date);
  return `${d.getMonth()+1}/${d.getDate()}/${d.getFullYear()}`;
}
```

**✅ GOOD: Use Existing Library**

```javascript
import { format } from 'date-fns';
const formatted = format(new Date(), 'MM/dd/yyyy');
```

**❌ BAD: Generic Utilities Folder**

```
/src/utils/
  - helpers.js
  - common.js
  - shared.js
```

**✅ GOOD: Domain-Driven Structure**

```
/src/order/
  - domain/OrderCalculator.js
  - infrastructure/OrderRepository.js
/src/user/
  - domain/UserValidator.js
  - application/UserRegistrationService.js
```

### Common Anti-Patterns to Avoid

1. **NIH (Not Invented Here) Syndrome**
   - Building custom auth when Auth0/Supabase exists
   - Writing custom state management instead of using Redux/Zustand
   - Creating custom form validation instead of using Formik/React Hook Form

2. **Poor Architectural Choices**
   - Mixing business logic with UI components
   - Database queries in controllers
   - No clear separation of concerns

3. **Generic Naming Anti-Patterns**
   - `utils.js` with 50 unrelated functions
   - `helpers/misc.js` as a dumping ground
   - `common/shared.js` with unclear purpose

**Remember**: Every line of custom code is a liability that needs to be maintained, tested, and documented. Use existing solutions whenever possible.

### Architecture and Design

1. **Clean Architecture & DDD Alignment**
   - [ ] Does naming follow ubiquitous language of the domain?
   - [ ] Are domain entities separated from infrastructure?
   - [ ] Is business logic independent of frameworks?
   - [ ] Are use cases clearly defined and isolated?

   **Naming Convention Check:**
   - Avoid generic names: `utils`, `helpers`, `common`, `shared`
   - Use domain-specific names: `OrderCalculator`, `UserAuthenticator`
   - Follow bounded context naming: `Billing.InvoiceGenerator`

2. **Design Patterns**
   - Is the current design pattern appropriate?
   - Could a different pattern simplify the solution?
   - Are SOLID principles being followed?

3. **Modularity**
   - Can the code be broken into smaller, reusable functions?
   - Are responsibilities properly separated?
   - Is there unnecessary coupling between components?
   - Does each module have a single, clear purpose?

### Code Quality

1. **Simplification Opportunities**
   - Can any complex logic be simplified?
   - Are there redundant operations?
   - Can loops be replaced with more elegant solutions?

2. **Performance Considerations**
   - Are there obvious performance bottlenecks?
   - Could algorithmic complexity be improved?
   - Are resources being used efficiently?
   - **IMPORTANT**: Any performance claims in comments must be verified

3. **Error Handling**
   - Are all potential errors properly handled?
   - Is error handling consistent throughout?
   - Are error messages informative?

### Testing and Validation

1. **Test Coverage**
   - Are all critical paths tested?
   - Missing edge cases to test:
     - Boundary conditions
     - Null/empty inputs
     - Large/extreme values
     - Concurrent access scenarios
   - Are tests meaningful and not just for coverage?

2. **Test Quality**
   - Are tests independent and isolated?
   - Do tests follow AAA pattern (Arrange, Act, Assert)?
   - Are test names descriptive?

## FACT-CHECKING AND CLAIM VERIFICATION

### Claims Requiring Immediate Verification

1. **Performance Claims**
   - "This is X% faster" → Requires benchmarking
   - "This has O(n) complexity" → Requires analysis proof
   - "This reduces memory usage" → Requires profiling

   **Verification Method**: Run actual benchmarks if exists or provide algorithmic analysis

2. **Technical Facts**
   - "This API supports..." → Check official documentation
   - "The framework requires..." → Verify with current docs
   - "This library version..." → Confirm version compatibility

   **Verification Method**: Cross-reference with official documentation

3. **Security Assertions**
   - "This is secure against..." → Requires security analysis
   - "This prevents injection..." → Needs proof/testing
   - "This follows OWASP..." → Verify against standards

   **Verification Method**: Reference security standards and test

4. **Best Practice Claims**
   - "It's best practice to..." → Cite authoritative source
   - "Industry standard is..." → Provide reference
   - "Most developers prefer..." → Need data/surveys

   **Verification Method**: Cite specific sources or standards

### Fact-Checking Checklist

- [ ] All performance claims have benchmarks or Big-O analysis
- [ ] Technical specifications match current documentation
- [ ] Security claims are backed by standards or testing
- [ ] Best practices are cited from authoritative sources
- [ ] Version numbers and compatibility are verified
- [ ] Statistical claims have sources or data

### Red Flags Requiring Double-Check

- Absolute statements ("always", "never", "only")
- Superlatives ("best", "fastest", "most secure")
- Specific numbers without context (percentages, metrics)
- Claims about third-party tools/libraries
- Historical or temporal claims ("recently", "nowadays")

### Concrete Example of Fact-Checking

**Claim Made**: "Using Map is 50% faster than using Object for this use case"
**Verification Process**:

1. Search for benchmark or documentation comparing both approaches
2. Provide algorithmic analysis
**Corrected Statement**: "Map performs better for large collections (10K+ items), while Object is more efficient for small sets (<100 items)"

## NON-CODE OUTPUT REFLECTION

For documentation, explanations, and analysis outputs:

### Content Quality

1. **Clarity and Structure**
   - Is the information well-organized?
   - Are complex concepts explained simply?
   - Is there a logical flow of ideas?

2. **Completeness**
   - Are all aspects of the question addressed?
   - Are examples provided where helpful?
   - Are limitations or caveats mentioned?

3. **Accuracy**
   - Are technical details correct?
   - Are claims verifiable?
   - Are sources or reasoning provided?

### Improvement Triggers for Non-Code

- Ambiguous explanations
- Missing context or background
- Overly complex language for the audience
- Lack of concrete examples
- Unsubstantiated claims

## ITERATIVE REFINEMENT WORKFLOW

### Chain of Verification (CoV)

1. **Generate**: Create initial solution
2. **Verify**: Check each component/claim
3. **Question**: What could go wrong?
4. **Re-answer**: Address identified issues

### Tree of Thoughts (ToT)

For complex problems, consider multiple approaches:

1. **Branch 1**: Current approach
   - Pros: [List advantages]
   - Cons: [List disadvantages]

2. **Branch 2**: Alternative approach
   - Pros: [List advantages]
   - Cons: [List disadvantages]

3. **Decision**: Choose best path based on:
   - Simplicity
   - Maintainability
   - Performance
   - Extensibility

## REFINEMENT TRIGGERS

Automatically trigger refinement if any of these conditions are met:

1. **Complexity Threshold**
   - Cyclomatic complexity > 10
   - Nested depth > 3 levels
   - Function length > 50 lines

2. **Code Smells**
   - Duplicate code blocks
   - Long parameter lists (>4)
   - God classes/functions
   - Magic numbers/strings
   - Generic utility folders (`utils/`, `helpers/`, `common/`)
   - NIH syndrome indicators (custom implementations of standard solutions)

3. **Missing Elements**
   - No error handling
   - No input validation
   - No documentation for complex logic
   - No tests for critical functionality
   - No library search for common problems
   - No consideration of existing services

4. **Architecture Violations**
   - Business logic in controllers/views
   - Domain logic depending on infrastructure
   - Unclear boundaries between contexts
   - Generic naming instead of domain terms

## FINAL VERIFICATION

Before finalizing any output:

### Self-Refine Checklist

- [ ] Have I considered at least one alternative approach?
- [ ] Have I verified my assumptions?
- [ ] Is this the simplest correct solution?
- [ ] Would another developer easily understand this?
- [ ] Have I anticipated likely future requirements?
- [ ] Have all factual claims been verified or sourced?
- [ ] Are performance/security assertions backed by evidence?
- [ ] Did I search for existing libraries before writing custom code?
- [ ] Is the architecture aligned with Clean Architecture/DDD principles?
- [ ] Are names domain-specific rather than generic (utils/helpers)?

### Reflexion Questions

1. **What worked well in this solution?**
2. **What could be improved?**
3. **What would I do differently next time?**
4. **Are there patterns here that could be reused?**

## IMPROVEMENT DIRECTIVE

If after reflection you identify improvements:

1. **STOP** current implementation
2. **SEARCH** for existing solutions before continuing
   - Check package registries (npm, PyPI, etc.)
   - Research existing services/APIs
   - Review architectural patterns and libraries
3. **DOCUMENT** the improvements needed
   - Why custom vs library?
   - What architectural pattern fits?
   - How does it align with Clean Architecture/DDD?
4. **IMPLEMENT** the refined solution
5. **RE-EVALUATE** using this framework again

## CONFIDENCE ASSESSMENT

Rate your confidence in the current solution:

- [ ] High (>90%) - Solution is robust and well-tested
- [ ] Medium (70-90%) - Solution works but could be improved
- [ ] Low (<70%) - Significant improvements needed

If confidence is not enough based on the TASK COMPLEXITY TRIAGE, iterate again.

## REFINEMENT METRICS

Track the effectiveness of refinements:

### Iteration Count

- First attempt: [Initial solution]
- Iteration 1: [What was improved]
- Iteration 2: [Further improvements]
- Final: [Convergence achieved]

### Quality Indicators

- **Complexity Reduction**: Did refactoring simplify the code?
- **Bug Prevention**: Were potential issues identified and fixed?
- **Performance Gain**: Was efficiency improved?
- **Readability Score**: Is the final version clearer?

### Learning Points

Document patterns for future use:

- What type of issue was this?
- What solution pattern worked?
- Can this be reused elsewhere?

---

**REMEMBER**: The goal is not perfection on the first try, but continuous improvement through structured reflection. Each iteration should bring the solution closer to optimal.