501 lines
15 KiB
Markdown
501 lines
15 KiB
Markdown
---
|
|
description: Reflect on previus response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification
|
|
argument-hint: None required - automatically reviews recent work output
|
|
---
|
|
|
|
# Self-Refinement and Iterative Improvement Framework
|
|
|
|
Reflect on previus response and output.
|
|
|
|
## TASK COMPLEXITY TRIAGE
|
|
|
|
First, categorize the task to apply appropriate reflection depth:
|
|
|
|
### Quick Path (5-second check)
|
|
|
|
For simple tasks like:
|
|
|
|
- Single file edits
|
|
- Documentation updates
|
|
- Simple queries or explanations
|
|
- Straightforward bug fixes
|
|
|
|
→ **Skip to "Final Verification" section**
|
|
|
|
### Standard Path (Full reflection)
|
|
|
|
For tasks involving:
|
|
|
|
- Multiple file changes
|
|
- New feature implementation
|
|
- Architecture decisions
|
|
- Complex problem solving
|
|
|
|
→ **Follow complete framework + require confidence >70%**
|
|
|
|
### Deep Reflection Path
|
|
|
|
For critical tasks:
|
|
|
|
- Core system changes
|
|
- Security-related code
|
|
- Performance-critical sections
|
|
- API design decisions
|
|
|
|
→ **Follow framework + require confidence >90%**
|
|
|
|
## IMMEDIATE REFLECTION PROTOCOL
|
|
|
|
### Step 1: Initial Assessment
|
|
|
|
Before proceeding, evaluate your most recent output against these criteria:
|
|
|
|
1. **Completeness Check**
|
|
- [ ] Does the solution fully address the user's request?
|
|
- [ ] Are all requirements explicitly mentioned by the user covered?
|
|
- [ ] Are there any implicit requirements that should be addressed?
|
|
|
|
2. **Quality Assessment**
|
|
- [ ] Is the solution at the appropriate level of complexity?
|
|
- [ ] Could the approach be simplified without losing functionality?
|
|
- [ ] Are there obvious improvements that could be made?
|
|
|
|
3. **Correctness Verification**
|
|
- [ ] Have you verified the logical correctness of your solution?
|
|
- [ ] Are there edge cases that haven't been considered?
|
|
- [ ] Could there be unintended side effects?
|
|
|
|
4. **Fact-Checking Required**
|
|
- [ ] Have you made any claims about performance? (needs verification)
|
|
- [ ] Have you stated any technical facts? (needs source/verification)
|
|
- [ ] Have you referenced best practices? (needs validation)
|
|
- [ ] Have you made security assertions? (needs careful review)
|
|
|
|
### Step 2: Decision Point
|
|
|
|
Based on the assessment above, determine:
|
|
|
|
**REFINEMENT NEEDED?** [YES/NO]
|
|
|
|
If YES, proceed to Step 3. If NO, skip to Final Verification.
|
|
|
|
### Step 3: Refinement Planning
|
|
|
|
If improvement is needed, generate a specific plan:
|
|
|
|
1. **Identify Issues** (List specific problems found)
|
|
- Issue 1: [Describe]
|
|
- Issue 2: [Describe]
|
|
- ...
|
|
|
|
2. **Propose Solutions** (For each issue)
|
|
- Solution 1: [Specific improvement]
|
|
- Solution 2: [Specific improvement]
|
|
- ...
|
|
|
|
3. **Priority Order**
|
|
- Critical fixes first
|
|
- Performance improvements second
|
|
- Style/readability improvements last
|
|
|
|
### Concrete Example
|
|
|
|
**Issue Identified**: Function has 6 levels of nesting
|
|
**Solution**: Extract nested logic into separate functions
|
|
**Implementation**:
|
|
|
|
```
|
|
Before: if (a) { if (b) { if (c) { ... } } }
|
|
After: if (!shouldProcess(a, b, c)) return;
|
|
processData();
|
|
```
|
|
|
|
## CODE-SPECIFIC REFLECTION CRITERIA
|
|
|
|
When the output involves code, additionally evaluate:
|
|
|
|
### STOP: Library & Existing Solution Check
|
|
|
|
**BEFORE PROCEEDING WITH CUSTOM CODE:**
|
|
|
|
1. **Search for Existing Libraries**
|
|
- [ ] Have you searched npm/PyPI/Maven for existing solutions?
|
|
- [ ] Is this a common problem that others have already solved?
|
|
- [ ] Are you reinventing the wheel for utility functions?
|
|
|
|
**Common areas to check:**
|
|
- Date/time manipulation → moment.js, date-fns, dayjs
|
|
- Form validation → joi, yup, zod
|
|
- HTTP requests → axios, fetch, got
|
|
- State management → Redux, MobX, Zustand
|
|
- Utility functions → lodash, ramda, underscore
|
|
|
|
2. **Existing Service/Solution Evaluation**
|
|
- [ ] Could this be handled by an existing service/SaaS?
|
|
- [ ] Is there an open-source solution that fits?
|
|
- [ ] Would a third-party API be more maintainable?
|
|
|
|
**Examples:**
|
|
- Authentication → Auth0, Supabase, Firebase Auth
|
|
- Email sending → SendGrid, Mailgun, AWS SES
|
|
- File storage → S3, Cloudinary, Firebase Storage
|
|
- Search → Elasticsearch, Algolia, MeiliSearch
|
|
- Queue/Jobs → Bull, RabbitMQ, AWS SQS
|
|
|
|
3. **Decision Framework**
|
|
|
|
```
|
|
IF common utility function → Use established library
|
|
ELSE IF complex domain-specific → Check for specialized libraries
|
|
ELSE IF infrastructure concern → Look for managed services
|
|
ELSE → Consider custom implementation
|
|
```
|
|
|
|
4. **When Custom Code IS Justified**
|
|
- Specific business logic unique to your domain
|
|
- Performance-critical paths with special requirements
|
|
- When external dependencies would be overkill (e.g., lodash for one function)
|
|
- Security-sensitive code requiring full control
|
|
- When existing solutions don't meet requirements after evaluation
|
|
|
|
### Real Examples of Library-First Approach
|
|
|
|
**❌ BAD: Custom Implementation**
|
|
|
|
```javascript
|
|
// utils/dateFormatter.js
|
|
function formatDate(date) {
|
|
const d = new Date(date);
|
|
return `${d.getMonth()+1}/${d.getDate()}/${d.getFullYear()}`;
|
|
}
|
|
```
|
|
|
|
**✅ GOOD: Use Existing Library**
|
|
|
|
```javascript
|
|
import { format } from 'date-fns';
|
|
const formatted = format(new Date(), 'MM/dd/yyyy');
|
|
```
|
|
|
|
**❌ BAD: Generic Utilities Folder**
|
|
|
|
```
|
|
/src/utils/
|
|
- helpers.js
|
|
- common.js
|
|
- shared.js
|
|
```
|
|
|
|
**✅ GOOD: Domain-Driven Structure**
|
|
|
|
```
|
|
/src/order/
|
|
- domain/OrderCalculator.js
|
|
- infrastructure/OrderRepository.js
|
|
/src/user/
|
|
- domain/UserValidator.js
|
|
- application/UserRegistrationService.js
|
|
```
|
|
|
|
### Common Anti-Patterns to Avoid
|
|
|
|
1. **NIH (Not Invented Here) Syndrome**
|
|
- Building custom auth when Auth0/Supabase exists
|
|
- Writing custom state management instead of using Redux/Zustand
|
|
- Creating custom form validation instead of using Formik/React Hook Form
|
|
|
|
2. **Poor Architectural Choices**
|
|
- Mixing business logic with UI components
|
|
- Database queries in controllers
|
|
- No clear separation of concerns
|
|
|
|
3. **Generic Naming Anti-Patterns**
|
|
- `utils.js` with 50 unrelated functions
|
|
- `helpers/misc.js` as a dumping ground
|
|
- `common/shared.js` with unclear purpose
|
|
|
|
**Remember**: Every line of custom code is a liability that needs to be maintained, tested, and documented. Use existing solutions whenever possible.
|
|
|
|
### Architecture and Design
|
|
|
|
1. **Clean Architecture & DDD Alignment**
|
|
- [ ] Does naming follow ubiquitous language of the domain?
|
|
- [ ] Are domain entities separated from infrastructure?
|
|
- [ ] Is business logic independent of frameworks?
|
|
- [ ] Are use cases clearly defined and isolated?
|
|
|
|
**Naming Convention Check:**
|
|
- Avoid generic names: `utils`, `helpers`, `common`, `shared`
|
|
- Use domain-specific names: `OrderCalculator`, `UserAuthenticator`
|
|
- Follow bounded context naming: `Billing.InvoiceGenerator`
|
|
|
|
2. **Design Patterns**
|
|
- Is the current design pattern appropriate?
|
|
- Could a different pattern simplify the solution?
|
|
- Are SOLID principles being followed?
|
|
|
|
3. **Modularity**
|
|
- Can the code be broken into smaller, reusable functions?
|
|
- Are responsibilities properly separated?
|
|
- Is there unnecessary coupling between components?
|
|
- Does each module have a single, clear purpose?
|
|
|
|
### Code Quality
|
|
|
|
1. **Simplification Opportunities**
|
|
- Can any complex logic be simplified?
|
|
- Are there redundant operations?
|
|
- Can loops be replaced with more elegant solutions?
|
|
|
|
2. **Performance Considerations**
|
|
- Are there obvious performance bottlenecks?
|
|
- Could algorithmic complexity be improved?
|
|
- Are resources being used efficiently?
|
|
- **IMPORTANT**: Any performance claims in comments must be verified
|
|
|
|
3. **Error Handling**
|
|
- Are all potential errors properly handled?
|
|
- Is error handling consistent throughout?
|
|
- Are error messages informative?
|
|
|
|
### Testing and Validation
|
|
|
|
1. **Test Coverage**
|
|
- Are all critical paths tested?
|
|
- Missing edge cases to test:
|
|
- Boundary conditions
|
|
- Null/empty inputs
|
|
- Large/extreme values
|
|
- Concurrent access scenarios
|
|
- Are tests meaningful and not just for coverage?
|
|
|
|
2. **Test Quality**
|
|
- Are tests independent and isolated?
|
|
- Do tests follow AAA pattern (Arrange, Act, Assert)?
|
|
- Are test names descriptive?
|
|
|
|
## FACT-CHECKING AND CLAIM VERIFICATION
|
|
|
|
### Claims Requiring Immediate Verification
|
|
|
|
1. **Performance Claims**
|
|
- "This is X% faster" → Requires benchmarking
|
|
- "This has O(n) complexity" → Requires analysis proof
|
|
- "This reduces memory usage" → Requires profiling
|
|
|
|
**Verification Method**: Run actual benchmarks if exists or provide algorithmic analysis
|
|
|
|
2. **Technical Facts**
|
|
- "This API supports..." → Check official documentation
|
|
- "The framework requires..." → Verify with current docs
|
|
- "This library version..." → Confirm version compatibility
|
|
|
|
**Verification Method**: Cross-reference with official documentation
|
|
|
|
3. **Security Assertions**
|
|
- "This is secure against..." → Requires security analysis
|
|
- "This prevents injection..." → Needs proof/testing
|
|
- "This follows OWASP..." → Verify against standards
|
|
|
|
**Verification Method**: Reference security standards and test
|
|
|
|
4. **Best Practice Claims**
|
|
- "It's best practice to..." → Cite authoritative source
|
|
- "Industry standard is..." → Provide reference
|
|
- "Most developers prefer..." → Need data/surveys
|
|
|
|
**Verification Method**: Cite specific sources or standards
|
|
|
|
### Fact-Checking Checklist
|
|
|
|
- [ ] All performance claims have benchmarks or Big-O analysis
|
|
- [ ] Technical specifications match current documentation
|
|
- [ ] Security claims are backed by standards or testing
|
|
- [ ] Best practices are cited from authoritative sources
|
|
- [ ] Version numbers and compatibility are verified
|
|
- [ ] Statistical claims have sources or data
|
|
|
|
### Red Flags Requiring Double-Check
|
|
|
|
- Absolute statements ("always", "never", "only")
|
|
- Superlatives ("best", "fastest", "most secure")
|
|
- Specific numbers without context (percentages, metrics)
|
|
- Claims about third-party tools/libraries
|
|
- Historical or temporal claims ("recently", "nowadays")
|
|
|
|
### Concrete Example of Fact-Checking
|
|
|
|
**Claim Made**: "Using Map is 50% faster than using Object for this use case"
|
|
**Verification Process**:
|
|
|
|
1. Search for benchmark or documentation comparing both approaches
|
|
2. Provide algorithmic analysis
|
|
**Corrected Statement**: "Map performs better for large collections (10K+ items), while Object is more efficient for small sets (<100 items)"
|
|
|
|
## NON-CODE OUTPUT REFLECTION
|
|
|
|
For documentation, explanations, and analysis outputs:
|
|
|
|
### Content Quality
|
|
|
|
1. **Clarity and Structure**
|
|
- Is the information well-organized?
|
|
- Are complex concepts explained simply?
|
|
- Is there a logical flow of ideas?
|
|
|
|
2. **Completeness**
|
|
- Are all aspects of the question addressed?
|
|
- Are examples provided where helpful?
|
|
- Are limitations or caveats mentioned?
|
|
|
|
3. **Accuracy**
|
|
- Are technical details correct?
|
|
- Are claims verifiable?
|
|
- Are sources or reasoning provided?
|
|
|
|
### Improvement Triggers for Non-Code
|
|
|
|
- Ambiguous explanations
|
|
- Missing context or background
|
|
- Overly complex language for the audience
|
|
- Lack of concrete examples
|
|
- Unsubstantiated claims
|
|
|
|
## ITERATIVE REFINEMENT WORKFLOW
|
|
|
|
### Chain of Verification (CoV)
|
|
|
|
1. **Generate**: Create initial solution
|
|
2. **Verify**: Check each component/claim
|
|
3. **Question**: What could go wrong?
|
|
4. **Re-answer**: Address identified issues
|
|
|
|
### Tree of Thoughts (ToT)
|
|
|
|
For complex problems, consider multiple approaches:
|
|
|
|
1. **Branch 1**: Current approach
|
|
- Pros: [List advantages]
|
|
- Cons: [List disadvantages]
|
|
|
|
2. **Branch 2**: Alternative approach
|
|
- Pros: [List advantages]
|
|
- Cons: [List disadvantages]
|
|
|
|
3. **Decision**: Choose best path based on:
|
|
- Simplicity
|
|
- Maintainability
|
|
- Performance
|
|
- Extensibility
|
|
|
|
## REFINEMENT TRIGGERS
|
|
|
|
Automatically trigger refinement if any of these conditions are met:
|
|
|
|
1. **Complexity Threshold**
|
|
- Cyclomatic complexity > 10
|
|
- Nested depth > 3 levels
|
|
- Function length > 50 lines
|
|
|
|
2. **Code Smells**
|
|
- Duplicate code blocks
|
|
- Long parameter lists (>4)
|
|
- God classes/functions
|
|
- Magic numbers/strings
|
|
- Generic utility folders (`utils/`, `helpers/`, `common/`)
|
|
- NIH syndrome indicators (custom implementations of standard solutions)
|
|
|
|
3. **Missing Elements**
|
|
- No error handling
|
|
- No input validation
|
|
- No documentation for complex logic
|
|
- No tests for critical functionality
|
|
- No library search for common problems
|
|
- No consideration of existing services
|
|
|
|
4. **Architecture Violations**
|
|
- Business logic in controllers/views
|
|
- Domain logic depending on infrastructure
|
|
- Unclear boundaries between contexts
|
|
- Generic naming instead of domain terms
|
|
|
|
## FINAL VERIFICATION
|
|
|
|
Before finalizing any output:
|
|
|
|
### Self-Refine Checklist
|
|
|
|
- [ ] Have I considered at least one alternative approach?
|
|
- [ ] Have I verified my assumptions?
|
|
- [ ] Is this the simplest correct solution?
|
|
- [ ] Would another developer easily understand this?
|
|
- [ ] Have I anticipated likely future requirements?
|
|
- [ ] Have all factual claims been verified or sourced?
|
|
- [ ] Are performance/security assertions backed by evidence?
|
|
- [ ] Did I search for existing libraries before writing custom code?
|
|
- [ ] Is the architecture aligned with Clean Architecture/DDD principles?
|
|
- [ ] Are names domain-specific rather than generic (utils/helpers)?
|
|
|
|
### Reflexion Questions
|
|
|
|
1. **What worked well in this solution?**
|
|
2. **What could be improved?**
|
|
3. **What would I do differently next time?**
|
|
4. **Are there patterns here that could be reused?**
|
|
|
|
## IMPROVEMENT DIRECTIVE
|
|
|
|
If after reflection you identify improvements:
|
|
|
|
1. **STOP** current implementation
|
|
2. **SEARCH** for existing solutions before continuing
|
|
- Check package registries (npm, PyPI, etc.)
|
|
- Research existing services/APIs
|
|
- Review architectural patterns and libraries
|
|
3. **DOCUMENT** the improvements needed
|
|
- Why custom vs library?
|
|
- What architectural pattern fits?
|
|
- How does it align with Clean Architecture/DDD?
|
|
4. **IMPLEMENT** the refined solution
|
|
5. **RE-EVALUATE** using this framework again
|
|
|
|
## CONFIDENCE ASSESSMENT
|
|
|
|
Rate your confidence in the current solution:
|
|
|
|
- [ ] High (>90%) - Solution is robust and well-tested
|
|
- [ ] Medium (70-90%) - Solution works but could be improved
|
|
- [ ] Low (<70%) - Significant improvements needed
|
|
|
|
If confidence is not enough based on the TASK COMPLEXITY TRIAGE, iterate again.
|
|
|
|
## REFINEMENT METRICS
|
|
|
|
Track the effectiveness of refinements:
|
|
|
|
### Iteration Count
|
|
|
|
- First attempt: [Initial solution]
|
|
- Iteration 1: [What was improved]
|
|
- Iteration 2: [Further improvements]
|
|
- Final: [Convergence achieved]
|
|
|
|
### Quality Indicators
|
|
|
|
- **Complexity Reduction**: Did refactoring simplify the code?
|
|
- **Bug Prevention**: Were potential issues identified and fixed?
|
|
- **Performance Gain**: Was efficiency improved?
|
|
- **Readability Score**: Is the final version clearer?
|
|
|
|
### Learning Points
|
|
|
|
Document patterns for future use:
|
|
|
|
- What type of issue was this?
|
|
- What solution pattern worked?
|
|
- Can this be reused elsewhere?
|
|
|
|
---
|
|
|
|
**REMEMBER**: The goal is not perfection on the first try, but continuous improvement through structured reflection. Each iteration should bring the solution closer to optimal.
|