zhongwei/gh-neolabhq-context-engineering-kit-plugins-reflexion

Files

Zhongwei Li 60e422fda5 Initial commit

2025-11-30 08:42:59 +08:00

15 KiB

Raw Permalink Blame History

description, argument-hint

description	argument-hint
Reflect on previus response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification	None required - automatically reviews recent work output

Self-Refinement and Iterative Improvement Framework

Reflect on previus response and output.

TASK COMPLEXITY TRIAGE

First, categorize the task to apply appropriate reflection depth:

Quick Path (5-second check)

For simple tasks like:

Single file edits
Documentation updates
Simple queries or explanations
Straightforward bug fixes

→ Skip to "Final Verification" section

Standard Path (Full reflection)

For tasks involving:

Multiple file changes
New feature implementation
Architecture decisions
Complex problem solving

→ Follow complete framework + require confidence >70%

Deep Reflection Path

For critical tasks:

Core system changes
Security-related code
Performance-critical sections
API design decisions

→ Follow framework + require confidence >90%

IMMEDIATE REFLECTION PROTOCOL

Step 1: Initial Assessment

Before proceeding, evaluate your most recent output against these criteria:

Completeness Check
- Does the solution fully address the user's request?
- Are all requirements explicitly mentioned by the user covered?
- Are there any implicit requirements that should be addressed?
Quality Assessment
- Is the solution at the appropriate level of complexity?
- Could the approach be simplified without losing functionality?
- Are there obvious improvements that could be made?
Correctness Verification
- Have you verified the logical correctness of your solution?
- Are there edge cases that haven't been considered?
- Could there be unintended side effects?
Fact-Checking Required
- Have you made any claims about performance? (needs verification)
- Have you stated any technical facts? (needs source/verification)
- Have you referenced best practices? (needs validation)
- Have you made security assertions? (needs careful review)

Step 2: Decision Point

Based on the assessment above, determine:

REFINEMENT NEEDED? [YES/NO]

If YES, proceed to Step 3. If NO, skip to Final Verification.

Step 3: Refinement Planning

If improvement is needed, generate a specific plan:

Identify Issues (List specific problems found)
- Issue 1: [Describe]
- Issue 2: [Describe]
- ...
Propose Solutions (For each issue)
- Solution 1: [Specific improvement]
- Solution 2: [Specific improvement]
- ...
Priority Order
- Critical fixes first
- Performance improvements second
- Style/readability improvements last

Concrete Example

Issue Identified: Function has 6 levels of nesting Solution: Extract nested logic into separate functions Implementation:

Before: if (a) { if (b) { if (c) { ... } } }
After: if (!shouldProcess(a, b, c)) return;
       processData();

CODE-SPECIFIC REFLECTION CRITERIA

When the output involves code, additionally evaluate:

STOP: Library & Existing Solution Check

BEFORE PROCEEDING WITH CUSTOM CODE:

Search for Existing Libraries
- Have you searched npm/PyPI/Maven for existing solutions?
- Is this a common problem that others have already solved?
- Are you reinventing the wheel for utility functions?
Common areas to check:
- Date/time manipulation → moment.js, date-fns, dayjs
- Form validation → joi, yup, zod
- HTTP requests → axios, fetch, got
- State management → Redux, MobX, Zustand
- Utility functions → lodash, ramda, underscore
Existing Service/Solution Evaluation
- Could this be handled by an existing service/SaaS?
- Is there an open-source solution that fits?
- Would a third-party API be more maintainable?
Examples:
- Authentication → Auth0, Supabase, Firebase Auth
- Email sending → SendGrid, Mailgun, AWS SES
- File storage → S3, Cloudinary, Firebase Storage
- Search → Elasticsearch, Algolia, MeiliSearch
- Queue/Jobs → Bull, RabbitMQ, AWS SQS

Decision Framework

IF common utility function → Use established library
ELSE IF complex domain-specific → Check for specialized libraries
ELSE IF infrastructure concern → Look for managed services
ELSE → Consider custom implementation

When Custom Code IS Justified
- Specific business logic unique to your domain
- Performance-critical paths with special requirements
- When external dependencies would be overkill (e.g., lodash for one function)
- Security-sensitive code requiring full control
- When existing solutions don't meet requirements after evaluation

Real Examples of Library-First Approach

❌ BAD: Custom Implementation

// utils/dateFormatter.js
function formatDate(date) {
  const d = new Date(date);
  return `${d.getMonth()+1}/${d.getDate()}/${d.getFullYear()}`;
}

✅ GOOD: Use Existing Library

import { format } from 'date-fns';
const formatted = format(new Date(), 'MM/dd/yyyy');

❌ BAD: Generic Utilities Folder

/src/utils/
  - helpers.js
  - common.js
  - shared.js

✅ GOOD: Domain-Driven Structure

/src/order/
  - domain/OrderCalculator.js
  - infrastructure/OrderRepository.js
/src/user/
  - domain/UserValidator.js
  - application/UserRegistrationService.js

Common Anti-Patterns to Avoid

NIH (Not Invented Here) Syndrome
- Building custom auth when Auth0/Supabase exists
- Writing custom state management instead of using Redux/Zustand
- Creating custom form validation instead of using Formik/React Hook Form
Poor Architectural Choices
- Mixing business logic with UI components
- Database queries in controllers
- No clear separation of concerns
Generic Naming Anti-Patterns
- utils.js with 50 unrelated functions
- helpers/misc.js as a dumping ground
- common/shared.js with unclear purpose

Remember: Every line of custom code is a liability that needs to be maintained, tested, and documented. Use existing solutions whenever possible.

Architecture and Design

Clean Architecture & DDD Alignment
- Does naming follow ubiquitous language of the domain?
- Are domain entities separated from infrastructure?
- Is business logic independent of frameworks?
- Are use cases clearly defined and isolated?
Naming Convention Check:
- Avoid generic names: utils, helpers, common, shared
- Use domain-specific names: OrderCalculator, UserAuthenticator
- Follow bounded context naming: Billing.InvoiceGenerator
Design Patterns
- Is the current design pattern appropriate?
- Could a different pattern simplify the solution?
- Are SOLID principles being followed?
Modularity
- Can the code be broken into smaller, reusable functions?
- Are responsibilities properly separated?
- Is there unnecessary coupling between components?
- Does each module have a single, clear purpose?

Code Quality

Simplification Opportunities
- Can any complex logic be simplified?
- Are there redundant operations?
- Can loops be replaced with more elegant solutions?
Performance Considerations
- Are there obvious performance bottlenecks?
- Could algorithmic complexity be improved?
- Are resources being used efficiently?
- IMPORTANT: Any performance claims in comments must be verified
Error Handling
- Are all potential errors properly handled?
- Is error handling consistent throughout?
- Are error messages informative?

Testing and Validation

Test Coverage
- Are all critical paths tested?
- Missing edge cases to test:
  - Boundary conditions
  - Null/empty inputs
  - Large/extreme values
  - Concurrent access scenarios
- Are tests meaningful and not just for coverage?
Test Quality
- Are tests independent and isolated?
- Do tests follow AAA pattern (Arrange, Act, Assert)?
- Are test names descriptive?

FACT-CHECKING AND CLAIM VERIFICATION

Claims Requiring Immediate Verification

Performance Claims
- "This is X% faster" → Requires benchmarking
- "This has O(n) complexity" → Requires analysis proof
- "This reduces memory usage" → Requires profiling
Verification Method: Run actual benchmarks if exists or provide algorithmic analysis
Technical Facts
- "This API supports..." → Check official documentation
- "The framework requires..." → Verify with current docs
- "This library version..." → Confirm version compatibility
Verification Method: Cross-reference with official documentation
Security Assertions
- "This is secure against..." → Requires security analysis
- "This prevents injection..." → Needs proof/testing
- "This follows OWASP..." → Verify against standards
Verification Method: Reference security standards and test
Best Practice Claims
- "It's best practice to..." → Cite authoritative source
- "Industry standard is..." → Provide reference
- "Most developers prefer..." → Need data/surveys
Verification Method: Cite specific sources or standards

Fact-Checking Checklist

All performance claims have benchmarks or Big-O analysis
Technical specifications match current documentation
Security claims are backed by standards or testing
Best practices are cited from authoritative sources
Version numbers and compatibility are verified
Statistical claims have sources or data

Red Flags Requiring Double-Check

Absolute statements ("always", "never", "only")
Superlatives ("best", "fastest", "most secure")
Specific numbers without context (percentages, metrics)
Claims about third-party tools/libraries
Historical or temporal claims ("recently", "nowadays")

Concrete Example of Fact-Checking

Claim Made: "Using Map is 50% faster than using Object for this use case" Verification Process:

Search for benchmark or documentation comparing both approaches
Provide algorithmic analysis Corrected Statement: "Map performs better for large collections (10K+ items), while Object is more efficient for small sets (<100 items)"

NON-CODE OUTPUT REFLECTION

For documentation, explanations, and analysis outputs:

Content Quality

Clarity and Structure
- Is the information well-organized?
- Are complex concepts explained simply?
- Is there a logical flow of ideas?
Completeness
- Are all aspects of the question addressed?
- Are examples provided where helpful?
- Are limitations or caveats mentioned?
Accuracy
- Are technical details correct?
- Are claims verifiable?
- Are sources or reasoning provided?

Improvement Triggers for Non-Code

Ambiguous explanations
Missing context or background
Overly complex language for the audience
Lack of concrete examples
Unsubstantiated claims

ITERATIVE REFINEMENT WORKFLOW

Chain of Verification (CoV)

Generate: Create initial solution
Verify: Check each component/claim
Question: What could go wrong?
Re-answer: Address identified issues

Tree of Thoughts (ToT)

For complex problems, consider multiple approaches:

Branch 1: Current approach
- Pros: [List advantages]
- Cons: [List disadvantages]
Branch 2: Alternative approach
- Pros: [List advantages]
- Cons: [List disadvantages]
Decision: Choose best path based on:
- Simplicity
- Maintainability
- Performance
- Extensibility

REFINEMENT TRIGGERS

Automatically trigger refinement if any of these conditions are met:

Complexity Threshold
- Cyclomatic complexity > 10
- Nested depth > 3 levels
- Function length > 50 lines
Code Smells
- Duplicate code blocks
- Long parameter lists (>4)
- God classes/functions
- Magic numbers/strings
- Generic utility folders (utils/, helpers/, common/)
- NIH syndrome indicators (custom implementations of standard solutions)
Missing Elements
- No error handling
- No input validation
- No documentation for complex logic
- No tests for critical functionality
- No library search for common problems
- No consideration of existing services
Architecture Violations
- Business logic in controllers/views
- Domain logic depending on infrastructure
- Unclear boundaries between contexts
- Generic naming instead of domain terms

FINAL VERIFICATION

Before finalizing any output:

Self-Refine Checklist

Have I considered at least one alternative approach?
Have I verified my assumptions?
Is this the simplest correct solution?
Would another developer easily understand this?
Have I anticipated likely future requirements?
Have all factual claims been verified or sourced?
Are performance/security assertions backed by evidence?
Did I search for existing libraries before writing custom code?
Is the architecture aligned with Clean Architecture/DDD principles?
Are names domain-specific rather than generic (utils/helpers)?

Reflexion Questions

What worked well in this solution?
What could be improved?
What would I do differently next time?
Are there patterns here that could be reused?

IMPROVEMENT DIRECTIVE

If after reflection you identify improvements:

STOP current implementation
SEARCH for existing solutions before continuing
- Check package registries (npm, PyPI, etc.)
- Research existing services/APIs
- Review architectural patterns and libraries
DOCUMENT the improvements needed
- Why custom vs library?
- What architectural pattern fits?
- How does it align with Clean Architecture/DDD?
IMPLEMENT the refined solution
RE-EVALUATE using this framework again

CONFIDENCE ASSESSMENT

Rate your confidence in the current solution:

High (>90%) - Solution is robust and well-tested
Medium (70-90%) - Solution works but could be improved
Low (<70%) - Significant improvements needed

If confidence is not enough based on the TASK COMPLEXITY TRIAGE, iterate again.

REFINEMENT METRICS

Track the effectiveness of refinements:

Iteration Count

First attempt: [Initial solution]
Iteration 1: [What was improved]
Iteration 2: [Further improvements]
Final: [Convergence achieved]

Quality Indicators

Complexity Reduction: Did refactoring simplify the code?
Bug Prevention: Were potential issues identified and fixed?
Performance Gain: Was efficiency improved?
Readability Score: Is the final version clearer?

Learning Points

Document patterns for future use:

What type of issue was this?
What solution pattern worked?
Can this be reused elsewhere?

REMEMBER: The goal is not perfection on the first try, but continuous improvement through structured reflection. Each iteration should bring the solution closer to optimal.

15 KiB Raw Permalink Blame History