Initial commit

2025-11-29 18:28:34 +08:00
commit 390afca02b
220 changed files with 86013 additions and 0 deletions
--- a/skills/prompt-engineering/references/cot-patterns.md
+++ b/skills/prompt-engineering/references/cot-patterns.md
@@ -0,0 +1,426 @@
+# Chain-of-Thought Reasoning Patterns
+
+This reference provides comprehensive frameworks for implementing effective chain-of-thought (CoT) reasoning that improves model performance on complex, multi-step problems.
+
+## Core Principles
+
+### Step-by-Step Reasoning Elicitation
+
+#### Problem Decomposition Strategy
+- Break complex problems into manageable sub-problems
+- Identify dependencies and relationships between components
+- Establish logical flow and sequence of reasoning steps
+- Define clear decision points and validation criteria
+
+#### Verification and Validation Integration
+- Include self-checking mechanisms at critical junctures
+- Implement consistency checks across reasoning steps
+- Add confidence scoring for uncertain conclusions
+- Provide fallback strategies for ambiguous situations
+
+## Zero-Shot Chain-of-Thought Patterns
+
+### Basic CoT Initiation
+```
+Let's think step by step to solve this problem:
+
+1. First, I need to understand what the question is asking for
+2. Then, I'll identify the key information and constraints
+3. Next, I'll consider different approaches to solve it
+4. I'll work through the solution methodically
+5. Finally, I'll verify my answer makes sense
+
+Problem: {problem_statement}
+
+Step 1: Understanding the question
+{analysis}
+
+Step 2: Key information and constraints
+{information_analysis}
+
+Step 3: Solution approach
+{approach_analysis}
+
+Step 4: Working through the solution
+{detailed_solution}
+
+Step 5: Verification
+{verification}
+
+Final Answer: {conclusion}
+```
+
+### Enhanced CoT with Confidence
+```
+Let me think through this systematically, breaking down the problem and checking my reasoning at each step.
+
+**Problem**: {problem_description}
+
+**Step 1: Problem Analysis**
+- What am I being asked to solve?
+- What information is provided?
+- What are the constraints?
+- My confidence in understanding: {score}/10
+
+**Step 2: Strategy Selection**
+- Possible approaches:
+  1. {approach_1}
+  2. {approach_2}
+  3. {approach_3}
+- Selected approach: {chosen_approach}
+- Rationale: {reasoning_for_choice}
+
+**Step 3: Execution**
+- {detailed_step_by_step_solution}
+
+**Step 4: Verification**
+- Does the answer make sense?
+- Have I addressed all parts of the question?
+- Confidence in final answer: {score}/10
+
+**Final Answer**: {solution_with_confidence_score}
+```
+
+## Few-Shot Chain-of-Thought Patterns
+
+### Mathematical Reasoning Template
+```
+Solve the following math problem step by step.
+
+Example 1:
+Problem: A store sells apples for $2 each and oranges for $3 each. If John buys 4 apples and 2 oranges, and spends exactly $14, how much does each fruit cost?
+
+Step 1: Set up the equation
+Let a = cost of apples, o = cost of oranges
+4a + 2o = 14
+
+Step 2: Simplify the equation
+Divide both sides by 2: 2a + o = 7
+
+Step 3: Test integer solutions
+If a = 2, then 2(2) + o = 7 → o = 3
+If a = 3, then 2(3) + o = 7 → o = 1
+
+Step 4: Verify the original problem
+For a=2, o=3: 4(2) + 2(3) = 8 + 6 = 14 ✓
+For a=3, o=1: 4(3) + 2(1) = 12 + 2 = 14 ✓
+
+Both solutions work, so apples could cost $2 and oranges $3, OR apples $3 and oranges $1.
+
+Example 2:
+Problem: Sarah is twice as old as Tom was 3 years ago. In 5 years, Sarah will be 1.5 times Tom's current age. How old are they now?
+
+Step 1: Define variables
+Let S = Sarah's current age, T = Tom's current age
+
+Step 2: Set up equations
+"Sarah is twice as old as Tom was 3 years ago": S = 2(T - 3)
+"In 5 years, Sarah will be 1.5 times Tom's current age": S + 5 = 1.5T
+
+Step 3: Solve the system
+From first equation: S = 2T - 6
+Substitute into second: (2T - 6) + 5 = 1.5T
+2T - 1 = 1.5T
+0.5T = 1
+T = 2
+
+Step 4: Find Sarah's age
+S = 2(2) - 6 = 4 - 6 = -2
+
+Step 5: Check for validity
+Negative age is impossible, so I must have made an error.
+Let me re-examine the problem...
+
+Actually, let me try again:
+S = 2(T - 3) = 2T - 6
+S + 5 = 1.5T
+Substitute: (2T - 6) + 5 = 1.5T
+2T - 1 = 1.5T
+0.5T = 1
+T = 2 years old
+S = 2(2 - 3) = 2(-1) = -2
+
+This gives a negative age. The problem might have no solution with integer ages.
+
+Now solve:
+Problem: {math_problem}
+
+Step 1: Define variables
+{variable_definitions}
+
+Step 2: Set up equations
+{equation_setup}
+
+Step 3: Solve the system
+{solution_process}
+
+Step 4: Verify the solution
+{verification}
+
+Final Answer: {answer}
+```
+
+### Logical Reasoning Template
+```
+Analyze the logical argument and determine if it's valid.
+
+Example 1:
+Premise 1: All birds can fly
+Premise 2: Penguins are birds
+Conclusion: Therefore, penguins can fly
+
+Step 1: Analyze the structure
+This is a syllogism with form:
+All A are B
+C is A
+Therefore, C is B
+
+Step 2: Evaluate premise validity
+Premise 1: "All birds can fly" - This is false (penguins, ostriches cannot fly)
+Premise 2: "Penguins are birds" - This is true
+
+Step 3: Check logical validity
+The logical structure is valid, but since Premise 1 is false, the conclusion may not be true
+
+Step 4: Real-world verification
+In reality, penguins cannot fly despite being birds
+
+Conclusion: The argument is logically valid but soundness fails due to false premise
+
+Example 2:
+Premise 1: If it rains, then the ground gets wet
+Premise 2: It is raining
+Conclusion: Therefore, the ground gets wet
+
+Step 1: Analyze the structure
+This is modus ponens:
+If P, then Q
+P
+Therefore, Q
+
+Step 2: Evaluate premise validity
+Premise 1: "If it rains, then the ground gets wet" - Generally true
+Premise 2: "It is raining" - Given as true
+
+Step 3: Check logical validity
+Modus ponens is a valid argument form
+
+Step 4: Verify the conclusion
+Given the premises, the conclusion follows logically
+
+Conclusion: The argument is both logically valid and sound
+
+Now analyze:
+Argument: {logical_argument}
+
+Step 1: Analyze the argument structure
+{structure_analysis}
+
+Step 2: Evaluate premise validity
+{premise_evaluation}
+
+Step 3: Check logical validity
+{validity_check}
+
+Step 4: Verify the conclusion
+{conclusion_verification}
+
+Final Assessment: {argument_validity_assessment}
+```
+
+## Self-Consistency Techniques
+
+### Multiple Reasoning Paths
+```
+I'll solve this problem using three different approaches and see which result is most reliable.
+
+**Problem**: {complex_problem}
+
+**Approach 1: Direct Calculation**
+{first_approach_reasoning}
+Result 1: {result_1}
+
+**Approach 2: Logical Deduction**
+{second_approach_reasoning}
+Result 2: {result_2}
+
+**Approach 3: Pattern Recognition**
+{third_approach_reasoning}
+Result 3: {result_3}
+
+**Consistency Analysis:**
+- Approach 1 and 2 agree: {yes/no}
+- Approach 1 and 3 agree: {yes/no}
+- Approach 2 and 3 agree: {yes/no}
+
+**Final Decision:**
+{majority_result} appears in {count} out of 3 approaches.
+Confidence: {high/medium/low}
+
+Most Likely Answer: {final_answer_with_confidence}
+```
+
+### Verification Loop Pattern
+```
+Let me solve this step by step and verify each step.
+
+**Problem**: {problem_description}
+
+**Step 1: Initial Analysis**
+{initial_analysis}
+
+Verification: Does this make sense? {verification_1}
+
+**Step 2: Solution Development**
+{solution_development}
+
+Verification: Does this logically follow from step 1? {verification_2}
+
+**Step 3: Result Calculation**
+{result_calculation}
+
+Verification: Does this answer the original question? {verification_3}
+
+**Step 4: Cross-Check**
+Let me try a different approach to confirm:
+{alternative_approach}
+
+Results comparison: {comparison_analysis}
+
+**Final Answer:**
+{conclusion_with_verification_status}
+```
+
+## Specialized CoT Patterns
+
+### Code Debugging CoT
+```
+Debug the following code by analyzing it step by step.
+
+**Code:**
+{code_snippet}
+
+**Step 1: Understand the Code's Purpose**
+{purpose_analysis}
+
+**Step 2: Identify Expected Behavior**
+{expected_behavior}
+
+**Step 3: Trace the Execution**
+{execution_trace}
+
+**Step 4: Find the Error**
+{error_identification}
+
+**Step 5: Propose Fix**
+{fix_proposal}
+
+**Step 6: Verify the Fix**
+{fix_verification}
+
+**Fixed Code:**
+{corrected_code}
+```
+
+### Data Analysis CoT
+```
+Analyze this data systematically to draw meaningful conclusions.
+
+**Data:**
+{dataset}
+
+**Step 1: Understand the Data Structure**
+{data_structure_analysis}
+
+**Step 2: Identify Patterns and Trends**
+{pattern_identification}
+
+**Step 3: Calculate Key Metrics**
+{metrics_calculation}
+
+**Step 4: Compare with Benchmarks**
+{benchmark_comparison}
+
+**Step 5: Formulate Insights**
+{insight_generation}
+
+**Step 6: Validate Conclusions**
+{conclusion_validation}
+
+**Key Findings:**
+{summary_of_insights}
+```
+
+### Creative Problem Solving CoT
+```
+Generate creative solutions to this challenging problem.
+
+**Problem:**
+{creative_problem}
+
+**Step 1: Reframe the Problem**
+{problem_reframing}
+
+**Step 2: Brainstorm Multiple Angles**
+- Technical approach: {technical_ideas}
+- Business approach: {business_ideas}
+- User experience approach: {ux_ideas}
+- Unconventional approach: {unconventional_ideas}
+
+**Step 3: Evaluate Each Approach**
+{approach_evaluation}
+
+**Step 4: Synthesize Best Elements**
+{synthesis_process}
+
+**Step 5: Develop Final Solution**
+{solution_development}
+
+**Step 6: Test for Feasibility**
+{feasibility_testing}
+
+**Recommended Solution:**
+{final_creative_solution}
+```
+
+## Implementation Guidelines
+
+### When to Use Chain-of-Thought
+- **Multi-step problems**: Tasks requiring sequential reasoning
+- **Complex calculations**: Mathematical or logical derivations
+- **Problem decomposition**: Tasks that benefit from breaking down
+- **Verification needs**: When accuracy is critical
+- **Educational contexts**: When showing reasoning is valuable
+
+### CoT Effectiveness Factors
+- **Problem complexity**: Higher benefit for complex problems
+- **Task type**: Mathematical, logical, and analytical tasks benefit most
+- **Model capability**: Newer models handle CoT more effectively
+- **Context window**: Ensure sufficient space for reasoning steps
+- **Output requirements**: Detailed explanations benefit from CoT
+
+### Common Pitfalls to Avoid
+- **Over-explaining simple steps**: Keep proportional detail
+- **Circular reasoning**: Ensure logical progression
+- **Missing verification**: Always include validation steps
+- **Inconsistent confidence**: Use realistic confidence scoring
+- **Premature conclusions**: Don't jump to answers without full reasoning
+
+## Integration with Other Techniques
+
+### CoT + Few-Shot Learning
+- Include reasoning traces in examples
+- Show step-by-step problem-solving demonstrations
+- Teach verification and self-checking patterns
+
+### CoT + Template Systems
+- Embed CoT patterns within structured templates
+- Use conditional CoT based on problem complexity
+- Implement adaptive reasoning depth
+
+### CoT + Prompt Optimization
+- Test different CoT formulations
+- Optimize reasoning step granularity
+- Balance detail with efficiency
+
+This framework provides comprehensive patterns for implementing effective chain-of-thought reasoning across diverse problem types and applications.
--- a/skills/prompt-engineering/references/few-shot-patterns.md
+++ b/skills/prompt-engineering/references/few-shot-patterns.md
@@ -0,0 +1,273 @@
+# Few-Shot Learning Patterns
+
+This reference provides comprehensive frameworks for implementing effective few-shot learning strategies that maximize model performance within context window constraints.
+
+## Core Principles
+
+### Example Selection Strategy
+
+#### Semantic Similarity Selection
+- Use embedding similarity to find examples closest to target input
+- Cluster similar examples to avoid redundancy
+- Select diverse representatives from different semantic regions
+- Prioritize examples that cover key variations in problem space
+
+#### Diversity Sampling Approach
+- Ensure coverage of different input types and patterns
+- Include boundary cases and edge conditions
+- Balance simple and complex examples
+- Select examples that demonstrate different solution strategies
+
+#### Progressive Complexity Ordering
+- Start with simplest, most straightforward examples
+- Progress to increasingly complex scenarios
+- Include challenging edge cases last
+- Use this ordering to build understanding incrementally
+
+## Example Templates
+
+### Classification Tasks
+
+#### Binary Classification Template
+```
+Classify if the text expresses positive or negative sentiment.
+
+Example 1:
+Text: "I love this product! It works exactly as advertised and exceeded my expectations."
+Sentiment: Positive
+Reasoning: Contains enthusiastic language, positive adjectives, and satisfaction indicators
+
+Example 2:
+Text: "The customer service was terrible and the product broke after one day of use."
+Sentiment: Negative
+Reasoning: Contains negative adjectives, complaint language, and dissatisfaction indicators
+
+Example 3:
+Text: "It's okay, nothing special but does the basic job."
+Sentiment: Negative
+Reasoning: Contains lukewarm language, lack of enthusiasm, minimal positive elements
+
+Now classify:
+Text: {input_text}
+Sentiment:
+Reasoning:
+```
+
+#### Multi-Class Classification Template
+```
+Categorize the customer inquiry into one of: Technical Support, Billing, Sales, or General.
+
+Example 1:
+Inquiry: "My account was charged twice for the same subscription this month"
+Category: Billing
+Key indicators: "charged twice", "subscription", "account", financial terms
+
+Example 2:
+Inquiry: "The app keeps crashing when I try to upload files larger than 10MB"
+Category: Technical Support
+Key indicators: "crashing", "upload files", "technical issue", "error report"
+
+Example 3:
+Inquiry: "What are your pricing plans for enterprise customers?"
+Category: Sales
+Key indicators: "pricing plans", "enterprise", business inquiry, sales question
+
+Now categorize:
+Inquiry: {inquiry_text}
+Category:
+Key indicators:
+```
+
+### Transformation Tasks
+
+#### Text Transformation Template
+```
+Convert formal business text into casual, friendly language.
+
+Example 1:
+Formal: "We regret to inform you that your request cannot be processed at this time due to insufficient documentation."
+Casual: "Sorry, but we can't process your request right now because some documents are missing."
+
+Example 2:
+Formal: "The aforementioned individual has demonstrated exceptional proficiency in the designated responsibilities."
+Casual: "They've done a great job with their tasks and really know what they're doing."
+
+Example 3:
+Formal: "Please be advised that the scheduled meeting has been postponed pending further notice."
+Casual: "Hey, just letting you know that we've put off the meeting for now and will let you know when it's rescheduled."
+
+Now convert:
+Formal: {formal_text}
+Casual:
+```
+
+#### Data Extraction Template
+```
+Extract key information from the job posting into structured format.
+
+Example 1:
+Job Posting: "We are seeking a Senior Software Engineer with 5+ years of experience in Python and cloud technologies. This is a remote position offering $120k-$150k salary plus equity."
+
+Extracted:
+- Position: Senior Software Engineer
+- Experience Required: 5+ years
+- Skills: Python, cloud technologies
+- Location: Remote
+- Salary: $120k-$150k plus equity
+
+Example 2:
+Job Posting: "Marketing Manager needed for growing startup. Must have 3 years experience in digital marketing, social media management, and content creation. San Francisco office, competitive compensation."
+
+Extracted:
+- Position: Marketing Manager
+- Experience Required: 3 years
+- Skills: Digital marketing, social media management, content creation
+- Location: San Francisco
+- Salary: Competitive compensation
+
+Now extract:
+Job Posting: {job_posting_text}
+Extracted:
+```
+
+### Generation Tasks
+
+#### Creative Writing Template
+```
+Generate compelling product descriptions following the shown patterns.
+
+Example 1:
+Product: Wireless headphones with noise cancellation
+Description: "Immerse yourself in crystal-clear audio with our premium wireless headphones. Advanced noise cancellation technology blocks out distractions while 30-hour battery life keeps you connected all day long."
+
+Example 2:
+Product: Smart home security camera
+Description: "Protect what matters most with intelligent monitoring that alerts you to activity instantly. AI-powered detection distinguishes between people, pets, and vehicles for truly smart security."
+
+Example 3:
+Product: Portable espresso maker
+Description: "Barista-quality espresso anywhere, anytime. Compact design meets professional-grade extraction in this revolutionary portable machine that delivers perfect shots in under 30 seconds."
+
+Now generate:
+Product: {product_description}
+Description:
+```
+
+### Error Correction Patterns
+
+#### Error Detection and Correction Template
+```
+Identify and correct errors in the given text.
+
+Example 1:
+Text with errors: "Their going to the park to play there new game with they're friends."
+Correction: "They're going to the park to play their new game with their friends."
+Errors fixed: "Their → They're", "there → their", "they're → their"
+
+Example 2:
+Text with errors: "The company's new policy effects every employee and there morale."
+Correction: "The company's new policy affects every employee and their morale."
+Errors fixed: "effects → affects", "there → their"
+
+Example 3:
+Text with errors: "Its important to review you're work carefully before submiting."
+Correction: "It's important to review your work carefully before submitting."
+Errors fixed: "Its → It's", "you're → your", "submiting → submitting"
+
+Now correct:
+Text with errors: {text_with_errors}
+Correction:
+Errors fixed:
+```
+
+## Advanced Strategies
+
+### Dynamic Example Selection
+
+#### Context-Aware Selection
+```python
+def select_examples(input_text, example_database, max_examples=3):
+    """
+    Select most relevant examples based on semantic similarity and diversity.
+    """
+    # 1. Calculate similarity scores
+    similarities = calculate_similarity(input_text, example_database)
+
+    # 2. Sort by similarity
+    sorted_examples = sort_by_similarity(similarities)
+
+    # 3. Apply diversity sampling
+    diverse_examples = diversity_sampling(sorted_examples, max_examples)
+
+    # 4. Order by complexity
+    final_examples = order_by_complexity(diverse_examples)
+
+    return final_examples
+```
+
+#### Adaptive Example Count
+```python
+def determine_example_count(input_complexity, context_limit):
+    """
+    Adjust example count based on input complexity and available context.
+    """
+    base_count = 3
+
+    # Complex inputs benefit from more examples
+    if input_complexity > 0.8:
+        return min(base_count + 2, context_limit)
+    elif input_complexity > 0.5:
+        return base_count + 1
+    else:
+        return max(base_count - 1, 2)
+```
+
+### Quality Metrics for Examples
+
+#### Example Effectiveness Scoring
+```python
+def score_example_effectiveness(example, test_cases):
+    """
+    Score how effectively an example teaches the desired pattern.
+    """
+    metrics = {
+        'coverage': measure_pattern_coverage(example),
+        'clarity': measure_instructional_clarity(example),
+        'uniqueness': measure_uniqueness_from_other_examples(example),
+        'difficulty': measure_appropriateness_difficulty(example)
+    }
+
+    return weighted_average(metrics, weights=[0.3, 0.3, 0.2, 0.2])
+```
+
+## Best Practices
+
+### Example Quality Guidelines
+- **Clarity**: Examples should clearly demonstrate the desired pattern
+- **Accuracy**: Input-output pairs must be correct and consistent
+- **Relevance**: Examples should be representative of target task
+- **Diversity**: Include variation in input types and complexity levels
+- **Completeness**: Cover edge cases and boundary conditions
+
+### Context Management
+- **Token Efficiency**: Optimize example length while maintaining clarity
+- **Progressive Disclosure**: Start simple, increase complexity gradually
+- **Redundancy Elimination**: Remove overlapping or duplicate examples
+- **Compression**: Use concise representations where possible
+
+### Common Pitfalls to Avoid
+- **Overfitting**: Don't include too many examples from same pattern
+- **Under-representation**: Ensure coverage of important variations
+- **Ambiguity**: Examples should have clear, unambiguous solutions
+- **Context Overflow**: Balance example count with window limitations
+- **Poor Ordering**: Place examples in logical progression order
+
+## Integration with Other Patterns
+
+Few-shot learning combines effectively with:
+- **Chain-of-Thought**: Add reasoning steps to examples
+- **Template Systems**: Use few-shot within structured templates
+- **Prompt Optimization**: Test different example selections
+- **System Prompts**: Establish few-shot learning expectations in system prompts
+
+This framework provides the foundation for implementing effective few-shot learning across diverse tasks and model types.
--- a/skills/prompt-engineering/references/optimization-frameworks.md
+++ b/skills/prompt-engineering/references/optimization-frameworks.md
@@ -0,0 +1,488 @@
+# Prompt Optimization Frameworks
+
+This reference provides systematic methodologies for iteratively improving prompt performance through structured testing, measurement, and refinement processes.
+
+## Optimization Process Overview
+
+### Iterative Improvement Cycle
+```mermaid
+graph TD
+    A[Baseline Measurement] --> B[Hypothesis Generation]
+    B --> C[Controlled Test]
+    C --> D[Performance Analysis]
+    D --> E[Statistical Validation]
+    E --> F[Implementation Decision]
+    F --> G[Monitor Impact]
+    G --> H[Learn & Iterate]
+    H --> B
+```
+
+### Core Optimization Principles
+- **Single Variable Testing**: Change one element at a time for accurate attribution
+- **Measurable Metrics**: Define quantitative success criteria
+- **Statistical Significance**: Use proper sample sizes and validation methods
+- **Controlled Environment**: Test conditions must be consistent
+- **Baseline Comparison**: Always measure against established baseline
+
+## Performance Metrics Framework
+
+### Primary Metrics
+
+#### Task Success Rate
+```python
+def calculate_success_rate(results, expected_outputs):
+    """
+    Measure percentage of tasks completed correctly.
+    """
+    correct = sum(1 for result, expected in zip(results, expected_outputs)
+                  if result == expected)
+    return (correct / len(results)) * 100
+```
+
+#### Response Consistency
+```python
+def measure_consistency(prompt, test_cases, num_runs=5):
+    """
+    Measure response stability across multiple runs.
+    """
+    responses = {}
+    for test_case in test_cases:
+        test_responses = []
+        for _ in range(num_runs):
+            response = execute_prompt(prompt, test_case)
+            test_responses.append(response)
+
+        # Calculate similarity score for consistency
+        consistency = calculate_similarity(test_responses)
+        responses[test_case] = consistency
+
+    return sum(responses.values()) / len(responses)
+```
+
+#### Token Efficiency
+```python
+def calculate_token_efficiency(prompt, test_cases):
+    """
+    Measure token usage per successful task completion.
+    """
+    total_tokens = 0
+    successful_tasks = 0
+
+    for test_case in test_cases:
+        response = execute_prompt_with_metrics(prompt, test_case)
+        total_tokens += response.token_count
+        if response.is_successful:
+            successful_tasks += 1
+
+    return total_tokens / successful_tasks if successful_tasks > 0 else float('inf')
+```
+
+#### Response Latency
+```python
+def measure_response_time(prompt, test_cases):
+    """
+    Measure average response time.
+    """
+    times = []
+    for test_case in test_cases:
+        start_time = time.time()
+        execute_prompt(prompt, test_case)
+        end_time = time.time()
+        times.append(end_time - start_time)
+
+    return sum(times) / len(times)
+```
+
+### Secondary Metrics
+
+#### Output Quality Score
+```python
+def assess_output_quality(response, criteria):
+    """
+    Multi-dimensional quality assessment.
+    """
+    scores = {
+        'accuracy': measure_accuracy(response),
+        'completeness': measure_completeness(response),
+        'coherence': measure_coherence(response),
+        'relevance': measure_relevance(response),
+        'format_compliance': measure_format_compliance(response)
+    }
+
+    weights = [0.3, 0.2, 0.2, 0.2, 0.1]
+    return sum(score * weight for score, weight in zip(scores.values(), weights))
+```
+
+#### Safety Compliance
+```python
+def check_safety_compliance(response):
+    """
+    Measure adherence to safety guidelines.
+    """
+    violations = []
+
+    # Check for various safety issues
+    if contains_harmful_content(response):
+        violations.append('harmful_content')
+    if has_bias(response):
+        violations.append('bias')
+    if violates_privacy(response):
+        violations.append('privacy_violation')
+
+    safety_score = max(0, 100 - len(violations) * 25)
+    return safety_score, violations
+```
+
+## A/B Testing Methodology
+
+### Controlled Test Design
+```python
+def design_ab_test(baseline_prompt, variant_prompt, test_cases):
+    """
+    Design controlled A/B test with proper statistical power.
+    """
+    # Calculate required sample size
+    effect_size = estimate_effect_size(baseline_prompt, variant_prompt)
+    sample_size = calculate_sample_size(effect_size, power=0.8, alpha=0.05)
+
+    # Random assignment
+    randomized_cases = random.sample(test_cases, sample_size)
+    split_point = len(randomized_cases) // 2
+
+    group_a = randomized_cases[:split_point]
+    group_b = randomized_cases[split_point:]
+
+    return {
+        'baseline_group': group_a,
+        'variant_group': group_b,
+        'sample_size': sample_size,
+        'statistical_power': 0.8,
+        'significance_level': 0.05
+    }
+```
+
+### Statistical Analysis
+```python
+def analyze_ab_results(baseline_results, variant_results):
+    """
+    Perform statistical analysis of A/B test results.
+    """
+    # Calculate means and standard deviations
+    baseline_mean = np.mean(baseline_results)
+    variant_mean = np.mean(variant_results)
+    baseline_std = np.std(baseline_results)
+    variant_std = np.std(variant_results)
+
+    # Perform t-test
+    t_statistic, p_value = stats.ttest_ind(baseline_results, variant_results)
+
+    # Calculate effect size (Cohen's d)
+    pooled_std = np.sqrt(((len(baseline_results) - 1) * baseline_std**2 +
+                         (len(variant_results) - 1) * variant_std**2) /
+                        (len(baseline_results) + len(variant_results) - 2))
+    cohens_d = (variant_mean - baseline_mean) / pooled_std
+
+    return {
+        'baseline_mean': baseline_mean,
+        'variant_mean': variant_mean,
+        'improvement': ((variant_mean - baseline_mean) / baseline_mean) * 100,
+        'p_value': p_value,
+        'statistical_significance': p_value < 0.05,
+        'effect_size': cohens_d,
+        'recommendation': 'implement_variant' if p_value < 0.05 and cohens_d > 0.2 else 'keep_baseline'
+    }
+```
+
+## Optimization Strategies
+
+### Strategy 1: Progressive Enhancement
+
+#### Stepwise Improvement Process
+```python
+def progressive_optimization(base_prompt, test_cases, max_iterations=10):
+    """
+    Incrementally improve prompt through systematic testing.
+    """
+    current_prompt = base_prompt
+    current_performance = evaluate_prompt(current_prompt, test_cases)
+    optimization_history = []
+
+    for iteration in range(max_iterations):
+        # Generate improvement hypotheses
+        hypotheses = generate_improvement_hypotheses(current_prompt, current_performance)
+
+        best_improvement = None
+        best_performance = current_performance
+
+        for hypothesis in hypotheses:
+            # Test hypothesis
+            test_prompt = apply_hypothesis(current_prompt, hypothesis)
+            test_performance = evaluate_prompt(test_prompt, test_cases)
+
+            # Validate improvement
+            if is_statistically_significant(current_performance, test_performance):
+                if test_performance.overall_score > best_performance.overall_score:
+                    best_improvement = hypothesis
+                    best_performance = test_performance
+
+        # Apply best improvement if found
+        if best_improvement:
+            current_prompt = apply_hypothesis(current_prompt, best_improvement)
+            optimization_history.append({
+                'iteration': iteration,
+                'hypothesis': best_improvement,
+                'performance_before': current_performance,
+                'performance_after': best_performance,
+                'improvement': best_performance.overall_score - current_performance.overall_score
+            })
+            current_performance = best_performance
+        else:
+            break  # No further improvements found
+
+    return current_prompt, optimization_history
+```
+
+### Strategy 2: Multi-Objective Optimization
+
+#### Pareto Optimization Framework
+```python
+def multi_objective_optimization(prompt_variants, objectives):
+    """
+    Optimize for multiple competing objectives using Pareto efficiency.
+    """
+    results = []
+
+    for variant in prompt_variants:
+        scores = {}
+        for objective in objectives:
+            scores[objective] = evaluate_objective(variant, objective)
+
+        results.append({
+            'prompt': variant,
+            'scores': scores,
+            'dominates': []
+        })
+
+    # Find Pareto optimal solutions
+    pareto_optimal = []
+    for i, result_i in enumerate(results):
+        is_dominated = False
+        for j, result_j in enumerate(results):
+            if i != j and dominates(result_j, result_i):
+                is_dominated = True
+                break
+
+        if not is_dominated:
+            pareto_optimal.append(result_i)
+
+    return pareto_optimal
+
+def dominates(result_a, result_b):
+    """
+    Check if result_a dominates result_b in all objectives.
+    """
+    return all(result_a['scores'][obj] >= result_b['scores'][obj]
+               for obj in result_a['scores'])
+```
+
+### Strategy 3: Adaptive Testing
+
+#### Dynamic Test Allocation
+```python
+def adaptive_testing(prompt_variants, initial_budget):
+    """
+    Dynamically allocate testing budget to promising variants.
+    """
+    # Initial exploration phase
+    exploration_results = {}
+    budget分配 = initial_budget // len(prompt_variants)
+
+    for variant in prompt_variants:
+        exploration_results[variant] = test_prompt(variant, budget分配)
+
+    # Exploitation phase - allocate more budget to promising variants
+    total_budget_spent = len(prompt_variants) * budget分配
+    remaining_budget = initial_budget - total_budget_spent
+
+    # Sort by performance
+    sorted_variants = sorted(exploration_results.items(),
+                           key=lambda x: x[1].overall_score, reverse=True)
+
+    # Allocate remaining budget proportionally to performance
+    final_results = {}
+    for i, (variant, initial_result) in enumerate(sorted_variants):
+        if remaining_budget > 0:
+            additional_budget = max(1, remaining_budget // (len(sorted_variants) - i))
+            final_results[variant] = test_prompt(variant, additional_budget)
+            remaining_budget -= additional_budget
+        else:
+            final_results[variant] = initial_result
+
+    return final_results
+```
+
+## Optimization Hypotheses
+
+### Common Optimization Areas
+
+#### Instruction Clarity
+```python
+instruction_clarity_hypotheses = [
+    "Add numbered steps to instructions",
+    "Include specific output format examples",
+    "Clarify role and expertise level",
+    "Add context and background information",
+    "Specify constraints and boundaries",
+    "Include success criteria and evaluation standards"
+]
+```
+
+#### Example Quality
+```python
+example_optimization_hypotheses = [
+    "Increase number of examples from 3 to 5",
+    "Add edge case examples",
+    "Reorder examples by complexity",
+    "Include negative examples",
+    "Add reasoning traces to examples",
+    "Improve example diversity and coverage"
+]
+```
+
+#### Structure Optimization
+```python
+structure_hypotheses = [
+    "Add clear section headings",
+    "Reorganize content flow",
+    "Include summary at the beginning",
+    "Add checklist for verification",
+    "Separate instructions from examples",
+    "Add troubleshooting section"
+]
+```
+
+#### Model-Specific Optimization
+```python
+model_specific_hypotheses = {
+    'claude': [
+        "Use XML tags for structure",
+        "Add <thinking> sections for reasoning",
+        "Include constitutional AI principles",
+        "Use system message format",
+        "Add safety guidelines and constraints"
+    ],
+    'gpt-4': [
+        "Use numbered sections with ### headers",
+        "Include JSON format specifications",
+        "Add function calling patterns",
+        "Use bullet points for clarity",
+        "Include error handling instructions"
+    ],
+    'gemini': [
+        "Use bold headers with ** formatting",
+        "Include step-by-step process descriptions",
+        "Add validation checkpoints",
+        "Use conversational tone",
+        "Include confidence scoring"
+    ]
+}
+```
+
+## Continuous Monitoring
+
+### Production Performance Tracking
+```python
+def setup_monitoring(prompt, alert_thresholds):
+    """
+    Set up continuous monitoring for deployed prompts.
+    """
+    monitors = {
+        'success_rate': MetricMonitor('success_rate', alert_thresholds['success_rate']),
+        'response_time': MetricMonitor('response_time', alert_thresholds['response_time']),
+        'token_cost': MetricMonitor('token_cost', alert_thresholds['token_cost']),
+        'safety_score': MetricMonitor('safety_score', alert_thresholds['safety_score'])
+    }
+
+    def monitor_performance():
+        recent_data = collect_recent_performance(prompt)
+        alerts = []
+
+        for metric_name, monitor in monitors.items():
+            if metric_name in recent_data:
+                alert = monitor.check(recent_data[metric_name])
+                if alert:
+                    alerts.append(alert)
+
+        return alerts
+
+    return monitor_performance
+```
+
+### Automated Rollback System
+```python
+def automated_rollback_system(prompts, monitoring_data):
+    """
+    Automatically rollback to previous version if performance degrades.
+    """
+    def check_and_rollback(current_prompt, baseline_prompt):
+        current_metrics = monitoring_data.get_metrics(current_prompt)
+        baseline_metrics = monitoring_data.get_metrics(baseline_prompt)
+
+        # Check if performance degradation exceeds threshold
+        degradation_threshold = 0.1  # 10% degradation
+
+        for metric in current_metrics:
+            if current_metrics[metric] < baseline_metrics[metric] * (1 - degradation_threshold):
+                return True, f"Performance degradation in {metric}"
+
+        return False, "Performance acceptable"
+
+    return check_and_rollback
+```
+
+## Optimization Tools and Utilities
+
+### Prompt Variation Generator
+```python
+def generate_prompt_variations(base_prompt):
+    """
+    Generate systematic variations for testing.
+    """
+    variations = {}
+
+    # Instruction variations
+    variations['more_detailed'] = add_detailed_instructions(base_prompt)
+    variations['simplified'] = simplify_instructions(base_prompt)
+    variations['structured'] = add_structured_format(base_prompt)
+
+    # Example variations
+    variations['more_examples'] = add_examples(base_prompt)
+    variations['better_examples'] = improve_example_quality(base_prompt)
+    variations['diverse_examples'] = add_example_diversity(base_prompt)
+
+    # Format variations
+    variations['numbered_steps'] = add_numbered_steps(base_prompt)
+    variations['bullet_points'] = use_bullet_points(base_prompt)
+    variations['sections'] = add_section_headers(base_prompt)
+
+    return variations
+```
+
+### Performance Dashboard
+```python
+def create_performance_dashboard(optimization_history):
+    """
+    Create visualization of optimization progress.
+    """
+    # Generate performance metrics over time
+    metrics_over_time = {
+        'iterations': [h['iteration'] for h in optimization_history],
+        'success_rates': [h['performance_after'].success_rate for h in optimization_history],
+        'token_efficiency': [h['performance_after'].token_efficiency for h in optimization_history],
+        'response_times': [h['performance_after'].response_time for h in optimization_history]
+    }
+
+    return PerformanceDashboard(metrics_over_time)
+```
+
+This comprehensive framework provides systematic methodologies for continuous prompt improvement through data-driven optimization and rigorous testing processes.
--- a/skills/prompt-engineering/references/system-prompt-design.md
+++ b/skills/prompt-engineering/references/system-prompt-design.md
@@ -0,0 +1,494 @@
+# System Prompt Design
+
+This reference provides comprehensive frameworks for designing effective system prompts that establish consistent model behavior, define clear boundaries, and ensure reliable performance across diverse applications.
+
+## System Prompt Architecture
+
+### Core Components Structure
+```
+1. Role Definition & Expertise
+2. Behavioral Guidelines & Constraints
+3. Interaction Protocols
+4. Output Format Specifications
+5. Safety & Ethical Guidelines
+6. Context & Background Information
+7. Quality Standards & Verification
+8. Error Handling & Uncertainty Protocols
+```
+
+## Component Design Patterns
+
+### 1. Role Definition Framework
+
+#### Comprehensive Role Specification
+```markdown
+## Role Definition
+You are an expert {role} with {experience_level} of specialized experience in {domain}. Your expertise includes:
+
+### Core Competencies
+- {competency_1}
+- {competency_2}
+- {competency_3}
+- {competency_4}
+
+### Knowledge Boundaries
+- You have deep knowledge of {strength_area_1} and {strength_area_2}
+- Your knowledge is current as of {knowledge_cutoff_date}
+- You should acknowledge limitations in {limitation_area}
+- When uncertain about recent developments, state this explicitly
+
+### Professional Standards
+- Adhere to {industry_standard_1} guidelines
+- Follow {industry_standard_2} best practices
+- Maintain {professional_attribute} in all interactions
+- Ensure compliance with {regulatory_framework}
+```
+
+#### Specialized Role Templates
+
+##### Technical Expert Role
+```markdown
+## Technical Expert Role
+You are a Senior {domain} Engineer with {years} years of experience in {specialization}. Your expertise encompasses:
+
+### Technical Proficiency
+- Deep understanding of {technology_stack}
+- Experience with {specific_frameworks} and {tools}
+- Knowledge of {design_patterns} and {architectures}
+- Proficiency in {programming_languages} and {development_methodologies}
+
+### Problem-Solving Approach
+- Analyze problems systematically using {methodology}
+- Consider multiple solution approaches before recommending
+- Evaluate trade-offs between {criteria_1}, {criteria_2}, and {criteria_3}
+- Provide scalable and maintainable solutions
+
+### Communication Style
+- Explain technical concepts clearly to both technical and non-technical audiences
+- Use precise terminology when appropriate
+- Provide concrete examples and code snippets when helpful
+- Structure responses with clear sections and logical flow
+```
+
+##### Analyst Role
+```markdown
+## Analyst Role
+You are a professional {analysis_type} Analyst with expertise in {data_domain} and {methodology}. Your analytical approach includes:
+
+### Analytical Framework
+- Apply {analytical_methodology} for systematic analysis
+- Use {statistical_techniques} for data interpretation
+- Consider {contextual_factors} in your analysis
+- Validate findings through {verification_methods}
+
+### Critical Thinking Process
+- Question assumptions and identify potential biases
+- Evaluate evidence quality and source reliability
+- Consider alternative explanations and perspectives
+- Synthesize information from multiple sources
+
+### Reporting Standards
+- Present findings with appropriate confidence levels
+- Distinguish between facts, interpretations, and recommendations
+- Provide evidence-based conclusions
+- Acknowledge limitations and uncertainties
+```
+
+### 2. Behavioral Guidelines Design
+
+#### Comprehensive Behavior Framework
+```markdown
+## Behavioral Guidelines
+
+### Interaction Style
+- Maintain {tone} tone throughout all interactions
+- Use {communication_approach} when explaining complex concepts
+- Be {responsiveness_level} in addressing user questions
+- Demonstrate {empathy_level} when dealing with user challenges
+
+### Response Standards
+- Provide responses that are {length_preference} and {detail_preference}
+- Structure information using {organization_pattern}
+- Include {frequency} examples and illustrations
+- Use {format_preference} formatting for clarity
+
+### Quality Expectations
+- Ensure all information is {accuracy_standard}
+- Provide citations for {information_type} when available
+- Cross-verify information using {verification_method}
+- Update knowledge based on {update_criteria}
+```
+
+#### Model-Specific Behavior Patterns
+
+##### Claude 3.5/4 Specific Guidelines
+```markdown
+## Claude-Specific Behavioral Guidelines
+
+### Constitutional Alignment
+- Follow constitutional AI principles in all responses
+- Prioritize helpfulness while maintaining safety
+- Consider multiple perspectives before concluding
+- Avoid harmful content while remaining useful
+
+### Output Formatting
+- Use XML tags for structured information: <tag>content</tag>
+- Include thinking blocks for complex reasoning: <thinking>...</thinking>
+- Provide clear section headers with proper hierarchy
+- Use markdown formatting for improved readability
+
+### Safety Protocols
+- Apply content policies consistently
+- Identify and flag potentially harmful requests
+- Provide safe alternatives when appropriate
+- Maintain transparency about limitations
+```
+
+##### GPT-4 Specific Guidelines
+```markdown
+## GPT-4 Specific Behavioral Guidelines
+
+### Structured Response Patterns
+- Use numbered lists for step-by-step processes
+- Implement clear section boundaries with ### headers
+- Provide JSON formatted outputs when specified
+- Use consistent indentation and formatting
+
+### Function Calling Integration
+- Recognize when function calling would be appropriate
+- Structure responses to facilitate tool usage
+- Provide clear parameter specifications
+- Handle function results systematically
+
+### Optimization Behaviors
+- Balance conciseness with comprehensiveness
+- Prioritize information relevance and importance
+- Use efficient language patterns
+- Minimize redundancy while maintaining clarity
+```
+
+### 3. Output Format Specifications
+
+#### Comprehensive Format Framework
+```markdown
+## Output Format Requirements
+
+### Structure Standards
+- Begin responses with {opening_pattern}
+- Use {section_pattern} for major sections
+- Implement {hierarchy_pattern} for information organization
+- Include {closing_pattern} for response completion
+
+### Content Organization
+- Present information in {presentation_order}
+- Group related information using {grouping_method}
+- Use {transition_pattern} between sections
+- Include {summary_element} for complex responses
+
+### Format Specifications
+{if json_format_required}
+- Provide responses in valid JSON format
+- Use consistent key naming conventions
+- Include all required fields
+- Validate JSON syntax before output
+{endif}
+
+{if markdown_format_required}
+- Use markdown for formatting and emphasis
+- Include appropriate heading levels
+- Use code blocks for technical content
+- Implement tables for structured data
+{endif}
+```
+
+### 4. Safety and Ethical Guidelines
+
+#### Comprehensive Safety Framework
+```markdown
+## Safety and Ethical Guidelines
+
+### Content Policies
+- Avoid generating {prohibited_content_1}
+- Do not provide {prohibited_content_2}
+- Flag {sensitive_topics} for human review
+- Provide {safe_alternatives} when appropriate
+
+### Ethical Considerations
+- Consider {ethical_principle_1} in all responses
+- Evaluate potential {ethical_impact} of provided information
+- Balance helpfulness with {safety_consideration}
+- Maintain {transparency_standard} about limitations
+
+### Bias Mitigation
+- Actively identify and mitigate {bias_type_1}
+- Present information {neutrality_standard}
+- Include {diverse_perspectives} when appropriate
+- Avoid {stereotype_patterns}
+
+### Harm Prevention
+- Identify potential {harm_type_1} in responses
+- Implement {prevention_mechanism} for harmful content
+- Provide {warning_system} for sensitive topics
+- Include {escalation_protocol} for concerning requests
+```
+
+### 5. Error Handling and Uncertainty
+
+#### Comprehensive Error Management
+```markdown
+## Error Handling and Uncertainty Protocols
+
+### Uncertainty Management
+- Explicitly state confidence levels for uncertain information
+- Use phrases like "I believe," "It appears that," "Based on available information"
+- Acknowledge when information may be {uncertainty_type}
+- Provide {verification_method} for uncertain claims
+
+### Error Recognition
+- Identify when {error_pattern} might have occurred
+- Implement {self_checking_mechanism} for accuracy
+- Use {validation_process} for important information
+- Provide {correction_protocol} when errors are identified
+
+### Limitation Acknowledgment
+- Clearly state {knowledge_limitation} when relevant
+- Explain {limitation_reason} when unable to provide complete information
+- Suggest {alternative_approach} when direct assistance isn't possible
+- Provide {escalation_option} for complex scenarios
+
+### Correction Procedures
+- Implement {correction_workflow} for identified errors
+- Provide {explanation_format} for corrections
+- Use {acknowledgment_pattern} for mistakes
+- Include {improvement_commitment} for future accuracy
+```
+
+## Specialized System Prompt Templates
+
+### 1. Educational Assistant System Prompt
+```markdown
+# Educational Assistant System Prompt
+
+## Role Definition
+You are an expert educational assistant specializing in {subject_area} with {experience_level} of teaching experience. Your pedagogical approach emphasizes {teaching_philosophy} and adapts to different learning styles.
+
+## Educational Philosophy
+- Create inclusive and supportive learning environments
+- Adapt explanations to match learner's comprehension level
+- Use scaffolding techniques to build understanding progressively
+- Encourage critical thinking and independent learning
+
+## Teaching Standards
+- Provide accurate, up-to-date information verified through {verification_sources}
+- Use clear, accessible language appropriate for the target audience
+- Include relevant examples and analogies to enhance understanding
+- Structure learning objectives with clear progression
+
+## Interaction Protocols
+- Assess learner's current understanding before providing explanations
+- Ask clarifying questions to tailor responses appropriately
+- Provide opportunities for learner questions and feedback
+- Offer additional resources for extended learning
+
+## Output Format
+- Begin with brief assessment of learner's needs
+- Use clear headings and organized structure
+- Include summary points for key takeaways
+- Provide practice exercises when appropriate
+- End with suggestions for further learning
+
+## Safety Guidelines
+- Create psychologically safe learning environments
+- Avoid language that might discourage or intimidate learners
+- Be patient and supportive when learners struggle with concepts
+- Respect diverse backgrounds and learning abilities
+
+## Uncertainty Handling
+- Acknowledge when topics are beyond current expertise
+- Suggest reliable resources for additional information
+- Be transparent about the limits of available knowledge
+- Encourage critical thinking and independent verification
+```
+
+### 2. Technical Documentation Generator System Prompt
+```markdown
+# Technical Documentation System Prompt
+
+## Role Definition
+You are a Senior Technical Writer with {years} of experience creating documentation for {technology_domain}. Your expertise encompasses {documentation_types} and you follow {industry_standards} for technical communication.
+
+## Documentation Standards
+- Follow {style_guide} for consistent formatting and terminology
+- Ensure clarity and accuracy in all technical explanations
+- Include practical examples and code snippets when helpful
+- Structure content with clear hierarchy and logical flow
+
+## Quality Requirements
+- Maintain technical accuracy verified through {review_process}
+- Use consistent terminology throughout documentation
+- Provide comprehensive coverage of topics without overwhelming detail
+- Include troubleshooting information for common issues
+
+## Audience Considerations
+- Target documentation at {audience_level} technical proficiency
+- Define technical terms and concepts appropriately
+- Provide progressive disclosure of complex information
+- Include context and motivation for technical decisions
+
+## Format Specifications
+- Use markdown formatting for clear structure and readability
+- Include code blocks with syntax highlighting
+- Implement consistent section headings and numbering
+- Provide navigation aids and cross-references
+
+## Review Process
+- Verify technical accuracy through {verification_method}
+- Test all code examples and procedures
+- Ensure completeness of coverage for documented features
+- Validate clarity and comprehensibility with target audience
+
+## Safety and Compliance
+- Include security considerations where relevant
+- Document potential risks and mitigation strategies
+- Follow industry compliance requirements
+- Maintain confidentiality for sensitive information
+```
+
+### 3. Data Analysis System Prompt
+```markdown
+# Data Analysis System Prompt
+
+## Role Definition
+You are an expert Data Analyst specializing in {data_domain} with {years} of experience in {analysis_methodologies}. Your analytical approach combines {technical_skills} with {business_acumen} to deliver actionable insights.
+
+## Analytical Framework
+- Apply {statistical_methods} for rigorous data analysis
+- Use {visualization_techniques} for effective data communication
+- Implement {quality_assurance} processes for data validation
+- Follow {ethical_guidelines} for responsible data handling
+
+## Analysis Standards
+- Ensure methodological soundness in all analyses
+- Provide clear documentation of analytical processes
+- Include appropriate statistical measures and confidence intervals
+- Validate findings through {validation_methods}
+
+## Communication Requirements
+- Present findings with appropriate technical depth for the audience
+- Use clear visualizations and narrative explanations
+- Highlight actionable insights and recommendations
+- Acknowledge limitations and uncertainties in analyses
+
+## Output Structure
+```json
+{
+  "executive_summary": "High-level overview of key findings",
+  "methodology": "Description of analytical approach and methods used",
+  "data_overview": "Summary of data sources, quality, and limitations",
+  "key_findings": [
+    {
+      "finding": "Specific discovery or insight",
+      "evidence": "Supporting data and statistical measures",
+      "confidence": "Confidence level in the finding",
+      "implications": "Business or operational implications"
+    }
+  ],
+  "recommendations": [
+    {
+      "action": "Recommended action",
+      "priority": "High/Medium/Low",
+      "expected_impact": "Anticipated outcome",
+      "implementation_considerations": "Factors to consider"
+    }
+  ],
+  "limitations": "Constraints and limitations of the analysis",
+  "next_steps": "Suggested follow-up analyses or actions"
+}
+```
+
+## Ethical Considerations
+- Protect privacy and confidentiality of data subjects
+- Ensure unbiased analysis and interpretation
+- Consider potential impact of findings on stakeholders
+- Maintain transparency about analytical limitations
+```
+
+## System Prompt Testing and Validation
+
+### Validation Framework
+```python
+class SystemPromptValidator:
+    def __init__(self):
+        self.validation_criteria = {
+            'role_clarity': 0.2,
+            'instruction_specificity': 0.2,
+            'safety_completeness': 0.15,
+            'output_format_clarity': 0.15,
+            'error_handling_coverage': 0.1,
+            'behavioral_consistency': 0.1,
+            'ethical_considerations': 0.1
+        }
+
+    def validate_prompt(self, system_prompt):
+        """Validate system prompt against quality criteria."""
+        scores = {}
+
+        scores['role_clarity'] = self.assess_role_clarity(system_prompt)
+        scores['instruction_specificity'] = self.assess_instruction_specificity(system_prompt)
+        scores['safety_completeness'] = self.assess_safety_completeness(system_prompt)
+        scores['output_format_clarity'] = self.assess_output_format_clarity(system_prompt)
+        scores['error_handling_coverage'] = self.assess_error_handling(system_prompt)
+        scores['behavioral_consistency'] = self.assess_behavioral_consistency(system_prompt)
+        scores['ethical_considerations'] = self.assess_ethical_considerations(system_prompt)
+
+        # Calculate overall score
+        overall_score = sum(score * weight for score, weight in
+                           zip(scores.values(), self.validation_criteria.values()))
+
+        return {
+            'overall_score': overall_score,
+            'individual_scores': scores,
+            'recommendations': self.generate_recommendations(scores)
+        }
+
+    def test_prompt_consistency(self, system_prompt, test_scenarios):
+        """Test prompt behavior consistency across different scenarios."""
+        results = []
+
+        for scenario in test_scenarios:
+            response = execute_with_system_prompt(system_prompt, scenario)
+
+            # Analyze response consistency
+            consistency_score = self.analyze_response_consistency(response, system_prompt)
+            results.append({
+                'scenario': scenario,
+                'response': response,
+                'consistency_score': consistency_score
+            })
+
+        average_consistency = sum(r['consistency_score'] for r in results) / len(results)
+
+        return {
+            'average_consistency': average_consistency,
+            'scenario_results': results,
+            'recommendations': self.generate_consistency_recommendations(results)
+        }
+```
+
+## Best Practices Summary
+
+### Design Principles
+- **Clarity First**: Ensure role and instructions are unambiguous
+- **Comprehensive Coverage**: Address all aspects of model behavior
+- **Consistency Focus**: Maintain consistent behavior across scenarios
+- **Safety Priority**: Include robust safety guidelines and constraints
+- **Flexibility Built-in**: Allow for adaptation to different contexts
+
+### Common Pitfalls to Avoid
+- **Vague Instructions**: Be specific about expected behaviors
+- **Over-constraining**: Allow room for intelligent adaptation
+- **Missing Safety Guidelines**: Always include comprehensive safety measures
+- **Inconsistent Formatting**: Use consistent structure throughout
+- **Ignoring Model Capabilities**: Design prompts that leverage model strengths
+
+This comprehensive system prompt design framework provides the foundation for creating effective, reliable, and safe AI system behaviors across diverse applications and use cases.
--- a/skills/prompt-engineering/references/template-systems.md
+++ b/skills/prompt-engineering/references/template-systems.md
@@ -0,0 +1,599 @@
+# Template Systems Architecture
+
+This reference provides comprehensive frameworks for building modular, reusable prompt templates with variable interpolation, conditional sections, and hierarchical composition.
+
+## Template Design Principles
+
+### Modularity and Reusability
+- **Single Responsibility**: Each template handles one specific type of task
+- **Composability**: Templates can be combined to create complex prompts
+- **Parameterization**: Variables allow customization without core changes
+- **Inheritance**: Base templates can be extended for specific use cases
+
+### Clear Variable Naming Conventions
+```
+{user_input}           - Direct input from user
+{context}             - Background information
+{examples}            - Few-shot learning examples
+{constraints}         - Task limitations and requirements
+{output_format}       - Desired output structure
+{role}                - AI role or persona
+{expertise_level}     - Level of expertise for the role
+{domain}              - Specific domain or field
+{difficulty}          - Task complexity level
+{language}            - Output language specification
+```
+
+## Core Template Components
+
+### 1. Base Template Structure
+```
+# Template: Universal Task Framework
+# Purpose: Base template for most task types
+# Variables: {role}, {task_description}, {context}, {examples}, {output_format}
+
+## System Instructions
+You are a {role} with {expertise_level} expertise in {domain}.
+
+## Context Information
+{if context}
+Background and relevant context:
+{context}
+{endif}
+
+## Task Description
+{task_description}
+
+## Examples
+{if examples}
+Here are some examples to guide your response:
+
+{examples}
+{endif}
+
+## Output Requirements
+{output_format}
+
+## Constraints and Guidelines
+{constraints}
+
+## User Input
+{user_input}
+```
+
+### 2. Conditional Sections Framework
+```python
+def process_conditional_template(template, variables):
+    """
+    Process template with conditional sections.
+    """
+    # Process if/endif blocks
+    while '{if ' in template:
+        start = template.find('{if ')
+        end_condition = template.find('}', start)
+        condition = template[start+4:end_condition].strip()
+
+        start_endif = template.find('{endif}', end_condition)
+        if_content = template[end_condition+1:start_endif].strip()
+
+        # Evaluate condition
+        if evaluate_condition(condition, variables):
+            template = template[:start] + if_content + template[start_endif+6:]
+        else:
+            template = template[:start] + template[start_endif+6:]
+
+    # Replace variables
+    for key, value in variables.items():
+        template = template.replace(f'{{{key}}}', str(value))
+
+    return template
+```
+
+### 3. Variable Interpolation System
+```python
+class TemplateEngine:
+    def __init__(self):
+        self.variables = {}
+        self.functions = {
+            'upper': str.upper,
+            'lower': str.lower,
+            'capitalize': str.capitalize,
+            'pluralize': self.pluralize,
+            'format_date': self.format_date,
+            'truncate': self.truncate
+        }
+
+    def set_variable(self, name, value):
+        """Set a template variable."""
+        self.variables[name] = value
+
+    def render(self, template):
+        """Render template with variable substitution."""
+        # Process function calls {variable|function}
+        template = self.process_functions(template)
+
+        # Replace variables
+        for key, value in self.variables.items():
+            template = template.replace(f'{{{key}}}', str(value))
+
+        return template
+
+    def process_functions(self, template):
+        """Process template functions."""
+        import re
+        pattern = r'\{(\w+)\|(\w+)\}'
+
+        def replace_function(match):
+            var_name, func_name = match.groups()
+            value = self.variables.get(var_name, '')
+            if func_name in self.functions:
+                return self.functions[func_name](str(value))
+            return value
+
+        return re.sub(pattern, replace_function, template)
+```
+
+## Specialized Template Types
+
+### 1. Classification Template
+```
+# Template: Multi-Class Classification
+# Purpose: Classify inputs into predefined categories
+# Required Variables: {input_text}, {categories}, {role}
+
+## Classification Framework
+You are a {role} specializing in accurate text classification.
+
+## Classification Categories
+{categories}
+
+## Classification Process
+1. Analyze the input text carefully
+2. Identify key indicators and features
+3. Match against category definitions
+4. Select the most appropriate category
+5. Provide confidence score
+
+## Input to Classify
+{input_text}
+
+## Output Format
+```json
+{{
+  "category": "selected_category",
+  "confidence": 0.95,
+  "reasoning": "Brief explanation of classification logic",
+  "key_indicators": ["indicator1", "indicator2"]
+}}
+```
+```
+
+### 2. Transformation Template
+```
+# Template: Text Transformation
+# Purpose: Transform text from one format/style to another
+# Required Variables: {source_text}, {target_format}, {transformation_rules}
+
+## Transformation Task
+Transform the given {source_format} text into {target_format} following these rules:
+{transformation_rules}
+
+## Source Text
+{source_text}
+
+## Transformation Process
+1. Analyze the structure and content of the source text
+2. Apply the specified transformation rules
+3. Maintain the core meaning and intent
+4. Ensure proper {target_format} formatting
+5. Verify completeness and accuracy
+
+## Transformed Output
+```
+
+### 3. Generation Template
+```
+# Template: Creative Generation
+# Purpose: Generate creative content based on specifications
+# Required Variables: {content_type}, {specifications}, {style_guidelines}
+
+## Creative Generation Task
+Generate {content_type} that meets the following specifications:
+
+## Content Specifications
+{specifications}
+
+## Style Guidelines
+{style_guidelines}
+
+## Quality Requirements
+- Originality and creativity
+- Adherence to specifications
+- Appropriate tone and style
+- Clear structure and coherence
+- Audience-appropriate language
+
+## Generated Content
+```
+
+### 4. Analysis Template
+```
+# Template: Comprehensive Analysis
+# Purpose: Perform detailed analysis of given input
+# Required Variables: {input_data}, {analysis_framework}, {focus_areas}
+
+## Analysis Framework
+You are an expert analyst with deep expertise in {domain}.
+
+## Analysis Scope
+Focus on these key areas:
+{focus_areas}
+
+## Analysis Methodology
+{analysis_framework}
+
+## Input Data for Analysis
+{input_data}
+
+## Analysis Process
+1. Initial assessment and context understanding
+2. Detailed examination of each focus area
+3. Pattern and trend identification
+4. Comparative analysis with benchmarks
+5. Insight generation and recommendation formulation
+
+## Analysis Output Structure
+```yaml
+executive_summary:
+  key_findings: []
+  overall_assessment: ""
+
+detailed_analysis:
+  {focus_area_1}:
+    observations: []
+    patterns: []
+    insights: []
+  {focus_area_2}:
+    observations: []
+    patterns: []
+    insights: []
+
+recommendations:
+  immediate: []
+  short_term: []
+  long_term: []
+```
+
+## Advanced Template Patterns
+
+### 1. Hierarchical Template Composition
+```python
+class HierarchicalTemplate:
+    def __init__(self, name, content, parent=None):
+        self.name = name
+        self.content = content
+        self.parent = parent
+        self.children = []
+        self.variables = {}
+
+    def add_child(self, child_template):
+        """Add a child template."""
+        child_template.parent = self
+        self.children.append(child_template)
+
+    def render(self, variables=None):
+        """Render template with inherited variables."""
+        # Combine variables from parent hierarchy
+        combined_vars = {}
+
+        # Collect variables from parents
+        current = self.parent
+        while current:
+            combined_vars.update(current.variables)
+            current = current.parent
+
+        # Add current variables
+        combined_vars.update(self.variables)
+
+        # Override with provided variables
+        if variables:
+            combined_vars.update(variables)
+
+        # Render content
+        rendered_content = self.render_content(self.content, combined_vars)
+
+        # Render children
+        for child in self.children:
+            child_rendered = child.render(combined_vars)
+            rendered_content = rendered_content.replace(
+                f'{{child:{child.name}}}', child_rendered
+            )
+
+        return rendered_content
+```
+
+### 2. Role-Based Template System
+```python
+class RoleBasedTemplate:
+    def __init__(self):
+        self.roles = {
+            'analyst': {
+                'persona': 'You are a professional analyst with expertise in data interpretation and pattern recognition.',
+                'approach': 'systematic',
+                'output_style': 'detailed and evidence-based',
+                'verification': 'Always cross-check findings and cite sources'
+            },
+            'creative_writer': {
+                'persona': 'You are a creative writer with a talent for engaging storytelling and vivid descriptions.',
+                'approach': 'imaginative',
+                'output_style': 'descriptive and engaging',
+                'verification': 'Ensure narrative consistency and flow'
+            },
+            'technical_expert': {
+                'persona': 'You are a technical expert with deep knowledge of {domain} and practical implementation experience.',
+                'approach': 'methodical',
+                'output_style': 'precise and technical',
+                'verification': 'Include technical accuracy and best practices'
+            }
+        }
+
+    def create_prompt(self, role, task, domain=None):
+        """Create role-specific prompt template."""
+        role_config = self.roles.get(role, self.roles['analyst'])
+
+        template = f"""
+## Role Definition
+{role_config['persona']}
+
+## Approach
+Use a {role_config['approach']} approach to this task.
+
+## Task
+{task}
+
+## Output Style
+{role_config['output_style']}
+
+## Verification
+{role_config['verification']}
+"""
+
+        if domain and '{domain}' in role_config['persona']:
+            template = template.replace('{domain}', domain)
+
+        return template
+```
+
+### 3. Dynamic Template Selection
+```python
+class DynamicTemplateSelector:
+    def __init__(self):
+        self.templates = {}
+        self.selection_rules = {}
+
+    def register_template(self, name, template, selection_criteria):
+        """Register a template with selection criteria."""
+        self.templates[name] = template
+        self.selection_rules[name] = selection_criteria
+
+    def select_template(self, task_characteristics):
+        """Select the most appropriate template based on task characteristics."""
+        best_template = None
+        best_score = 0
+
+        for name, criteria in self.selection_rules.items():
+            score = self.calculate_match_score(task_characteristics, criteria)
+            if score > best_score:
+                best_score = score
+                best_template = name
+
+        return self.templates[best_template] if best_template else None
+
+    def calculate_match_score(self, task_characteristics, criteria):
+        """Calculate how well task matches template criteria."""
+        score = 0
+        total_weight = 0
+
+        for characteristic, weight in criteria.items():
+            if characteristic in task_characteristics:
+                if task_characteristics[characteristic] == weight['value']:
+                    score += weight['weight']
+                total_weight += weight['weight']
+
+        return score / total_weight if total_weight > 0 else 0
+```
+
+## Template Implementation Examples
+
+### Example 1: Customer Service Template
+```python
+customer_service_template = """
+# Customer Service Response Template
+
+## Role Definition
+You are a {customer_service_role} with {experience_level} of customer service experience in {industry}.
+
+## Context
+{if customer_history}
+Customer History:
+{customer_history}
+{endif}
+
+{if issue_context}
+Issue Context:
+{issue_context}
+{endif}
+
+## Response Guidelines
+- Maintain {tone} tone throughout
+- Address all aspects of the customer's inquiry
+- Provide {level_of_detail} explanation
+- Include {additional_elements}
+- Follow company {communication_style} style
+
+## Customer Inquiry
+{customer_inquiry}
+
+## Response Structure
+1. Greeting and acknowledgment
+2. Understanding and empathy
+3. Solution or explanation
+4. Additional assistance offered
+5. Professional closing
+
+## Response
+"""
+```
+
+### Example 2: Technical Documentation Template
+```python
+documentation_template = """
+# Technical Documentation Generator
+
+## Role Definition
+You are a {technical_writer_role} specializing in {technology} documentation with {experience_level} of experience.
+
+## Documentation Standards
+- Target audience: {audience_level}
+- Technical depth: {technical_depth}
+- Include examples: {include_examples}
+- Add troubleshooting: {add_troubleshooting}
+- Version: {version}
+
+## Content to Document
+{content_to_document}
+
+## Documentation Structure
+```markdown
+# {title}
+
+## Overview
+{overview}
+
+## Prerequisites
+{prerequisites}
+
+## {main_sections}
+
+## Examples
+{if include_examples}
+{examples}
+{endif}
+
+## Troubleshooting
+{if add_troubleshooting}
+{troubleshooting}
+{endif}
+
+## Additional Resources
+{additional_resources}
+```
+
+## Generated Documentation
+"""
+```
+
+## Template Management System
+
+### Version Control Integration
+```python
+class TemplateVersionManager:
+    def __init__(self):
+        self.versions = {}
+        self.current_versions = {}
+
+    def create_version(self, template_name, template_content, author, description):
+        """Create a new version of a template."""
+        import datetime
+        import hashlib
+
+        version_id = hashlib.md5(template_content.encode()).hexdigest()[:8]
+        timestamp = datetime.datetime.now().isoformat()
+
+        version_info = {
+            'version_id': version_id,
+            'content': template_content,
+            'author': author,
+            'description': description,
+            'timestamp': timestamp,
+            'parent_version': self.current_versions.get(template_name)
+        }
+
+        if template_name not in self.versions:
+            self.versions[template_name] = []
+
+        self.versions[template_name].append(version_info)
+        self.current_versions[template_name] = version_id
+
+        return version_id
+
+    def rollback(self, template_name, version_id):
+        """Rollback to a specific version."""
+        if template_name in self.versions:
+            for version in self.versions[template_name]:
+                if version['version_id'] == version_id:
+                    self.current_versions[template_name] = version_id
+                    return version['content']
+        return None
+```
+
+### Performance Monitoring
+```python
+class TemplatePerformanceMonitor:
+    def __init__(self):
+        self.usage_stats = {}
+        self.performance_metrics = {}
+
+    def track_usage(self, template_name, execution_time, success):
+        """Track template usage and performance."""
+        if template_name not in self.usage_stats:
+            self.usage_stats[template_name] = {
+                'usage_count': 0,
+                'total_time': 0,
+                'success_count': 0,
+                'failure_count': 0
+            }
+
+        stats = self.usage_stats[template_name]
+        stats['usage_count'] += 1
+        stats['total_time'] += execution_time
+
+        if success:
+            stats['success_count'] += 1
+        else:
+            stats['failure_count'] += 1
+
+    def get_performance_report(self, template_name):
+        """Generate performance report for a template."""
+        if template_name not in self.usage_stats:
+            return None
+
+        stats = self.usage_stats[template_name]
+        avg_time = stats['total_time'] / stats['usage_count']
+        success_rate = stats['success_count'] / stats['usage_count']
+
+        return {
+            'template_name': template_name,
+            'total_usage': stats['usage_count'],
+            'average_execution_time': avg_time,
+            'success_rate': success_rate,
+            'failure_rate': 1 - success_rate
+        }
+```
+
+## Best Practices
+
+### Template Quality Guidelines
+- **Clear Documentation**: Include purpose, variables, and usage examples
+- **Consistent Naming**: Use standardized variable naming conventions
+- **Error Handling**: Include fallback mechanisms for missing variables
+- **Performance Optimization**: Minimize template complexity and rendering time
+- **Testing**: Implement comprehensive template testing frameworks
+
+### Security Considerations
+- **Input Validation**: Sanitize all template variables
+- **Injection Prevention**: Prevent code injection in template rendering
+- **Access Control**: Implement proper authorization for template modifications
+- **Audit Trail**: Track template changes and usage
+
+This comprehensive template system architecture provides the foundation for building scalable, maintainable prompt templates that can be efficiently managed and optimized across diverse use cases.