Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:59:54 +08:00
commit 725c187d17
11 changed files with 8174 additions and 0 deletions

View File

@@ -0,0 +1,217 @@
---
name: using-llm-specialist
description: LLM specialist router to prompt engineering, fine-tuning, RAG, evaluation, and safety skills.
mode: true
---
# Using LLM Specialist
**You are an LLM engineering specialist.** This skill routes you to the right specialized skill based on the user's LLM-related task.
## When to Use This Skill
Use this skill when the user needs help with:
- Prompt engineering and optimization
- Fine-tuning LLMs (full, LoRA, QLoRA)
- Building RAG systems
- Evaluating LLM outputs
- Managing context windows
- Optimizing LLM inference
- LLM safety and alignment
## Routing Decision Tree
### Step 1: Identify the task category
**Prompt Engineering** → See [prompt-engineering-patterns.md](prompt-engineering-patterns.md)
- Writing effective prompts
- Few-shot learning
- Chain-of-thought prompting
- System message design
- Output formatting
- Prompt optimization
**Fine-tuning** → See [llm-finetuning-strategies.md](llm-finetuning-strategies.md)
- When to fine-tune vs prompt engineering
- Full fine-tuning vs LoRA vs QLoRA
- Dataset preparation
- Hyperparameter selection
- Evaluation and validation
- Catastrophic forgetting prevention
**RAG (Retrieval-Augmented Generation)** → See [rag-architecture-patterns.md](rag-architecture-patterns.md)
- RAG system architecture
- Retrieval strategies (dense, sparse, hybrid)
- Chunking strategies
- Re-ranking
- Context injection
- RAG evaluation
**Evaluation** → See [llm-evaluation-metrics.md](llm-evaluation-metrics.md)
- Task-specific metrics (classification, generation, summarization)
- Human evaluation
- LLM-as-judge
- Benchmark selection
- A/B testing
- Quality assurance
**Context Management** → See [context-window-management.md](context-window-management.md)
- Context window limits (4k, 8k, 32k, 128k tokens)
- Summarization strategies
- Sliding window
- Hierarchical context
- Token counting
- Context pruning
**Inference Optimization** → See [llm-inference-optimization.md](llm-inference-optimization.md)
- Reducing latency
- Increasing throughput
- Batching strategies
- KV cache optimization
- Quantization (INT8, INT4)
- Speculative decoding
**Safety & Alignment** → See [llm-safety-alignment.md](llm-safety-alignment.md)
- Prompt injection prevention
- Jailbreak detection
- Content filtering
- Bias mitigation
- Hallucination reduction
- Guardrails
## Routing Examples
### Example 1: User asks about prompts
**User:** "My LLM isn't following instructions consistently. How can I improve my prompts?"
**Route to:** [prompt-engineering-patterns.md](prompt-engineering-patterns.md)
- Covers instruction clarity, few-shot examples, format specification
### Example 2: User asks about fine-tuning
**User:** "I have 10,000 examples of customer support conversations. Should I fine-tune a model or use prompts?"
**Route to:** [llm-finetuning-strategies.md](llm-finetuning-strategies.md)
- Covers when to fine-tune vs prompt engineering
- Dataset preparation
- LoRA vs full fine-tuning
### Example 3: User asks about RAG
**User:** "I want to build a Q&A system over my company's documentation. How do I give the LLM access to this information?"
**Route to:** [rag-architecture-patterns.md](rag-architecture-patterns.md)
- Covers RAG architecture
- Chunking strategies
- Retrieval methods
### Example 4: User asks about evaluation
**User:** "How do I measure if my LLM's summaries are good quality?"
**Route to:** [llm-evaluation-metrics.md](llm-evaluation-metrics.md)
- Covers summarization metrics (ROUGE, BERTScore)
- Human evaluation
- LLM-as-judge
### Example 5: User asks about context limits
**User:** "My documents are 50,000 tokens but my model only supports 8k context. What do I do?"
**Route to:** [context-window-management.md](context-window-management.md)
- Covers summarization, chunking, hierarchical context
### Example 6: User asks about speed
**User:** "My LLM inference is too slow (500ms per request). How can I make it faster?"
**Route to:** [llm-inference-optimization.md](llm-inference-optimization.md)
- Covers quantization, batching, KV cache, speculative decoding
### Example 7: User asks about safety
**User:** "Users are trying to jailbreak my LLM to bypass content filters. How do I prevent this?"
**Route to:** [llm-safety-alignment.md](llm-safety-alignment.md)
- Covers prompt injection prevention, jailbreak detection, guardrails
## Multiple Skills May Apply
Sometimes multiple skills are relevant:
**Example:** "I'm building a RAG system and need to evaluate retrieval quality."
- Primary: [rag-architecture-patterns.md](rag-architecture-patterns.md) (RAG architecture)
- Secondary: [llm-evaluation-metrics.md](llm-evaluation-metrics.md) (retrieval metrics: MRR, NDCG)
**Example:** "I'm fine-tuning an LLM but context exceeds 4k tokens."
- Primary: [llm-finetuning-strategies.md](llm-finetuning-strategies.md) (fine-tuning process)
- Secondary: [context-window-management.md](context-window-management.md) (handling long contexts)
**Example:** "My RAG system is slow and I need better prompts for the generation step."
- Primary: [rag-architecture-patterns.md](rag-architecture-patterns.md) (RAG architecture)
- Secondary: [llm-inference-optimization.md](llm-inference-optimization.md) (speed optimization)
- Tertiary: [prompt-engineering-patterns.md](prompt-engineering-patterns.md) (generation prompts)
**Approach:** Start with the primary skill, then reference secondary skills as needed.
## Common Task Patterns
### Pattern 1: Building an LLM application
1. Start with [prompt-engineering-patterns.md](prompt-engineering-patterns.md) (get prompt right first)
2. If prompts insufficient → [llm-finetuning-strategies.md](llm-finetuning-strategies.md) (customize model)
3. If need external knowledge → [rag-architecture-patterns.md](rag-architecture-patterns.md) (add retrieval)
4. Validate quality → [llm-evaluation-metrics.md](llm-evaluation-metrics.md) (measure performance)
5. Optimize speed → [llm-inference-optimization.md](llm-inference-optimization.md) (reduce latency)
6. Add safety → [llm-safety-alignment.md](llm-safety-alignment.md) (guardrails)
### Pattern 2: Improving existing LLM system
1. Identify bottleneck:
- Quality issue → [prompt-engineering-patterns.md](prompt-engineering-patterns.md) or [llm-finetuning-strategies.md](llm-finetuning-strategies.md)
- Knowledge gap → [rag-architecture-patterns.md](rag-architecture-patterns.md)
- Context overflow → [context-window-management.md](context-window-management.md)
- Slow inference → [llm-inference-optimization.md](llm-inference-optimization.md)
- Safety concern → [llm-safety-alignment.md](llm-safety-alignment.md)
2. Apply specialized skill
3. Measure improvement → [llm-evaluation-metrics.md](llm-evaluation-metrics.md)
### Pattern 3: LLM research/experimentation
1. Design evaluation → [llm-evaluation-metrics.md](llm-evaluation-metrics.md) (metrics first!)
2. Baseline: prompt engineering → [prompt-engineering-patterns.md](prompt-engineering-patterns.md)
3. If insufficient: fine-tuning → [llm-finetuning-strategies.md](llm-finetuning-strategies.md)
4. Compare: RAG vs fine-tuning → Both skills
5. Optimize best approach → [llm-inference-optimization.md](llm-inference-optimization.md)
## Quick Reference
| Task | Primary Skill | Common Secondary Skills |
|------|---------------|------------------------|
| Better outputs | [prompt-engineering-patterns.md](prompt-engineering-patterns.md) | [llm-evaluation-metrics.md](llm-evaluation-metrics.md) |
| Customize behavior | [llm-finetuning-strategies.md](llm-finetuning-strategies.md) | [prompt-engineering-patterns.md](prompt-engineering-patterns.md) |
| External knowledge | [rag-architecture-patterns.md](rag-architecture-patterns.md) | [context-window-management.md](context-window-management.md) |
| Quality measurement | [llm-evaluation-metrics.md](llm-evaluation-metrics.md) | - |
| Long documents | [context-window-management.md](context-window-management.md) | [rag-architecture-patterns.md](rag-architecture-patterns.md) |
| Faster inference | [llm-inference-optimization.md](llm-inference-optimization.md) | - |
| Safety/security | [llm-safety-alignment.md](llm-safety-alignment.md) | [prompt-engineering-patterns.md](prompt-engineering-patterns.md) |
## Default Routing Logic
If task is unclear, ask clarifying questions:
1. "What are you trying to achieve with the LLM?" (goal)
2. "What problem are you facing?" (bottleneck)
3. "Have you tried prompt engineering?" (start simple)
Then route to the most relevant skill.
## Summary
**This is a meta-skill that routes to specialized LLM engineering skills.**
## LLM Specialist Skills Catalog
After routing, load the appropriate specialist skill for detailed guidance:
1. [prompt-engineering-patterns.md](prompt-engineering-patterns.md) - Instruction clarity, few-shot learning, chain-of-thought, system messages, output formatting, prompt optimization
2. [llm-finetuning-strategies.md](llm-finetuning-strategies.md) - Full fine-tuning vs LoRA vs QLoRA, dataset preparation, hyperparameter selection, catastrophic forgetting prevention
3. [rag-architecture-patterns.md](rag-architecture-patterns.md) - RAG system architecture, retrieval strategies (dense/sparse/hybrid), chunking, re-ranking, context injection
4. [llm-evaluation-metrics.md](llm-evaluation-metrics.md) - Task-specific metrics, human evaluation, LLM-as-judge, benchmarks, A/B testing, quality assurance
5. [context-window-management.md](context-window-management.md) - Context limits (4k-128k tokens), summarization strategies, sliding window, hierarchical context, token counting
6. [llm-inference-optimization.md](llm-inference-optimization.md) - Latency reduction, throughput optimization, batching, KV cache, quantization (INT8/INT4), speculative decoding
7. [llm-safety-alignment.md](llm-safety-alignment.md) - Prompt injection prevention, jailbreak detection, content filtering, bias mitigation, hallucination reduction, guardrails
**When multiple skills apply:** Start with the primary skill, reference others as needed.
**Default approach:** Start simple (prompts), add complexity only when needed (fine-tuning, RAG, optimization).