10 KiB
name, description, mode
| name | description | mode |
|---|---|---|
| using-llm-specialist | LLM specialist router to prompt engineering, fine-tuning, RAG, evaluation, and safety skills. | true |
Using LLM Specialist
You are an LLM engineering specialist. This skill routes you to the right specialized skill based on the user's LLM-related task.
When to Use This Skill
Use this skill when the user needs help with:
- Prompt engineering and optimization
- Fine-tuning LLMs (full, LoRA, QLoRA)
- Building RAG systems
- Evaluating LLM outputs
- Managing context windows
- Optimizing LLM inference
- LLM safety and alignment
Routing Decision Tree
Step 1: Identify the task category
Prompt Engineering → See prompt-engineering-patterns.md
- Writing effective prompts
- Few-shot learning
- Chain-of-thought prompting
- System message design
- Output formatting
- Prompt optimization
Fine-tuning → See llm-finetuning-strategies.md
- When to fine-tune vs prompt engineering
- Full fine-tuning vs LoRA vs QLoRA
- Dataset preparation
- Hyperparameter selection
- Evaluation and validation
- Catastrophic forgetting prevention
RAG (Retrieval-Augmented Generation) → See rag-architecture-patterns.md
- RAG system architecture
- Retrieval strategies (dense, sparse, hybrid)
- Chunking strategies
- Re-ranking
- Context injection
- RAG evaluation
Evaluation → See llm-evaluation-metrics.md
- Task-specific metrics (classification, generation, summarization)
- Human evaluation
- LLM-as-judge
- Benchmark selection
- A/B testing
- Quality assurance
Context Management → See context-window-management.md
- Context window limits (4k, 8k, 32k, 128k tokens)
- Summarization strategies
- Sliding window
- Hierarchical context
- Token counting
- Context pruning
Inference Optimization → See llm-inference-optimization.md
- Reducing latency
- Increasing throughput
- Batching strategies
- KV cache optimization
- Quantization (INT8, INT4)
- Speculative decoding
Safety & Alignment → See llm-safety-alignment.md
- Prompt injection prevention
- Jailbreak detection
- Content filtering
- Bias mitigation
- Hallucination reduction
- Guardrails
Routing Examples
Example 1: User asks about prompts
User: "My LLM isn't following instructions consistently. How can I improve my prompts?"
Route to: prompt-engineering-patterns.md
- Covers instruction clarity, few-shot examples, format specification
Example 2: User asks about fine-tuning
User: "I have 10,000 examples of customer support conversations. Should I fine-tune a model or use prompts?"
Route to: llm-finetuning-strategies.md
- Covers when to fine-tune vs prompt engineering
- Dataset preparation
- LoRA vs full fine-tuning
Example 3: User asks about RAG
User: "I want to build a Q&A system over my company's documentation. How do I give the LLM access to this information?"
Route to: rag-architecture-patterns.md
- Covers RAG architecture
- Chunking strategies
- Retrieval methods
Example 4: User asks about evaluation
User: "How do I measure if my LLM's summaries are good quality?"
Route to: llm-evaluation-metrics.md
- Covers summarization metrics (ROUGE, BERTScore)
- Human evaluation
- LLM-as-judge
Example 5: User asks about context limits
User: "My documents are 50,000 tokens but my model only supports 8k context. What do I do?"
Route to: context-window-management.md
- Covers summarization, chunking, hierarchical context
Example 6: User asks about speed
User: "My LLM inference is too slow (500ms per request). How can I make it faster?"
Route to: llm-inference-optimization.md
- Covers quantization, batching, KV cache, speculative decoding
Example 7: User asks about safety
User: "Users are trying to jailbreak my LLM to bypass content filters. How do I prevent this?"
Route to: llm-safety-alignment.md
- Covers prompt injection prevention, jailbreak detection, guardrails
Multiple Skills May Apply
Sometimes multiple skills are relevant:
Example: "I'm building a RAG system and need to evaluate retrieval quality."
- Primary: rag-architecture-patterns.md (RAG architecture)
- Secondary: llm-evaluation-metrics.md (retrieval metrics: MRR, NDCG)
Example: "I'm fine-tuning an LLM but context exceeds 4k tokens."
- Primary: llm-finetuning-strategies.md (fine-tuning process)
- Secondary: context-window-management.md (handling long contexts)
Example: "My RAG system is slow and I need better prompts for the generation step."
- Primary: rag-architecture-patterns.md (RAG architecture)
- Secondary: llm-inference-optimization.md (speed optimization)
- Tertiary: prompt-engineering-patterns.md (generation prompts)
Approach: Start with the primary skill, then reference secondary skills as needed.
Common Task Patterns
Pattern 1: Building an LLM application
- Start with prompt-engineering-patterns.md (get prompt right first)
- If prompts insufficient → llm-finetuning-strategies.md (customize model)
- If need external knowledge → rag-architecture-patterns.md (add retrieval)
- Validate quality → llm-evaluation-metrics.md (measure performance)
- Optimize speed → llm-inference-optimization.md (reduce latency)
- Add safety → llm-safety-alignment.md (guardrails)
Pattern 2: Improving existing LLM system
- Identify bottleneck:
- Quality issue → prompt-engineering-patterns.md or llm-finetuning-strategies.md
- Knowledge gap → rag-architecture-patterns.md
- Context overflow → context-window-management.md
- Slow inference → llm-inference-optimization.md
- Safety concern → llm-safety-alignment.md
- Apply specialized skill
- Measure improvement → llm-evaluation-metrics.md
Pattern 3: LLM research/experimentation
- Design evaluation → llm-evaluation-metrics.md (metrics first!)
- Baseline: prompt engineering → prompt-engineering-patterns.md
- If insufficient: fine-tuning → llm-finetuning-strategies.md
- Compare: RAG vs fine-tuning → Both skills
- Optimize best approach → llm-inference-optimization.md
Quick Reference
| Task | Primary Skill | Common Secondary Skills |
|---|---|---|
| Better outputs | prompt-engineering-patterns.md | llm-evaluation-metrics.md |
| Customize behavior | llm-finetuning-strategies.md | prompt-engineering-patterns.md |
| External knowledge | rag-architecture-patterns.md | context-window-management.md |
| Quality measurement | llm-evaluation-metrics.md | - |
| Long documents | context-window-management.md | rag-architecture-patterns.md |
| Faster inference | llm-inference-optimization.md | - |
| Safety/security | llm-safety-alignment.md | prompt-engineering-patterns.md |
Default Routing Logic
If task is unclear, ask clarifying questions:
- "What are you trying to achieve with the LLM?" (goal)
- "What problem are you facing?" (bottleneck)
- "Have you tried prompt engineering?" (start simple)
Then route to the most relevant skill.
Summary
This is a meta-skill that routes to specialized LLM engineering skills.
LLM Specialist Skills Catalog
After routing, load the appropriate specialist skill for detailed guidance:
- prompt-engineering-patterns.md - Instruction clarity, few-shot learning, chain-of-thought, system messages, output formatting, prompt optimization
- llm-finetuning-strategies.md - Full fine-tuning vs LoRA vs QLoRA, dataset preparation, hyperparameter selection, catastrophic forgetting prevention
- rag-architecture-patterns.md - RAG system architecture, retrieval strategies (dense/sparse/hybrid), chunking, re-ranking, context injection
- llm-evaluation-metrics.md - Task-specific metrics, human evaluation, LLM-as-judge, benchmarks, A/B testing, quality assurance
- context-window-management.md - Context limits (4k-128k tokens), summarization strategies, sliding window, hierarchical context, token counting
- llm-inference-optimization.md - Latency reduction, throughput optimization, batching, KV cache, quantization (INT8/INT4), speculative decoding
- llm-safety-alignment.md - Prompt injection prevention, jailbreak detection, content filtering, bias mitigation, hallucination reduction, guardrails
When multiple skills apply: Start with the primary skill, reference others as needed.
Default approach: Start simple (prompts), add complexity only when needed (fine-tuning, RAG, optimization).