Files
gh-tachyon-beep-skillpacks-…/skills/using-llm-specialist/SKILL.md
2025-11-30 08:59:54 +08:00

10 KiB

name, description, mode
name description mode
using-llm-specialist LLM specialist router to prompt engineering, fine-tuning, RAG, evaluation, and safety skills. true

Using LLM Specialist

You are an LLM engineering specialist. This skill routes you to the right specialized skill based on the user's LLM-related task.

When to Use This Skill

Use this skill when the user needs help with:

  • Prompt engineering and optimization
  • Fine-tuning LLMs (full, LoRA, QLoRA)
  • Building RAG systems
  • Evaluating LLM outputs
  • Managing context windows
  • Optimizing LLM inference
  • LLM safety and alignment

Routing Decision Tree

Step 1: Identify the task category

Prompt Engineering → See prompt-engineering-patterns.md

  • Writing effective prompts
  • Few-shot learning
  • Chain-of-thought prompting
  • System message design
  • Output formatting
  • Prompt optimization

Fine-tuning → See llm-finetuning-strategies.md

  • When to fine-tune vs prompt engineering
  • Full fine-tuning vs LoRA vs QLoRA
  • Dataset preparation
  • Hyperparameter selection
  • Evaluation and validation
  • Catastrophic forgetting prevention

RAG (Retrieval-Augmented Generation) → See rag-architecture-patterns.md

  • RAG system architecture
  • Retrieval strategies (dense, sparse, hybrid)
  • Chunking strategies
  • Re-ranking
  • Context injection
  • RAG evaluation

Evaluation → See llm-evaluation-metrics.md

  • Task-specific metrics (classification, generation, summarization)
  • Human evaluation
  • LLM-as-judge
  • Benchmark selection
  • A/B testing
  • Quality assurance

Context Management → See context-window-management.md

  • Context window limits (4k, 8k, 32k, 128k tokens)
  • Summarization strategies
  • Sliding window
  • Hierarchical context
  • Token counting
  • Context pruning

Inference Optimization → See llm-inference-optimization.md

  • Reducing latency
  • Increasing throughput
  • Batching strategies
  • KV cache optimization
  • Quantization (INT8, INT4)
  • Speculative decoding

Safety & Alignment → See llm-safety-alignment.md

  • Prompt injection prevention
  • Jailbreak detection
  • Content filtering
  • Bias mitigation
  • Hallucination reduction
  • Guardrails

Routing Examples

Example 1: User asks about prompts

User: "My LLM isn't following instructions consistently. How can I improve my prompts?"

Route to: prompt-engineering-patterns.md

  • Covers instruction clarity, few-shot examples, format specification

Example 2: User asks about fine-tuning

User: "I have 10,000 examples of customer support conversations. Should I fine-tune a model or use prompts?"

Route to: llm-finetuning-strategies.md

  • Covers when to fine-tune vs prompt engineering
  • Dataset preparation
  • LoRA vs full fine-tuning

Example 3: User asks about RAG

User: "I want to build a Q&A system over my company's documentation. How do I give the LLM access to this information?"

Route to: rag-architecture-patterns.md

  • Covers RAG architecture
  • Chunking strategies
  • Retrieval methods

Example 4: User asks about evaluation

User: "How do I measure if my LLM's summaries are good quality?"

Route to: llm-evaluation-metrics.md

  • Covers summarization metrics (ROUGE, BERTScore)
  • Human evaluation
  • LLM-as-judge

Example 5: User asks about context limits

User: "My documents are 50,000 tokens but my model only supports 8k context. What do I do?"

Route to: context-window-management.md

  • Covers summarization, chunking, hierarchical context

Example 6: User asks about speed

User: "My LLM inference is too slow (500ms per request). How can I make it faster?"

Route to: llm-inference-optimization.md

  • Covers quantization, batching, KV cache, speculative decoding

Example 7: User asks about safety

User: "Users are trying to jailbreak my LLM to bypass content filters. How do I prevent this?"

Route to: llm-safety-alignment.md

  • Covers prompt injection prevention, jailbreak detection, guardrails

Multiple Skills May Apply

Sometimes multiple skills are relevant:

Example: "I'm building a RAG system and need to evaluate retrieval quality."

Example: "I'm fine-tuning an LLM but context exceeds 4k tokens."

Example: "My RAG system is slow and I need better prompts for the generation step."

Approach: Start with the primary skill, then reference secondary skills as needed.

Common Task Patterns

Pattern 1: Building an LLM application

  1. Start with prompt-engineering-patterns.md (get prompt right first)
  2. If prompts insufficient → llm-finetuning-strategies.md (customize model)
  3. If need external knowledge → rag-architecture-patterns.md (add retrieval)
  4. Validate quality → llm-evaluation-metrics.md (measure performance)
  5. Optimize speed → llm-inference-optimization.md (reduce latency)
  6. Add safety → llm-safety-alignment.md (guardrails)

Pattern 2: Improving existing LLM system

  1. Identify bottleneck:
  2. Apply specialized skill
  3. Measure improvement → llm-evaluation-metrics.md

Pattern 3: LLM research/experimentation

  1. Design evaluation → llm-evaluation-metrics.md (metrics first!)
  2. Baseline: prompt engineering → prompt-engineering-patterns.md
  3. If insufficient: fine-tuning → llm-finetuning-strategies.md
  4. Compare: RAG vs fine-tuning → Both skills
  5. Optimize best approach → llm-inference-optimization.md

Quick Reference

Task Primary Skill Common Secondary Skills
Better outputs prompt-engineering-patterns.md llm-evaluation-metrics.md
Customize behavior llm-finetuning-strategies.md prompt-engineering-patterns.md
External knowledge rag-architecture-patterns.md context-window-management.md
Quality measurement llm-evaluation-metrics.md -
Long documents context-window-management.md rag-architecture-patterns.md
Faster inference llm-inference-optimization.md -
Safety/security llm-safety-alignment.md prompt-engineering-patterns.md

Default Routing Logic

If task is unclear, ask clarifying questions:

  1. "What are you trying to achieve with the LLM?" (goal)
  2. "What problem are you facing?" (bottleneck)
  3. "Have you tried prompt engineering?" (start simple)

Then route to the most relevant skill.

Summary

This is a meta-skill that routes to specialized LLM engineering skills.

LLM Specialist Skills Catalog

After routing, load the appropriate specialist skill for detailed guidance:

  1. prompt-engineering-patterns.md - Instruction clarity, few-shot learning, chain-of-thought, system messages, output formatting, prompt optimization
  2. llm-finetuning-strategies.md - Full fine-tuning vs LoRA vs QLoRA, dataset preparation, hyperparameter selection, catastrophic forgetting prevention
  3. rag-architecture-patterns.md - RAG system architecture, retrieval strategies (dense/sparse/hybrid), chunking, re-ranking, context injection
  4. llm-evaluation-metrics.md - Task-specific metrics, human evaluation, LLM-as-judge, benchmarks, A/B testing, quality assurance
  5. context-window-management.md - Context limits (4k-128k tokens), summarization strategies, sliding window, hierarchical context, token counting
  6. llm-inference-optimization.md - Latency reduction, throughput optimization, batching, KV cache, quantization (INT8/INT4), speculative decoding
  7. llm-safety-alignment.md - Prompt injection prevention, jailbreak detection, content filtering, bias mitigation, hallucination reduction, guardrails

When multiple skills apply: Start with the primary skill, reference others as needed.

Default approach: Start simple (prompts), add complexity only when needed (fine-tuning, RAG, optimization).