Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:00:00 +08:00
commit 955d5c6743
12 changed files with 6996 additions and 0 deletions

View File

@@ -0,0 +1,496 @@
---
name: using-neural-architectures
description: Architecture selection router: CNNs, Transformers, RNNs, GANs, GNNs by data modality and constraints
mode: true
pack: neural-architectures
faction: yzmir
---
# Using Neural Architectures: Architecture Selection Router
<CRITICAL_CONTEXT>
Architecture selection comes BEFORE training optimization. Wrong architecture = no amount of training will fix it.
This meta-skill routes you to the right architecture guidance based on:
- Data modality (images, sequences, graphs, etc.)
- Problem type (classification, generation, regression)
- Constraints (data size, compute, latency, interpretability)
Load this skill when architecture decisions are needed.
</CRITICAL_CONTEXT>
## When to Use This Skill
Use this skill when:
- ✅ Selecting an architecture for a new problem
- ✅ Comparing architecture families (CNN vs Transformer, RNN vs Transformer, etc.)
- ✅ Designing custom network topology
- ✅ Troubleshooting architectural instability (deep networks, gradient issues)
- ✅ Understanding when to use specialized architectures (GNNs, generative models)
DO NOT use for:
- ❌ Training/optimization issues (use training-optimization pack)
- ❌ PyTorch implementation details (use pytorch-engineering pack)
- ❌ Production deployment (use ml-production pack)
**When in doubt:** If choosing WHAT architecture → this skill. If training/deploying architecture → different pack.
---
## Core Routing Logic
### Step 1: Identify Data Modality
**Question to ask:** "What type of data are you working with?"
| Data Type | Route To | Why |
|-----------|----------|-----|
| Images (photos, medical scans, etc.) | [cnn-families-and-selection.md](cnn-families-and-selection.md) | CNNs excel at spatial hierarchies |
| Sequences (time series, text, audio) | [sequence-models-comparison.md](sequence-models-comparison.md) | Temporal dependencies need sequential models |
| Graphs (social networks, molecules) | [graph-neural-networks-basics.md](graph-neural-networks-basics.md) | Graph structure requires GNNs |
| Generation task (create images, text) | [generative-model-families.md](generative-model-families.md) | Generative models are specialized |
| Multiple modalities (text + images) | [architecture-design-principles.md](architecture-design-principles.md) | Need custom design |
| Unclear / Generic | [architecture-design-principles.md](architecture-design-principles.md) | Start with fundamentals |
### Step 2: Check for Special Requirements
**If any of these apply, address FIRST:**
| Requirement | Route To | Priority |
|-------------|----------|----------|
| Deep network (> 20 layers) unstable | [normalization-techniques.md](normalization-techniques.md) | CRITICAL - fix before continuing |
| Need attention mechanisms | [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md) | Specialized component |
| Custom architecture design | [architecture-design-principles.md](architecture-design-principles.md) | Foundation before specifics |
| Transformer-specific question | [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md) | Specialized architecture |
### Step 3: Consider Problem Characteristics
**Clarify BEFORE routing:**
Ask:
- "How large is your dataset?" (Small < 10k, Medium 10k-1M, Large > 1M)
- "What are your computational constraints?" (Edge device, cloud, GPU availability)
- "What are your latency requirements?" (Real-time, batch, offline)
- "Do you need interpretability?" (Clinical, research, production)
These answers determine architecture appropriateness.
---
## Routing by Data Modality
### Images → CNN Families
**Symptoms triggering this route:**
- "classify images"
- "object detection"
- "semantic segmentation"
- "medical imaging"
- "computer vision"
**Route to:** See [cnn-families-and-selection.md](cnn-families-and-selection.md) for CNN architecture selection and comparison.
**When to route here:**
- ANY vision task (CNNs are default for spatial data)
- Even if considering Transformers, check CNN families first (often better with less data)
**Clarifying questions:**
- "Dataset size?" (< 10k → Start with proven CNNs, > 100k → Consider ViT)
- "Deployment target?" (Edge → EfficientNet, Cloud → Anything)
- "Task type?" (Classification → ResNet/EfficientNet, Detection → YOLO/Faster-RCNN)
---
### Sequences → Sequence Models Comparison
**Symptoms triggering this route:**
- "time series"
- "forecasting"
- "natural language" (NLP)
- "sequential data"
- "temporal patterns"
- "RNN vs LSTM vs Transformer"
**Route to:** See [sequence-models-comparison.md](sequence-models-comparison.md) for sequential model selection (RNN, LSTM, Transformer, TCN).
**When to route here:**
- ANY sequential data
- When user asks "RNN vs LSTM" (skill will present modern alternatives)
- Time-dependent patterns
**Clarifying questions:**
- "Sequence length?" (< 100 → RNN/LSTM/TCN, 100-1000 → Transformer, > 1000 → Sparse Transformers)
- "Latency requirements?" (Real-time → TCN/LSTM, Offline → Transformer)
- "Data volume?" (Small → Simpler models, Large → Transformers)
**CRITICAL:** Challenge "RNN vs LSTM" premise if they ask. Modern alternatives (Transformers, TCN) often better.
---
### Graphs → Graph Neural Networks
**Symptoms triggering this route:**
- "social network"
- "molecular structure"
- "knowledge graph"
- "graph data"
- "node classification"
- "link prediction"
- "graph embeddings"
**Route to:** See [graph-neural-networks-basics.md](graph-neural-networks-basics.md) for GNN architectures and graph learning.
**When to route here:**
- Data has explicit graph structure (nodes + edges)
- Relational information is important
- Network topology matters
**Red flag:** If treating graph as tabular data (extracting features and ignoring edges) → WRONG. Route to GNN skill.
---
### Generation → Generative Model Families
**Symptoms triggering this route:**
- "generate images"
- "synthesize data"
- "GAN vs VAE vs Diffusion"
- "image-to-image translation"
- "style transfer"
- "generative modeling"
**Route to:** See [generative-model-families.md](generative-model-families.md) for GANs, VAEs, and Diffusion models.
**When to route here:**
- Goal is to CREATE data, not classify/predict
- Need to sample from distribution
- Data augmentation through generation
**Clarifying questions:**
- "Use case?" (Real-time game → GAN, Art/research → Diffusion, Fast training → VAE)
- "Quality vs speed?" (Quality → Diffusion, Speed → GAN)
- "Controllability?" (Fine control → StyleGAN/Conditional models)
**CRITICAL:** Different generative models have VERY different trade-offs. Must clarify requirements.
---
## Routing by Architecture Component
### Attention Mechanisms
**Symptoms triggering this route:**
- "when to use attention"
- "self-attention vs cross-attention"
- "attention in CNNs"
- "attention bottleneck"
- "multi-head attention"
**Route to:** See [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md) for attention mechanism selection and design.
**When to route here:**
- Designing custom architecture that might benefit from attention
- Understanding where attention helps vs hinders
- Comparing attention variants
**NOT for:** General Transformer questions → [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md) instead
---
### Transformer Deep Dive
**Symptoms triggering this route:**
- "how do transformers work"
- "Vision Transformer (ViT)"
- "BERT architecture"
- "positional encoding"
- "transformer blocks"
- "scaling transformers"
**Route to:** See [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md) for Transformer internals and implementation.
**When to route here:**
- Implementing/customizing transformers
- Understanding transformer internals
- Debugging transformer-specific issues
**Cross-reference:**
- For sequence models generally → [sequence-models-comparison.md](sequence-models-comparison.md) (includes transformers in context)
- For LLMs specifically → `yzmir/llm-specialist/transformer-for-llms` (LLM-specific transformers)
---
### Normalization Techniques
**Symptoms triggering this route:**
- "gradient explosion"
- "training instability in deep network"
- "BatchNorm vs LayerNorm"
- "normalization layers"
- "50+ layer network won't train"
**Route to:** See [normalization-techniques.md](normalization-techniques.md) for deep network stability and normalization methods.
**When to route here:**
- Deep networks (> 20 layers) with training instability
- Choosing between normalization methods
- Architectural stability issues
**CRITICAL:** This is often the ROOT CAUSE of "training won't work" - fix architecture before blaming hyperparameters.
---
### Architecture Design Principles
**Symptoms triggering this route:**
- "how to design architecture"
- "architecture best practices"
- "when to use skip connections"
- "how deep should network be"
- "custom architecture for [novel task]"
- Unclear problem modality
**Route to:** See [architecture-design-principles.md](architecture-design-principles.md) for custom architecture design fundamentals.
**When to route here:**
- Designing custom architectures
- Novel problems without established architecture
- Understanding WHY architectures work
- User is unsure what modality/problem type they have
**This is the foundational skill** - route here if other specific skills don't match.
---
## Multi-Modal / Cross-Pack Routing
### When Problem Spans Multiple Modalities
**Example:** "Text + image classification" (multimodal)
**Route to BOTH:**
1. [sequence-models-comparison.md](sequence-models-comparison.md) (for text)
2. [cnn-families-and-selection.md](cnn-families-and-selection.md) (for images)
3. [architecture-design-principles.md](architecture-design-principles.md) (for fusion strategy)
**Order matters:** Understand individual modalities BEFORE fusion.
### When Architecture + Other Concerns
**Example:** "Select architecture AND optimize training"
**Route order:**
1. Architecture skill FIRST (this pack)
2. Training-optimization SECOND (after architecture chosen)
**Why:** Wrong architecture can't be fixed by better training.
**Example:** "Select architecture AND deploy efficiently"
**Route order:**
1. Architecture skill FIRST
2. ML-production SECOND (quantization, serving)
**Deployment constraints might influence architecture choice** - if so, note constraints during architecture selection.
---
## Common Routing Mistakes (DON'T DO THESE)
| Symptom | Wrong Route | Correct Route | Why |
|---------|-------------|---------------|-----|
| "My transformer won't train" | [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md) | training-optimization | Training issue, not architecture understanding |
| "Deploy image classifier" | [cnn-families-and-selection.md](cnn-families-and-selection.md) | ml-production | Deployment, not selection |
| "ViT vs ResNet for medical imaging" | [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md) | [cnn-families-and-selection.md](cnn-families-and-selection.md) | Comparative selection, not single architecture detail |
| "Implement BatchNorm in PyTorch" | [normalization-techniques.md](normalization-techniques.md) | pytorch-engineering | Implementation, not architecture concept |
| "GAN won't converge" | [generative-model-families.md](generative-model-families.md) | training-optimization | Training stability, not architecture selection |
| "Which optimizer for CNN" | [cnn-families-and-selection.md](cnn-families-and-selection.md) | training-optimization | Optimization, not architecture |
**Rule:** Architecture pack is for CHOOSING and DESIGNING architectures. Training/deployment/implementation are other packs.
---
## Red Flags: Stop and Clarify
If query contains these patterns, ASK clarifying questions before routing:
| Pattern | Why Clarify | What to Ask |
|---------|-------------|--------------|
| "Best architecture for X" | "Best" depends on constraints | "What are your data size, compute, and latency constraints?" |
| Generic problem description | Can't route without modality | "What type of data? (images, sequences, graphs, etc.)" |
| Latest trend mentioned (ViT, Diffusion) | Recency bias risk | "Have you considered alternatives? What are your specific requirements?" |
| "Should I use X or Y" | May be wrong question | "What's the underlying problem? There might be option Z." |
| Very deep network (> 50 layers) | Likely needs normalization first | "Are you using normalization layers? Skip connections?" |
**Never guess modality or constraints. Always clarify.**
---
## Recency Bias: Resistance Table
| Trendy Architecture | When NOT to Use | Better Alternative |
|---------------------|------------------|-------------------|
| **Vision Transformers (ViT)** | Small datasets (< 10k images) | CNNs (ResNet, EfficientNet) |
| **Vision Transformers (ViT)** | Edge deployment (latency/power) | EfficientNets, MobileNets |
| **Transformers (general)** | Very small datasets | RNNs, CNNs (less capacity, less overfit) |
| **Diffusion Models** | Real-time generation needed | GANs (1 forward pass vs 50-1000 steps) |
| **Diffusion Models** | Limited compute for training | VAEs (faster training) |
| **Graph Transformers** | Small graphs (< 100 nodes) | Standard GNNs (GCN, GAT) simpler and effective |
| **LLMs (GPT-style)** | < 1M tokens of training data | Simpler language models or fine-tuning |
**Counter-narrative:** "New architecture ≠ better for your use case. Match architecture to constraints."
---
## Decision Tree
```
Start here: What's your primary goal?
┌─ SELECT architecture for task
│ ├─ Data modality?
│ │ ├─ Images → [cnn-families-and-selection.md](cnn-families-and-selection.md)
│ │ ├─ Sequences → [sequence-models-comparison.md](sequence-models-comparison.md)
│ │ ├─ Graphs → [graph-neural-networks-basics.md](graph-neural-networks-basics.md)
│ │ ├─ Generation → [generative-model-families.md](generative-model-families.md)
│ │ └─ Unknown/Multiple → [architecture-design-principles.md](architecture-design-principles.md)
│ └─ Special requirements?
│ ├─ Deep network (>20 layers) unstable → [normalization-techniques.md](normalization-techniques.md) (CRITICAL)
│ ├─ Need attention mechanism → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│ └─ None → Proceed with modality-based route
├─ UNDERSTAND specific architecture
│ ├─ Transformers → [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md)
│ ├─ Attention → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│ ├─ Normalization → [normalization-techniques.md](normalization-techniques.md)
│ └─ General principles → [architecture-design-principles.md](architecture-design-principles.md)
├─ DESIGN custom architecture
│ └─ [architecture-design-principles.md](architecture-design-principles.md) (start here always)
└─ COMPARE architectures
├─ CNNs (ResNet vs EfficientNet) → [cnn-families-and-selection.md](cnn-families-and-selection.md)
├─ Sequence models (RNN vs Transformer) → [sequence-models-comparison.md](sequence-models-comparison.md)
├─ Generative (GAN vs Diffusion) → [generative-model-families.md](generative-model-families.md)
└─ General comparison → [architecture-design-principles.md](architecture-design-principles.md)
```
---
## Workflow
**Standard Architecture Selection Workflow:**
```
1. Clarify Problem
☐ What data modality? (images, sequences, graphs, etc.)
☐ What's the task? (classification, generation, regression, etc.)
☐ Dataset size?
☐ Computational constraints?
☐ Latency requirements?
☐ Interpretability needs?
2. Route Based on Modality
☐ Images → [cnn-families-and-selection.md](cnn-families-and-selection.md)
☐ Sequences → [sequence-models-comparison.md](sequence-models-comparison.md)
☐ Graphs → [graph-neural-networks-basics.md](graph-neural-networks-basics.md)
☐ Generation → [generative-model-families.md](generative-model-families.md)
☐ Custom/Unclear → [architecture-design-principles.md](architecture-design-principles.md)
3. Check for Critical Issues
☐ Deep network unstable? → [normalization-techniques.md](normalization-techniques.md) FIRST
☐ Need specialized component? → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md) or [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md)
4. Apply Architecture Skill
☐ Follow guidance from routed skill
☐ Consider trade-offs (accuracy vs speed vs data requirements)
5. Cross-Pack if Needed
☐ Architecture chosen → training-optimization (for training)
☐ Architecture chosen → ml-production (for deployment)
```
---
## Rationalization Table
| Rationalization | Reality | Counter |
|-----------------|---------|---------|
| "Transformers are SOTA, recommend them" | SOTA on benchmark ≠ best for user's constraints | "Ask about dataset size and compute first" |
| "User said RNN vs LSTM, answer that" | Question premise might be outdated | "Challenge: Have you considered Transformers or TCN?" |
| "Just recommend latest architecture" | Latest ≠ appropriate | "Match architecture to requirements, not trends" |
| "Architecture doesn't matter, training matters" | Wrong architecture can't be fixed by training | "Architecture is foundation - get it right first" |
| "They seem rushed, skip clarification" | Wrong route wastes more time than clarification | "30 seconds to clarify saves hours of wasted effort" |
| "Generic architecture advice is safe" | Generic = useless for specific domains | "Route to domain-specific skill for actionable guidance" |
---
## Integration with Other Packs
### After Architecture Selection
Once architecture is chosen, route to:
**Training the architecture:**
`yzmir/training-optimization/using-training-optimization`
- Optimizer selection
- Learning rate schedules
- Debugging training issues
**Implementing in PyTorch:**
`yzmir/pytorch-engineering/using-pytorch-engineering`
- Module design patterns
- Performance optimization
- Custom components
**Deploying to production:**
`yzmir/ml-production/using-ml-production`
- Model serving
- Quantization
- Inference optimization
### Before Architecture Selection
If problem involves:
**Reinforcement learning:**
`yzmir/deep-rl/using-deep-rl` FIRST
- RL algorithms dictate architecture requirements
- Value networks vs policy networks have different needs
**Large language models:**
`yzmir/llm-specialist/using-llm-specialist` FIRST
- LLM architectures are specialized transformers
- Different considerations than general sequence models
**Architecture is downstream of algorithm choice in RL and LLMs.**
---
## Summary
**Use this meta-skill to:**
- ✅ Route architecture queries to appropriate specialized skill
- ✅ Identify data modality and problem type
- ✅ Clarify constraints before recommending
- ✅ Resist recency bias (latest ≠ best)
- ✅ Recognize when architecture is the problem (vs training/implementation)
## Neural Architecture Specialist Skills
After routing, load the appropriate specialist skill for detailed guidance:
1. [architecture-design-principles.md](architecture-design-principles.md) - Custom design, architectural best practices, skip connections, network depth fundamentals
2. [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md) - Self-attention, cross-attention, multi-head attention, attention in CNNs, attention variants comparison
3. [cnn-families-and-selection.md](cnn-families-and-selection.md) - ResNet, EfficientNet, MobileNet, YOLO, computer vision architecture selection
4. [generative-model-families.md](generative-model-families.md) - GANs, VAEs, Diffusion models, image generation, style transfer, generative modeling trade-offs
5. [graph-neural-networks-basics.md](graph-neural-networks-basics.md) - GCN, GAT, node classification, link prediction, graph embeddings, molecular structures
6. [normalization-techniques.md](normalization-techniques.md) - BatchNorm, LayerNorm, GroupNorm, training stability for deep networks (>20 layers)
7. [sequence-models-comparison.md](sequence-models-comparison.md) - RNN, LSTM, Transformer, TCN comparison, time series, NLP, sequential data
8. [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md) - Transformer internals, ViT, BERT, positional encoding, scaling transformers
**Critical principle:** Architecture comes BEFORE training. Get this right first.
---
**END OF SKILL**