Files
gh-tachyon-beep-skillpacks-…/skills/using-neural-architectures/SKILL.md
2025-11-30 09:00:00 +08:00

20 KiB

name: using-neural-architectures description: Architecture selection router: CNNs, Transformers, RNNs, GANs, GNNs by data modality and constraints mode: true pack: neural-architectures faction: yzmir

Using Neural Architectures: Architecture Selection Router

<CRITICAL_CONTEXT> Architecture selection comes BEFORE training optimization. Wrong architecture = no amount of training will fix it.

This meta-skill routes you to the right architecture guidance based on:

  • Data modality (images, sequences, graphs, etc.)
  • Problem type (classification, generation, regression)
  • Constraints (data size, compute, latency, interpretability)

Load this skill when architecture decisions are needed. </CRITICAL_CONTEXT>

When to Use This Skill

Use this skill when:

  • Selecting an architecture for a new problem
  • Comparing architecture families (CNN vs Transformer, RNN vs Transformer, etc.)
  • Designing custom network topology
  • Troubleshooting architectural instability (deep networks, gradient issues)
  • Understanding when to use specialized architectures (GNNs, generative models)

DO NOT use for:

  • Training/optimization issues (use training-optimization pack)
  • PyTorch implementation details (use pytorch-engineering pack)
  • Production deployment (use ml-production pack)

When in doubt: If choosing WHAT architecture → this skill. If training/deploying architecture → different pack.


Core Routing Logic

Step 1: Identify Data Modality

Question to ask: "What type of data are you working with?"

Data Type Route To Why
Images (photos, medical scans, etc.) cnn-families-and-selection.md CNNs excel at spatial hierarchies
Sequences (time series, text, audio) sequence-models-comparison.md Temporal dependencies need sequential models
Graphs (social networks, molecules) graph-neural-networks-basics.md Graph structure requires GNNs
Generation task (create images, text) generative-model-families.md Generative models are specialized
Multiple modalities (text + images) architecture-design-principles.md Need custom design
Unclear / Generic architecture-design-principles.md Start with fundamentals

Step 2: Check for Special Requirements

If any of these apply, address FIRST:

Requirement Route To Priority
Deep network (> 20 layers) unstable normalization-techniques.md CRITICAL - fix before continuing
Need attention mechanisms attention-mechanisms-catalog.md Specialized component
Custom architecture design architecture-design-principles.md Foundation before specifics
Transformer-specific question transformer-architecture-deepdive.md Specialized architecture

Step 3: Consider Problem Characteristics

Clarify BEFORE routing:

Ask:

  • "How large is your dataset?" (Small < 10k, Medium 10k-1M, Large > 1M)
  • "What are your computational constraints?" (Edge device, cloud, GPU availability)
  • "What are your latency requirements?" (Real-time, batch, offline)
  • "Do you need interpretability?" (Clinical, research, production)

These answers determine architecture appropriateness.


Routing by Data Modality

Images → CNN Families

Symptoms triggering this route:

  • "classify images"
  • "object detection"
  • "semantic segmentation"
  • "medical imaging"
  • "computer vision"

Route to: See cnn-families-and-selection.md for CNN architecture selection and comparison.

When to route here:

  • ANY vision task (CNNs are default for spatial data)
  • Even if considering Transformers, check CNN families first (often better with less data)

Clarifying questions:

  • "Dataset size?" (< 10k → Start with proven CNNs, > 100k → Consider ViT)
  • "Deployment target?" (Edge → EfficientNet, Cloud → Anything)
  • "Task type?" (Classification → ResNet/EfficientNet, Detection → YOLO/Faster-RCNN)

Sequences → Sequence Models Comparison

Symptoms triggering this route:

  • "time series"
  • "forecasting"
  • "natural language" (NLP)
  • "sequential data"
  • "temporal patterns"
  • "RNN vs LSTM vs Transformer"

Route to: See sequence-models-comparison.md for sequential model selection (RNN, LSTM, Transformer, TCN).

When to route here:

  • ANY sequential data
  • When user asks "RNN vs LSTM" (skill will present modern alternatives)
  • Time-dependent patterns

Clarifying questions:

  • "Sequence length?" (< 100 → RNN/LSTM/TCN, 100-1000 → Transformer, > 1000 → Sparse Transformers)
  • "Latency requirements?" (Real-time → TCN/LSTM, Offline → Transformer)
  • "Data volume?" (Small → Simpler models, Large → Transformers)

CRITICAL: Challenge "RNN vs LSTM" premise if they ask. Modern alternatives (Transformers, TCN) often better.


Graphs → Graph Neural Networks

Symptoms triggering this route:

  • "social network"
  • "molecular structure"
  • "knowledge graph"
  • "graph data"
  • "node classification"
  • "link prediction"
  • "graph embeddings"

Route to: See graph-neural-networks-basics.md for GNN architectures and graph learning.

When to route here:

  • Data has explicit graph structure (nodes + edges)
  • Relational information is important
  • Network topology matters

Red flag: If treating graph as tabular data (extracting features and ignoring edges) → WRONG. Route to GNN skill.


Generation → Generative Model Families

Symptoms triggering this route:

  • "generate images"
  • "synthesize data"
  • "GAN vs VAE vs Diffusion"
  • "image-to-image translation"
  • "style transfer"
  • "generative modeling"

Route to: See generative-model-families.md for GANs, VAEs, and Diffusion models.

When to route here:

  • Goal is to CREATE data, not classify/predict
  • Need to sample from distribution
  • Data augmentation through generation

Clarifying questions:

  • "Use case?" (Real-time game → GAN, Art/research → Diffusion, Fast training → VAE)
  • "Quality vs speed?" (Quality → Diffusion, Speed → GAN)
  • "Controllability?" (Fine control → StyleGAN/Conditional models)

CRITICAL: Different generative models have VERY different trade-offs. Must clarify requirements.


Routing by Architecture Component

Attention Mechanisms

Symptoms triggering this route:

  • "when to use attention"
  • "self-attention vs cross-attention"
  • "attention in CNNs"
  • "attention bottleneck"
  • "multi-head attention"

Route to: See attention-mechanisms-catalog.md for attention mechanism selection and design.

When to route here:

  • Designing custom architecture that might benefit from attention
  • Understanding where attention helps vs hinders
  • Comparing attention variants

NOT for: General Transformer questions → transformer-architecture-deepdive.md instead


Transformer Deep Dive

Symptoms triggering this route:

  • "how do transformers work"
  • "Vision Transformer (ViT)"
  • "BERT architecture"
  • "positional encoding"
  • "transformer blocks"
  • "scaling transformers"

Route to: See transformer-architecture-deepdive.md for Transformer internals and implementation.

When to route here:

  • Implementing/customizing transformers
  • Understanding transformer internals
  • Debugging transformer-specific issues

Cross-reference:

  • For sequence models generally → sequence-models-comparison.md (includes transformers in context)
  • For LLMs specifically → yzmir/llm-specialist/transformer-for-llms (LLM-specific transformers)

Normalization Techniques

Symptoms triggering this route:

  • "gradient explosion"
  • "training instability in deep network"
  • "BatchNorm vs LayerNorm"
  • "normalization layers"
  • "50+ layer network won't train"

Route to: See normalization-techniques.md for deep network stability and normalization methods.

When to route here:

  • Deep networks (> 20 layers) with training instability
  • Choosing between normalization methods
  • Architectural stability issues

CRITICAL: This is often the ROOT CAUSE of "training won't work" - fix architecture before blaming hyperparameters.


Architecture Design Principles

Symptoms triggering this route:

  • "how to design architecture"
  • "architecture best practices"
  • "when to use skip connections"
  • "how deep should network be"
  • "custom architecture for [novel task]"
  • Unclear problem modality

Route to: See architecture-design-principles.md for custom architecture design fundamentals.

When to route here:

  • Designing custom architectures
  • Novel problems without established architecture
  • Understanding WHY architectures work
  • User is unsure what modality/problem type they have

This is the foundational skill - route here if other specific skills don't match.


Multi-Modal / Cross-Pack Routing

When Problem Spans Multiple Modalities

Example: "Text + image classification" (multimodal)

Route to BOTH:

  1. sequence-models-comparison.md (for text)
  2. cnn-families-and-selection.md (for images)
  3. architecture-design-principles.md (for fusion strategy)

Order matters: Understand individual modalities BEFORE fusion.

When Architecture + Other Concerns

Example: "Select architecture AND optimize training"

Route order:

  1. Architecture skill FIRST (this pack)
  2. Training-optimization SECOND (after architecture chosen)

Why: Wrong architecture can't be fixed by better training.

Example: "Select architecture AND deploy efficiently"

Route order:

  1. Architecture skill FIRST
  2. ML-production SECOND (quantization, serving)

Deployment constraints might influence architecture choice - if so, note constraints during architecture selection.


Common Routing Mistakes (DON'T DO THESE)

Symptom Wrong Route Correct Route Why
"My transformer won't train" transformer-architecture-deepdive.md training-optimization Training issue, not architecture understanding
"Deploy image classifier" cnn-families-and-selection.md ml-production Deployment, not selection
"ViT vs ResNet for medical imaging" transformer-architecture-deepdive.md cnn-families-and-selection.md Comparative selection, not single architecture detail
"Implement BatchNorm in PyTorch" normalization-techniques.md pytorch-engineering Implementation, not architecture concept
"GAN won't converge" generative-model-families.md training-optimization Training stability, not architecture selection
"Which optimizer for CNN" cnn-families-and-selection.md training-optimization Optimization, not architecture

Rule: Architecture pack is for CHOOSING and DESIGNING architectures. Training/deployment/implementation are other packs.


Red Flags: Stop and Clarify

If query contains these patterns, ASK clarifying questions before routing:

Pattern Why Clarify What to Ask
"Best architecture for X" "Best" depends on constraints "What are your data size, compute, and latency constraints?"
Generic problem description Can't route without modality "What type of data? (images, sequences, graphs, etc.)"
Latest trend mentioned (ViT, Diffusion) Recency bias risk "Have you considered alternatives? What are your specific requirements?"
"Should I use X or Y" May be wrong question "What's the underlying problem? There might be option Z."
Very deep network (> 50 layers) Likely needs normalization first "Are you using normalization layers? Skip connections?"

Never guess modality or constraints. Always clarify.


Recency Bias: Resistance Table

Trendy Architecture When NOT to Use Better Alternative
Vision Transformers (ViT) Small datasets (< 10k images) CNNs (ResNet, EfficientNet)
Vision Transformers (ViT) Edge deployment (latency/power) EfficientNets, MobileNets
Transformers (general) Very small datasets RNNs, CNNs (less capacity, less overfit)
Diffusion Models Real-time generation needed GANs (1 forward pass vs 50-1000 steps)
Diffusion Models Limited compute for training VAEs (faster training)
Graph Transformers Small graphs (< 100 nodes) Standard GNNs (GCN, GAT) simpler and effective
LLMs (GPT-style) < 1M tokens of training data Simpler language models or fine-tuning

Counter-narrative: "New architecture ≠ better for your use case. Match architecture to constraints."


Decision Tree

Start here: What's your primary goal?

┌─ SELECT architecture for task
│  ├─ Data modality?
│  │  ├─ Images → [cnn-families-and-selection.md](cnn-families-and-selection.md)
│  │  ├─ Sequences → [sequence-models-comparison.md](sequence-models-comparison.md)
│  │  ├─ Graphs → [graph-neural-networks-basics.md](graph-neural-networks-basics.md)
│  │  ├─ Generation → [generative-model-families.md](generative-model-families.md)
│  │  └─ Unknown/Multiple → [architecture-design-principles.md](architecture-design-principles.md)
│  └─ Special requirements?
│     ├─ Deep network (>20 layers) unstable → [normalization-techniques.md](normalization-techniques.md) (CRITICAL)
│     ├─ Need attention mechanism → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│     └─ None → Proceed with modality-based route
│
├─ UNDERSTAND specific architecture
│  ├─ Transformers → [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md)
│  ├─ Attention → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│  ├─ Normalization → [normalization-techniques.md](normalization-techniques.md)
│  └─ General principles → [architecture-design-principles.md](architecture-design-principles.md)
│
├─ DESIGN custom architecture
│  └─ [architecture-design-principles.md](architecture-design-principles.md) (start here always)
│
└─ COMPARE architectures
   ├─ CNNs (ResNet vs EfficientNet) → [cnn-families-and-selection.md](cnn-families-and-selection.md)
   ├─ Sequence models (RNN vs Transformer) → [sequence-models-comparison.md](sequence-models-comparison.md)
   ├─ Generative (GAN vs Diffusion) → [generative-model-families.md](generative-model-families.md)
   └─ General comparison → [architecture-design-principles.md](architecture-design-principles.md)

Workflow

Standard Architecture Selection Workflow:

1. Clarify Problem
   ☐ What data modality? (images, sequences, graphs, etc.)
   ☐ What's the task? (classification, generation, regression, etc.)
   ☐ Dataset size?
   ☐ Computational constraints?
   ☐ Latency requirements?
   ☐ Interpretability needs?

2. Route Based on Modality
   ☐ Images → [cnn-families-and-selection.md](cnn-families-and-selection.md)
   ☐ Sequences → [sequence-models-comparison.md](sequence-models-comparison.md)
   ☐ Graphs → [graph-neural-networks-basics.md](graph-neural-networks-basics.md)
   ☐ Generation → [generative-model-families.md](generative-model-families.md)
   ☐ Custom/Unclear → [architecture-design-principles.md](architecture-design-principles.md)

3. Check for Critical Issues
   ☐ Deep network unstable? → [normalization-techniques.md](normalization-techniques.md) FIRST
   ☐ Need specialized component? → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md) or [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md)

4. Apply Architecture Skill
   ☐ Follow guidance from routed skill
   ☐ Consider trade-offs (accuracy vs speed vs data requirements)

5. Cross-Pack if Needed
   ☐ Architecture chosen → training-optimization (for training)
   ☐ Architecture chosen → ml-production (for deployment)

Rationalization Table

Rationalization Reality Counter
"Transformers are SOTA, recommend them" SOTA on benchmark ≠ best for user's constraints "Ask about dataset size and compute first"
"User said RNN vs LSTM, answer that" Question premise might be outdated "Challenge: Have you considered Transformers or TCN?"
"Just recommend latest architecture" Latest ≠ appropriate "Match architecture to requirements, not trends"
"Architecture doesn't matter, training matters" Wrong architecture can't be fixed by training "Architecture is foundation - get it right first"
"They seem rushed, skip clarification" Wrong route wastes more time than clarification "30 seconds to clarify saves hours of wasted effort"
"Generic architecture advice is safe" Generic = useless for specific domains "Route to domain-specific skill for actionable guidance"

Integration with Other Packs

After Architecture Selection

Once architecture is chosen, route to:

Training the architecture:yzmir/training-optimization/using-training-optimization

  • Optimizer selection
  • Learning rate schedules
  • Debugging training issues

Implementing in PyTorch:yzmir/pytorch-engineering/using-pytorch-engineering

  • Module design patterns
  • Performance optimization
  • Custom components

Deploying to production:yzmir/ml-production/using-ml-production

  • Model serving
  • Quantization
  • Inference optimization

Before Architecture Selection

If problem involves:

Reinforcement learning:yzmir/deep-rl/using-deep-rl FIRST

  • RL algorithms dictate architecture requirements
  • Value networks vs policy networks have different needs

Large language models:yzmir/llm-specialist/using-llm-specialist FIRST

  • LLM architectures are specialized transformers
  • Different considerations than general sequence models

Architecture is downstream of algorithm choice in RL and LLMs.


Summary

Use this meta-skill to:

  • Route architecture queries to appropriate specialized skill
  • Identify data modality and problem type
  • Clarify constraints before recommending
  • Resist recency bias (latest ≠ best)
  • Recognize when architecture is the problem (vs training/implementation)

Neural Architecture Specialist Skills

After routing, load the appropriate specialist skill for detailed guidance:

  1. architecture-design-principles.md - Custom design, architectural best practices, skip connections, network depth fundamentals
  2. attention-mechanisms-catalog.md - Self-attention, cross-attention, multi-head attention, attention in CNNs, attention variants comparison
  3. cnn-families-and-selection.md - ResNet, EfficientNet, MobileNet, YOLO, computer vision architecture selection
  4. generative-model-families.md - GANs, VAEs, Diffusion models, image generation, style transfer, generative modeling trade-offs
  5. graph-neural-networks-basics.md - GCN, GAT, node classification, link prediction, graph embeddings, molecular structures
  6. normalization-techniques.md - BatchNorm, LayerNorm, GroupNorm, training stability for deep networks (>20 layers)
  7. sequence-models-comparison.md - RNN, LSTM, Transformer, TCN comparison, time series, NLP, sequential data
  8. transformer-architecture-deepdive.md - Transformer internals, ViT, BERT, positional encoding, scaling transformers

Critical principle: Architecture comes BEFORE training. Get this right first.


END OF SKILL