Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/transformers/references/models.md
+++ b/skills/transformers/references/models.md
@@ -0,0 +1,361 @@
+# Model Loading and Management
+
+## Overview
+
+The transformers library provides flexible model loading with automatic architecture detection, device management, and configuration control.
+
+## Loading Models
+
+### AutoModel Classes
+
+Use AutoModel classes for automatic architecture selection:
+
+```python
+from transformers import AutoModel, AutoModelForSequenceClassification, AutoModelForCausalLM
+
+# Base model (no task head)
+model = AutoModel.from_pretrained("bert-base-uncased")
+
+# Sequence classification
+model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+
+# Causal language modeling (GPT-style)
+model = AutoModelForCausalLM.from_pretrained("gpt2")
+
+# Masked language modeling (BERT-style)
+from transformers import AutoModelForMaskedLM
+model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")
+
+# Sequence-to-sequence (T5-style)
+from transformers import AutoModelForSeq2SeqLM
+model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
+```
+
+### Common AutoModel Classes
+
+**NLP Tasks:**
+- `AutoModelForSequenceClassification`: Text classification, sentiment analysis
+- `AutoModelForTokenClassification`: NER, POS tagging
+- `AutoModelForQuestionAnswering`: Extractive QA
+- `AutoModelForCausalLM`: Text generation (GPT, Llama)
+- `AutoModelForMaskedLM`: Masked language modeling (BERT)
+- `AutoModelForSeq2SeqLM`: Translation, summarization (T5, BART)
+
+**Vision Tasks:**
+- `AutoModelForImageClassification`: Image classification
+- `AutoModelForObjectDetection`: Object detection
+- `AutoModelForImageSegmentation`: Image segmentation
+
+**Audio Tasks:**
+- `AutoModelForAudioClassification`: Audio classification
+- `AutoModelForSpeechSeq2Seq`: Speech recognition
+
+**Multimodal:**
+- `AutoModelForVision2Seq`: Image captioning, VQA
+
+## Loading Parameters
+
+### Basic Parameters
+
+**pretrained_model_name_or_path**: Model identifier or local path
+```python
+model = AutoModel.from_pretrained("bert-base-uncased")  # From Hub
+model = AutoModel.from_pretrained("./local/model/path")  # From disk
+```
+
+**num_labels**: Number of output labels for classification
+```python
+model = AutoModelForSequenceClassification.from_pretrained(
+    "bert-base-uncased",
+    num_labels=3
+)
+```
+
+**cache_dir**: Custom cache location
+```python
+model = AutoModel.from_pretrained("model-id", cache_dir="./my_cache")
+```
+
+### Device Management
+
+**device_map**: Automatic device allocation for large models
+```python
+# Automatically distribute across GPUs and CPU
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf",
+    device_map="auto"
+)
+
+# Sequential placement
+model = AutoModelForCausalLM.from_pretrained(
+    "model-id",
+    device_map="sequential"
+)
+
+# Custom device map
+device_map = {
+    "transformer.layers.0": 0,      # GPU 0
+    "transformer.layers.1": 1,      # GPU 1
+    "transformer.layers.2": "cpu",  # CPU
+}
+model = AutoModel.from_pretrained("model-id", device_map=device_map)
+```
+
+Manual device placement:
+```python
+import torch
+model = AutoModel.from_pretrained("model-id")
+model.to("cuda:0")  # Move to GPU 0
+model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
+```
+
+### Precision Control
+
+**torch_dtype**: Set model precision
+```python
+import torch
+
+# Float16 (half precision)
+model = AutoModel.from_pretrained("model-id", torch_dtype=torch.float16)
+
+# BFloat16 (better range than float16)
+model = AutoModel.from_pretrained("model-id", torch_dtype=torch.bfloat16)
+
+# Auto (use original dtype)
+model = AutoModel.from_pretrained("model-id", torch_dtype="auto")
+```
+
+### Attention Implementation
+
+**attn_implementation**: Choose attention mechanism
+```python
+# Scaled Dot Product Attention (PyTorch 2.0+, fastest)
+model = AutoModel.from_pretrained("model-id", attn_implementation="sdpa")
+
+# Flash Attention 2 (requires flash-attn package)
+model = AutoModel.from_pretrained("model-id", attn_implementation="flash_attention_2")
+
+# Eager (default, most compatible)
+model = AutoModel.from_pretrained("model-id", attn_implementation="eager")
+```
+
+### Memory Optimization
+
+**low_cpu_mem_usage**: Reduce CPU memory during loading
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    "large-model-id",
+    low_cpu_mem_usage=True,
+    device_map="auto"
+)
+```
+
+**load_in_8bit**: 8-bit quantization (requires bitsandbytes)
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    "model-id",
+    load_in_8bit=True,
+    device_map="auto"
+)
+```
+
+**load_in_4bit**: 4-bit quantization
+```python
+from transformers import BitsAndBytesConfig
+
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16
+)
+
+model = AutoModelForCausalLM.from_pretrained(
+    "model-id",
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+
+## Model Configuration
+
+### Loading with Custom Config
+
+```python
+from transformers import AutoConfig, AutoModel
+
+# Load and modify config
+config = AutoConfig.from_pretrained("bert-base-uncased")
+config.hidden_dropout_prob = 0.2
+config.attention_probs_dropout_prob = 0.2
+
+# Initialize model with custom config
+model = AutoModel.from_pretrained("bert-base-uncased", config=config)
+```
+
+### Initializing from Config Only
+
+```python
+config = AutoConfig.from_pretrained("gpt2")
+model = AutoModelForCausalLM.from_config(config)  # Random weights
+```
+
+## Model Modes
+
+### Training vs Evaluation Mode
+
+Models load in evaluation mode by default:
+
+```python
+model = AutoModel.from_pretrained("model-id")
+print(model.training)  # False
+
+# Switch to training mode
+model.train()
+
+# Switch back to evaluation mode
+model.eval()
+```
+
+Evaluation mode disables dropout and uses batch norm statistics.
+
+## Saving Models
+
+### Save Locally
+
+```python
+model.save_pretrained("./my_model")
+```
+
+This creates:
+- `config.json`: Model configuration
+- `pytorch_model.bin` or `model.safetensors`: Model weights
+
+### Save to Hugging Face Hub
+
+```python
+model.push_to_hub("username/model-name")
+
+# With custom commit message
+model.push_to_hub("username/model-name", commit_message="Update model")
+
+# Private repository
+model.push_to_hub("username/model-name", private=True)
+```
+
+## Model Inspection
+
+### Parameter Count
+
+```python
+# Total parameters
+total_params = model.num_parameters()
+
+# Trainable parameters only
+trainable_params = model.num_parameters(only_trainable=True)
+
+print(f"Total: {total_params:,}")
+print(f"Trainable: {trainable_params:,}")
+```
+
+### Memory Footprint
+
+```python
+memory_bytes = model.get_memory_footprint()
+memory_mb = memory_bytes / 1024**2
+print(f"Memory: {memory_mb:.2f} MB")
+```
+
+### Model Architecture
+
+```python
+print(model)  # Print full architecture
+
+# Access specific components
+print(model.config)
+print(model.base_model)
+```
+
+## Forward Pass
+
+Basic inference:
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("model-id")
+model = AutoModelForSequenceClassification.from_pretrained("model-id")
+
+inputs = tokenizer("Sample text", return_tensors="pt")
+outputs = model(**inputs)
+
+logits = outputs.logits
+predictions = logits.argmax(dim=-1)
+```
+
+## Model Formats
+
+### SafeTensors vs PyTorch
+
+SafeTensors is faster and safer:
+
+```python
+# Save as safetensors (recommended)
+model.save_pretrained("./model", safe_serialization=True)
+
+# Load either format automatically
+model = AutoModel.from_pretrained("./model")
+```
+
+### ONNX Export
+
+Export for optimized inference:
+
+```python
+from transformers.onnx import export
+
+# Export to ONNX
+export(
+    tokenizer=tokenizer,
+    model=model,
+    config=config,
+    output=Path("model.onnx")
+)
+```
+
+## Best Practices
+
+1. **Use AutoModel classes**: Automatic architecture detection
+2. **Specify dtype explicitly**: Control precision and memory
+3. **Use device_map="auto"**: For large models
+4. **Enable low_cpu_mem_usage**: When loading large models
+5. **Use safetensors format**: Faster and safer serialization
+6. **Check model.training**: Ensure correct mode for task
+7. **Consider quantization**: For deployment on resource-constrained devices
+8. **Cache models locally**: Set TRANSFORMERS_CACHE environment variable
+
+## Common Issues
+
+**CUDA out of memory:**
+```python
+# Use smaller precision
+model = AutoModel.from_pretrained("model-id", torch_dtype=torch.float16)
+
+# Or use quantization
+model = AutoModel.from_pretrained("model-id", load_in_8bit=True)
+
+# Or use CPU
+model = AutoModel.from_pretrained("model-id", device_map="cpu")
+```
+
+**Slow loading:**
+```python
+# Enable low CPU memory mode
+model = AutoModel.from_pretrained("model-id", low_cpu_mem_usage=True)
+```
+
+**Model not found:**
+```python
+# Verify model ID on hub.co
+# Check authentication for private models
+from huggingface_hub import login
+login()
+```