Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/transformers/references/pipelines.md
+++ b/skills/transformers/references/pipelines.md
@@ -0,0 +1,335 @@
+# Pipeline API Reference
+
+## Overview
+
+Pipelines provide the simplest way to use pre-trained models for inference. They abstract away tokenization, model loading, and post-processing, offering a unified interface for dozens of tasks.
+
+## Basic Usage
+
+Create a pipeline by specifying a task:
+
+```python
+from transformers import pipeline
+
+# Auto-select default model for task
+pipe = pipeline("text-classification")
+result = pipe("This is great!")
+```
+
+Or specify a model:
+
+```python
+pipe = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
+```
+
+## Supported Tasks
+
+### Natural Language Processing
+
+**text-generation**: Generate text continuations
+```python
+generator = pipeline("text-generation", model="gpt2")
+output = generator("Once upon a time", max_length=50, num_return_sequences=2)
+```
+
+**text-classification**: Classify text into categories
+```python
+classifier = pipeline("text-classification")
+result = classifier("I love this product!")  # Returns label and score
+```
+
+**token-classification**: Label individual tokens (NER, POS tagging)
+```python
+ner = pipeline("token-classification", model="dslim/bert-base-NER")
+entities = ner("Hugging Face is based in New York City")
+```
+
+**question-answering**: Extract answers from context
+```python
+qa = pipeline("question-answering")
+result = qa(question="What is the capital?", context="Paris is the capital of France.")
+```
+
+**fill-mask**: Predict masked tokens
+```python
+unmasker = pipeline("fill-mask", model="bert-base-uncased")
+result = unmasker("Paris is the [MASK] of France")
+```
+
+**summarization**: Summarize long texts
+```python
+summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
+summary = summarizer("Long article text...", max_length=130, min_length=30)
+```
+
+**translation**: Translate between languages
+```python
+translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
+result = translator("Hello, how are you?")
+```
+
+**zero-shot-classification**: Classify without training data
+```python
+classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
+result = classifier(
+    "This is a course about Python programming",
+    candidate_labels=["education", "politics", "business"]
+)
+```
+
+**sentiment-analysis**: Alias for text-classification focused on sentiment
+```python
+sentiment = pipeline("sentiment-analysis")
+result = sentiment("This product exceeded my expectations!")
+```
+
+### Computer Vision
+
+**image-classification**: Classify images
+```python
+classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
+result = classifier("path/to/image.jpg")
+# Or use PIL Image or URL
+from PIL import Image
+result = classifier(Image.open("image.jpg"))
+```
+
+**object-detection**: Detect objects in images
+```python
+detector = pipeline("object-detection", model="facebook/detr-resnet-50")
+results = detector("image.jpg")  # Returns bounding boxes and labels
+```
+
+**image-segmentation**: Segment images
+```python
+segmenter = pipeline("image-segmentation", model="facebook/detr-resnet-50-panoptic")
+segments = segmenter("image.jpg")
+```
+
+**depth-estimation**: Estimate depth from images
+```python
+depth = pipeline("depth-estimation", model="Intel/dpt-large")
+result = depth("image.jpg")
+```
+
+**zero-shot-image-classification**: Classify images without training
+```python
+classifier = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")
+result = classifier("image.jpg", candidate_labels=["cat", "dog", "bird"])
+```
+
+### Audio
+
+**automatic-speech-recognition**: Transcribe speech
+```python
+asr = pipeline("automatic-speech-recognition", model="openai/whisper-base")
+text = asr("audio.mp3")
+```
+
+**audio-classification**: Classify audio
+```python
+classifier = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
+result = classifier("audio.wav")
+```
+
+**text-to-speech**: Generate speech from text (with specific models)
+```python
+tts = pipeline("text-to-speech", model="microsoft/speecht5_tts")
+audio = tts("Hello, this is a test")
+```
+
+### Multimodal
+
+**visual-question-answering**: Answer questions about images
+```python
+vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")
+result = vqa(image="image.jpg", question="What color is the car?")
+```
+
+**document-question-answering**: Answer questions about documents
+```python
+doc_qa = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
+result = doc_qa(image="document.png", question="What is the invoice number?")
+```
+
+**image-to-text**: Generate captions for images
+```python
+captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
+caption = captioner("image.jpg")
+```
+
+## Pipeline Parameters
+
+### Common Parameters
+
+**model**: Model identifier or path
+```python
+pipe = pipeline("task", model="model-id")
+```
+
+**device**: GPU device index (-1 for CPU, 0+ for GPU)
+```python
+pipe = pipeline("task", device=0)  # Use first GPU
+```
+
+**device_map**: Automatic device allocation for large models
+```python
+pipe = pipeline("task", model="large-model", device_map="auto")
+```
+
+**dtype**: Model precision (reduces memory)
+```python
+import torch
+pipe = pipeline("task", torch_dtype=torch.float16)
+```
+
+**batch_size**: Process multiple inputs at once
+```python
+pipe = pipeline("task", batch_size=8)
+results = pipe(["text1", "text2", "text3"])
+```
+
+**framework**: Choose PyTorch or TensorFlow
+```python
+pipe = pipeline("task", framework="pt")  # or "tf"
+```
+
+## Batch Processing
+
+Process multiple inputs efficiently:
+
+```python
+classifier = pipeline("text-classification")
+texts = ["Great product!", "Terrible experience", "Just okay"]
+results = classifier(texts)
+```
+
+For large datasets, use generators or KeyDataset:
+
+```python
+from transformers.pipelines.pt_utils import KeyDataset
+import datasets
+
+dataset = datasets.load_dataset("dataset-name", split="test")
+pipe = pipeline("task", device=0)
+
+for output in pipe(KeyDataset(dataset, "text")):
+    print(output)
+```
+
+## Performance Optimization
+
+### GPU Acceleration
+
+Always specify device for GPU usage:
+```python
+pipe = pipeline("task", device=0)
+```
+
+### Mixed Precision
+
+Use float16 for 2x speedup on supported GPUs:
+```python
+import torch
+pipe = pipeline("task", torch_dtype=torch.float16, device=0)
+```
+
+### Batching Guidelines
+
+- **CPU**: Usually skip batching
+- **GPU with variable lengths**: May reduce efficiency
+- **GPU with similar lengths**: Significant speedup
+- **Real-time applications**: Skip batching (increases latency)
+
+```python
+# Good for throughput
+pipe = pipeline("task", batch_size=32, device=0)
+results = pipe(list_of_texts)
+```
+
+### Streaming Output
+
+For text generation, stream tokens as they're generated:
+
+```python
+from transformers import TextStreamer
+
+generator = pipeline("text-generation", model="gpt2", streamer=TextStreamer())
+generator("The future of AI", max_length=100)
+```
+
+## Custom Pipeline Configuration
+
+Specify tokenizer and model separately:
+
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+tokenizer = AutoTokenizer.from_pretrained("model-id")
+model = AutoModelForSequenceClassification.from_pretrained("model-id")
+pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
+```
+
+Use custom pipeline classes:
+
+```python
+from transformers import TextClassificationPipeline
+
+class CustomPipeline(TextClassificationPipeline):
+    def postprocess(self, model_outputs, **kwargs):
+        # Custom post-processing
+        return super().postprocess(model_outputs, **kwargs)
+
+pipe = pipeline("text-classification", model="model-id", pipeline_class=CustomPipeline)
+```
+
+## Input Formats
+
+Pipelines accept various input types:
+
+**Text tasks**: Strings or lists of strings
+```python
+pipe("single text")
+pipe(["text1", "text2"])
+```
+
+**Image tasks**: URLs, file paths, PIL Images, or numpy arrays
+```python
+pipe("https://example.com/image.jpg")
+pipe("local/path/image.png")
+pipe(PIL.Image.open("image.jpg"))
+pipe(numpy_array)
+```
+
+**Audio tasks**: File paths, numpy arrays, or raw waveforms
+```python
+pipe("audio.mp3")
+pipe(audio_array)
+```
+
+## Error Handling
+
+Handle common issues:
+
+```python
+try:
+    result = pipe(input_data)
+except Exception as e:
+    if "CUDA out of memory" in str(e):
+        # Reduce batch size or use CPU
+        pipe = pipeline("task", device=-1)
+    elif "does not appear to have a file named" in str(e):
+        # Model not found
+        print("Check model identifier")
+    else:
+        raise
+```
+
+## Best Practices
+
+1. **Use pipelines for prototyping**: Fast iteration without boilerplate
+2. **Specify models explicitly**: Default models may change
+3. **Enable GPU when available**: Significant speedup
+4. **Use batching for throughput**: When processing many inputs
+5. **Consider memory usage**: Use float16 or smaller models for large batches
+6. **Cache models locally**: Avoid repeated downloads