Files
2025-11-30 08:30:10 +08:00

336 lines
8.7 KiB
Markdown

# Pipeline API Reference
## Overview
Pipelines provide the simplest way to use pre-trained models for inference. They abstract away tokenization, model loading, and post-processing, offering a unified interface for dozens of tasks.
## Basic Usage
Create a pipeline by specifying a task:
```python
from transformers import pipeline
# Auto-select default model for task
pipe = pipeline("text-classification")
result = pipe("This is great!")
```
Or specify a model:
```python
pipe = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
```
## Supported Tasks
### Natural Language Processing
**text-generation**: Generate text continuations
```python
generator = pipeline("text-generation", model="gpt2")
output = generator("Once upon a time", max_length=50, num_return_sequences=2)
```
**text-classification**: Classify text into categories
```python
classifier = pipeline("text-classification")
result = classifier("I love this product!") # Returns label and score
```
**token-classification**: Label individual tokens (NER, POS tagging)
```python
ner = pipeline("token-classification", model="dslim/bert-base-NER")
entities = ner("Hugging Face is based in New York City")
```
**question-answering**: Extract answers from context
```python
qa = pipeline("question-answering")
result = qa(question="What is the capital?", context="Paris is the capital of France.")
```
**fill-mask**: Predict masked tokens
```python
unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("Paris is the [MASK] of France")
```
**summarization**: Summarize long texts
```python
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Long article text...", max_length=130, min_length=30)
```
**translation**: Translate between languages
```python
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")
```
**zero-shot-classification**: Classify without training data
```python
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier(
"This is a course about Python programming",
candidate_labels=["education", "politics", "business"]
)
```
**sentiment-analysis**: Alias for text-classification focused on sentiment
```python
sentiment = pipeline("sentiment-analysis")
result = sentiment("This product exceeded my expectations!")
```
### Computer Vision
**image-classification**: Classify images
```python
classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
result = classifier("path/to/image.jpg")
# Or use PIL Image or URL
from PIL import Image
result = classifier(Image.open("image.jpg"))
```
**object-detection**: Detect objects in images
```python
detector = pipeline("object-detection", model="facebook/detr-resnet-50")
results = detector("image.jpg") # Returns bounding boxes and labels
```
**image-segmentation**: Segment images
```python
segmenter = pipeline("image-segmentation", model="facebook/detr-resnet-50-panoptic")
segments = segmenter("image.jpg")
```
**depth-estimation**: Estimate depth from images
```python
depth = pipeline("depth-estimation", model="Intel/dpt-large")
result = depth("image.jpg")
```
**zero-shot-image-classification**: Classify images without training
```python
classifier = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")
result = classifier("image.jpg", candidate_labels=["cat", "dog", "bird"])
```
### Audio
**automatic-speech-recognition**: Transcribe speech
```python
asr = pipeline("automatic-speech-recognition", model="openai/whisper-base")
text = asr("audio.mp3")
```
**audio-classification**: Classify audio
```python
classifier = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
result = classifier("audio.wav")
```
**text-to-speech**: Generate speech from text (with specific models)
```python
tts = pipeline("text-to-speech", model="microsoft/speecht5_tts")
audio = tts("Hello, this is a test")
```
### Multimodal
**visual-question-answering**: Answer questions about images
```python
vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")
result = vqa(image="image.jpg", question="What color is the car?")
```
**document-question-answering**: Answer questions about documents
```python
doc_qa = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
result = doc_qa(image="document.png", question="What is the invoice number?")
```
**image-to-text**: Generate captions for images
```python
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
caption = captioner("image.jpg")
```
## Pipeline Parameters
### Common Parameters
**model**: Model identifier or path
```python
pipe = pipeline("task", model="model-id")
```
**device**: GPU device index (-1 for CPU, 0+ for GPU)
```python
pipe = pipeline("task", device=0) # Use first GPU
```
**device_map**: Automatic device allocation for large models
```python
pipe = pipeline("task", model="large-model", device_map="auto")
```
**dtype**: Model precision (reduces memory)
```python
import torch
pipe = pipeline("task", torch_dtype=torch.float16)
```
**batch_size**: Process multiple inputs at once
```python
pipe = pipeline("task", batch_size=8)
results = pipe(["text1", "text2", "text3"])
```
**framework**: Choose PyTorch or TensorFlow
```python
pipe = pipeline("task", framework="pt") # or "tf"
```
## Batch Processing
Process multiple inputs efficiently:
```python
classifier = pipeline("text-classification")
texts = ["Great product!", "Terrible experience", "Just okay"]
results = classifier(texts)
```
For large datasets, use generators or KeyDataset:
```python
from transformers.pipelines.pt_utils import KeyDataset
import datasets
dataset = datasets.load_dataset("dataset-name", split="test")
pipe = pipeline("task", device=0)
for output in pipe(KeyDataset(dataset, "text")):
print(output)
```
## Performance Optimization
### GPU Acceleration
Always specify device for GPU usage:
```python
pipe = pipeline("task", device=0)
```
### Mixed Precision
Use float16 for 2x speedup on supported GPUs:
```python
import torch
pipe = pipeline("task", torch_dtype=torch.float16, device=0)
```
### Batching Guidelines
- **CPU**: Usually skip batching
- **GPU with variable lengths**: May reduce efficiency
- **GPU with similar lengths**: Significant speedup
- **Real-time applications**: Skip batching (increases latency)
```python
# Good for throughput
pipe = pipeline("task", batch_size=32, device=0)
results = pipe(list_of_texts)
```
### Streaming Output
For text generation, stream tokens as they're generated:
```python
from transformers import TextStreamer
generator = pipeline("text-generation", model="gpt2", streamer=TextStreamer())
generator("The future of AI", max_length=100)
```
## Custom Pipeline Configuration
Specify tokenizer and model separately:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("model-id")
model = AutoModelForSequenceClassification.from_pretrained("model-id")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
```
Use custom pipeline classes:
```python
from transformers import TextClassificationPipeline
class CustomPipeline(TextClassificationPipeline):
def postprocess(self, model_outputs, **kwargs):
# Custom post-processing
return super().postprocess(model_outputs, **kwargs)
pipe = pipeline("text-classification", model="model-id", pipeline_class=CustomPipeline)
```
## Input Formats
Pipelines accept various input types:
**Text tasks**: Strings or lists of strings
```python
pipe("single text")
pipe(["text1", "text2"])
```
**Image tasks**: URLs, file paths, PIL Images, or numpy arrays
```python
pipe("https://example.com/image.jpg")
pipe("local/path/image.png")
pipe(PIL.Image.open("image.jpg"))
pipe(numpy_array)
```
**Audio tasks**: File paths, numpy arrays, or raw waveforms
```python
pipe("audio.mp3")
pipe(audio_array)
```
## Error Handling
Handle common issues:
```python
try:
result = pipe(input_data)
except Exception as e:
if "CUDA out of memory" in str(e):
# Reduce batch size or use CPU
pipe = pipeline("task", device=-1)
elif "does not appear to have a file named" in str(e):
# Model not found
print("Check model identifier")
else:
raise
```
## Best Practices
1. **Use pipelines for prototyping**: Fast iteration without boilerplate
2. **Specify models explicitly**: Default models may change
3. **Enable GPU when available**: Significant speedup
4. **Use batching for throughput**: When processing many inputs
5. **Consider memory usage**: Use float16 or smaller models for large batches
6. **Cache models locally**: Avoid repeated downloads