Files
2025-11-30 08:30:10 +08:00

8.7 KiB

Pipeline API Reference

Overview

Pipelines provide the simplest way to use pre-trained models for inference. They abstract away tokenization, model loading, and post-processing, offering a unified interface for dozens of tasks.

Basic Usage

Create a pipeline by specifying a task:

from transformers import pipeline

# Auto-select default model for task
pipe = pipeline("text-classification")
result = pipe("This is great!")

Or specify a model:

pipe = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

Supported Tasks

Natural Language Processing

text-generation: Generate text continuations

generator = pipeline("text-generation", model="gpt2")
output = generator("Once upon a time", max_length=50, num_return_sequences=2)

text-classification: Classify text into categories

classifier = pipeline("text-classification")
result = classifier("I love this product!")  # Returns label and score

token-classification: Label individual tokens (NER, POS tagging)

ner = pipeline("token-classification", model="dslim/bert-base-NER")
entities = ner("Hugging Face is based in New York City")

question-answering: Extract answers from context

qa = pipeline("question-answering")
result = qa(question="What is the capital?", context="Paris is the capital of France.")

fill-mask: Predict masked tokens

unmasker = pipeline("fill-mask", model="bert-base-uncased")
result = unmasker("Paris is the [MASK] of France")

summarization: Summarize long texts

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Long article text...", max_length=130, min_length=30)

translation: Translate between languages

translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")

zero-shot-classification: Classify without training data

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier(
    "This is a course about Python programming",
    candidate_labels=["education", "politics", "business"]
)

sentiment-analysis: Alias for text-classification focused on sentiment

sentiment = pipeline("sentiment-analysis")
result = sentiment("This product exceeded my expectations!")

Computer Vision

image-classification: Classify images

classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
result = classifier("path/to/image.jpg")
# Or use PIL Image or URL
from PIL import Image
result = classifier(Image.open("image.jpg"))

object-detection: Detect objects in images

detector = pipeline("object-detection", model="facebook/detr-resnet-50")
results = detector("image.jpg")  # Returns bounding boxes and labels

image-segmentation: Segment images

segmenter = pipeline("image-segmentation", model="facebook/detr-resnet-50-panoptic")
segments = segmenter("image.jpg")

depth-estimation: Estimate depth from images

depth = pipeline("depth-estimation", model="Intel/dpt-large")
result = depth("image.jpg")

zero-shot-image-classification: Classify images without training

classifier = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")
result = classifier("image.jpg", candidate_labels=["cat", "dog", "bird"])

Audio

automatic-speech-recognition: Transcribe speech

asr = pipeline("automatic-speech-recognition", model="openai/whisper-base")
text = asr("audio.mp3")

audio-classification: Classify audio

classifier = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
result = classifier("audio.wav")

text-to-speech: Generate speech from text (with specific models)

tts = pipeline("text-to-speech", model="microsoft/speecht5_tts")
audio = tts("Hello, this is a test")

Multimodal

visual-question-answering: Answer questions about images

vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")
result = vqa(image="image.jpg", question="What color is the car?")

document-question-answering: Answer questions about documents

doc_qa = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
result = doc_qa(image="document.png", question="What is the invoice number?")

image-to-text: Generate captions for images

captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
caption = captioner("image.jpg")

Pipeline Parameters

Common Parameters

model: Model identifier or path

pipe = pipeline("task", model="model-id")

device: GPU device index (-1 for CPU, 0+ for GPU)

pipe = pipeline("task", device=0)  # Use first GPU

device_map: Automatic device allocation for large models

pipe = pipeline("task", model="large-model", device_map="auto")

dtype: Model precision (reduces memory)

import torch
pipe = pipeline("task", torch_dtype=torch.float16)

batch_size: Process multiple inputs at once

pipe = pipeline("task", batch_size=8)
results = pipe(["text1", "text2", "text3"])

framework: Choose PyTorch or TensorFlow

pipe = pipeline("task", framework="pt")  # or "tf"

Batch Processing

Process multiple inputs efficiently:

classifier = pipeline("text-classification")
texts = ["Great product!", "Terrible experience", "Just okay"]
results = classifier(texts)

For large datasets, use generators or KeyDataset:

from transformers.pipelines.pt_utils import KeyDataset
import datasets

dataset = datasets.load_dataset("dataset-name", split="test")
pipe = pipeline("task", device=0)

for output in pipe(KeyDataset(dataset, "text")):
    print(output)

Performance Optimization

GPU Acceleration

Always specify device for GPU usage:

pipe = pipeline("task", device=0)

Mixed Precision

Use float16 for 2x speedup on supported GPUs:

import torch
pipe = pipeline("task", torch_dtype=torch.float16, device=0)

Batching Guidelines

  • CPU: Usually skip batching
  • GPU with variable lengths: May reduce efficiency
  • GPU with similar lengths: Significant speedup
  • Real-time applications: Skip batching (increases latency)
# Good for throughput
pipe = pipeline("task", batch_size=32, device=0)
results = pipe(list_of_texts)

Streaming Output

For text generation, stream tokens as they're generated:

from transformers import TextStreamer

generator = pipeline("text-generation", model="gpt2", streamer=TextStreamer())
generator("The future of AI", max_length=100)

Custom Pipeline Configuration

Specify tokenizer and model separately:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("model-id")
model = AutoModelForSequenceClassification.from_pretrained("model-id")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

Use custom pipeline classes:

from transformers import TextClassificationPipeline

class CustomPipeline(TextClassificationPipeline):
    def postprocess(self, model_outputs, **kwargs):
        # Custom post-processing
        return super().postprocess(model_outputs, **kwargs)

pipe = pipeline("text-classification", model="model-id", pipeline_class=CustomPipeline)

Input Formats

Pipelines accept various input types:

Text tasks: Strings or lists of strings

pipe("single text")
pipe(["text1", "text2"])

Image tasks: URLs, file paths, PIL Images, or numpy arrays

pipe("https://example.com/image.jpg")
pipe("local/path/image.png")
pipe(PIL.Image.open("image.jpg"))
pipe(numpy_array)

Audio tasks: File paths, numpy arrays, or raw waveforms

pipe("audio.mp3")
pipe(audio_array)

Error Handling

Handle common issues:

try:
    result = pipe(input_data)
except Exception as e:
    if "CUDA out of memory" in str(e):
        # Reduce batch size or use CPU
        pipe = pipeline("task", device=-1)
    elif "does not appear to have a file named" in str(e):
        # Model not found
        print("Check model identifier")
    else:
        raise

Best Practices

  1. Use pipelines for prototyping: Fast iteration without boilerplate
  2. Specify models explicitly: Default models may change
  3. Enable GPU when available: Significant speedup
  4. Use batching for throughput: When processing many inputs
  5. Consider memory usage: Use float16 or smaller models for large batches
  6. Cache models locally: Avoid repeated downloads