11 KiB
Tile Extraction
Overview
Tile extraction is the process of cropping smaller, manageable regions from large whole slide images. Histolab provides three main extraction strategies, each suited for different analysis needs. All tilers share common parameters and provide methods for previewing and extracting tiles.
Common Parameters
All tiler classes accept these parameters:
tile_size: tuple = (512, 512) # Tile dimensions in pixels (width, height)
level: int = 0 # Pyramid level for extraction (0=highest resolution)
check_tissue: bool = True # Filter tiles by tissue content
tissue_percent: float = 80.0 # Minimum tissue coverage (0-100)
pixel_overlap: int = 0 # Overlap between adjacent tiles (GridTiler only)
prefix: str = "" # Prefix for saved tile filenames
suffix: str = ".png" # File extension for saved tiles
extraction_mask: BinaryMask = BiggestTissueBoxMask() # Mask defining extraction region
RandomTiler
Purpose: Extract a fixed number of randomly positioned tiles from tissue regions.
from histolab.tiler import RandomTiler
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100, # Number of random tiles to extract
level=0,
seed=42, # Random seed for reproducibility
check_tissue=True,
tissue_percent=80.0
)
# Extract tiles
random_tiler.extract(slide, extraction_mask=TissueMask())
Key Parameters:
n_tiles: Number of random tiles to extractseed: Random seed for reproducible tile selectionmax_iter: Maximum attempts to find valid tiles (default 1000)
Use cases:
- Exploratory analysis of slide content
- Sampling diverse regions for training data
- Quick assessment of tissue characteristics
- Balanced dataset creation from multiple slides
Advantages:
- Computationally efficient
- Good for sampling diverse tissue morphologies
- Reproducible with seed parameter
- Fast execution
Limitations:
- May miss rare tissue patterns
- No guarantee of coverage
- Random distribution may not capture structured features
GridTiler
Purpose: Extract tiles systematically across tissue regions following a grid pattern.
from histolab.tiler import GridTiler
grid_tiler = GridTiler(
tile_size=(512, 512),
level=0,
check_tissue=True,
tissue_percent=80.0,
pixel_overlap=0 # Overlap in pixels between adjacent tiles
)
# Extract tiles
grid_tiler.extract(slide)
Key Parameters:
pixel_overlap: Number of overlapping pixels between adjacent tilespixel_overlap=0: Non-overlapping tilespixel_overlap=128: 128-pixel overlap on each side- Can be used for sliding window approaches
Use cases:
- Comprehensive slide coverage
- Spatial analysis requiring positional information
- Image reconstruction from tiles
- Semantic segmentation tasks
- Region-based analysis
Advantages:
- Complete tissue coverage
- Preserves spatial relationships
- Predictable tile positions
- Suitable for whole-slide analysis
Limitations:
- Computationally intensive for large slides
- May generate many background-heavy tiles (mitigated by
check_tissue) - Larger output datasets
Grid Pattern:
[Tile 1][Tile 2][Tile 3]
[Tile 4][Tile 5][Tile 6]
[Tile 7][Tile 8][Tile 9]
With pixel_overlap=64:
[Tile 1-overlap-Tile 2-overlap-Tile 3]
[ overlap overlap overlap]
[Tile 4-overlap-Tile 5-overlap-Tile 6]
ScoreTiler
Purpose: Extract top-ranked tiles based on custom scoring functions.
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50, # Number of top-scoring tiles to extract
level=0,
scorer=NucleiScorer(), # Scoring function
check_tissue=True
)
# Extract top-scoring tiles
score_tiler.extract(slide)
Key Parameters:
n_tiles: Number of top-scoring tiles to extractscorer: Scoring function (e.g.,NucleiScorer,CellularityScorer, custom scorer)
Use cases:
- Extracting most informative regions
- Prioritizing tiles with specific features (nuclei, cells, etc.)
- Quality-based tile selection
- Focusing on diagnostically relevant areas
- Training data curation
Advantages:
- Focuses on most informative tiles
- Reduces dataset size while maintaining quality
- Customizable with different scorers
- Efficient for targeted analysis
Limitations:
- Slower than RandomTiler (must score all candidate tiles)
- Requires appropriate scorer for task
- May miss low-scoring but relevant regions
Available Scorers
NucleiScorer
Scores tiles based on nuclei detection and density.
from histolab.scorer import NucleiScorer
nuclei_scorer = NucleiScorer()
How it works:
- Converts tile to grayscale
- Applies thresholding to detect nuclei
- Counts nuclei-like structures
- Assigns score based on nuclei density
Best for:
- Cell-rich tissue regions
- Tumor detection
- Mitosis analysis
- Areas with high cellular content
CellularityScorer
Scores tiles based on overall cellular content.
from histolab.scorer import CellularityScorer
cellularity_scorer = CellularityScorer()
Best for:
- Identifying cellular vs. stromal regions
- Tumor cellularity assessment
- Separating dense from sparse tissue areas
Custom Scorers
Create custom scoring functions for specific needs:
from histolab.scorer import Scorer
import numpy as np
class ColorVarianceScorer(Scorer):
def __call__(self, tile):
"""Score tiles based on color variance."""
tile_array = np.array(tile.image)
# Calculate color variance
variance = np.var(tile_array, axis=(0, 1)).sum()
return variance
# Use custom scorer
variance_scorer = ColorVarianceScorer()
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=30,
scorer=variance_scorer
)
Tile Preview with locate_tiles()
Preview tile locations before extraction to validate tiler configuration:
# Preview random tile locations
random_tiler.locate_tiles(
slide=slide,
extraction_mask=TissueMask(),
n_tiles=20 # Number of tiles to preview (for RandomTiler)
)
This displays the slide thumbnail with colored rectangles indicating tile positions.
Extraction Workflow
Basic Extraction
from histolab.slide import Slide
from histolab.tiler import RandomTiler
# Load slide
slide = Slide("slide.svs", processed_path="output/tiles/")
# Configure tiler
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42
)
# Extract tiles (saved to processed_path)
tiler.extract(slide)
Extraction with Logging
import logging
# Enable logging
logging.basicConfig(level=logging.INFO)
# Extract tiles with progress information
tiler.extract(slide)
# Output: INFO: Tile 1/100 saved...
# Output: INFO: Tile 2/100 saved...
Extraction with Report
# Generate CSV report with tile information
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
scorer=NucleiScorer()
)
# Extract and save report
score_tiler.extract(slide, report_path="tiles_report.csv")
# Report contains: tile name, coordinates, score, tissue percentage
Report format:
tile_name,x_coord,y_coord,level,score,tissue_percent
tile_001.png,10240,5120,0,0.89,95.2
tile_002.png,15360,7680,0,0.85,91.7
...
Advanced Extraction Patterns
Multi-Level Extraction
Extract tiles at different magnification levels:
# High resolution tiles (level 0)
high_res_tiler = RandomTiler(tile_size=(512, 512), n_tiles=50, level=0)
high_res_tiler.extract(slide)
# Medium resolution tiles (level 1)
med_res_tiler = RandomTiler(tile_size=(512, 512), n_tiles=50, level=1)
med_res_tiler.extract(slide)
# Low resolution tiles (level 2)
low_res_tiler = RandomTiler(tile_size=(512, 512), n_tiles=50, level=2)
low_res_tiler.extract(slide)
Hierarchical Extraction
Extract at multiple scales from same locations:
# Extract random locations at level 0
random_tiler_l0 = RandomTiler(
tile_size=(512, 512),
n_tiles=30,
level=0,
seed=42,
prefix="level0_"
)
random_tiler_l0.extract(slide)
# Extract same locations at level 1 (use same seed)
random_tiler_l1 = RandomTiler(
tile_size=(512, 512),
n_tiles=30,
level=1,
seed=42,
prefix="level1_"
)
random_tiler_l1.extract(slide)
Custom Tile Filtering
Apply additional filtering after extraction:
from PIL import Image
import numpy as np
from pathlib import Path
def filter_blurry_tiles(tile_dir, threshold=100):
"""Remove blurry tiles using Laplacian variance."""
for tile_path in Path(tile_dir).glob("*.png"):
img = Image.open(tile_path)
gray = np.array(img.convert('L'))
laplacian_var = cv2.Laplacian(gray, cv2.CV_64F).var()
if laplacian_var < threshold:
tile_path.unlink() # Remove blurry tile
print(f"Removed blurry tile: {tile_path.name}")
# Use after extraction
tiler.extract(slide)
filter_blurry_tiles("output/tiles/")
Best Practices
- Preview before extraction: Always use
locate_tiles()to verify tile placement - Use appropriate level: Match extraction level to analysis resolution requirements
- Set tissue_percent threshold: Adjust based on staining and tissue type (70-90% typical)
- Choose right tiler:
- RandomTiler for sampling and exploration
- GridTiler for comprehensive coverage
- ScoreTiler for targeted, quality-driven extraction
- Enable logging: Monitor extraction progress for large datasets
- Use seeds for reproducibility: Set random seeds in RandomTiler
- Consider storage: GridTiler can generate thousands of tiles per slide
- Validate tile quality: Check extracted tiles for artifacts, blur, or focus issues
Performance Optimization
- Extract at appropriate level: Lower levels (1, 2) extract faster
- Adjust tissue_percent: Higher thresholds reduce invalid tile attempts
- Use BiggestTissueBoxMask: Faster than TissueMask for single tissue sections
- Limit n_tiles: For RandomTiler and ScoreTiler
- Use pixel_overlap=0: For non-overlapping GridTiler extraction
Troubleshooting
Issue: No tiles extracted
Solutions:
- Lower
tissue_percentthreshold - Verify slide contains tissue (check thumbnail)
- Ensure extraction_mask captures tissue regions
- Check that tile_size is appropriate for slide resolution
Issue: Many background tiles extracted
Solutions:
- Enable
check_tissue=True - Increase
tissue_percentthreshold - Use appropriate mask (TissueMask vs. BiggestTissueBoxMask)
Issue: Extraction is very slow
Solutions:
- Extract at lower pyramid level (level=1 or 2)
- Reduce
n_tilesfor RandomTiler/ScoreTiler - Use RandomTiler instead of GridTiler for sampling
- Use BiggestTissueBoxMask instead of TissueMask
Issue: Tiles have too much overlap (GridTiler)
Solutions:
- Set
pixel_overlap=0for non-overlapping tiles - Reduce
pixel_overlapvalue