Back to blog
Feb 2025 6 min read

Processing Gigapixel Images Efficiently

PythonPerformanceImage Processing

A gigapixel image cannot fit in RAM. The moment you try to load one naively, your process runs out of memory. Working with WSI data demands a fundamentally different approach to image I/O, processing, and ML inference.

Tiled Access

Modern WSI formats store images as pyramid tilesets — the same image at multiple resolution levels, each split into 256×256 or 512×512 tiles. You never load the full image; you request specific tiles at specific zoom levels.

import openslide

slide = openslide.OpenSlide("sample.svs")

# Image dimensions at full resolution
print(slide.dimensions)  # (100000, 80000)

# Read a 512x512 tile at level 0 (full res)
tile = slide.read_region(
    location=(x, y),   # top-left corner
    level=0,            # zoom level
    size=(512, 512)
)

# Thumbnail for overview
thumbnail = slide.get_thumbnail((1024, 1024))

Parallel Tile Processing

A typical 40x slide has ~100,000 tiles. Processing them sequentially is too slow. We distribute tiles across a worker pool using Python multiprocessing, with each worker holding its own OpenSlide handle (OpenSlide is not thread-safe).

from concurrent.futures import ProcessPoolExecutor
from functools import partial

def process_tile(coords, slide_path, model_path):
    slide = openslide.OpenSlide(slide_path)
    x, y = coords
    tile = slide.read_region((x, y), 0, (512, 512))
    # run inference...
    return result

tile_coords = generate_tile_coords(slide.dimensions)

with ProcessPoolExecutor(max_workers=8) as executor:
    results = list(executor.map(
        partial(process_tile, slide_path=path, model_path=model),
        tile_coords
    ))

Memory Management

Even when tiling, loading tiles into numpy arrays accumulates quickly. Key strategies: process tiles in batches and delete references immediately, use generators instead of lists for tile sequences, and profile with memory_profiler to catch leaks in long-running pipeline processes.