Files
gh-k-dense-ai-claude-scient…/skills/exploratory-data-analysis/references/microscopy_imaging_formats.md
2025-11-30 08:30:10 +08:00

18 KiB

Microscopy and Imaging File Formats Reference

This reference covers file formats used in microscopy, medical imaging, remote sensing, and scientific image analysis.

Microscopy-Specific Formats

.tif / .tiff - Tagged Image File Format

Description: Flexible image format supporting multiple pages and metadata Typical Data: Microscopy images, z-stacks, time series, multi-channel Use Cases: Fluorescence microscopy, confocal imaging, biological imaging Python Libraries:

  • tifffile: tifffile.imread('file.tif') - Microscopy TIFF support
  • PIL/Pillow: Image.open('file.tif') - Basic TIFF
  • scikit-image: io.imread('file.tif')
  • AICSImageIO: Multi-format microscopy reader EDA Approach:
  • Image dimensions and bit depth
  • Multi-page/z-stack analysis
  • Metadata extraction (OME-TIFF)
  • Channel analysis and intensity distributions
  • Temporal dynamics (time-lapse)
  • Pixel size and spatial calibration
  • Histogram analysis per channel
  • Dynamic range utilization

.nd2 - Nikon NIS-Elements

Description: Proprietary Nikon microscope format Typical Data: Multi-dimensional microscopy (XYZCT) Use Cases: Nikon microscope data, confocal, widefield Python Libraries:

  • nd2reader: ND2Reader('file.nd2')
  • pims: pims.ND2_Reader('file.nd2')
  • AICSImageIO: Universal reader EDA Approach:
  • Experiment metadata extraction
  • Channel configurations
  • Time-lapse frame analysis
  • Z-stack depth and spacing
  • XY stage positions
  • Laser settings and power
  • Pixel binning information
  • Acquisition timestamps

.lif - Leica Image Format

Description: Leica microscope proprietary format Typical Data: Multi-experiment, multi-dimensional images Use Cases: Leica confocal and widefield data Python Libraries:

  • readlif: readlif.LifFile('file.lif')
  • AICSImageIO: LIF support
  • python-bioformats: Via Bio-Formats EDA Approach:
  • Multiple experiment detection
  • Image series enumeration
  • Metadata per experiment
  • Channel and timepoint structure
  • Physical dimensions extraction
  • Objective and detector information
  • Scan settings analysis

.czi - Carl Zeiss Image

Description: Zeiss microscope format Typical Data: Multi-dimensional microscopy with rich metadata Use Cases: Zeiss confocal, lightsheet, widefield Python Libraries:

  • czifile: czifile.CziFile('file.czi')
  • AICSImageIO: CZI support
  • pylibCZIrw: Official Zeiss library EDA Approach:
  • Scene and position analysis
  • Mosaic tile structure
  • Channel wavelength information
  • Acquisition mode detection
  • Scaling and calibration
  • Instrument configuration
  • ROI definitions

.oib / .oif - Olympus Image Format

Description: Olympus microscope formats Typical Data: Confocal and multiphoton imaging Use Cases: Olympus FluoView data Python Libraries:

  • AICSImageIO: OIB/OIF support
  • python-bioformats: Via Bio-Formats EDA Approach:
  • Directory structure validation (OIF)
  • Metadata file parsing
  • Channel configuration
  • Scan parameters
  • Objective and filter information
  • PMT settings

.vsi - Olympus VSI

Description: Olympus slide scanner format Typical Data: Whole slide imaging, large mosaics Use Cases: Virtual microscopy, pathology Python Libraries:

  • openslide-python: openslide.OpenSlide('file.vsi')
  • AICSImageIO: VSI support EDA Approach:
  • Pyramid level analysis
  • Tile structure and overlap
  • Macro and label images
  • Magnification levels
  • Whole slide statistics
  • Region detection

.ims - Imaris Format

Description: Bitplane Imaris HDF5-based format Typical Data: Large 3D/4D microscopy datasets Use Cases: 3D rendering, time-lapse analysis Python Libraries:

  • h5py: Direct HDF5 access
  • imaris_ims_file_reader: Specialized reader EDA Approach:
  • Resolution level analysis
  • Time point structure
  • Channel organization
  • Dataset hierarchy
  • Thumbnail generation
  • Memory-mapped access strategies
  • Chunking optimization

.lsm - Zeiss LSM

Description: Legacy Zeiss confocal format Typical Data: Confocal laser scanning microscopy Use Cases: Older Zeiss confocal data Python Libraries:

  • tifffile: LSM support (TIFF-based)
  • python-bioformats: LSM reading EDA Approach:
  • Similar to TIFF with LSM-specific metadata
  • Scan speed and resolution
  • Laser lines and power
  • Detector gain and offset
  • LUT information

.stk - MetaMorph Stack

Description: MetaMorph image stack format Typical Data: Time-lapse or z-stack sequences Use Cases: MetaMorph software output Python Libraries:

  • tifffile: STK is TIFF-based
  • python-bioformats: STK support EDA Approach:
  • Stack dimensionality
  • Plane metadata
  • Timing information
  • Stage positions
  • UIC tags parsing

.dv - DeltaVision

Description: Applied Precision DeltaVision format Typical Data: Deconvolution microscopy Use Cases: DeltaVision microscope data Python Libraries:

  • mrc: Can read DV (MRC-related)
  • AICSImageIO: DV support EDA Approach:
  • Wave information (channels)
  • Extended header analysis
  • Lens and magnification
  • Deconvolution status
  • Time stamps per section

.mrc - Medical Research Council

Description: Electron microscopy format Typical Data: EM images, cryo-EM, tomography Use Cases: Structural biology, electron microscopy Python Libraries:

  • mrcfile: mrcfile.open('file.mrc')
  • EMAN2: EM-specific tools EDA Approach:
  • Volume dimensions
  • Voxel size and units
  • Origin and map statistics
  • Symmetry information
  • Extended header analysis
  • Density statistics
  • Header consistency validation

.dm3 / .dm4 - Gatan Digital Micrograph

Description: Gatan TEM/STEM format Typical Data: Transmission electron microscopy Use Cases: TEM imaging and analysis Python Libraries:

  • hyperspy: hs.load('file.dm3')
  • ncempy: ncempy.io.dm.dmReader('file.dm3') EDA Approach:
  • Microscope parameters
  • Energy dispersive spectroscopy data
  • Diffraction patterns
  • Calibration information
  • Tag structure analysis
  • Image series handling

.eer - Electron Event Representation

Description: Direct electron detector format Typical Data: Electron counting data from detectors Use Cases: Cryo-EM data collection Python Libraries:

  • mrcfile: Some EER support
  • Vendor-specific tools (Gatan, TFS) EDA Approach:
  • Event counting statistics
  • Frame rate and dose
  • Detector configuration
  • Motion correction assessment
  • Gain reference validation

.ser - TIA Series

Description: FEI/TFS TIA format Typical Data: EM image series Use Cases: FEI/Thermo Fisher EM data Python Libraries:

  • hyperspy: SER support
  • ncempy: TIA reader EDA Approach:
  • Series structure
  • Calibration data
  • Acquisition metadata
  • Time stamps
  • Multi-dimensional data organization

Medical and Biological Imaging

.dcm - DICOM

Description: Digital Imaging and Communications in Medicine Typical Data: Medical images with patient/study metadata Use Cases: Clinical imaging, radiology, CT, MRI, PET Python Libraries:

  • pydicom: pydicom.dcmread('file.dcm')
  • SimpleITK: sitk.ReadImage('file.dcm')
  • nibabel: Limited DICOM support EDA Approach:
  • Patient metadata extraction (anonymization check)
  • Modality-specific analysis
  • Series and study organization
  • Slice thickness and spacing
  • Window/level settings
  • Hounsfield units (CT)
  • Image orientation and position
  • Multi-frame analysis

.nii / .nii.gz - NIfTI

Description: Neuroimaging Informatics Technology Initiative Typical Data: Brain imaging, fMRI, structural MRI Use Cases: Neuroimaging research, brain analysis Python Libraries:

  • nibabel: nibabel.load('file.nii')
  • nilearn: Neuroimaging with ML
  • SimpleITK: NIfTI support EDA Approach:
  • Volume dimensions and voxel size
  • Affine transformation matrix
  • Time series analysis (fMRI)
  • Intensity distribution
  • Brain extraction quality
  • Registration assessment
  • Orientation validation
  • Header information consistency

.mnc - MINC Format

Description: Medical Image NetCDF Typical Data: Medical imaging (predecessor to NIfTI) Use Cases: Legacy neuroimaging data Python Libraries:

  • pyminc: MINC-specific tools
  • nibabel: MINC support EDA Approach:
  • Similar to NIfTI
  • NetCDF structure exploration
  • Dimension ordering
  • Metadata extraction

.nrrd - Nearly Raw Raster Data

Description: Medical imaging format with detached header Typical Data: Medical images, research imaging Use Cases: 3D Slicer, ITK-based applications Python Libraries:

  • pynrrd: nrrd.read('file.nrrd')
  • SimpleITK: NRRD support EDA Approach:
  • Header field analysis
  • Encoding format
  • Dimension and spacing
  • Orientation matrix
  • Compression assessment
  • Endianness handling

.mha / .mhd - MetaImage

Description: MetaImage format (ITK) Typical Data: Medical/scientific 3D images Use Cases: ITK/SimpleITK applications Python Libraries:

  • SimpleITK: Native MHA/MHD support
  • itk: Direct ITK integration EDA Approach:
  • Header-data file pairing (MHD)
  • Transform matrix
  • Element spacing
  • Compression format
  • Data type and dimensions

.hdr / .img - Analyze Format

Description: Legacy medical imaging format Typical Data: Brain imaging (pre-NIfTI) Use Cases: Old neuroimaging datasets Python Libraries:

  • nibabel: Analyze support
  • Conversion to NIfTI recommended EDA Approach:
  • Header-image pairing validation
  • Byte order issues
  • Conversion to modern formats
  • Metadata limitations

Scientific Image Formats

.png - Portable Network Graphics

Description: Lossless compressed image format Typical Data: 2D images, screenshots, processed data Use Cases: Publication figures, lossless storage Python Libraries:

  • PIL/Pillow: Image.open('file.png')
  • scikit-image: io.imread('file.png')
  • imageio: imageio.imread('file.png') EDA Approach:
  • Bit depth analysis (8-bit, 16-bit)
  • Color mode (grayscale, RGB, palette)
  • Metadata (PNG chunks)
  • Transparency handling
  • Compression efficiency
  • Histogram analysis

.jpg / .jpeg - Joint Photographic Experts Group

Description: Lossy compressed image format Typical Data: Natural images, photos Use Cases: Visualization, web graphics (not raw data) Python Libraries:

  • PIL/Pillow: Standard JPEG support
  • scikit-image: JPEG reading EDA Approach:
  • Compression artifacts detection
  • Quality factor estimation
  • Color space (RGB, grayscale)
  • EXIF metadata
  • Quantization table analysis
  • Note: Not suitable for quantitative analysis

.bmp - Bitmap Image

Description: Uncompressed raster image Typical Data: Simple images, screenshots Use Cases: Compatibility, simple storage Python Libraries:

  • PIL/Pillow: BMP support
  • scikit-image: BMP reading EDA Approach:
  • Color depth
  • Palette analysis (if indexed)
  • File size efficiency
  • Pixel format validation

.gif - Graphics Interchange Format

Description: Image format with animation support Typical Data: Animated images, simple graphics Use Cases: Animations, time-lapse visualization Python Libraries:

  • PIL/Pillow: GIF support
  • imageio: Better GIF animation support EDA Approach:
  • Frame count and timing
  • Palette limitations (256 colors)
  • Loop count
  • Disposal method
  • Transparency handling

.svg - Scalable Vector Graphics

Description: XML-based vector graphics Typical Data: Vector drawings, plots, diagrams Use Cases: Publication-quality figures, plots Python Libraries:

  • svgpathtools: Path manipulation
  • cairosvg: Rasterization
  • lxml: XML parsing EDA Approach:
  • Element structure analysis
  • Style information
  • Viewbox and dimensions
  • Path complexity
  • Text element extraction
  • Layer organization

.eps - Encapsulated PostScript

Description: Vector graphics format Typical Data: Publication figures Use Cases: Legacy publication graphics Python Libraries:

  • PIL/Pillow: Basic EPS rasterization
  • ghostscript via subprocess EDA Approach:
  • Bounding box information
  • Preview image validation
  • Font embedding
  • Conversion to modern formats

.pdf (Images)

Description: Portable Document Format with images Typical Data: Publication figures, multi-page documents Use Cases: Publication, data presentation Python Libraries:

  • PyMuPDF/fitz: fitz.open('file.pdf')
  • pdf2image: Rasterization
  • pdfplumber: Text and layout extraction EDA Approach:
  • Page count
  • Image extraction
  • Resolution and DPI
  • Embedded fonts and metadata
  • Compression methods
  • Image vs vector content

.fig - MATLAB Figure

Description: MATLAB figure file Typical Data: MATLAB plots and figures Use Cases: MATLAB data visualization Python Libraries:

  • Custom parsers (MAT file structure)
  • Conversion to other formats EDA Approach:
  • Figure structure
  • Data extraction from plots
  • Axes and label information
  • Plot type identification

.hdf5 (Imaging Specific)

Description: HDF5 for large imaging datasets Typical Data: High-content screening, large microscopy Use Cases: BigDataViewer, large-scale imaging Python Libraries:

  • h5py: Universal HDF5 access
  • Imaging-specific readers (BigDataViewer) EDA Approach:
  • Dataset hierarchy
  • Chunk and compression strategy
  • Multi-resolution pyramid
  • Metadata organization
  • Memory-mapped access
  • Parallel I/O performance

.zarr - Chunked Array Storage

Description: Cloud-optimized array storage Typical Data: Large imaging datasets, OME-ZARR Use Cases: Cloud microscopy, large-scale analysis Python Libraries:

  • zarr: zarr.open('file.zarr')
  • ome-zarr-py: OME-ZARR support EDA Approach:
  • Chunk size optimization
  • Compression codec analysis
  • Multi-scale representation
  • Array dimensions and dtype
  • Metadata structure (OME)
  • Cloud access patterns

.raw - Raw Image Data

Description: Unformatted binary pixel data Typical Data: Raw detector output Use Cases: Custom imaging systems Python Libraries:

  • numpy: np.fromfile() with dtype
  • imageio: Raw format plugins EDA Approach:
  • Dimensions determination (external info needed)
  • Byte order and data type
  • Header presence detection
  • Pixel value range
  • Noise characteristics

.bin - Binary Image Data

Description: Generic binary image format Typical Data: Raw or custom-formatted images Use Cases: Instrument-specific outputs Python Libraries:

  • numpy: Custom binary reading
  • struct: For structured binary data EDA Approach:
  • Format specification required
  • Header parsing (if present)
  • Data type inference
  • Dimension extraction
  • Validation with known parameters

Image Analysis Formats

.roi - ImageJ ROI

Description: ImageJ region of interest format Typical Data: Geometric ROIs, selections Use Cases: ImageJ/Fiji analysis workflows Python Libraries:

  • read-roi: read_roi.read_roi_file('file.roi')
  • roifile: ROI manipulation EDA Approach:
  • ROI type analysis (rectangle, polygon, etc.)
  • Coordinate extraction
  • ROI properties (area, perimeter)
  • Group analysis (ROI sets)
  • Z-position and time information

.zip (ROI sets)

Description: ZIP archive of ImageJ ROIs Typical Data: Multiple ROI files Use Cases: Batch ROI analysis Python Libraries:

  • read-roi: read_roi.read_roi_zip('file.zip')
  • Standard zipfile module EDA Approach:
  • ROI count in set
  • ROI type distribution
  • Spatial distribution
  • Overlapping ROI detection
  • Naming conventions

.ome.tif / .ome.tiff - OME-TIFF

Description: TIFF with OME-XML metadata Typical Data: Standardized microscopy with rich metadata Use Cases: Bio-Formats compatible storage Python Libraries:

  • tifffile: OME-TIFF support
  • AICSImageIO: OME reading
  • python-bioformats: Bio-Formats integration EDA Approach:
  • OME-XML validation
  • Physical dimensions extraction
  • Channel naming and wavelengths
  • Plane positions (Z, C, T)
  • Instrument metadata
  • Bio-Formats compatibility

.ome.zarr - OME-ZARR

Description: OME-NGFF specification on ZARR Typical Data: Next-generation file format for bioimaging Use Cases: Cloud-native imaging, large datasets Python Libraries:

  • ome-zarr-py: Official implementation
  • zarr: Underlying array storage EDA Approach:
  • Multiscale resolution levels
  • Metadata compliance with OME-NGFF spec
  • Coordinate transformations
  • Label and ROI handling
  • Cloud storage optimization
  • Chunk access patterns

.klb - Keller Lab Block

Description: Fast microscopy format for large data Typical Data: Lightsheet microscopy, time-lapse Use Cases: High-throughput imaging Python Libraries:

  • pyklb: KLB reading and writing EDA Approach:
  • Compression efficiency
  • Block structure
  • Multi-resolution support
  • Read performance benchmarking
  • Metadata extraction

.vsi - Whole Slide Imaging

Description: Virtual slide format (multiple vendors) Typical Data: Pathology slides, large mosaics Use Cases: Digital pathology Python Libraries:

  • openslide-python: Multi-format WSI
  • tiffslide: Pure Python alternative EDA Approach:
  • Pyramid level count
  • Downsampling factors
  • Associated images (macro, label)
  • Tile size and overlap
  • MPP (microns per pixel)
  • Background detection
  • Tissue segmentation

.ndpi - Hamamatsu NanoZoomer

Description: Hamamatsu slide scanner format Typical Data: Whole slide pathology images Use Cases: Digital pathology workflows Python Libraries:

  • openslide-python: NDPI support EDA Approach:
  • Multi-resolution pyramid
  • Lens and objective information
  • Scan area and magnification
  • Focal plane information
  • Tissue detection

.svs - Aperio ScanScope

Description: Aperio whole slide format Typical Data: Digital pathology slides Use Cases: Pathology image analysis Python Libraries:

  • openslide-python: SVS support EDA Approach:
  • Pyramid structure
  • MPP calibration
  • Label and macro images
  • Compression quality
  • Thumbnail generation

.scn - Leica SCN

Description: Leica slide scanner format Typical Data: Whole slide imaging Use Cases: Digital pathology Python Libraries:

  • openslide-python: SCN support EDA Approach:
  • Tile structure analysis
  • Collection organization
  • Metadata extraction
  • Magnification levels