Files
gh-k-dense-ai-claude-scient…/skills/flowio/references/api_reference.md
2025-11-30 08:30:10 +08:00

10 KiB

FlowIO API Reference

Overview

FlowIO is a Python library for reading and writing Flow Cytometry Standard (FCS) files. It supports FCS versions 2.0, 3.0, and 3.1 with minimal dependencies.

Installation

pip install flowio

Supports Python 3.9 and later.

Core Classes

FlowData

The primary class for working with FCS files.

Constructor

FlowData(fcs_file,
         ignore_offset_error=False,
         ignore_offset_discrepancy=False,
         use_header_offsets=False,
         only_text=False,
         nextdata_offset=None,
         null_channel_list=None)

Parameters:

  • fcs_file: File path (str), Path object, or file handle
  • ignore_offset_error (bool): Ignore offset errors (default: False)
  • ignore_offset_discrepancy (bool): Ignore offset discrepancies between HEADER and TEXT sections (default: False)
  • use_header_offsets (bool): Use HEADER section offsets instead of TEXT section (default: False)
  • only_text (bool): Only parse the TEXT segment, skip DATA and ANALYSIS (default: False)
  • nextdata_offset (int): Byte offset for reading multi-dataset files
  • null_channel_list (list): List of PnN labels for null channels to exclude

Attributes

File Information:

  • name: Name of the FCS file
  • file_size: Size of the file in bytes
  • version: FCS version (e.g., '3.0', '3.1')
  • header: Dictionary containing HEADER segment information
  • data_type: Type of data format ('I', 'F', 'D', 'A')

Channel Information:

  • channel_count: Number of channels in the dataset
  • channels: Dictionary mapping channel numbers to channel info
  • pnn_labels: List of PnN (short channel name) labels
  • pns_labels: List of PnS (descriptive stain name) labels
  • pnr_values: List of PnR (range) values for each channel
  • fluoro_indices: List of indices for fluorescence channels
  • scatter_indices: List of indices for scatter channels
  • time_index: Index of the time channel (or None)
  • null_channels: List of null channel indices

Event Data:

  • event_count: Number of events (rows) in the dataset
  • events: Raw event data as bytes

Metadata:

  • text: Dictionary of TEXT segment key-value pairs
  • analysis: Dictionary of ANALYSIS segment key-value pairs (if present)

Methods

as_array()
as_array(preprocess=True)

Return event data as a 2-D NumPy array.

Parameters:

  • preprocess (bool): Apply gain, logarithmic, and time scaling transformations (default: True)

Returns:

  • NumPy ndarray with shape (event_count, channel_count)

Example:

flow_data = FlowData('sample.fcs')
events_array = flow_data.as_array()  # Preprocessed data
raw_array = flow_data.as_array(preprocess=False)  # Raw data
write_fcs()
write_fcs(filename, metadata=None)

Export the FlowData instance as a new FCS file.

Parameters:

  • filename (str): Output file path
  • metadata (dict): Optional dictionary of TEXT segment keywords to add/update

Example:

flow_data = FlowData('sample.fcs')
flow_data.write_fcs('output.fcs', metadata={'$SRC': 'Modified data'})

Note: Exports as FCS 3.1 with single-precision floating-point data.

Utility Functions

read_multiple_data_sets()

read_multiple_data_sets(fcs_file,
                        ignore_offset_error=False,
                        ignore_offset_discrepancy=False,
                        use_header_offsets=False)

Read all datasets from an FCS file containing multiple datasets.

Parameters:

  • Same as FlowData constructor (except nextdata_offset)

Returns:

  • List of FlowData instances, one for each dataset

Example:

from flowio import read_multiple_data_sets

datasets = read_multiple_data_sets('multi_dataset.fcs')
print(f"Found {len(datasets)} datasets")
for i, dataset in enumerate(datasets):
    print(f"Dataset {i}: {dataset.event_count} events")

create_fcs()

create_fcs(filename,
           event_data,
           channel_names,
           opt_channel_names=None,
           metadata=None)

Create a new FCS file from event data.

Parameters:

  • filename (str): Output file path
  • event_data (ndarray): 2-D NumPy array of event data (rows=events, columns=channels)
  • channel_names (list): List of PnN (short) channel names
  • opt_channel_names (list): Optional list of PnS (descriptive) channel names
  • metadata (dict): Optional dictionary of TEXT segment keywords

Example:

import numpy as np
from flowio import create_fcs

# Create synthetic data
events = np.random.rand(10000, 5)
channels = ['FSC-A', 'SSC-A', 'FL1-A', 'FL2-A', 'Time']
opt_channels = ['Forward Scatter', 'Side Scatter', 'FITC', 'PE', 'Time']

create_fcs('synthetic.fcs',
           events,
           channels,
           opt_channel_names=opt_channels,
           metadata={'$SRC': 'Synthetic data'})

Exception Classes

FlowIOWarning

Generic warning class for non-critical issues.

PnEWarning

Warning raised when PnE values are invalid during FCS file creation.

FlowIOException

Base exception class for FlowIO errors.

FCSParsingError

Raised when there are issues parsing an FCS file.

DataOffsetDiscrepancyError

Raised when the HEADER and TEXT sections provide different byte offsets for data segments.

Workaround: Use ignore_offset_discrepancy=True parameter when creating FlowData instance.

MultipleDataSetsError

Raised when attempting to read a file with multiple datasets using the standard FlowData constructor.

Solution: Use read_multiple_data_sets() function instead.

FCS File Structure Reference

FCS files consist of four segments:

  1. HEADER: Contains FCS version and byte locations of other segments
  2. TEXT: Key-value metadata pairs (delimited format)
  3. DATA: Raw event data (binary, floating-point, or ASCII)
  4. ANALYSIS (optional): Results from data processing

Common TEXT Segment Keywords

  • $BEGINDATA, $ENDDATA: Byte offsets for DATA segment
  • $BEGINANALYSIS, $ENDANALYSIS: Byte offsets for ANALYSIS segment
  • $BYTEORD: Byte order (1,2,3,4 for little-endian; 4,3,2,1 for big-endian)
  • $DATATYPE: Data type ('I'=integer, 'F'=float, 'D'=double, 'A'=ASCII)
  • $MODE: Data mode ('L'=list mode, most common)
  • $NEXTDATA: Offset to next dataset (0 if single dataset)
  • $PAR: Number of parameters (channels)
  • $TOT: Total number of events
  • PnN: Short name for parameter n
  • PnS: Descriptive stain name for parameter n
  • PnR: Range (max value) for parameter n
  • PnE: Amplification exponent for parameter n (format: "a,b" where value = a * 10^(b*x))
  • PnG: Amplification gain for parameter n

Channel Types

FlowIO automatically categorizes channels:

  • Scatter channels: FSC (forward scatter), SSC (side scatter)
  • Fluorescence channels: FL1, FL2, FITC, PE, etc.
  • Time channel: Usually labeled "Time"

Access indices via:

  • flow_data.scatter_indices
  • flow_data.fluoro_indices
  • flow_data.time_index

Data Preprocessing

When calling as_array(preprocess=True), FlowIO applies:

  1. Gain scaling: Multiply by PnG value
  2. Logarithmic transformation: Apply PnE exponential transformation if present
  3. Time scaling: Convert time values to appropriate units

To access raw, unprocessed data: as_array(preprocess=False)

Best Practices

  1. Memory efficiency: Use only_text=True when only metadata is needed
  2. Error handling: Wrap file operations in try-except blocks for FCSParsingError
  3. Multi-dataset files: Always use read_multiple_data_sets() if unsure about dataset count
  4. Offset issues: If encountering offset errors, try ignore_offset_discrepancy=True
  5. Channel selection: Use null_channel_list to exclude unwanted channels during parsing

Integration with FlowKit

For advanced flow cytometry analysis including compensation, gating, and GatingML support, consider using FlowKit library alongside FlowIO. FlowKit provides higher-level abstractions built on top of FlowIO's file parsing capabilities.

Example Workflows

Basic File Reading

from flowio import FlowData

# Read FCS file
flow = FlowData('experiment.fcs')

# Print basic info
print(f"Version: {flow.version}")
print(f"Events: {flow.event_count}")
print(f"Channels: {flow.channel_count}")
print(f"Channel names: {flow.pnn_labels}")

# Get event data
events = flow.as_array()
print(f"Data shape: {events.shape}")

Metadata Extraction

from flowio import FlowData

flow = FlowData('sample.fcs', only_text=True)

# Access metadata
print(f"Acquisition date: {flow.text.get('$DATE', 'N/A')}")
print(f"Instrument: {flow.text.get('$CYT', 'N/A')}")

# Channel information
for i, (pnn, pns) in enumerate(zip(flow.pnn_labels, flow.pns_labels)):
    print(f"Channel {i}: {pnn} ({pns})")

Creating New FCS Files

import numpy as np
from flowio import create_fcs

# Generate or process data
data = np.random.rand(5000, 3) * 1000

# Define channels
channels = ['FSC-A', 'SSC-A', 'FL1-A']
stains = ['Forward Scatter', 'Side Scatter', 'GFP']

# Create FCS file
create_fcs('output.fcs',
           data,
           channels,
           opt_channel_names=stains,
           metadata={
               '$SRC': 'Python script',
               '$DATE': '19-OCT-2025'
           })

Processing Multi-Dataset Files

from flowio import read_multiple_data_sets

# Read all datasets
datasets = read_multiple_data_sets('multi.fcs')

# Process each dataset
for i, dataset in enumerate(datasets):
    print(f"\nDataset {i}:")
    print(f"  Events: {dataset.event_count}")
    print(f"  Channels: {dataset.pnn_labels}")

    # Get data array
    events = dataset.as_array()
    mean_values = events.mean(axis=0)
    print(f"  Mean values: {mean_values}")

Modifying and Re-exporting

from flowio import FlowData

# Read original file
flow = FlowData('original.fcs')

# Get event data
events = flow.as_array(preprocess=False)

# Modify data (example: apply custom transformation)
events[:, 0] = events[:, 0] * 1.5  # Scale first channel

# Note: Currently, FlowIO doesn't support direct modification of event data
# For modifications, use create_fcs() instead:
from flowio import create_fcs

create_fcs('modified.fcs',
           events,
           flow.pnn_labels,
           opt_channel_names=flow.pns_labels,
           metadata=flow.text)