10 KiB
FlowIO API Reference
Overview
FlowIO is a Python library for reading and writing Flow Cytometry Standard (FCS) files. It supports FCS versions 2.0, 3.0, and 3.1 with minimal dependencies.
Installation
pip install flowio
Supports Python 3.9 and later.
Core Classes
FlowData
The primary class for working with FCS files.
Constructor
FlowData(fcs_file,
ignore_offset_error=False,
ignore_offset_discrepancy=False,
use_header_offsets=False,
only_text=False,
nextdata_offset=None,
null_channel_list=None)
Parameters:
fcs_file: File path (str), Path object, or file handleignore_offset_error(bool): Ignore offset errors (default: False)ignore_offset_discrepancy(bool): Ignore offset discrepancies between HEADER and TEXT sections (default: False)use_header_offsets(bool): Use HEADER section offsets instead of TEXT section (default: False)only_text(bool): Only parse the TEXT segment, skip DATA and ANALYSIS (default: False)nextdata_offset(int): Byte offset for reading multi-dataset filesnull_channel_list(list): List of PnN labels for null channels to exclude
Attributes
File Information:
name: Name of the FCS filefile_size: Size of the file in bytesversion: FCS version (e.g., '3.0', '3.1')header: Dictionary containing HEADER segment informationdata_type: Type of data format ('I', 'F', 'D', 'A')
Channel Information:
channel_count: Number of channels in the datasetchannels: Dictionary mapping channel numbers to channel infopnn_labels: List of PnN (short channel name) labelspns_labels: List of PnS (descriptive stain name) labelspnr_values: List of PnR (range) values for each channelfluoro_indices: List of indices for fluorescence channelsscatter_indices: List of indices for scatter channelstime_index: Index of the time channel (or None)null_channels: List of null channel indices
Event Data:
event_count: Number of events (rows) in the datasetevents: Raw event data as bytes
Metadata:
text: Dictionary of TEXT segment key-value pairsanalysis: Dictionary of ANALYSIS segment key-value pairs (if present)
Methods
as_array()
as_array(preprocess=True)
Return event data as a 2-D NumPy array.
Parameters:
preprocess(bool): Apply gain, logarithmic, and time scaling transformations (default: True)
Returns:
- NumPy ndarray with shape (event_count, channel_count)
Example:
flow_data = FlowData('sample.fcs')
events_array = flow_data.as_array() # Preprocessed data
raw_array = flow_data.as_array(preprocess=False) # Raw data
write_fcs()
write_fcs(filename, metadata=None)
Export the FlowData instance as a new FCS file.
Parameters:
filename(str): Output file pathmetadata(dict): Optional dictionary of TEXT segment keywords to add/update
Example:
flow_data = FlowData('sample.fcs')
flow_data.write_fcs('output.fcs', metadata={'$SRC': 'Modified data'})
Note: Exports as FCS 3.1 with single-precision floating-point data.
Utility Functions
read_multiple_data_sets()
read_multiple_data_sets(fcs_file,
ignore_offset_error=False,
ignore_offset_discrepancy=False,
use_header_offsets=False)
Read all datasets from an FCS file containing multiple datasets.
Parameters:
- Same as FlowData constructor (except
nextdata_offset)
Returns:
- List of FlowData instances, one for each dataset
Example:
from flowio import read_multiple_data_sets
datasets = read_multiple_data_sets('multi_dataset.fcs')
print(f"Found {len(datasets)} datasets")
for i, dataset in enumerate(datasets):
print(f"Dataset {i}: {dataset.event_count} events")
create_fcs()
create_fcs(filename,
event_data,
channel_names,
opt_channel_names=None,
metadata=None)
Create a new FCS file from event data.
Parameters:
filename(str): Output file pathevent_data(ndarray): 2-D NumPy array of event data (rows=events, columns=channels)channel_names(list): List of PnN (short) channel namesopt_channel_names(list): Optional list of PnS (descriptive) channel namesmetadata(dict): Optional dictionary of TEXT segment keywords
Example:
import numpy as np
from flowio import create_fcs
# Create synthetic data
events = np.random.rand(10000, 5)
channels = ['FSC-A', 'SSC-A', 'FL1-A', 'FL2-A', 'Time']
opt_channels = ['Forward Scatter', 'Side Scatter', 'FITC', 'PE', 'Time']
create_fcs('synthetic.fcs',
events,
channels,
opt_channel_names=opt_channels,
metadata={'$SRC': 'Synthetic data'})
Exception Classes
FlowIOWarning
Generic warning class for non-critical issues.
PnEWarning
Warning raised when PnE values are invalid during FCS file creation.
FlowIOException
Base exception class for FlowIO errors.
FCSParsingError
Raised when there are issues parsing an FCS file.
DataOffsetDiscrepancyError
Raised when the HEADER and TEXT sections provide different byte offsets for data segments.
Workaround: Use ignore_offset_discrepancy=True parameter when creating FlowData instance.
MultipleDataSetsError
Raised when attempting to read a file with multiple datasets using the standard FlowData constructor.
Solution: Use read_multiple_data_sets() function instead.
FCS File Structure Reference
FCS files consist of four segments:
- HEADER: Contains FCS version and byte locations of other segments
- TEXT: Key-value metadata pairs (delimited format)
- DATA: Raw event data (binary, floating-point, or ASCII)
- ANALYSIS (optional): Results from data processing
Common TEXT Segment Keywords
$BEGINDATA,$ENDDATA: Byte offsets for DATA segment$BEGINANALYSIS,$ENDANALYSIS: Byte offsets for ANALYSIS segment$BYTEORD: Byte order (1,2,3,4 for little-endian; 4,3,2,1 for big-endian)$DATATYPE: Data type ('I'=integer, 'F'=float, 'D'=double, 'A'=ASCII)$MODE: Data mode ('L'=list mode, most common)$NEXTDATA: Offset to next dataset (0 if single dataset)$PAR: Number of parameters (channels)$TOT: Total number of eventsPnN: Short name for parameter nPnS: Descriptive stain name for parameter nPnR: Range (max value) for parameter nPnE: Amplification exponent for parameter n (format: "a,b" where value = a * 10^(b*x))PnG: Amplification gain for parameter n
Channel Types
FlowIO automatically categorizes channels:
- Scatter channels: FSC (forward scatter), SSC (side scatter)
- Fluorescence channels: FL1, FL2, FITC, PE, etc.
- Time channel: Usually labeled "Time"
Access indices via:
flow_data.scatter_indicesflow_data.fluoro_indicesflow_data.time_index
Data Preprocessing
When calling as_array(preprocess=True), FlowIO applies:
- Gain scaling: Multiply by PnG value
- Logarithmic transformation: Apply PnE exponential transformation if present
- Time scaling: Convert time values to appropriate units
To access raw, unprocessed data: as_array(preprocess=False)
Best Practices
- Memory efficiency: Use
only_text=Truewhen only metadata is needed - Error handling: Wrap file operations in try-except blocks for FCSParsingError
- Multi-dataset files: Always use
read_multiple_data_sets()if unsure about dataset count - Offset issues: If encountering offset errors, try
ignore_offset_discrepancy=True - Channel selection: Use null_channel_list to exclude unwanted channels during parsing
Integration with FlowKit
For advanced flow cytometry analysis including compensation, gating, and GatingML support, consider using FlowKit library alongside FlowIO. FlowKit provides higher-level abstractions built on top of FlowIO's file parsing capabilities.
Example Workflows
Basic File Reading
from flowio import FlowData
# Read FCS file
flow = FlowData('experiment.fcs')
# Print basic info
print(f"Version: {flow.version}")
print(f"Events: {flow.event_count}")
print(f"Channels: {flow.channel_count}")
print(f"Channel names: {flow.pnn_labels}")
# Get event data
events = flow.as_array()
print(f"Data shape: {events.shape}")
Metadata Extraction
from flowio import FlowData
flow = FlowData('sample.fcs', only_text=True)
# Access metadata
print(f"Acquisition date: {flow.text.get('$DATE', 'N/A')}")
print(f"Instrument: {flow.text.get('$CYT', 'N/A')}")
# Channel information
for i, (pnn, pns) in enumerate(zip(flow.pnn_labels, flow.pns_labels)):
print(f"Channel {i}: {pnn} ({pns})")
Creating New FCS Files
import numpy as np
from flowio import create_fcs
# Generate or process data
data = np.random.rand(5000, 3) * 1000
# Define channels
channels = ['FSC-A', 'SSC-A', 'FL1-A']
stains = ['Forward Scatter', 'Side Scatter', 'GFP']
# Create FCS file
create_fcs('output.fcs',
data,
channels,
opt_channel_names=stains,
metadata={
'$SRC': 'Python script',
'$DATE': '19-OCT-2025'
})
Processing Multi-Dataset Files
from flowio import read_multiple_data_sets
# Read all datasets
datasets = read_multiple_data_sets('multi.fcs')
# Process each dataset
for i, dataset in enumerate(datasets):
print(f"\nDataset {i}:")
print(f" Events: {dataset.event_count}")
print(f" Channels: {dataset.pnn_labels}")
# Get data array
events = dataset.as_array()
mean_values = events.mean(axis=0)
print(f" Mean values: {mean_values}")
Modifying and Re-exporting
from flowio import FlowData
# Read original file
flow = FlowData('original.fcs')
# Get event data
events = flow.as_array(preprocess=False)
# Modify data (example: apply custom transformation)
events[:, 0] = events[:, 0] * 1.5 # Scale first channel
# Note: Currently, FlowIO doesn't support direct modification of event data
# For modifications, use create_fcs() instead:
from flowio import create_fcs
create_fcs('modified.fcs',
events,
flow.pnn_labels,
opt_channel_names=flow.pns_labels,
metadata=flow.text)