# FlowIO API Reference ## Overview FlowIO is a Python library for reading and writing Flow Cytometry Standard (FCS) files. It supports FCS versions 2.0, 3.0, and 3.1 with minimal dependencies. ## Installation ```bash pip install flowio ``` Supports Python 3.9 and later. ## Core Classes ### FlowData The primary class for working with FCS files. #### Constructor ```python FlowData(fcs_file, ignore_offset_error=False, ignore_offset_discrepancy=False, use_header_offsets=False, only_text=False, nextdata_offset=None, null_channel_list=None) ``` **Parameters:** - `fcs_file`: File path (str), Path object, or file handle - `ignore_offset_error` (bool): Ignore offset errors (default: False) - `ignore_offset_discrepancy` (bool): Ignore offset discrepancies between HEADER and TEXT sections (default: False) - `use_header_offsets` (bool): Use HEADER section offsets instead of TEXT section (default: False) - `only_text` (bool): Only parse the TEXT segment, skip DATA and ANALYSIS (default: False) - `nextdata_offset` (int): Byte offset for reading multi-dataset files - `null_channel_list` (list): List of PnN labels for null channels to exclude #### Attributes **File Information:** - `name`: Name of the FCS file - `file_size`: Size of the file in bytes - `version`: FCS version (e.g., '3.0', '3.1') - `header`: Dictionary containing HEADER segment information - `data_type`: Type of data format ('I', 'F', 'D', 'A') **Channel Information:** - `channel_count`: Number of channels in the dataset - `channels`: Dictionary mapping channel numbers to channel info - `pnn_labels`: List of PnN (short channel name) labels - `pns_labels`: List of PnS (descriptive stain name) labels - `pnr_values`: List of PnR (range) values for each channel - `fluoro_indices`: List of indices for fluorescence channels - `scatter_indices`: List of indices for scatter channels - `time_index`: Index of the time channel (or None) - `null_channels`: List of null channel indices **Event Data:** - `event_count`: Number of events (rows) in the dataset - `events`: Raw event data as bytes **Metadata:** - `text`: Dictionary of TEXT segment key-value pairs - `analysis`: Dictionary of ANALYSIS segment key-value pairs (if present) #### Methods ##### as_array() ```python as_array(preprocess=True) ``` Return event data as a 2-D NumPy array. **Parameters:** - `preprocess` (bool): Apply gain, logarithmic, and time scaling transformations (default: True) **Returns:** - NumPy ndarray with shape (event_count, channel_count) **Example:** ```python flow_data = FlowData('sample.fcs') events_array = flow_data.as_array() # Preprocessed data raw_array = flow_data.as_array(preprocess=False) # Raw data ``` ##### write_fcs() ```python write_fcs(filename, metadata=None) ``` Export the FlowData instance as a new FCS file. **Parameters:** - `filename` (str): Output file path - `metadata` (dict): Optional dictionary of TEXT segment keywords to add/update **Example:** ```python flow_data = FlowData('sample.fcs') flow_data.write_fcs('output.fcs', metadata={'$SRC': 'Modified data'}) ``` **Note:** Exports as FCS 3.1 with single-precision floating-point data. ## Utility Functions ### read_multiple_data_sets() ```python read_multiple_data_sets(fcs_file, ignore_offset_error=False, ignore_offset_discrepancy=False, use_header_offsets=False) ``` Read all datasets from an FCS file containing multiple datasets. **Parameters:** - Same as FlowData constructor (except `nextdata_offset`) **Returns:** - List of FlowData instances, one for each dataset **Example:** ```python from flowio import read_multiple_data_sets datasets = read_multiple_data_sets('multi_dataset.fcs') print(f"Found {len(datasets)} datasets") for i, dataset in enumerate(datasets): print(f"Dataset {i}: {dataset.event_count} events") ``` ### create_fcs() ```python create_fcs(filename, event_data, channel_names, opt_channel_names=None, metadata=None) ``` Create a new FCS file from event data. **Parameters:** - `filename` (str): Output file path - `event_data` (ndarray): 2-D NumPy array of event data (rows=events, columns=channels) - `channel_names` (list): List of PnN (short) channel names - `opt_channel_names` (list): Optional list of PnS (descriptive) channel names - `metadata` (dict): Optional dictionary of TEXT segment keywords **Example:** ```python import numpy as np from flowio import create_fcs # Create synthetic data events = np.random.rand(10000, 5) channels = ['FSC-A', 'SSC-A', 'FL1-A', 'FL2-A', 'Time'] opt_channels = ['Forward Scatter', 'Side Scatter', 'FITC', 'PE', 'Time'] create_fcs('synthetic.fcs', events, channels, opt_channel_names=opt_channels, metadata={'$SRC': 'Synthetic data'}) ``` ## Exception Classes ### FlowIOWarning Generic warning class for non-critical issues. ### PnEWarning Warning raised when PnE values are invalid during FCS file creation. ### FlowIOException Base exception class for FlowIO errors. ### FCSParsingError Raised when there are issues parsing an FCS file. ### DataOffsetDiscrepancyError Raised when the HEADER and TEXT sections provide different byte offsets for data segments. **Workaround:** Use `ignore_offset_discrepancy=True` parameter when creating FlowData instance. ### MultipleDataSetsError Raised when attempting to read a file with multiple datasets using the standard FlowData constructor. **Solution:** Use `read_multiple_data_sets()` function instead. ## FCS File Structure Reference FCS files consist of four segments: 1. **HEADER**: Contains FCS version and byte locations of other segments 2. **TEXT**: Key-value metadata pairs (delimited format) 3. **DATA**: Raw event data (binary, floating-point, or ASCII) 4. **ANALYSIS** (optional): Results from data processing ### Common TEXT Segment Keywords - `$BEGINDATA`, `$ENDDATA`: Byte offsets for DATA segment - `$BEGINANALYSIS`, `$ENDANALYSIS`: Byte offsets for ANALYSIS segment - `$BYTEORD`: Byte order (1,2,3,4 for little-endian; 4,3,2,1 for big-endian) - `$DATATYPE`: Data type ('I'=integer, 'F'=float, 'D'=double, 'A'=ASCII) - `$MODE`: Data mode ('L'=list mode, most common) - `$NEXTDATA`: Offset to next dataset (0 if single dataset) - `$PAR`: Number of parameters (channels) - `$TOT`: Total number of events - `PnN`: Short name for parameter n - `PnS`: Descriptive stain name for parameter n - `PnR`: Range (max value) for parameter n - `PnE`: Amplification exponent for parameter n (format: "a,b" where value = a * 10^(b*x)) - `PnG`: Amplification gain for parameter n ## Channel Types FlowIO automatically categorizes channels: - **Scatter channels**: FSC (forward scatter), SSC (side scatter) - **Fluorescence channels**: FL1, FL2, FITC, PE, etc. - **Time channel**: Usually labeled "Time" Access indices via: - `flow_data.scatter_indices` - `flow_data.fluoro_indices` - `flow_data.time_index` ## Data Preprocessing When calling `as_array(preprocess=True)`, FlowIO applies: 1. **Gain scaling**: Multiply by PnG value 2. **Logarithmic transformation**: Apply PnE exponential transformation if present 3. **Time scaling**: Convert time values to appropriate units To access raw, unprocessed data: `as_array(preprocess=False)` ## Best Practices 1. **Memory efficiency**: Use `only_text=True` when only metadata is needed 2. **Error handling**: Wrap file operations in try-except blocks for FCSParsingError 3. **Multi-dataset files**: Always use `read_multiple_data_sets()` if unsure about dataset count 4. **Offset issues**: If encountering offset errors, try `ignore_offset_discrepancy=True` 5. **Channel selection**: Use null_channel_list to exclude unwanted channels during parsing ## Integration with FlowKit For advanced flow cytometry analysis including compensation, gating, and GatingML support, consider using FlowKit library alongside FlowIO. FlowKit provides higher-level abstractions built on top of FlowIO's file parsing capabilities. ## Example Workflows ### Basic File Reading ```python from flowio import FlowData # Read FCS file flow = FlowData('experiment.fcs') # Print basic info print(f"Version: {flow.version}") print(f"Events: {flow.event_count}") print(f"Channels: {flow.channel_count}") print(f"Channel names: {flow.pnn_labels}") # Get event data events = flow.as_array() print(f"Data shape: {events.shape}") ``` ### Metadata Extraction ```python from flowio import FlowData flow = FlowData('sample.fcs', only_text=True) # Access metadata print(f"Acquisition date: {flow.text.get('$DATE', 'N/A')}") print(f"Instrument: {flow.text.get('$CYT', 'N/A')}") # Channel information for i, (pnn, pns) in enumerate(zip(flow.pnn_labels, flow.pns_labels)): print(f"Channel {i}: {pnn} ({pns})") ``` ### Creating New FCS Files ```python import numpy as np from flowio import create_fcs # Generate or process data data = np.random.rand(5000, 3) * 1000 # Define channels channels = ['FSC-A', 'SSC-A', 'FL1-A'] stains = ['Forward Scatter', 'Side Scatter', 'GFP'] # Create FCS file create_fcs('output.fcs', data, channels, opt_channel_names=stains, metadata={ '$SRC': 'Python script', '$DATE': '19-OCT-2025' }) ``` ### Processing Multi-Dataset Files ```python from flowio import read_multiple_data_sets # Read all datasets datasets = read_multiple_data_sets('multi.fcs') # Process each dataset for i, dataset in enumerate(datasets): print(f"\nDataset {i}:") print(f" Events: {dataset.event_count}") print(f" Channels: {dataset.pnn_labels}") # Get data array events = dataset.as_array() mean_values = events.mean(axis=0) print(f" Mean values: {mean_values}") ``` ### Modifying and Re-exporting ```python from flowio import FlowData # Read original file flow = FlowData('original.fcs') # Get event data events = flow.as_array(preprocess=False) # Modify data (example: apply custom transformation) events[:, 0] = events[:, 0] * 1.5 # Scale first channel # Note: Currently, FlowIO doesn't support direct modification of event data # For modifications, use create_fcs() instead: from flowio import create_fcs create_fcs('modified.fcs', events, flow.pnn_labels, opt_channel_names=flow.pns_labels, metadata=flow.text) ```