zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

10 KiB

Raw Blame History

FlowIO API Reference

Overview

FlowIO is a Python library for reading and writing Flow Cytometry Standard (FCS) files. It supports FCS versions 2.0, 3.0, and 3.1 with minimal dependencies.

Installation

pip install flowio

Supports Python 3.9 and later.

Core Classes

FlowData

The primary class for working with FCS files.

Constructor

FlowData(fcs_file,
         ignore_offset_error=False,
         ignore_offset_discrepancy=False,
         use_header_offsets=False,
         only_text=False,
         nextdata_offset=None,
         null_channel_list=None)

Parameters:

fcs_file: File path (str), Path object, or file handle
ignore_offset_error (bool): Ignore offset errors (default: False)
ignore_offset_discrepancy (bool): Ignore offset discrepancies between HEADER and TEXT sections (default: False)
use_header_offsets (bool): Use HEADER section offsets instead of TEXT section (default: False)
only_text (bool): Only parse the TEXT segment, skip DATA and ANALYSIS (default: False)
nextdata_offset (int): Byte offset for reading multi-dataset files
null_channel_list (list): List of PnN labels for null channels to exclude

Attributes

File Information:

name: Name of the FCS file
file_size: Size of the file in bytes
version: FCS version (e.g., '3.0', '3.1')
header: Dictionary containing HEADER segment information
data_type: Type of data format ('I', 'F', 'D', 'A')

Channel Information:

channel_count: Number of channels in the dataset
channels: Dictionary mapping channel numbers to channel info
pnn_labels: List of PnN (short channel name) labels
pns_labels: List of PnS (descriptive stain name) labels
pnr_values: List of PnR (range) values for each channel
fluoro_indices: List of indices for fluorescence channels
scatter_indices: List of indices for scatter channels
time_index: Index of the time channel (or None)
null_channels: List of null channel indices

Event Data:

event_count: Number of events (rows) in the dataset
events: Raw event data as bytes

Metadata:

text: Dictionary of TEXT segment key-value pairs
analysis: Dictionary of ANALYSIS segment key-value pairs (if present)

Methods

as_array()

as_array(preprocess=True)

Return event data as a 2-D NumPy array.

Parameters:

preprocess (bool): Apply gain, logarithmic, and time scaling transformations (default: True)

Returns:

NumPy ndarray with shape (event_count, channel_count)

Example:

flow_data = FlowData('sample.fcs')
events_array = flow_data.as_array()  # Preprocessed data
raw_array = flow_data.as_array(preprocess=False)  # Raw data

write_fcs()

write_fcs(filename, metadata=None)

Export the FlowData instance as a new FCS file.

Parameters:

filename (str): Output file path
metadata (dict): Optional dictionary of TEXT segment keywords to add/update

Example:

flow_data = FlowData('sample.fcs')
flow_data.write_fcs('output.fcs', metadata={'$SRC': 'Modified data'})

Note: Exports as FCS 3.1 with single-precision floating-point data.

Utility Functions

read_multiple_data_sets()

read_multiple_data_sets(fcs_file,
                        ignore_offset_error=False,
                        ignore_offset_discrepancy=False,
                        use_header_offsets=False)

Read all datasets from an FCS file containing multiple datasets.

Parameters:

Same as FlowData constructor (except nextdata_offset)

Returns:

List of FlowData instances, one for each dataset

Example:

from flowio import read_multiple_data_sets

datasets = read_multiple_data_sets('multi_dataset.fcs')
print(f"Found {len(datasets)} datasets")
for i, dataset in enumerate(datasets):
    print(f"Dataset {i}: {dataset.event_count} events")

create_fcs()

create_fcs(filename,
           event_data,
           channel_names,
           opt_channel_names=None,
           metadata=None)

Create a new FCS file from event data.

Parameters:

filename (str): Output file path
event_data (ndarray): 2-D NumPy array of event data (rows=events, columns=channels)
channel_names (list): List of PnN (short) channel names
opt_channel_names (list): Optional list of PnS (descriptive) channel names
metadata (dict): Optional dictionary of TEXT segment keywords

Example:

import numpy as np
from flowio import create_fcs

# Create synthetic data
events = np.random.rand(10000, 5)
channels = ['FSC-A', 'SSC-A', 'FL1-A', 'FL2-A', 'Time']
opt_channels = ['Forward Scatter', 'Side Scatter', 'FITC', 'PE', 'Time']

create_fcs('synthetic.fcs',
           events,
           channels,
           opt_channel_names=opt_channels,
           metadata={'$SRC': 'Synthetic data'})

Exception Classes

FlowIOWarning

Generic warning class for non-critical issues.

PnEWarning

Warning raised when PnE values are invalid during FCS file creation.

FlowIOException

Base exception class for FlowIO errors.

FCSParsingError

Raised when there are issues parsing an FCS file.

DataOffsetDiscrepancyError

Raised when the HEADER and TEXT sections provide different byte offsets for data segments.

Workaround: Use ignore_offset_discrepancy=True parameter when creating FlowData instance.

MultipleDataSetsError

Raised when attempting to read a file with multiple datasets using the standard FlowData constructor.

Solution: Use read_multiple_data_sets() function instead.

FCS File Structure Reference

FCS files consist of four segments:

HEADER: Contains FCS version and byte locations of other segments
TEXT: Key-value metadata pairs (delimited format)
DATA: Raw event data (binary, floating-point, or ASCII)
ANALYSIS (optional): Results from data processing

Common TEXT Segment Keywords

$BEGINDATA, $ENDDATA: Byte offsets for DATA segment
$BEGINANALYSIS, $ENDANALYSIS: Byte offsets for ANALYSIS segment
$BYTEORD: Byte order (1,2,3,4 for little-endian; 4,3,2,1 for big-endian)
$DATATYPE: Data type ('I'=integer, 'F'=float, 'D'=double, 'A'=ASCII)
$MODE: Data mode ('L'=list mode, most common)
$NEXTDATA: Offset to next dataset (0 if single dataset)
$PAR: Number of parameters (channels)
$TOT: Total number of events
PnN: Short name for parameter n
PnS: Descriptive stain name for parameter n
PnR: Range (max value) for parameter n
PnE: Amplification exponent for parameter n (format: "a,b" where value = a * 10^(b*x))
PnG: Amplification gain for parameter n

Channel Types

FlowIO automatically categorizes channels:

Scatter channels: FSC (forward scatter), SSC (side scatter)
Fluorescence channels: FL1, FL2, FITC, PE, etc.
Time channel: Usually labeled "Time"

Access indices via:

flow_data.scatter_indices
flow_data.fluoro_indices
flow_data.time_index

Data Preprocessing

When calling as_array(preprocess=True), FlowIO applies:

Gain scaling: Multiply by PnG value
Logarithmic transformation: Apply PnE exponential transformation if present
Time scaling: Convert time values to appropriate units

To access raw, unprocessed data: as_array(preprocess=False)

Best Practices

Memory efficiency: Use only_text=True when only metadata is needed
Error handling: Wrap file operations in try-except blocks for FCSParsingError
Multi-dataset files: Always use read_multiple_data_sets() if unsure about dataset count
Offset issues: If encountering offset errors, try ignore_offset_discrepancy=True
Channel selection: Use null_channel_list to exclude unwanted channels during parsing

Integration with FlowKit

For advanced flow cytometry analysis including compensation, gating, and GatingML support, consider using FlowKit library alongside FlowIO. FlowKit provides higher-level abstractions built on top of FlowIO's file parsing capabilities.

Example Workflows

Basic File Reading

from flowio import FlowData

# Read FCS file
flow = FlowData('experiment.fcs')

# Print basic info
print(f"Version: {flow.version}")
print(f"Events: {flow.event_count}")
print(f"Channels: {flow.channel_count}")
print(f"Channel names: {flow.pnn_labels}")

# Get event data
events = flow.as_array()
print(f"Data shape: {events.shape}")

Metadata Extraction

from flowio import FlowData

flow = FlowData('sample.fcs', only_text=True)

# Access metadata
print(f"Acquisition date: {flow.text.get('$DATE', 'N/A')}")
print(f"Instrument: {flow.text.get('$CYT', 'N/A')}")

# Channel information
for i, (pnn, pns) in enumerate(zip(flow.pnn_labels, flow.pns_labels)):
    print(f"Channel {i}: {pnn} ({pns})")

Creating New FCS Files

import numpy as np
from flowio import create_fcs

# Generate or process data
data = np.random.rand(5000, 3) * 1000

# Define channels
channels = ['FSC-A', 'SSC-A', 'FL1-A']
stains = ['Forward Scatter', 'Side Scatter', 'GFP']

# Create FCS file
create_fcs('output.fcs',
           data,
           channels,
           opt_channel_names=stains,
           metadata={
               '$SRC': 'Python script',
               '$DATE': '19-OCT-2025'
           })

Processing Multi-Dataset Files

from flowio import read_multiple_data_sets

# Read all datasets
datasets = read_multiple_data_sets('multi.fcs')

# Process each dataset
for i, dataset in enumerate(datasets):
    print(f"\nDataset {i}:")
    print(f"  Events: {dataset.event_count}")
    print(f"  Channels: {dataset.pnn_labels}")

    # Get data array
    events = dataset.as_array()
    mean_values = events.mean(axis=0)
    print(f"  Mean values: {mean_values}")

Modifying and Re-exporting

from flowio import FlowData

# Read original file
flow = FlowData('original.fcs')

# Get event data
events = flow.as_array(preprocess=False)

# Modify data (example: apply custom transformation)
events[:, 0] = events[:, 0] * 1.5  # Scale first channel

# Note: Currently, FlowIO doesn't support direct modification of event data
# For modifications, use create_fcs() instead:
from flowio import create_fcs

create_fcs('modified.fcs',
           events,
           flow.pnn_labels,
           opt_channel_names=flow.pns_labels,
           metadata=flow.text)

10 KiB Raw Blame History

FlowIO API Reference

Overview

Installation

Core Classes

FlowData

Constructor

Attributes

Methods

as_array()

write_fcs()

Utility Functions

read_multiple_data_sets()

create_fcs()

Exception Classes

FlowIOWarning

PnEWarning

FlowIOException

FCSParsingError

DataOffsetDiscrepancyError

MultipleDataSetsError

FCS File Structure Reference

Common TEXT Segment Keywords

Channel Types

Data Preprocessing

Best Practices

Integration with FlowKit

Example Workflows

Basic File Reading

Metadata Extraction

Creating New FCS Files

Processing Multi-Dataset Files

Modifying and Re-exporting

10 KiB

Raw Blame History