Initial commit
This commit is contained in:
372
skills/flowio/references/api_reference.md
Normal file
372
skills/flowio/references/api_reference.md
Normal file
@@ -0,0 +1,372 @@
|
||||
# FlowIO API Reference
|
||||
|
||||
## Overview
|
||||
|
||||
FlowIO is a Python library for reading and writing Flow Cytometry Standard (FCS) files. It supports FCS versions 2.0, 3.0, and 3.1 with minimal dependencies.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install flowio
|
||||
```
|
||||
|
||||
Supports Python 3.9 and later.
|
||||
|
||||
## Core Classes
|
||||
|
||||
### FlowData
|
||||
|
||||
The primary class for working with FCS files.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
FlowData(fcs_file,
|
||||
ignore_offset_error=False,
|
||||
ignore_offset_discrepancy=False,
|
||||
use_header_offsets=False,
|
||||
only_text=False,
|
||||
nextdata_offset=None,
|
||||
null_channel_list=None)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `fcs_file`: File path (str), Path object, or file handle
|
||||
- `ignore_offset_error` (bool): Ignore offset errors (default: False)
|
||||
- `ignore_offset_discrepancy` (bool): Ignore offset discrepancies between HEADER and TEXT sections (default: False)
|
||||
- `use_header_offsets` (bool): Use HEADER section offsets instead of TEXT section (default: False)
|
||||
- `only_text` (bool): Only parse the TEXT segment, skip DATA and ANALYSIS (default: False)
|
||||
- `nextdata_offset` (int): Byte offset for reading multi-dataset files
|
||||
- `null_channel_list` (list): List of PnN labels for null channels to exclude
|
||||
|
||||
#### Attributes
|
||||
|
||||
**File Information:**
|
||||
- `name`: Name of the FCS file
|
||||
- `file_size`: Size of the file in bytes
|
||||
- `version`: FCS version (e.g., '3.0', '3.1')
|
||||
- `header`: Dictionary containing HEADER segment information
|
||||
- `data_type`: Type of data format ('I', 'F', 'D', 'A')
|
||||
|
||||
**Channel Information:**
|
||||
- `channel_count`: Number of channels in the dataset
|
||||
- `channels`: Dictionary mapping channel numbers to channel info
|
||||
- `pnn_labels`: List of PnN (short channel name) labels
|
||||
- `pns_labels`: List of PnS (descriptive stain name) labels
|
||||
- `pnr_values`: List of PnR (range) values for each channel
|
||||
- `fluoro_indices`: List of indices for fluorescence channels
|
||||
- `scatter_indices`: List of indices for scatter channels
|
||||
- `time_index`: Index of the time channel (or None)
|
||||
- `null_channels`: List of null channel indices
|
||||
|
||||
**Event Data:**
|
||||
- `event_count`: Number of events (rows) in the dataset
|
||||
- `events`: Raw event data as bytes
|
||||
|
||||
**Metadata:**
|
||||
- `text`: Dictionary of TEXT segment key-value pairs
|
||||
- `analysis`: Dictionary of ANALYSIS segment key-value pairs (if present)
|
||||
|
||||
#### Methods
|
||||
|
||||
##### as_array()
|
||||
|
||||
```python
|
||||
as_array(preprocess=True)
|
||||
```
|
||||
|
||||
Return event data as a 2-D NumPy array.
|
||||
|
||||
**Parameters:**
|
||||
- `preprocess` (bool): Apply gain, logarithmic, and time scaling transformations (default: True)
|
||||
|
||||
**Returns:**
|
||||
- NumPy ndarray with shape (event_count, channel_count)
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
flow_data = FlowData('sample.fcs')
|
||||
events_array = flow_data.as_array() # Preprocessed data
|
||||
raw_array = flow_data.as_array(preprocess=False) # Raw data
|
||||
```
|
||||
|
||||
##### write_fcs()
|
||||
|
||||
```python
|
||||
write_fcs(filename, metadata=None)
|
||||
```
|
||||
|
||||
Export the FlowData instance as a new FCS file.
|
||||
|
||||
**Parameters:**
|
||||
- `filename` (str): Output file path
|
||||
- `metadata` (dict): Optional dictionary of TEXT segment keywords to add/update
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
flow_data = FlowData('sample.fcs')
|
||||
flow_data.write_fcs('output.fcs', metadata={'$SRC': 'Modified data'})
|
||||
```
|
||||
|
||||
**Note:** Exports as FCS 3.1 with single-precision floating-point data.
|
||||
|
||||
## Utility Functions
|
||||
|
||||
### read_multiple_data_sets()
|
||||
|
||||
```python
|
||||
read_multiple_data_sets(fcs_file,
|
||||
ignore_offset_error=False,
|
||||
ignore_offset_discrepancy=False,
|
||||
use_header_offsets=False)
|
||||
```
|
||||
|
||||
Read all datasets from an FCS file containing multiple datasets.
|
||||
|
||||
**Parameters:**
|
||||
- Same as FlowData constructor (except `nextdata_offset`)
|
||||
|
||||
**Returns:**
|
||||
- List of FlowData instances, one for each dataset
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
from flowio import read_multiple_data_sets
|
||||
|
||||
datasets = read_multiple_data_sets('multi_dataset.fcs')
|
||||
print(f"Found {len(datasets)} datasets")
|
||||
for i, dataset in enumerate(datasets):
|
||||
print(f"Dataset {i}: {dataset.event_count} events")
|
||||
```
|
||||
|
||||
### create_fcs()
|
||||
|
||||
```python
|
||||
create_fcs(filename,
|
||||
event_data,
|
||||
channel_names,
|
||||
opt_channel_names=None,
|
||||
metadata=None)
|
||||
```
|
||||
|
||||
Create a new FCS file from event data.
|
||||
|
||||
**Parameters:**
|
||||
- `filename` (str): Output file path
|
||||
- `event_data` (ndarray): 2-D NumPy array of event data (rows=events, columns=channels)
|
||||
- `channel_names` (list): List of PnN (short) channel names
|
||||
- `opt_channel_names` (list): Optional list of PnS (descriptive) channel names
|
||||
- `metadata` (dict): Optional dictionary of TEXT segment keywords
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
import numpy as np
|
||||
from flowio import create_fcs
|
||||
|
||||
# Create synthetic data
|
||||
events = np.random.rand(10000, 5)
|
||||
channels = ['FSC-A', 'SSC-A', 'FL1-A', 'FL2-A', 'Time']
|
||||
opt_channels = ['Forward Scatter', 'Side Scatter', 'FITC', 'PE', 'Time']
|
||||
|
||||
create_fcs('synthetic.fcs',
|
||||
events,
|
||||
channels,
|
||||
opt_channel_names=opt_channels,
|
||||
metadata={'$SRC': 'Synthetic data'})
|
||||
```
|
||||
|
||||
## Exception Classes
|
||||
|
||||
### FlowIOWarning
|
||||
|
||||
Generic warning class for non-critical issues.
|
||||
|
||||
### PnEWarning
|
||||
|
||||
Warning raised when PnE values are invalid during FCS file creation.
|
||||
|
||||
### FlowIOException
|
||||
|
||||
Base exception class for FlowIO errors.
|
||||
|
||||
### FCSParsingError
|
||||
|
||||
Raised when there are issues parsing an FCS file.
|
||||
|
||||
### DataOffsetDiscrepancyError
|
||||
|
||||
Raised when the HEADER and TEXT sections provide different byte offsets for data segments.
|
||||
|
||||
**Workaround:** Use `ignore_offset_discrepancy=True` parameter when creating FlowData instance.
|
||||
|
||||
### MultipleDataSetsError
|
||||
|
||||
Raised when attempting to read a file with multiple datasets using the standard FlowData constructor.
|
||||
|
||||
**Solution:** Use `read_multiple_data_sets()` function instead.
|
||||
|
||||
## FCS File Structure Reference
|
||||
|
||||
FCS files consist of four segments:
|
||||
|
||||
1. **HEADER**: Contains FCS version and byte locations of other segments
|
||||
2. **TEXT**: Key-value metadata pairs (delimited format)
|
||||
3. **DATA**: Raw event data (binary, floating-point, or ASCII)
|
||||
4. **ANALYSIS** (optional): Results from data processing
|
||||
|
||||
### Common TEXT Segment Keywords
|
||||
|
||||
- `$BEGINDATA`, `$ENDDATA`: Byte offsets for DATA segment
|
||||
- `$BEGINANALYSIS`, `$ENDANALYSIS`: Byte offsets for ANALYSIS segment
|
||||
- `$BYTEORD`: Byte order (1,2,3,4 for little-endian; 4,3,2,1 for big-endian)
|
||||
- `$DATATYPE`: Data type ('I'=integer, 'F'=float, 'D'=double, 'A'=ASCII)
|
||||
- `$MODE`: Data mode ('L'=list mode, most common)
|
||||
- `$NEXTDATA`: Offset to next dataset (0 if single dataset)
|
||||
- `$PAR`: Number of parameters (channels)
|
||||
- `$TOT`: Total number of events
|
||||
- `PnN`: Short name for parameter n
|
||||
- `PnS`: Descriptive stain name for parameter n
|
||||
- `PnR`: Range (max value) for parameter n
|
||||
- `PnE`: Amplification exponent for parameter n (format: "a,b" where value = a * 10^(b*x))
|
||||
- `PnG`: Amplification gain for parameter n
|
||||
|
||||
## Channel Types
|
||||
|
||||
FlowIO automatically categorizes channels:
|
||||
|
||||
- **Scatter channels**: FSC (forward scatter), SSC (side scatter)
|
||||
- **Fluorescence channels**: FL1, FL2, FITC, PE, etc.
|
||||
- **Time channel**: Usually labeled "Time"
|
||||
|
||||
Access indices via:
|
||||
- `flow_data.scatter_indices`
|
||||
- `flow_data.fluoro_indices`
|
||||
- `flow_data.time_index`
|
||||
|
||||
## Data Preprocessing
|
||||
|
||||
When calling `as_array(preprocess=True)`, FlowIO applies:
|
||||
|
||||
1. **Gain scaling**: Multiply by PnG value
|
||||
2. **Logarithmic transformation**: Apply PnE exponential transformation if present
|
||||
3. **Time scaling**: Convert time values to appropriate units
|
||||
|
||||
To access raw, unprocessed data: `as_array(preprocess=False)`
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Memory efficiency**: Use `only_text=True` when only metadata is needed
|
||||
2. **Error handling**: Wrap file operations in try-except blocks for FCSParsingError
|
||||
3. **Multi-dataset files**: Always use `read_multiple_data_sets()` if unsure about dataset count
|
||||
4. **Offset issues**: If encountering offset errors, try `ignore_offset_discrepancy=True`
|
||||
5. **Channel selection**: Use null_channel_list to exclude unwanted channels during parsing
|
||||
|
||||
## Integration with FlowKit
|
||||
|
||||
For advanced flow cytometry analysis including compensation, gating, and GatingML support, consider using FlowKit library alongside FlowIO. FlowKit provides higher-level abstractions built on top of FlowIO's file parsing capabilities.
|
||||
|
||||
## Example Workflows
|
||||
|
||||
### Basic File Reading
|
||||
|
||||
```python
|
||||
from flowio import FlowData
|
||||
|
||||
# Read FCS file
|
||||
flow = FlowData('experiment.fcs')
|
||||
|
||||
# Print basic info
|
||||
print(f"Version: {flow.version}")
|
||||
print(f"Events: {flow.event_count}")
|
||||
print(f"Channels: {flow.channel_count}")
|
||||
print(f"Channel names: {flow.pnn_labels}")
|
||||
|
||||
# Get event data
|
||||
events = flow.as_array()
|
||||
print(f"Data shape: {events.shape}")
|
||||
```
|
||||
|
||||
### Metadata Extraction
|
||||
|
||||
```python
|
||||
from flowio import FlowData
|
||||
|
||||
flow = FlowData('sample.fcs', only_text=True)
|
||||
|
||||
# Access metadata
|
||||
print(f"Acquisition date: {flow.text.get('$DATE', 'N/A')}")
|
||||
print(f"Instrument: {flow.text.get('$CYT', 'N/A')}")
|
||||
|
||||
# Channel information
|
||||
for i, (pnn, pns) in enumerate(zip(flow.pnn_labels, flow.pns_labels)):
|
||||
print(f"Channel {i}: {pnn} ({pns})")
|
||||
```
|
||||
|
||||
### Creating New FCS Files
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
from flowio import create_fcs
|
||||
|
||||
# Generate or process data
|
||||
data = np.random.rand(5000, 3) * 1000
|
||||
|
||||
# Define channels
|
||||
channels = ['FSC-A', 'SSC-A', 'FL1-A']
|
||||
stains = ['Forward Scatter', 'Side Scatter', 'GFP']
|
||||
|
||||
# Create FCS file
|
||||
create_fcs('output.fcs',
|
||||
data,
|
||||
channels,
|
||||
opt_channel_names=stains,
|
||||
metadata={
|
||||
'$SRC': 'Python script',
|
||||
'$DATE': '19-OCT-2025'
|
||||
})
|
||||
```
|
||||
|
||||
### Processing Multi-Dataset Files
|
||||
|
||||
```python
|
||||
from flowio import read_multiple_data_sets
|
||||
|
||||
# Read all datasets
|
||||
datasets = read_multiple_data_sets('multi.fcs')
|
||||
|
||||
# Process each dataset
|
||||
for i, dataset in enumerate(datasets):
|
||||
print(f"\nDataset {i}:")
|
||||
print(f" Events: {dataset.event_count}")
|
||||
print(f" Channels: {dataset.pnn_labels}")
|
||||
|
||||
# Get data array
|
||||
events = dataset.as_array()
|
||||
mean_values = events.mean(axis=0)
|
||||
print(f" Mean values: {mean_values}")
|
||||
```
|
||||
|
||||
### Modifying and Re-exporting
|
||||
|
||||
```python
|
||||
from flowio import FlowData
|
||||
|
||||
# Read original file
|
||||
flow = FlowData('original.fcs')
|
||||
|
||||
# Get event data
|
||||
events = flow.as_array(preprocess=False)
|
||||
|
||||
# Modify data (example: apply custom transformation)
|
||||
events[:, 0] = events[:, 0] * 1.5 # Scale first channel
|
||||
|
||||
# Note: Currently, FlowIO doesn't support direct modification of event data
|
||||
# For modifications, use create_fcs() instead:
|
||||
from flowio import create_fcs
|
||||
|
||||
create_fcs('modified.fcs',
|
||||
events,
|
||||
flow.pnn_labels,
|
||||
opt_channel_names=flow.pns_labels,
|
||||
metadata=flow.text)
|
||||
```
|
||||
Reference in New Issue
Block a user