Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/flowio/references/api_reference.md
+++ b/skills/flowio/references/api_reference.md
@@ -0,0 +1,372 @@
+# FlowIO API Reference
+
+## Overview
+
+FlowIO is a Python library for reading and writing Flow Cytometry Standard (FCS) files. It supports FCS versions 2.0, 3.0, and 3.1 with minimal dependencies.
+
+## Installation
+
+```bash
+pip install flowio
+```
+
+Supports Python 3.9 and later.
+
+## Core Classes
+
+### FlowData
+
+The primary class for working with FCS files.
+
+#### Constructor
+
+```python
+FlowData(fcs_file,
+         ignore_offset_error=False,
+         ignore_offset_discrepancy=False,
+         use_header_offsets=False,
+         only_text=False,
+         nextdata_offset=None,
+         null_channel_list=None)
+```
+
+**Parameters:**
+- `fcs_file`: File path (str), Path object, or file handle
+- `ignore_offset_error` (bool): Ignore offset errors (default: False)
+- `ignore_offset_discrepancy` (bool): Ignore offset discrepancies between HEADER and TEXT sections (default: False)
+- `use_header_offsets` (bool): Use HEADER section offsets instead of TEXT section (default: False)
+- `only_text` (bool): Only parse the TEXT segment, skip DATA and ANALYSIS (default: False)
+- `nextdata_offset` (int): Byte offset for reading multi-dataset files
+- `null_channel_list` (list): List of PnN labels for null channels to exclude
+
+#### Attributes
+
+**File Information:**
+- `name`: Name of the FCS file
+- `file_size`: Size of the file in bytes
+- `version`: FCS version (e.g., '3.0', '3.1')
+- `header`: Dictionary containing HEADER segment information
+- `data_type`: Type of data format ('I', 'F', 'D', 'A')
+
+**Channel Information:**
+- `channel_count`: Number of channels in the dataset
+- `channels`: Dictionary mapping channel numbers to channel info
+- `pnn_labels`: List of PnN (short channel name) labels
+- `pns_labels`: List of PnS (descriptive stain name) labels
+- `pnr_values`: List of PnR (range) values for each channel
+- `fluoro_indices`: List of indices for fluorescence channels
+- `scatter_indices`: List of indices for scatter channels
+- `time_index`: Index of the time channel (or None)
+- `null_channels`: List of null channel indices
+
+**Event Data:**
+- `event_count`: Number of events (rows) in the dataset
+- `events`: Raw event data as bytes
+
+**Metadata:**
+- `text`: Dictionary of TEXT segment key-value pairs
+- `analysis`: Dictionary of ANALYSIS segment key-value pairs (if present)
+
+#### Methods
+
+##### as_array()
+
+```python
+as_array(preprocess=True)
+```
+
+Return event data as a 2-D NumPy array.
+
+**Parameters:**
+- `preprocess` (bool): Apply gain, logarithmic, and time scaling transformations (default: True)
+
+**Returns:**
+- NumPy ndarray with shape (event_count, channel_count)
+
+**Example:**
+```python
+flow_data = FlowData('sample.fcs')
+events_array = flow_data.as_array()  # Preprocessed data
+raw_array = flow_data.as_array(preprocess=False)  # Raw data
+```
+
+##### write_fcs()
+
+```python
+write_fcs(filename, metadata=None)
+```
+
+Export the FlowData instance as a new FCS file.
+
+**Parameters:**
+- `filename` (str): Output file path
+- `metadata` (dict): Optional dictionary of TEXT segment keywords to add/update
+
+**Example:**
+```python
+flow_data = FlowData('sample.fcs')
+flow_data.write_fcs('output.fcs', metadata={'$SRC': 'Modified data'})
+```
+
+**Note:** Exports as FCS 3.1 with single-precision floating-point data.
+
+## Utility Functions
+
+### read_multiple_data_sets()
+
+```python
+read_multiple_data_sets(fcs_file,
+                        ignore_offset_error=False,
+                        ignore_offset_discrepancy=False,
+                        use_header_offsets=False)
+```
+
+Read all datasets from an FCS file containing multiple datasets.
+
+**Parameters:**
+- Same as FlowData constructor (except `nextdata_offset`)
+
+**Returns:**
+- List of FlowData instances, one for each dataset
+
+**Example:**
+```python
+from flowio import read_multiple_data_sets
+
+datasets = read_multiple_data_sets('multi_dataset.fcs')
+print(f"Found {len(datasets)} datasets")
+for i, dataset in enumerate(datasets):
+    print(f"Dataset {i}: {dataset.event_count} events")
+```
+
+### create_fcs()
+
+```python
+create_fcs(filename,
+           event_data,
+           channel_names,
+           opt_channel_names=None,
+           metadata=None)
+```
+
+Create a new FCS file from event data.
+
+**Parameters:**
+- `filename` (str): Output file path
+- `event_data` (ndarray): 2-D NumPy array of event data (rows=events, columns=channels)
+- `channel_names` (list): List of PnN (short) channel names
+- `opt_channel_names` (list): Optional list of PnS (descriptive) channel names
+- `metadata` (dict): Optional dictionary of TEXT segment keywords
+
+**Example:**
+```python
+import numpy as np
+from flowio import create_fcs
+
+# Create synthetic data
+events = np.random.rand(10000, 5)
+channels = ['FSC-A', 'SSC-A', 'FL1-A', 'FL2-A', 'Time']
+opt_channels = ['Forward Scatter', 'Side Scatter', 'FITC', 'PE', 'Time']
+
+create_fcs('synthetic.fcs',
+           events,
+           channels,
+           opt_channel_names=opt_channels,
+           metadata={'$SRC': 'Synthetic data'})
+```
+
+## Exception Classes
+
+### FlowIOWarning
+
+Generic warning class for non-critical issues.
+
+### PnEWarning
+
+Warning raised when PnE values are invalid during FCS file creation.
+
+### FlowIOException
+
+Base exception class for FlowIO errors.
+
+### FCSParsingError
+
+Raised when there are issues parsing an FCS file.
+
+### DataOffsetDiscrepancyError
+
+Raised when the HEADER and TEXT sections provide different byte offsets for data segments.
+
+**Workaround:** Use `ignore_offset_discrepancy=True` parameter when creating FlowData instance.
+
+### MultipleDataSetsError
+
+Raised when attempting to read a file with multiple datasets using the standard FlowData constructor.
+
+**Solution:** Use `read_multiple_data_sets()` function instead.
+
+## FCS File Structure Reference
+
+FCS files consist of four segments:
+
+1. **HEADER**: Contains FCS version and byte locations of other segments
+2. **TEXT**: Key-value metadata pairs (delimited format)
+3. **DATA**: Raw event data (binary, floating-point, or ASCII)
+4. **ANALYSIS** (optional): Results from data processing
+
+### Common TEXT Segment Keywords
+
+- `$BEGINDATA`, `$ENDDATA`: Byte offsets for DATA segment
+- `$BEGINANALYSIS`, `$ENDANALYSIS`: Byte offsets for ANALYSIS segment
+- `$BYTEORD`: Byte order (1,2,3,4 for little-endian; 4,3,2,1 for big-endian)
+- `$DATATYPE`: Data type ('I'=integer, 'F'=float, 'D'=double, 'A'=ASCII)
+- `$MODE`: Data mode ('L'=list mode, most common)
+- `$NEXTDATA`: Offset to next dataset (0 if single dataset)
+- `$PAR`: Number of parameters (channels)
+- `$TOT`: Total number of events
+- `PnN`: Short name for parameter n
+- `PnS`: Descriptive stain name for parameter n
+- `PnR`: Range (max value) for parameter n
+- `PnE`: Amplification exponent for parameter n (format: "a,b" where value = a * 10^(b*x))
+- `PnG`: Amplification gain for parameter n
+
+## Channel Types
+
+FlowIO automatically categorizes channels:
+
+- **Scatter channels**: FSC (forward scatter), SSC (side scatter)
+- **Fluorescence channels**: FL1, FL2, FITC, PE, etc.
+- **Time channel**: Usually labeled "Time"
+
+Access indices via:
+- `flow_data.scatter_indices`
+- `flow_data.fluoro_indices`
+- `flow_data.time_index`
+
+## Data Preprocessing
+
+When calling `as_array(preprocess=True)`, FlowIO applies:
+
+1. **Gain scaling**: Multiply by PnG value
+2. **Logarithmic transformation**: Apply PnE exponential transformation if present
+3. **Time scaling**: Convert time values to appropriate units
+
+To access raw, unprocessed data: `as_array(preprocess=False)`
+
+## Best Practices
+
+1. **Memory efficiency**: Use `only_text=True` when only metadata is needed
+2. **Error handling**: Wrap file operations in try-except blocks for FCSParsingError
+3. **Multi-dataset files**: Always use `read_multiple_data_sets()` if unsure about dataset count
+4. **Offset issues**: If encountering offset errors, try `ignore_offset_discrepancy=True`
+5. **Channel selection**: Use null_channel_list to exclude unwanted channels during parsing
+
+## Integration with FlowKit
+
+For advanced flow cytometry analysis including compensation, gating, and GatingML support, consider using FlowKit library alongside FlowIO. FlowKit provides higher-level abstractions built on top of FlowIO's file parsing capabilities.
+
+## Example Workflows
+
+### Basic File Reading
+
+```python
+from flowio import FlowData
+
+# Read FCS file
+flow = FlowData('experiment.fcs')
+
+# Print basic info
+print(f"Version: {flow.version}")
+print(f"Events: {flow.event_count}")
+print(f"Channels: {flow.channel_count}")
+print(f"Channel names: {flow.pnn_labels}")
+
+# Get event data
+events = flow.as_array()
+print(f"Data shape: {events.shape}")
+```
+
+### Metadata Extraction
+
+```python
+from flowio import FlowData
+
+flow = FlowData('sample.fcs', only_text=True)
+
+# Access metadata
+print(f"Acquisition date: {flow.text.get('$DATE', 'N/A')}")
+print(f"Instrument: {flow.text.get('$CYT', 'N/A')}")
+
+# Channel information
+for i, (pnn, pns) in enumerate(zip(flow.pnn_labels, flow.pns_labels)):
+    print(f"Channel {i}: {pnn} ({pns})")
+```
+
+### Creating New FCS Files
+
+```python
+import numpy as np
+from flowio import create_fcs
+
+# Generate or process data
+data = np.random.rand(5000, 3) * 1000
+
+# Define channels
+channels = ['FSC-A', 'SSC-A', 'FL1-A']
+stains = ['Forward Scatter', 'Side Scatter', 'GFP']
+
+# Create FCS file
+create_fcs('output.fcs',
+           data,
+           channels,
+           opt_channel_names=stains,
+           metadata={
+               '$SRC': 'Python script',
+               '$DATE': '19-OCT-2025'
+           })
+```
+
+### Processing Multi-Dataset Files
+
+```python
+from flowio import read_multiple_data_sets
+
+# Read all datasets
+datasets = read_multiple_data_sets('multi.fcs')
+
+# Process each dataset
+for i, dataset in enumerate(datasets):
+    print(f"\nDataset {i}:")
+    print(f"  Events: {dataset.event_count}")
+    print(f"  Channels: {dataset.pnn_labels}")
+
+    # Get data array
+    events = dataset.as_array()
+    mean_values = events.mean(axis=0)
+    print(f"  Mean values: {mean_values}")
+```
+
+### Modifying and Re-exporting
+
+```python
+from flowio import FlowData
+
+# Read original file
+flow = FlowData('original.fcs')
+
+# Get event data
+events = flow.as_array(preprocess=False)
+
+# Modify data (example: apply custom transformation)
+events[:, 0] = events[:, 0] * 1.5  # Scale first channel
+
+# Note: Currently, FlowIO doesn't support direct modification of event data
+# For modifications, use create_fcs() instead:
+from flowio import create_fcs
+
+create_fcs('modified.fcs',
+           events,
+           flow.pnn_labels,
+           opt_channel_names=flow.pns_labels,
+           metadata=flow.text)
+```