Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions

View File

@@ -0,0 +1,633 @@
# Spectroscopy and Analytical Chemistry File Formats Reference
This reference covers file formats used in various spectroscopic techniques and analytical chemistry instrumentation.
## NMR Spectroscopy
### .fid - NMR Free Induction Decay
**Description:** Raw time-domain NMR data from Bruker, Agilent, JEOL
**Typical Data:** Complex time-domain signal
**Use Cases:** NMR spectroscopy, structure elucidation
**Python Libraries:**
- `nmrglue`: `nmrglue.bruker.read_fid('fid')` or `nmrglue.varian.read_fid('fid')`
- `nmrstarlib`: NMR data handling
**EDA Approach:**
- Time-domain signal decay
- Sampling rate and acquisition time
- Number of data points
- Signal-to-noise ratio estimation
- Baseline drift assessment
- Digital filter effects
- Acquisition parameter validation
- Apodization function selection
### .ft / .ft1 / .ft2 - NMR Frequency Domain
**Description:** Fourier-transformed NMR spectrum
**Typical Data:** Processed frequency-domain data
**Use Cases:** NMR analysis, peak integration
**Python Libraries:**
- `nmrglue`: Frequency domain reading
- Custom processing pipelines
**EDA Approach:**
- Peak picking and integration
- Chemical shift range
- Baseline correction quality
- Phase correction assessment
- Reference peak identification
- Spectral resolution
- Artifacts detection
- Multiplicity analysis
### .1r / .2rr - Bruker NMR Processed Data
**Description:** Bruker processed spectrum (real part)
**Typical Data:** 1D or 2D processed NMR spectra
**Use Cases:** NMR data analysis with Bruker software
**Python Libraries:**
- `nmrglue`: Bruker format support
**EDA Approach:**
- Processing parameters review
- Window function effects
- Zero-filling assessment
- Linear prediction validation
- Spectral artifacts
### .dx - NMR JCAMP-DX
**Description:** JCAMP-DX format for NMR
**Typical Data:** Standardized NMR spectrum
**Use Cases:** Data exchange between software
**Python Libraries:**
- `jcamp`: JCAMP reader
- `nmrglue`: Can import JCAMP
**EDA Approach:**
- Format compliance
- Metadata completeness
- Peak table validation
- Integration values
- Compound identification info
### .mnova - Mnova Format
**Description:** Mestrelab Research Mnova format
**Typical Data:** NMR data with processing info
**Use Cases:** Mnova software workflows
**Python Libraries:**
- `nmrglue`: Limited Mnova support
- Conversion tools to standard formats
**EDA Approach:**
- Multi-spectrum handling
- Processing pipeline review
- Quantification data
- Structure assignment
## Mass Spectrometry
### .mzML - Mass Spectrometry Markup Language
**Description:** Standard XML-based MS format
**Typical Data:** MS spectra, chromatograms, metadata
**Use Cases:** Proteomics, metabolomics, lipidomics
**Python Libraries:**
- `pymzml`: `pymzml.run.Reader('file.mzML')`
- `pyteomics.mzml`: `pyteomics.mzml.read('file.mzML')`
- `MSFileReader`: Various wrappers
**EDA Approach:**
- Scan count and MS level distribution
- Retention time range and TIC
- m/z range and resolution
- Precursor ion selection
- Fragmentation patterns
- Instrument configuration
- Quality control metrics
- Data completeness
### .mzXML - Mass Spectrometry XML
**Description:** Legacy XML MS format
**Typical Data:** Mass spectra and chromatograms
**Use Cases:** Proteomics workflows (older)
**Python Libraries:**
- `pyteomics.mzxml`
- `pymzml`: Can read mzXML
**EDA Approach:**
- Similar to mzML
- Version compatibility
- Conversion quality assessment
### .mzData - mzData Format
**Description:** Legacy PSI MS format
**Typical Data:** Mass spectrometry data
**Use Cases:** Legacy data archives
**Python Libraries:**
- `pyteomics`: Limited support
- Conversion to mzML recommended
**EDA Approach:**
- Format conversion validation
- Data completeness
- Metadata extraction
### .raw - Vendor Raw Files (Thermo, Agilent, Bruker)
**Description:** Proprietary instrument data
**Typical Data:** Raw mass spectra and metadata
**Use Cases:** Direct instrument output
**Python Libraries:**
- `pymsfilereader`: Thermo RAW files
- `ThermoRawFileParser`: CLI wrapper
- Vendor-specific APIs
**EDA Approach:**
- Method parameter extraction
- Instrument performance metrics
- Calibration status
- Scan function analysis
- MS/MS quality metrics
- Dynamic exclusion evaluation
### .d - Agilent Data Directory
**Description:** Agilent MS data folder
**Typical Data:** LC-MS, GC-MS with methods
**Use Cases:** Agilent MassHunter workflows
**Python Libraries:**
- Community parsers
- Chemstation integration
**EDA Approach:**
- Directory structure validation
- Method parameters
- Calibration curves
- Sequence metadata
- Signal quality metrics
### .wiff - AB SCIEX Data
**Description:** AB SCIEX/SCIEX instrument format
**Typical Data:** Mass spectrometry data
**Use Cases:** SCIEX instrument workflows
**Python Libraries:**
- Vendor SDKs (limited Python support)
- Conversion tools
**EDA Approach:**
- Experiment type identification
- Scan properties
- Quantitation data
- Multi-experiment structure
### .mgf - Mascot Generic Format
**Description:** Peak list format for MS/MS
**Typical Data:** Precursor and fragment masses
**Use Cases:** Peptide identification, database searches
**Python Libraries:**
- `pyteomics.mgf`: `pyteomics.mgf.read('file.mgf')`
- `pyopenms`: MGF support
**EDA Approach:**
- Spectrum count
- Charge state distribution
- Precursor m/z and intensity
- Fragment peak count
- Mass accuracy
- Title and metadata parsing
### .pkl - Peak List (Binary)
**Description:** Binary peak list format
**Typical Data:** Serialized MS/MS spectra
**Use Cases:** Software-specific storage
**Python Libraries:**
- `pickle`: Standard deserialization
- `pyteomics`: PKL support
**EDA Approach:**
- Data structure inspection
- Conversion to standard formats
- Metadata preservation
### .ms1 / .ms2 - MS1/MS2 Formats
**Description:** Simple text format for MS data
**Typical Data:** MS1 and MS2 scans
**Use Cases:** Database searching, proteomics
**Python Libraries:**
- `pyteomics.ms1` and `ms2`
- Simple text parsing
**EDA Approach:**
- Scan count by level
- Retention time series
- Charge state analysis
- m/z range coverage
### .pepXML - Peptide XML
**Description:** TPP peptide identification format
**Typical Data:** Peptide-spectrum matches
**Use Cases:** Proteomics search results
**Python Libraries:**
- `pyteomics.pepxml`
**EDA Approach:**
- Search result statistics
- Score distribution
- Modification analysis
- FDR assessment
- Enzyme specificity
### .protXML - Protein XML
**Description:** TPP protein inference format
**Typical Data:** Protein identifications
**Use Cases:** Proteomics protein-level results
**Python Libraries:**
- `pyteomics.protxml`
**EDA Approach:**
- Protein group analysis
- Coverage statistics
- Confidence scoring
- Parsimony analysis
### .msp - NIST MS Search Format
**Description:** NIST spectral library format
**Typical Data:** Reference mass spectra
**Use Cases:** Spectral library searching
**Python Libraries:**
- `matchms`: Spectral library handling
- Custom parsers
**EDA Approach:**
- Library size and coverage
- Metadata completeness
- Peak count statistics
- Compound annotation quality
## Infrared and Raman Spectroscopy
### .spc - Galactic SPC
**Description:** Thermo Galactic spectroscopy format
**Typical Data:** IR, Raman, UV-Vis spectra
**Use Cases:** Various spectroscopy instruments
**Python Libraries:**
- `spc`: `spc.File('file.spc')`
- `specio`: Multi-format reader
**EDA Approach:**
- Wavenumber/wavelength range
- Data point density
- Multi-spectrum handling
- Baseline characteristics
- Peak identification
- Absorbance/transmittance mode
- Instrument information
### .spa - Thermo Nicolet
**Description:** Thermo Fisher FTIR format
**Typical Data:** FTIR spectra
**Use Cases:** OMNIC software data
**Python Libraries:**
- Custom binary parsers
- Conversion to JCAMP or SPC
**EDA Approach:**
- Interferogram vs spectrum
- Background spectrum validation
- Atmospheric compensation
- Resolution and scan number
- Sample information
### .0 - Bruker OPUS
**Description:** Bruker OPUS FTIR format (numbered files)
**Typical Data:** FTIR spectra and metadata
**Use Cases:** Bruker FTIR instruments
**Python Libraries:**
- `brukeropusreader`: OPUS format parser
- `specio`: OPUS support
**EDA Approach:**
- Multiple block types (AB, ScSm, etc.)
- Sample and reference spectra
- Instrument parameters
- Optical path configuration
- Beam splitter and detector info
### .dpt - Data Point Table
**Description:** Simple XY data format
**Typical Data:** Generic spectroscopic data
**Use Cases:** Renishaw Raman, generic exports
**Python Libraries:**
- `pandas`: CSV-like reading
- Text parsing
**EDA Approach:**
- X-axis type (wavelength, wavenumber, Raman shift)
- Y-axis units (intensity, absorbance, etc.)
- Data point spacing
- Header information
- Multi-column data handling
### .wdf - Renishaw Raman
**Description:** Renishaw WiRE data format
**Typical Data:** Raman spectra and maps
**Use Cases:** Renishaw Raman microscopy
**Python Libraries:**
- `renishawWiRE`: WDF reader
- Custom parsers for WDF format
**EDA Approach:**
- Spectral vs mapping data
- Laser wavelength
- Accumulation and exposure time
- Spatial coordinates (mapping)
- Z-scan data
- Baseline and cosmic ray correction
### .txt (Spectroscopy)
**Description:** Generic text export from instruments
**Typical Data:** Wavelength/wavenumber and intensity
**Use Cases:** Universal data exchange
**Python Libraries:**
- `pandas`: Text file reading
- `numpy`: Simple array loading
**EDA Approach:**
- Delimiter and format detection
- Header parsing
- Units identification
- Multiple spectrum handling
- Metadata extraction from comments
## UV-Visible Spectroscopy
### .asd / .asc - ASD Binary/ASCII
**Description:** ASD FieldSpec spectroradiometer
**Typical Data:** Hyperspectral UV-Vis-NIR data
**Use Cases:** Remote sensing, reflectance spectroscopy
**Python Libraries:**
- `spectral.io.asd`: ASD format support
- Custom parsers
**EDA Approach:**
- Wavelength range (UV to NIR)
- Reference spectrum validation
- Dark current correction
- Integration time
- GPS metadata (if present)
- Reflectance vs radiance
### .sp - Perkin Elmer
**Description:** Perkin Elmer UV/Vis format
**Typical Data:** UV-Vis spectrophotometer data
**Use Cases:** PE Lambda instruments
**Python Libraries:**
- Custom parsers
- Conversion to standard formats
**EDA Approach:**
- Scan parameters
- Baseline correction
- Multi-wavelength scans
- Time-based measurements
- Sample/reference handling
### .csv (Spectroscopy)
**Description:** CSV export from UV-Vis instruments
**Typical Data:** Wavelength and absorbance/transmittance
**Use Cases:** Universal format for UV-Vis data
**Python Libraries:**
- `pandas`: Native CSV support
**EDA Approach:**
- Lambda max identification
- Beer's law compliance
- Baseline offset
- Path length correction
- Concentration calculations
## X-ray and Diffraction
### .cif - Crystallographic Information File
**Description:** Crystal structure and diffraction data
**Typical Data:** Unit cell, atomic positions, structure factors
**Use Cases:** Crystallography, materials science
**Python Libraries:**
- `gemmi`: `gemmi.cif.read_file('file.cif')`
- `PyCifRW`: CIF reading/writing
- `pymatgen`: Materials structure analysis
**EDA Approach:**
- Crystal system and space group
- Unit cell parameters
- Atomic positions and occupancy
- Thermal parameters
- R-factors and refinement quality
- Completeness and redundancy
- Structure validation
### .hkl - Reflection Data
**Description:** Miller indices and intensities
**Typical Data:** Integrated diffraction intensities
**Use Cases:** Crystallographic refinement
**Python Libraries:**
- Custom parsers (format dependent)
- Crystallography packages (CCP4, etc.)
**EDA Approach:**
- Resolution range
- Completeness by shell
- I/sigma distribution
- Systematic absences
- Twinning detection
- Wilson plot
### .mtz - MTZ Format (CCP4)
**Description:** Binary crystallographic data
**Typical Data:** Reflections, phases, structure factors
**Use Cases:** Macromolecular crystallography
**Python Libraries:**
- `gemmi`: MTZ support
- `cctbx`: Comprehensive crystallography
**EDA Approach:**
- Column types and data
- Resolution limits
- R-factors (Rwork, Rfree)
- Phase probability distribution
- Map coefficients
- Batch information
### .xy / .xye - Powder Diffraction
**Description:** 2-theta vs intensity data
**Typical Data:** Powder X-ray diffraction patterns
**Use Cases:** Phase identification, Rietveld refinement
**Python Libraries:**
- `pandas`: Simple XY reading
- `pymatgen`: XRD pattern analysis
**EDA Approach:**
- 2-theta range
- Peak positions and intensities
- Background modeling
- Peak width analysis (strain/size)
- Phase identification via matching
- Preferred orientation effects
### .raw (XRD)
**Description:** Vendor-specific XRD raw data
**Typical Data:** XRD patterns with metadata
**Use Cases:** Bruker, PANalytical, Rigaku instruments
**Python Libraries:**
- Vendor-specific parsers
- Conversion tools
**EDA Approach:**
- Scan parameters (step size, time)
- Sample alignment
- Incident beam setup
- Detector configuration
- Background scan validation
### .gsa / .gsas - GSAS Format
**Description:** General Structure Analysis System
**Typical Data:** Powder diffraction for Rietveld
**Use Cases:** Rietveld refinement
**Python Libraries:**
- GSAS-II Python interface
- Custom parsers
**EDA Approach:**
- Histogram data
- Instrument parameters
- Phase information
- Refinement constraints
- Profile function parameters
## Electron Spectroscopy
### .vms - VG Scienta
**Description:** VG Scienta spectrometer format
**Typical Data:** XPS, UPS, ARPES spectra
**Use Cases:** Photoelectron spectroscopy
**Python Libraries:**
- Custom parsers for VMS
- `specio`: Multi-format support
**EDA Approach:**
- Binding energy calibration
- Pass energy and resolution
- Photoelectron line identification
- Satellite peak analysis
- Background subtraction quality
- Fermi edge position
### .spe - WinSpec/SPE Format
**Description:** Princeton Instruments/Roper Scientific
**Typical Data:** CCD spectra, Raman, PL
**Use Cases:** Spectroscopy with CCD detectors
**Python Libraries:**
- `spe2py`: SPE file reader
- `spe_loader`: Alternative parser
**EDA Approach:**
- CCD frame analysis
- Wavelength calibration
- Dark frame subtraction
- Cosmic ray identification
- Readout noise
- Accumulation statistics
### .pxt - Princeton PTI
**Description:** Photon Technology International
**Typical Data:** Fluorescence, phosphorescence spectra
**Use Cases:** Fluorescence spectroscopy
**Python Libraries:**
- Custom parsers
- Text-based format variants
**EDA Approach:**
- Excitation and emission spectra
- Quantum yield calculations
- Time-resolved measurements
- Temperature-dependent data
- Correction factors applied
### .dat (Spectroscopy Generic)
**Description:** Generic binary or text spectroscopy data
**Typical Data:** Various spectroscopic measurements
**Use Cases:** Many instruments use .dat extension
**Python Libraries:**
- Format-specific identification needed
- `numpy`, `pandas` for known formats
**EDA Approach:**
- Format detection (binary vs text)
- Header identification
- Data structure inference
- Units and axis labels
- Instrument signature detection
## Chromatography
### .chrom - Chromatogram Data
**Description:** Generic chromatography format
**Typical Data:** Retention time vs signal
**Use Cases:** HPLC, GC, LC-MS
**Python Libraries:**
- Vendor-specific parsers
- `pandas` for text exports
**EDA Approach:**
- Retention time range
- Peak detection and integration
- Baseline drift
- Resolution between peaks
- Signal-to-noise ratio
- Tailing factor
### .ch - ChemStation
**Description:** Agilent ChemStation format
**Typical Data:** Chromatograms and method parameters
**Use Cases:** Agilent HPLC and GC systems
**Python Libraries:**
- `agilent-chemstation`: Community tools
- Binary format parsers
**EDA Approach:**
- Method validation
- Integration parameters
- Calibration curve
- Sample sequence information
- Instrument status
### .arw - Empower (Waters)
**Description:** Waters Empower format
**Typical Data:** UPLC/HPLC chromatograms
**Use Cases:** Waters instrument data
**Python Libraries:**
- Vendor tools (limited Python access)
- Database extraction tools
**EDA Approach:**
- Audit trail information
- Processing methods
- Compound identification
- Quantitation results
- System suitability tests
### .lcd - Shimadzu LabSolutions
**Description:** Shimadzu chromatography format
**Typical Data:** GC/HPLC data
**Use Cases:** Shimadzu instruments
**Python Libraries:**
- Vendor-specific parsers
**EDA Approach:**
- Method parameters
- Peak purity analysis
- Spectral data (if PDA)
- Quantitative results
## Other Analytical Techniques
### .dta - DSC/TGA Data
**Description:** Thermal analysis data (TA Instruments)
**Typical Data:** Temperature vs heat flow or mass
**Use Cases:** Differential scanning calorimetry, thermogravimetry
**Python Libraries:**
- Custom parsers for TA formats
- `pandas` for exported data
**EDA Approach:**
- Transition temperature identification
- Enthalpy calculations
- Mass loss steps
- Heating rate effects
- Baseline determination
- Purity assessment
### .run - ICP-MS/ICP-OES
**Description:** Elemental analysis data
**Typical Data:** Element concentrations or counts
**Use Cases:** Inductively coupled plasma MS/OES
**Python Libraries:**
- Vendor-specific tools
- Custom parsers
**EDA Approach:**
- Element detection and quantitation
- Internal standard performance
- Spike recovery
- Dilution factor corrections
- Isotope ratios
- LOD/LOQ calculations
### .exp - Electrochemistry Data
**Description:** Electrochemical experiment data
**Typical Data:** Potential vs current or charge
**Use Cases:** Cyclic voltammetry, chronoamperometry
**Python Libraries:**
- Custom parsers per instrument (CHI, Gamry, etc.)
- `galvani`: Biologic EC-Lab files
**EDA Approach:**
- Redox peak identification
- Peak potential and current
- Scan rate effects
- Electron transfer kinetics
- Background subtraction
- Capacitance calculations