18 KiB
Spectroscopy and Analytical Chemistry File Formats Reference
This reference covers file formats used in various spectroscopic techniques and analytical chemistry instrumentation.
NMR Spectroscopy
.fid - NMR Free Induction Decay
Description: Raw time-domain NMR data from Bruker, Agilent, JEOL Typical Data: Complex time-domain signal Use Cases: NMR spectroscopy, structure elucidation Python Libraries:
nmrglue:nmrglue.bruker.read_fid('fid')ornmrglue.varian.read_fid('fid')nmrstarlib: NMR data handling EDA Approach:- Time-domain signal decay
- Sampling rate and acquisition time
- Number of data points
- Signal-to-noise ratio estimation
- Baseline drift assessment
- Digital filter effects
- Acquisition parameter validation
- Apodization function selection
.ft / .ft1 / .ft2 - NMR Frequency Domain
Description: Fourier-transformed NMR spectrum Typical Data: Processed frequency-domain data Use Cases: NMR analysis, peak integration Python Libraries:
nmrglue: Frequency domain reading- Custom processing pipelines EDA Approach:
- Peak picking and integration
- Chemical shift range
- Baseline correction quality
- Phase correction assessment
- Reference peak identification
- Spectral resolution
- Artifacts detection
- Multiplicity analysis
.1r / .2rr - Bruker NMR Processed Data
Description: Bruker processed spectrum (real part) Typical Data: 1D or 2D processed NMR spectra Use Cases: NMR data analysis with Bruker software Python Libraries:
nmrglue: Bruker format support EDA Approach:- Processing parameters review
- Window function effects
- Zero-filling assessment
- Linear prediction validation
- Spectral artifacts
.dx - NMR JCAMP-DX
Description: JCAMP-DX format for NMR Typical Data: Standardized NMR spectrum Use Cases: Data exchange between software Python Libraries:
jcamp: JCAMP readernmrglue: Can import JCAMP EDA Approach:- Format compliance
- Metadata completeness
- Peak table validation
- Integration values
- Compound identification info
.mnova - Mnova Format
Description: Mestrelab Research Mnova format Typical Data: NMR data with processing info Use Cases: Mnova software workflows Python Libraries:
nmrglue: Limited Mnova support- Conversion tools to standard formats EDA Approach:
- Multi-spectrum handling
- Processing pipeline review
- Quantification data
- Structure assignment
Mass Spectrometry
.mzML - Mass Spectrometry Markup Language
Description: Standard XML-based MS format Typical Data: MS spectra, chromatograms, metadata Use Cases: Proteomics, metabolomics, lipidomics Python Libraries:
pymzml:pymzml.run.Reader('file.mzML')pyteomics.mzml:pyteomics.mzml.read('file.mzML')MSFileReader: Various wrappers EDA Approach:- Scan count and MS level distribution
- Retention time range and TIC
- m/z range and resolution
- Precursor ion selection
- Fragmentation patterns
- Instrument configuration
- Quality control metrics
- Data completeness
.mzXML - Mass Spectrometry XML
Description: Legacy XML MS format Typical Data: Mass spectra and chromatograms Use Cases: Proteomics workflows (older) Python Libraries:
pyteomics.mzxmlpymzml: Can read mzXML EDA Approach:- Similar to mzML
- Version compatibility
- Conversion quality assessment
.mzData - mzData Format
Description: Legacy PSI MS format Typical Data: Mass spectrometry data Use Cases: Legacy data archives Python Libraries:
pyteomics: Limited support- Conversion to mzML recommended EDA Approach:
- Format conversion validation
- Data completeness
- Metadata extraction
.raw - Vendor Raw Files (Thermo, Agilent, Bruker)
Description: Proprietary instrument data Typical Data: Raw mass spectra and metadata Use Cases: Direct instrument output Python Libraries:
pymsfilereader: Thermo RAW filesThermoRawFileParser: CLI wrapper- Vendor-specific APIs EDA Approach:
- Method parameter extraction
- Instrument performance metrics
- Calibration status
- Scan function analysis
- MS/MS quality metrics
- Dynamic exclusion evaluation
.d - Agilent Data Directory
Description: Agilent MS data folder Typical Data: LC-MS, GC-MS with methods Use Cases: Agilent MassHunter workflows Python Libraries:
- Community parsers
- Chemstation integration EDA Approach:
- Directory structure validation
- Method parameters
- Calibration curves
- Sequence metadata
- Signal quality metrics
.wiff - AB SCIEX Data
Description: AB SCIEX/SCIEX instrument format Typical Data: Mass spectrometry data Use Cases: SCIEX instrument workflows Python Libraries:
- Vendor SDKs (limited Python support)
- Conversion tools EDA Approach:
- Experiment type identification
- Scan properties
- Quantitation data
- Multi-experiment structure
.mgf - Mascot Generic Format
Description: Peak list format for MS/MS Typical Data: Precursor and fragment masses Use Cases: Peptide identification, database searches Python Libraries:
pyteomics.mgf:pyteomics.mgf.read('file.mgf')pyopenms: MGF support EDA Approach:- Spectrum count
- Charge state distribution
- Precursor m/z and intensity
- Fragment peak count
- Mass accuracy
- Title and metadata parsing
.pkl - Peak List (Binary)
Description: Binary peak list format Typical Data: Serialized MS/MS spectra Use Cases: Software-specific storage Python Libraries:
pickle: Standard deserializationpyteomics: PKL support EDA Approach:- Data structure inspection
- Conversion to standard formats
- Metadata preservation
.ms1 / .ms2 - MS1/MS2 Formats
Description: Simple text format for MS data Typical Data: MS1 and MS2 scans Use Cases: Database searching, proteomics Python Libraries:
pyteomics.ms1andms2- Simple text parsing EDA Approach:
- Scan count by level
- Retention time series
- Charge state analysis
- m/z range coverage
.pepXML - Peptide XML
Description: TPP peptide identification format Typical Data: Peptide-spectrum matches Use Cases: Proteomics search results Python Libraries:
pyteomics.pepxmlEDA Approach:- Search result statistics
- Score distribution
- Modification analysis
- FDR assessment
- Enzyme specificity
.protXML - Protein XML
Description: TPP protein inference format Typical Data: Protein identifications Use Cases: Proteomics protein-level results Python Libraries:
pyteomics.protxmlEDA Approach:- Protein group analysis
- Coverage statistics
- Confidence scoring
- Parsimony analysis
.msp - NIST MS Search Format
Description: NIST spectral library format Typical Data: Reference mass spectra Use Cases: Spectral library searching Python Libraries:
matchms: Spectral library handling- Custom parsers EDA Approach:
- Library size and coverage
- Metadata completeness
- Peak count statistics
- Compound annotation quality
Infrared and Raman Spectroscopy
.spc - Galactic SPC
Description: Thermo Galactic spectroscopy format Typical Data: IR, Raman, UV-Vis spectra Use Cases: Various spectroscopy instruments Python Libraries:
spc:spc.File('file.spc')specio: Multi-format reader EDA Approach:- Wavenumber/wavelength range
- Data point density
- Multi-spectrum handling
- Baseline characteristics
- Peak identification
- Absorbance/transmittance mode
- Instrument information
.spa - Thermo Nicolet
Description: Thermo Fisher FTIR format Typical Data: FTIR spectra Use Cases: OMNIC software data Python Libraries:
- Custom binary parsers
- Conversion to JCAMP or SPC EDA Approach:
- Interferogram vs spectrum
- Background spectrum validation
- Atmospheric compensation
- Resolution and scan number
- Sample information
.0 - Bruker OPUS
Description: Bruker OPUS FTIR format (numbered files) Typical Data: FTIR spectra and metadata Use Cases: Bruker FTIR instruments Python Libraries:
brukeropusreader: OPUS format parserspecio: OPUS support EDA Approach:- Multiple block types (AB, ScSm, etc.)
- Sample and reference spectra
- Instrument parameters
- Optical path configuration
- Beam splitter and detector info
.dpt - Data Point Table
Description: Simple XY data format Typical Data: Generic spectroscopic data Use Cases: Renishaw Raman, generic exports Python Libraries:
pandas: CSV-like reading- Text parsing EDA Approach:
- X-axis type (wavelength, wavenumber, Raman shift)
- Y-axis units (intensity, absorbance, etc.)
- Data point spacing
- Header information
- Multi-column data handling
.wdf - Renishaw Raman
Description: Renishaw WiRE data format Typical Data: Raman spectra and maps Use Cases: Renishaw Raman microscopy Python Libraries:
renishawWiRE: WDF reader- Custom parsers for WDF format EDA Approach:
- Spectral vs mapping data
- Laser wavelength
- Accumulation and exposure time
- Spatial coordinates (mapping)
- Z-scan data
- Baseline and cosmic ray correction
.txt (Spectroscopy)
Description: Generic text export from instruments Typical Data: Wavelength/wavenumber and intensity Use Cases: Universal data exchange Python Libraries:
pandas: Text file readingnumpy: Simple array loading EDA Approach:- Delimiter and format detection
- Header parsing
- Units identification
- Multiple spectrum handling
- Metadata extraction from comments
UV-Visible Spectroscopy
.asd / .asc - ASD Binary/ASCII
Description: ASD FieldSpec spectroradiometer Typical Data: Hyperspectral UV-Vis-NIR data Use Cases: Remote sensing, reflectance spectroscopy Python Libraries:
spectral.io.asd: ASD format support- Custom parsers EDA Approach:
- Wavelength range (UV to NIR)
- Reference spectrum validation
- Dark current correction
- Integration time
- GPS metadata (if present)
- Reflectance vs radiance
.sp - Perkin Elmer
Description: Perkin Elmer UV/Vis format Typical Data: UV-Vis spectrophotometer data Use Cases: PE Lambda instruments Python Libraries:
- Custom parsers
- Conversion to standard formats EDA Approach:
- Scan parameters
- Baseline correction
- Multi-wavelength scans
- Time-based measurements
- Sample/reference handling
.csv (Spectroscopy)
Description: CSV export from UV-Vis instruments Typical Data: Wavelength and absorbance/transmittance Use Cases: Universal format for UV-Vis data Python Libraries:
pandas: Native CSV support EDA Approach:- Lambda max identification
- Beer's law compliance
- Baseline offset
- Path length correction
- Concentration calculations
X-ray and Diffraction
.cif - Crystallographic Information File
Description: Crystal structure and diffraction data Typical Data: Unit cell, atomic positions, structure factors Use Cases: Crystallography, materials science Python Libraries:
gemmi:gemmi.cif.read_file('file.cif')PyCifRW: CIF reading/writingpymatgen: Materials structure analysis EDA Approach:- Crystal system and space group
- Unit cell parameters
- Atomic positions and occupancy
- Thermal parameters
- R-factors and refinement quality
- Completeness and redundancy
- Structure validation
.hkl - Reflection Data
Description: Miller indices and intensities Typical Data: Integrated diffraction intensities Use Cases: Crystallographic refinement Python Libraries:
- Custom parsers (format dependent)
- Crystallography packages (CCP4, etc.) EDA Approach:
- Resolution range
- Completeness by shell
- I/sigma distribution
- Systematic absences
- Twinning detection
- Wilson plot
.mtz - MTZ Format (CCP4)
Description: Binary crystallographic data Typical Data: Reflections, phases, structure factors Use Cases: Macromolecular crystallography Python Libraries:
gemmi: MTZ supportcctbx: Comprehensive crystallography EDA Approach:- Column types and data
- Resolution limits
- R-factors (Rwork, Rfree)
- Phase probability distribution
- Map coefficients
- Batch information
.xy / .xye - Powder Diffraction
Description: 2-theta vs intensity data Typical Data: Powder X-ray diffraction patterns Use Cases: Phase identification, Rietveld refinement Python Libraries:
pandas: Simple XY readingpymatgen: XRD pattern analysis EDA Approach:- 2-theta range
- Peak positions and intensities
- Background modeling
- Peak width analysis (strain/size)
- Phase identification via matching
- Preferred orientation effects
.raw (XRD)
Description: Vendor-specific XRD raw data Typical Data: XRD patterns with metadata Use Cases: Bruker, PANalytical, Rigaku instruments Python Libraries:
- Vendor-specific parsers
- Conversion tools EDA Approach:
- Scan parameters (step size, time)
- Sample alignment
- Incident beam setup
- Detector configuration
- Background scan validation
.gsa / .gsas - GSAS Format
Description: General Structure Analysis System Typical Data: Powder diffraction for Rietveld Use Cases: Rietveld refinement Python Libraries:
- GSAS-II Python interface
- Custom parsers EDA Approach:
- Histogram data
- Instrument parameters
- Phase information
- Refinement constraints
- Profile function parameters
Electron Spectroscopy
.vms - VG Scienta
Description: VG Scienta spectrometer format Typical Data: XPS, UPS, ARPES spectra Use Cases: Photoelectron spectroscopy Python Libraries:
- Custom parsers for VMS
specio: Multi-format support EDA Approach:- Binding energy calibration
- Pass energy and resolution
- Photoelectron line identification
- Satellite peak analysis
- Background subtraction quality
- Fermi edge position
.spe - WinSpec/SPE Format
Description: Princeton Instruments/Roper Scientific Typical Data: CCD spectra, Raman, PL Use Cases: Spectroscopy with CCD detectors Python Libraries:
spe2py: SPE file readerspe_loader: Alternative parser EDA Approach:- CCD frame analysis
- Wavelength calibration
- Dark frame subtraction
- Cosmic ray identification
- Readout noise
- Accumulation statistics
.pxt - Princeton PTI
Description: Photon Technology International Typical Data: Fluorescence, phosphorescence spectra Use Cases: Fluorescence spectroscopy Python Libraries:
- Custom parsers
- Text-based format variants EDA Approach:
- Excitation and emission spectra
- Quantum yield calculations
- Time-resolved measurements
- Temperature-dependent data
- Correction factors applied
.dat (Spectroscopy Generic)
Description: Generic binary or text spectroscopy data Typical Data: Various spectroscopic measurements Use Cases: Many instruments use .dat extension Python Libraries:
- Format-specific identification needed
numpy,pandasfor known formats EDA Approach:- Format detection (binary vs text)
- Header identification
- Data structure inference
- Units and axis labels
- Instrument signature detection
Chromatography
.chrom - Chromatogram Data
Description: Generic chromatography format Typical Data: Retention time vs signal Use Cases: HPLC, GC, LC-MS Python Libraries:
- Vendor-specific parsers
pandasfor text exports EDA Approach:- Retention time range
- Peak detection and integration
- Baseline drift
- Resolution between peaks
- Signal-to-noise ratio
- Tailing factor
.ch - ChemStation
Description: Agilent ChemStation format Typical Data: Chromatograms and method parameters Use Cases: Agilent HPLC and GC systems Python Libraries:
agilent-chemstation: Community tools- Binary format parsers EDA Approach:
- Method validation
- Integration parameters
- Calibration curve
- Sample sequence information
- Instrument status
.arw - Empower (Waters)
Description: Waters Empower format Typical Data: UPLC/HPLC chromatograms Use Cases: Waters instrument data Python Libraries:
- Vendor tools (limited Python access)
- Database extraction tools EDA Approach:
- Audit trail information
- Processing methods
- Compound identification
- Quantitation results
- System suitability tests
.lcd - Shimadzu LabSolutions
Description: Shimadzu chromatography format Typical Data: GC/HPLC data Use Cases: Shimadzu instruments Python Libraries:
- Vendor-specific parsers EDA Approach:
- Method parameters
- Peak purity analysis
- Spectral data (if PDA)
- Quantitative results
Other Analytical Techniques
.dta - DSC/TGA Data
Description: Thermal analysis data (TA Instruments) Typical Data: Temperature vs heat flow or mass Use Cases: Differential scanning calorimetry, thermogravimetry Python Libraries:
- Custom parsers for TA formats
pandasfor exported data EDA Approach:- Transition temperature identification
- Enthalpy calculations
- Mass loss steps
- Heating rate effects
- Baseline determination
- Purity assessment
.run - ICP-MS/ICP-OES
Description: Elemental analysis data Typical Data: Element concentrations or counts Use Cases: Inductively coupled plasma MS/OES Python Libraries:
- Vendor-specific tools
- Custom parsers EDA Approach:
- Element detection and quantitation
- Internal standard performance
- Spike recovery
- Dilution factor corrections
- Isotope ratios
- LOD/LOQ calculations
.exp - Electrochemistry Data
Description: Electrochemical experiment data Typical Data: Potential vs current or charge Use Cases: Cyclic voltammetry, chronoamperometry Python Libraries:
- Custom parsers per instrument (CHI, Gamry, etc.)
galvani: Biologic EC-Lab files EDA Approach:- Redox peak identification
- Peak potential and current
- Scan rate effects
- Electron transfer kinetics
- Background subtraction
- Capacitance calculations