Files
gh-k-dense-ai-claude-scient…/skills/exploratory-data-analysis/references/spectroscopy_analytical_formats.md
2025-11-30 08:30:10 +08:00

18 KiB

Spectroscopy and Analytical Chemistry File Formats Reference

This reference covers file formats used in various spectroscopic techniques and analytical chemistry instrumentation.

NMR Spectroscopy

.fid - NMR Free Induction Decay

Description: Raw time-domain NMR data from Bruker, Agilent, JEOL Typical Data: Complex time-domain signal Use Cases: NMR spectroscopy, structure elucidation Python Libraries:

  • nmrglue: nmrglue.bruker.read_fid('fid') or nmrglue.varian.read_fid('fid')
  • nmrstarlib: NMR data handling EDA Approach:
  • Time-domain signal decay
  • Sampling rate and acquisition time
  • Number of data points
  • Signal-to-noise ratio estimation
  • Baseline drift assessment
  • Digital filter effects
  • Acquisition parameter validation
  • Apodization function selection

.ft / .ft1 / .ft2 - NMR Frequency Domain

Description: Fourier-transformed NMR spectrum Typical Data: Processed frequency-domain data Use Cases: NMR analysis, peak integration Python Libraries:

  • nmrglue: Frequency domain reading
  • Custom processing pipelines EDA Approach:
  • Peak picking and integration
  • Chemical shift range
  • Baseline correction quality
  • Phase correction assessment
  • Reference peak identification
  • Spectral resolution
  • Artifacts detection
  • Multiplicity analysis

.1r / .2rr - Bruker NMR Processed Data

Description: Bruker processed spectrum (real part) Typical Data: 1D or 2D processed NMR spectra Use Cases: NMR data analysis with Bruker software Python Libraries:

  • nmrglue: Bruker format support EDA Approach:
  • Processing parameters review
  • Window function effects
  • Zero-filling assessment
  • Linear prediction validation
  • Spectral artifacts

.dx - NMR JCAMP-DX

Description: JCAMP-DX format for NMR Typical Data: Standardized NMR spectrum Use Cases: Data exchange between software Python Libraries:

  • jcamp: JCAMP reader
  • nmrglue: Can import JCAMP EDA Approach:
  • Format compliance
  • Metadata completeness
  • Peak table validation
  • Integration values
  • Compound identification info

.mnova - Mnova Format

Description: Mestrelab Research Mnova format Typical Data: NMR data with processing info Use Cases: Mnova software workflows Python Libraries:

  • nmrglue: Limited Mnova support
  • Conversion tools to standard formats EDA Approach:
  • Multi-spectrum handling
  • Processing pipeline review
  • Quantification data
  • Structure assignment

Mass Spectrometry

.mzML - Mass Spectrometry Markup Language

Description: Standard XML-based MS format Typical Data: MS spectra, chromatograms, metadata Use Cases: Proteomics, metabolomics, lipidomics Python Libraries:

  • pymzml: pymzml.run.Reader('file.mzML')
  • pyteomics.mzml: pyteomics.mzml.read('file.mzML')
  • MSFileReader: Various wrappers EDA Approach:
  • Scan count and MS level distribution
  • Retention time range and TIC
  • m/z range and resolution
  • Precursor ion selection
  • Fragmentation patterns
  • Instrument configuration
  • Quality control metrics
  • Data completeness

.mzXML - Mass Spectrometry XML

Description: Legacy XML MS format Typical Data: Mass spectra and chromatograms Use Cases: Proteomics workflows (older) Python Libraries:

  • pyteomics.mzxml
  • pymzml: Can read mzXML EDA Approach:
  • Similar to mzML
  • Version compatibility
  • Conversion quality assessment

.mzData - mzData Format

Description: Legacy PSI MS format Typical Data: Mass spectrometry data Use Cases: Legacy data archives Python Libraries:

  • pyteomics: Limited support
  • Conversion to mzML recommended EDA Approach:
  • Format conversion validation
  • Data completeness
  • Metadata extraction

.raw - Vendor Raw Files (Thermo, Agilent, Bruker)

Description: Proprietary instrument data Typical Data: Raw mass spectra and metadata Use Cases: Direct instrument output Python Libraries:

  • pymsfilereader: Thermo RAW files
  • ThermoRawFileParser: CLI wrapper
  • Vendor-specific APIs EDA Approach:
  • Method parameter extraction
  • Instrument performance metrics
  • Calibration status
  • Scan function analysis
  • MS/MS quality metrics
  • Dynamic exclusion evaluation

.d - Agilent Data Directory

Description: Agilent MS data folder Typical Data: LC-MS, GC-MS with methods Use Cases: Agilent MassHunter workflows Python Libraries:

  • Community parsers
  • Chemstation integration EDA Approach:
  • Directory structure validation
  • Method parameters
  • Calibration curves
  • Sequence metadata
  • Signal quality metrics

.wiff - AB SCIEX Data

Description: AB SCIEX/SCIEX instrument format Typical Data: Mass spectrometry data Use Cases: SCIEX instrument workflows Python Libraries:

  • Vendor SDKs (limited Python support)
  • Conversion tools EDA Approach:
  • Experiment type identification
  • Scan properties
  • Quantitation data
  • Multi-experiment structure

.mgf - Mascot Generic Format

Description: Peak list format for MS/MS Typical Data: Precursor and fragment masses Use Cases: Peptide identification, database searches Python Libraries:

  • pyteomics.mgf: pyteomics.mgf.read('file.mgf')
  • pyopenms: MGF support EDA Approach:
  • Spectrum count
  • Charge state distribution
  • Precursor m/z and intensity
  • Fragment peak count
  • Mass accuracy
  • Title and metadata parsing

.pkl - Peak List (Binary)

Description: Binary peak list format Typical Data: Serialized MS/MS spectra Use Cases: Software-specific storage Python Libraries:

  • pickle: Standard deserialization
  • pyteomics: PKL support EDA Approach:
  • Data structure inspection
  • Conversion to standard formats
  • Metadata preservation

.ms1 / .ms2 - MS1/MS2 Formats

Description: Simple text format for MS data Typical Data: MS1 and MS2 scans Use Cases: Database searching, proteomics Python Libraries:

  • pyteomics.ms1 and ms2
  • Simple text parsing EDA Approach:
  • Scan count by level
  • Retention time series
  • Charge state analysis
  • m/z range coverage

.pepXML - Peptide XML

Description: TPP peptide identification format Typical Data: Peptide-spectrum matches Use Cases: Proteomics search results Python Libraries:

  • pyteomics.pepxml EDA Approach:
  • Search result statistics
  • Score distribution
  • Modification analysis
  • FDR assessment
  • Enzyme specificity

.protXML - Protein XML

Description: TPP protein inference format Typical Data: Protein identifications Use Cases: Proteomics protein-level results Python Libraries:

  • pyteomics.protxml EDA Approach:
  • Protein group analysis
  • Coverage statistics
  • Confidence scoring
  • Parsimony analysis

.msp - NIST MS Search Format

Description: NIST spectral library format Typical Data: Reference mass spectra Use Cases: Spectral library searching Python Libraries:

  • matchms: Spectral library handling
  • Custom parsers EDA Approach:
  • Library size and coverage
  • Metadata completeness
  • Peak count statistics
  • Compound annotation quality

Infrared and Raman Spectroscopy

.spc - Galactic SPC

Description: Thermo Galactic spectroscopy format Typical Data: IR, Raman, UV-Vis spectra Use Cases: Various spectroscopy instruments Python Libraries:

  • spc: spc.File('file.spc')
  • specio: Multi-format reader EDA Approach:
  • Wavenumber/wavelength range
  • Data point density
  • Multi-spectrum handling
  • Baseline characteristics
  • Peak identification
  • Absorbance/transmittance mode
  • Instrument information

.spa - Thermo Nicolet

Description: Thermo Fisher FTIR format Typical Data: FTIR spectra Use Cases: OMNIC software data Python Libraries:

  • Custom binary parsers
  • Conversion to JCAMP or SPC EDA Approach:
  • Interferogram vs spectrum
  • Background spectrum validation
  • Atmospheric compensation
  • Resolution and scan number
  • Sample information

.0 - Bruker OPUS

Description: Bruker OPUS FTIR format (numbered files) Typical Data: FTIR spectra and metadata Use Cases: Bruker FTIR instruments Python Libraries:

  • brukeropusreader: OPUS format parser
  • specio: OPUS support EDA Approach:
  • Multiple block types (AB, ScSm, etc.)
  • Sample and reference spectra
  • Instrument parameters
  • Optical path configuration
  • Beam splitter and detector info

.dpt - Data Point Table

Description: Simple XY data format Typical Data: Generic spectroscopic data Use Cases: Renishaw Raman, generic exports Python Libraries:

  • pandas: CSV-like reading
  • Text parsing EDA Approach:
  • X-axis type (wavelength, wavenumber, Raman shift)
  • Y-axis units (intensity, absorbance, etc.)
  • Data point spacing
  • Header information
  • Multi-column data handling

.wdf - Renishaw Raman

Description: Renishaw WiRE data format Typical Data: Raman spectra and maps Use Cases: Renishaw Raman microscopy Python Libraries:

  • renishawWiRE: WDF reader
  • Custom parsers for WDF format EDA Approach:
  • Spectral vs mapping data
  • Laser wavelength
  • Accumulation and exposure time
  • Spatial coordinates (mapping)
  • Z-scan data
  • Baseline and cosmic ray correction

.txt (Spectroscopy)

Description: Generic text export from instruments Typical Data: Wavelength/wavenumber and intensity Use Cases: Universal data exchange Python Libraries:

  • pandas: Text file reading
  • numpy: Simple array loading EDA Approach:
  • Delimiter and format detection
  • Header parsing
  • Units identification
  • Multiple spectrum handling
  • Metadata extraction from comments

UV-Visible Spectroscopy

.asd / .asc - ASD Binary/ASCII

Description: ASD FieldSpec spectroradiometer Typical Data: Hyperspectral UV-Vis-NIR data Use Cases: Remote sensing, reflectance spectroscopy Python Libraries:

  • spectral.io.asd: ASD format support
  • Custom parsers EDA Approach:
  • Wavelength range (UV to NIR)
  • Reference spectrum validation
  • Dark current correction
  • Integration time
  • GPS metadata (if present)
  • Reflectance vs radiance

.sp - Perkin Elmer

Description: Perkin Elmer UV/Vis format Typical Data: UV-Vis spectrophotometer data Use Cases: PE Lambda instruments Python Libraries:

  • Custom parsers
  • Conversion to standard formats EDA Approach:
  • Scan parameters
  • Baseline correction
  • Multi-wavelength scans
  • Time-based measurements
  • Sample/reference handling

.csv (Spectroscopy)

Description: CSV export from UV-Vis instruments Typical Data: Wavelength and absorbance/transmittance Use Cases: Universal format for UV-Vis data Python Libraries:

  • pandas: Native CSV support EDA Approach:
  • Lambda max identification
  • Beer's law compliance
  • Baseline offset
  • Path length correction
  • Concentration calculations

X-ray and Diffraction

.cif - Crystallographic Information File

Description: Crystal structure and diffraction data Typical Data: Unit cell, atomic positions, structure factors Use Cases: Crystallography, materials science Python Libraries:

  • gemmi: gemmi.cif.read_file('file.cif')
  • PyCifRW: CIF reading/writing
  • pymatgen: Materials structure analysis EDA Approach:
  • Crystal system and space group
  • Unit cell parameters
  • Atomic positions and occupancy
  • Thermal parameters
  • R-factors and refinement quality
  • Completeness and redundancy
  • Structure validation

.hkl - Reflection Data

Description: Miller indices and intensities Typical Data: Integrated diffraction intensities Use Cases: Crystallographic refinement Python Libraries:

  • Custom parsers (format dependent)
  • Crystallography packages (CCP4, etc.) EDA Approach:
  • Resolution range
  • Completeness by shell
  • I/sigma distribution
  • Systematic absences
  • Twinning detection
  • Wilson plot

.mtz - MTZ Format (CCP4)

Description: Binary crystallographic data Typical Data: Reflections, phases, structure factors Use Cases: Macromolecular crystallography Python Libraries:

  • gemmi: MTZ support
  • cctbx: Comprehensive crystallography EDA Approach:
  • Column types and data
  • Resolution limits
  • R-factors (Rwork, Rfree)
  • Phase probability distribution
  • Map coefficients
  • Batch information

.xy / .xye - Powder Diffraction

Description: 2-theta vs intensity data Typical Data: Powder X-ray diffraction patterns Use Cases: Phase identification, Rietveld refinement Python Libraries:

  • pandas: Simple XY reading
  • pymatgen: XRD pattern analysis EDA Approach:
  • 2-theta range
  • Peak positions and intensities
  • Background modeling
  • Peak width analysis (strain/size)
  • Phase identification via matching
  • Preferred orientation effects

.raw (XRD)

Description: Vendor-specific XRD raw data Typical Data: XRD patterns with metadata Use Cases: Bruker, PANalytical, Rigaku instruments Python Libraries:

  • Vendor-specific parsers
  • Conversion tools EDA Approach:
  • Scan parameters (step size, time)
  • Sample alignment
  • Incident beam setup
  • Detector configuration
  • Background scan validation

.gsa / .gsas - GSAS Format

Description: General Structure Analysis System Typical Data: Powder diffraction for Rietveld Use Cases: Rietveld refinement Python Libraries:

  • GSAS-II Python interface
  • Custom parsers EDA Approach:
  • Histogram data
  • Instrument parameters
  • Phase information
  • Refinement constraints
  • Profile function parameters

Electron Spectroscopy

.vms - VG Scienta

Description: VG Scienta spectrometer format Typical Data: XPS, UPS, ARPES spectra Use Cases: Photoelectron spectroscopy Python Libraries:

  • Custom parsers for VMS
  • specio: Multi-format support EDA Approach:
  • Binding energy calibration
  • Pass energy and resolution
  • Photoelectron line identification
  • Satellite peak analysis
  • Background subtraction quality
  • Fermi edge position

.spe - WinSpec/SPE Format

Description: Princeton Instruments/Roper Scientific Typical Data: CCD spectra, Raman, PL Use Cases: Spectroscopy with CCD detectors Python Libraries:

  • spe2py: SPE file reader
  • spe_loader: Alternative parser EDA Approach:
  • CCD frame analysis
  • Wavelength calibration
  • Dark frame subtraction
  • Cosmic ray identification
  • Readout noise
  • Accumulation statistics

.pxt - Princeton PTI

Description: Photon Technology International Typical Data: Fluorescence, phosphorescence spectra Use Cases: Fluorescence spectroscopy Python Libraries:

  • Custom parsers
  • Text-based format variants EDA Approach:
  • Excitation and emission spectra
  • Quantum yield calculations
  • Time-resolved measurements
  • Temperature-dependent data
  • Correction factors applied

.dat (Spectroscopy Generic)

Description: Generic binary or text spectroscopy data Typical Data: Various spectroscopic measurements Use Cases: Many instruments use .dat extension Python Libraries:

  • Format-specific identification needed
  • numpy, pandas for known formats EDA Approach:
  • Format detection (binary vs text)
  • Header identification
  • Data structure inference
  • Units and axis labels
  • Instrument signature detection

Chromatography

.chrom - Chromatogram Data

Description: Generic chromatography format Typical Data: Retention time vs signal Use Cases: HPLC, GC, LC-MS Python Libraries:

  • Vendor-specific parsers
  • pandas for text exports EDA Approach:
  • Retention time range
  • Peak detection and integration
  • Baseline drift
  • Resolution between peaks
  • Signal-to-noise ratio
  • Tailing factor

.ch - ChemStation

Description: Agilent ChemStation format Typical Data: Chromatograms and method parameters Use Cases: Agilent HPLC and GC systems Python Libraries:

  • agilent-chemstation: Community tools
  • Binary format parsers EDA Approach:
  • Method validation
  • Integration parameters
  • Calibration curve
  • Sample sequence information
  • Instrument status

.arw - Empower (Waters)

Description: Waters Empower format Typical Data: UPLC/HPLC chromatograms Use Cases: Waters instrument data Python Libraries:

  • Vendor tools (limited Python access)
  • Database extraction tools EDA Approach:
  • Audit trail information
  • Processing methods
  • Compound identification
  • Quantitation results
  • System suitability tests

.lcd - Shimadzu LabSolutions

Description: Shimadzu chromatography format Typical Data: GC/HPLC data Use Cases: Shimadzu instruments Python Libraries:

  • Vendor-specific parsers EDA Approach:
  • Method parameters
  • Peak purity analysis
  • Spectral data (if PDA)
  • Quantitative results

Other Analytical Techniques

.dta - DSC/TGA Data

Description: Thermal analysis data (TA Instruments) Typical Data: Temperature vs heat flow or mass Use Cases: Differential scanning calorimetry, thermogravimetry Python Libraries:

  • Custom parsers for TA formats
  • pandas for exported data EDA Approach:
  • Transition temperature identification
  • Enthalpy calculations
  • Mass loss steps
  • Heating rate effects
  • Baseline determination
  • Purity assessment

.run - ICP-MS/ICP-OES

Description: Elemental analysis data Typical Data: Element concentrations or counts Use Cases: Inductively coupled plasma MS/OES Python Libraries:

  • Vendor-specific tools
  • Custom parsers EDA Approach:
  • Element detection and quantitation
  • Internal standard performance
  • Spike recovery
  • Dilution factor corrections
  • Isotope ratios
  • LOD/LOQ calculations

.exp - Electrochemistry Data

Description: Electrochemical experiment data Typical Data: Potential vs current or charge Use Cases: Cyclic voltammetry, chronoamperometry Python Libraries:

  • Custom parsers per instrument (CHI, Gamry, etc.)
  • galvani: Biologic EC-Lab files EDA Approach:
  • Redox peak identification
  • Peak potential and current
  • Scan rate effects
  • Electron transfer kinetics
  • Background subtraction
  • Capacitance calculations