# Spectroscopy and Analytical Chemistry File Formats Reference This reference covers file formats used in various spectroscopic techniques and analytical chemistry instrumentation. ## NMR Spectroscopy ### .fid - NMR Free Induction Decay **Description:** Raw time-domain NMR data from Bruker, Agilent, JEOL **Typical Data:** Complex time-domain signal **Use Cases:** NMR spectroscopy, structure elucidation **Python Libraries:** - `nmrglue`: `nmrglue.bruker.read_fid('fid')` or `nmrglue.varian.read_fid('fid')` - `nmrstarlib`: NMR data handling **EDA Approach:** - Time-domain signal decay - Sampling rate and acquisition time - Number of data points - Signal-to-noise ratio estimation - Baseline drift assessment - Digital filter effects - Acquisition parameter validation - Apodization function selection ### .ft / .ft1 / .ft2 - NMR Frequency Domain **Description:** Fourier-transformed NMR spectrum **Typical Data:** Processed frequency-domain data **Use Cases:** NMR analysis, peak integration **Python Libraries:** - `nmrglue`: Frequency domain reading - Custom processing pipelines **EDA Approach:** - Peak picking and integration - Chemical shift range - Baseline correction quality - Phase correction assessment - Reference peak identification - Spectral resolution - Artifacts detection - Multiplicity analysis ### .1r / .2rr - Bruker NMR Processed Data **Description:** Bruker processed spectrum (real part) **Typical Data:** 1D or 2D processed NMR spectra **Use Cases:** NMR data analysis with Bruker software **Python Libraries:** - `nmrglue`: Bruker format support **EDA Approach:** - Processing parameters review - Window function effects - Zero-filling assessment - Linear prediction validation - Spectral artifacts ### .dx - NMR JCAMP-DX **Description:** JCAMP-DX format for NMR **Typical Data:** Standardized NMR spectrum **Use Cases:** Data exchange between software **Python Libraries:** - `jcamp`: JCAMP reader - `nmrglue`: Can import JCAMP **EDA Approach:** - Format compliance - Metadata completeness - Peak table validation - Integration values - Compound identification info ### .mnova - Mnova Format **Description:** Mestrelab Research Mnova format **Typical Data:** NMR data with processing info **Use Cases:** Mnova software workflows **Python Libraries:** - `nmrglue`: Limited Mnova support - Conversion tools to standard formats **EDA Approach:** - Multi-spectrum handling - Processing pipeline review - Quantification data - Structure assignment ## Mass Spectrometry ### .mzML - Mass Spectrometry Markup Language **Description:** Standard XML-based MS format **Typical Data:** MS spectra, chromatograms, metadata **Use Cases:** Proteomics, metabolomics, lipidomics **Python Libraries:** - `pymzml`: `pymzml.run.Reader('file.mzML')` - `pyteomics.mzml`: `pyteomics.mzml.read('file.mzML')` - `MSFileReader`: Various wrappers **EDA Approach:** - Scan count and MS level distribution - Retention time range and TIC - m/z range and resolution - Precursor ion selection - Fragmentation patterns - Instrument configuration - Quality control metrics - Data completeness ### .mzXML - Mass Spectrometry XML **Description:** Legacy XML MS format **Typical Data:** Mass spectra and chromatograms **Use Cases:** Proteomics workflows (older) **Python Libraries:** - `pyteomics.mzxml` - `pymzml`: Can read mzXML **EDA Approach:** - Similar to mzML - Version compatibility - Conversion quality assessment ### .mzData - mzData Format **Description:** Legacy PSI MS format **Typical Data:** Mass spectrometry data **Use Cases:** Legacy data archives **Python Libraries:** - `pyteomics`: Limited support - Conversion to mzML recommended **EDA Approach:** - Format conversion validation - Data completeness - Metadata extraction ### .raw - Vendor Raw Files (Thermo, Agilent, Bruker) **Description:** Proprietary instrument data **Typical Data:** Raw mass spectra and metadata **Use Cases:** Direct instrument output **Python Libraries:** - `pymsfilereader`: Thermo RAW files - `ThermoRawFileParser`: CLI wrapper - Vendor-specific APIs **EDA Approach:** - Method parameter extraction - Instrument performance metrics - Calibration status - Scan function analysis - MS/MS quality metrics - Dynamic exclusion evaluation ### .d - Agilent Data Directory **Description:** Agilent MS data folder **Typical Data:** LC-MS, GC-MS with methods **Use Cases:** Agilent MassHunter workflows **Python Libraries:** - Community parsers - Chemstation integration **EDA Approach:** - Directory structure validation - Method parameters - Calibration curves - Sequence metadata - Signal quality metrics ### .wiff - AB SCIEX Data **Description:** AB SCIEX/SCIEX instrument format **Typical Data:** Mass spectrometry data **Use Cases:** SCIEX instrument workflows **Python Libraries:** - Vendor SDKs (limited Python support) - Conversion tools **EDA Approach:** - Experiment type identification - Scan properties - Quantitation data - Multi-experiment structure ### .mgf - Mascot Generic Format **Description:** Peak list format for MS/MS **Typical Data:** Precursor and fragment masses **Use Cases:** Peptide identification, database searches **Python Libraries:** - `pyteomics.mgf`: `pyteomics.mgf.read('file.mgf')` - `pyopenms`: MGF support **EDA Approach:** - Spectrum count - Charge state distribution - Precursor m/z and intensity - Fragment peak count - Mass accuracy - Title and metadata parsing ### .pkl - Peak List (Binary) **Description:** Binary peak list format **Typical Data:** Serialized MS/MS spectra **Use Cases:** Software-specific storage **Python Libraries:** - `pickle`: Standard deserialization - `pyteomics`: PKL support **EDA Approach:** - Data structure inspection - Conversion to standard formats - Metadata preservation ### .ms1 / .ms2 - MS1/MS2 Formats **Description:** Simple text format for MS data **Typical Data:** MS1 and MS2 scans **Use Cases:** Database searching, proteomics **Python Libraries:** - `pyteomics.ms1` and `ms2` - Simple text parsing **EDA Approach:** - Scan count by level - Retention time series - Charge state analysis - m/z range coverage ### .pepXML - Peptide XML **Description:** TPP peptide identification format **Typical Data:** Peptide-spectrum matches **Use Cases:** Proteomics search results **Python Libraries:** - `pyteomics.pepxml` **EDA Approach:** - Search result statistics - Score distribution - Modification analysis - FDR assessment - Enzyme specificity ### .protXML - Protein XML **Description:** TPP protein inference format **Typical Data:** Protein identifications **Use Cases:** Proteomics protein-level results **Python Libraries:** - `pyteomics.protxml` **EDA Approach:** - Protein group analysis - Coverage statistics - Confidence scoring - Parsimony analysis ### .msp - NIST MS Search Format **Description:** NIST spectral library format **Typical Data:** Reference mass spectra **Use Cases:** Spectral library searching **Python Libraries:** - `matchms`: Spectral library handling - Custom parsers **EDA Approach:** - Library size and coverage - Metadata completeness - Peak count statistics - Compound annotation quality ## Infrared and Raman Spectroscopy ### .spc - Galactic SPC **Description:** Thermo Galactic spectroscopy format **Typical Data:** IR, Raman, UV-Vis spectra **Use Cases:** Various spectroscopy instruments **Python Libraries:** - `spc`: `spc.File('file.spc')` - `specio`: Multi-format reader **EDA Approach:** - Wavenumber/wavelength range - Data point density - Multi-spectrum handling - Baseline characteristics - Peak identification - Absorbance/transmittance mode - Instrument information ### .spa - Thermo Nicolet **Description:** Thermo Fisher FTIR format **Typical Data:** FTIR spectra **Use Cases:** OMNIC software data **Python Libraries:** - Custom binary parsers - Conversion to JCAMP or SPC **EDA Approach:** - Interferogram vs spectrum - Background spectrum validation - Atmospheric compensation - Resolution and scan number - Sample information ### .0 - Bruker OPUS **Description:** Bruker OPUS FTIR format (numbered files) **Typical Data:** FTIR spectra and metadata **Use Cases:** Bruker FTIR instruments **Python Libraries:** - `brukeropusreader`: OPUS format parser - `specio`: OPUS support **EDA Approach:** - Multiple block types (AB, ScSm, etc.) - Sample and reference spectra - Instrument parameters - Optical path configuration - Beam splitter and detector info ### .dpt - Data Point Table **Description:** Simple XY data format **Typical Data:** Generic spectroscopic data **Use Cases:** Renishaw Raman, generic exports **Python Libraries:** - `pandas`: CSV-like reading - Text parsing **EDA Approach:** - X-axis type (wavelength, wavenumber, Raman shift) - Y-axis units (intensity, absorbance, etc.) - Data point spacing - Header information - Multi-column data handling ### .wdf - Renishaw Raman **Description:** Renishaw WiRE data format **Typical Data:** Raman spectra and maps **Use Cases:** Renishaw Raman microscopy **Python Libraries:** - `renishawWiRE`: WDF reader - Custom parsers for WDF format **EDA Approach:** - Spectral vs mapping data - Laser wavelength - Accumulation and exposure time - Spatial coordinates (mapping) - Z-scan data - Baseline and cosmic ray correction ### .txt (Spectroscopy) **Description:** Generic text export from instruments **Typical Data:** Wavelength/wavenumber and intensity **Use Cases:** Universal data exchange **Python Libraries:** - `pandas`: Text file reading - `numpy`: Simple array loading **EDA Approach:** - Delimiter and format detection - Header parsing - Units identification - Multiple spectrum handling - Metadata extraction from comments ## UV-Visible Spectroscopy ### .asd / .asc - ASD Binary/ASCII **Description:** ASD FieldSpec spectroradiometer **Typical Data:** Hyperspectral UV-Vis-NIR data **Use Cases:** Remote sensing, reflectance spectroscopy **Python Libraries:** - `spectral.io.asd`: ASD format support - Custom parsers **EDA Approach:** - Wavelength range (UV to NIR) - Reference spectrum validation - Dark current correction - Integration time - GPS metadata (if present) - Reflectance vs radiance ### .sp - Perkin Elmer **Description:** Perkin Elmer UV/Vis format **Typical Data:** UV-Vis spectrophotometer data **Use Cases:** PE Lambda instruments **Python Libraries:** - Custom parsers - Conversion to standard formats **EDA Approach:** - Scan parameters - Baseline correction - Multi-wavelength scans - Time-based measurements - Sample/reference handling ### .csv (Spectroscopy) **Description:** CSV export from UV-Vis instruments **Typical Data:** Wavelength and absorbance/transmittance **Use Cases:** Universal format for UV-Vis data **Python Libraries:** - `pandas`: Native CSV support **EDA Approach:** - Lambda max identification - Beer's law compliance - Baseline offset - Path length correction - Concentration calculations ## X-ray and Diffraction ### .cif - Crystallographic Information File **Description:** Crystal structure and diffraction data **Typical Data:** Unit cell, atomic positions, structure factors **Use Cases:** Crystallography, materials science **Python Libraries:** - `gemmi`: `gemmi.cif.read_file('file.cif')` - `PyCifRW`: CIF reading/writing - `pymatgen`: Materials structure analysis **EDA Approach:** - Crystal system and space group - Unit cell parameters - Atomic positions and occupancy - Thermal parameters - R-factors and refinement quality - Completeness and redundancy - Structure validation ### .hkl - Reflection Data **Description:** Miller indices and intensities **Typical Data:** Integrated diffraction intensities **Use Cases:** Crystallographic refinement **Python Libraries:** - Custom parsers (format dependent) - Crystallography packages (CCP4, etc.) **EDA Approach:** - Resolution range - Completeness by shell - I/sigma distribution - Systematic absences - Twinning detection - Wilson plot ### .mtz - MTZ Format (CCP4) **Description:** Binary crystallographic data **Typical Data:** Reflections, phases, structure factors **Use Cases:** Macromolecular crystallography **Python Libraries:** - `gemmi`: MTZ support - `cctbx`: Comprehensive crystallography **EDA Approach:** - Column types and data - Resolution limits - R-factors (Rwork, Rfree) - Phase probability distribution - Map coefficients - Batch information ### .xy / .xye - Powder Diffraction **Description:** 2-theta vs intensity data **Typical Data:** Powder X-ray diffraction patterns **Use Cases:** Phase identification, Rietveld refinement **Python Libraries:** - `pandas`: Simple XY reading - `pymatgen`: XRD pattern analysis **EDA Approach:** - 2-theta range - Peak positions and intensities - Background modeling - Peak width analysis (strain/size) - Phase identification via matching - Preferred orientation effects ### .raw (XRD) **Description:** Vendor-specific XRD raw data **Typical Data:** XRD patterns with metadata **Use Cases:** Bruker, PANalytical, Rigaku instruments **Python Libraries:** - Vendor-specific parsers - Conversion tools **EDA Approach:** - Scan parameters (step size, time) - Sample alignment - Incident beam setup - Detector configuration - Background scan validation ### .gsa / .gsas - GSAS Format **Description:** General Structure Analysis System **Typical Data:** Powder diffraction for Rietveld **Use Cases:** Rietveld refinement **Python Libraries:** - GSAS-II Python interface - Custom parsers **EDA Approach:** - Histogram data - Instrument parameters - Phase information - Refinement constraints - Profile function parameters ## Electron Spectroscopy ### .vms - VG Scienta **Description:** VG Scienta spectrometer format **Typical Data:** XPS, UPS, ARPES spectra **Use Cases:** Photoelectron spectroscopy **Python Libraries:** - Custom parsers for VMS - `specio`: Multi-format support **EDA Approach:** - Binding energy calibration - Pass energy and resolution - Photoelectron line identification - Satellite peak analysis - Background subtraction quality - Fermi edge position ### .spe - WinSpec/SPE Format **Description:** Princeton Instruments/Roper Scientific **Typical Data:** CCD spectra, Raman, PL **Use Cases:** Spectroscopy with CCD detectors **Python Libraries:** - `spe2py`: SPE file reader - `spe_loader`: Alternative parser **EDA Approach:** - CCD frame analysis - Wavelength calibration - Dark frame subtraction - Cosmic ray identification - Readout noise - Accumulation statistics ### .pxt - Princeton PTI **Description:** Photon Technology International **Typical Data:** Fluorescence, phosphorescence spectra **Use Cases:** Fluorescence spectroscopy **Python Libraries:** - Custom parsers - Text-based format variants **EDA Approach:** - Excitation and emission spectra - Quantum yield calculations - Time-resolved measurements - Temperature-dependent data - Correction factors applied ### .dat (Spectroscopy Generic) **Description:** Generic binary or text spectroscopy data **Typical Data:** Various spectroscopic measurements **Use Cases:** Many instruments use .dat extension **Python Libraries:** - Format-specific identification needed - `numpy`, `pandas` for known formats **EDA Approach:** - Format detection (binary vs text) - Header identification - Data structure inference - Units and axis labels - Instrument signature detection ## Chromatography ### .chrom - Chromatogram Data **Description:** Generic chromatography format **Typical Data:** Retention time vs signal **Use Cases:** HPLC, GC, LC-MS **Python Libraries:** - Vendor-specific parsers - `pandas` for text exports **EDA Approach:** - Retention time range - Peak detection and integration - Baseline drift - Resolution between peaks - Signal-to-noise ratio - Tailing factor ### .ch - ChemStation **Description:** Agilent ChemStation format **Typical Data:** Chromatograms and method parameters **Use Cases:** Agilent HPLC and GC systems **Python Libraries:** - `agilent-chemstation`: Community tools - Binary format parsers **EDA Approach:** - Method validation - Integration parameters - Calibration curve - Sample sequence information - Instrument status ### .arw - Empower (Waters) **Description:** Waters Empower format **Typical Data:** UPLC/HPLC chromatograms **Use Cases:** Waters instrument data **Python Libraries:** - Vendor tools (limited Python access) - Database extraction tools **EDA Approach:** - Audit trail information - Processing methods - Compound identification - Quantitation results - System suitability tests ### .lcd - Shimadzu LabSolutions **Description:** Shimadzu chromatography format **Typical Data:** GC/HPLC data **Use Cases:** Shimadzu instruments **Python Libraries:** - Vendor-specific parsers **EDA Approach:** - Method parameters - Peak purity analysis - Spectral data (if PDA) - Quantitative results ## Other Analytical Techniques ### .dta - DSC/TGA Data **Description:** Thermal analysis data (TA Instruments) **Typical Data:** Temperature vs heat flow or mass **Use Cases:** Differential scanning calorimetry, thermogravimetry **Python Libraries:** - Custom parsers for TA formats - `pandas` for exported data **EDA Approach:** - Transition temperature identification - Enthalpy calculations - Mass loss steps - Heating rate effects - Baseline determination - Purity assessment ### .run - ICP-MS/ICP-OES **Description:** Elemental analysis data **Typical Data:** Element concentrations or counts **Use Cases:** Inductively coupled plasma MS/OES **Python Libraries:** - Vendor-specific tools - Custom parsers **EDA Approach:** - Element detection and quantitation - Internal standard performance - Spike recovery - Dilution factor corrections - Isotope ratios - LOD/LOQ calculations ### .exp - Electrochemistry Data **Description:** Electrochemical experiment data **Typical Data:** Potential vs current or charge **Use Cases:** Cyclic voltammetry, chronoamperometry **Python Libraries:** - Custom parsers per instrument (CHI, Gamry, etc.) - `galvani`: Biologic EC-Lab files **EDA Approach:** - Redox peak identification - Peak potential and current - Scan rate effects - Electron transfer kinetics - Background subtraction - Capacitance calculations