10 KiB
Pymatgen I/O and File Format Reference
This reference documents pymatgen's extensive input/output capabilities for reading and writing structural and computational data across 100+ file formats.
General I/O Philosophy
Pymatgen provides a unified interface for file operations through the from_file() and to() methods, with automatic format detection based on file extensions.
Reading Files
from pymatgen.core import Structure, Molecule
# Automatic format detection
struct = Structure.from_file("POSCAR")
struct = Structure.from_file("structure.cif")
mol = Molecule.from_file("molecule.xyz")
# Explicit format specification
struct = Structure.from_file("file.txt", fmt="cif")
Writing Files
# Write to file (format inferred from extension)
struct.to(filename="output.cif")
struct.to(filename="POSCAR")
struct.to(filename="structure.xyz")
# Get string representation without writing
cif_string = struct.to(fmt="cif")
poscar_string = struct.to(fmt="poscar")
Structure File Formats
CIF (Crystallographic Information File)
Standard format for crystallographic data.
from pymatgen.io.cif import CifParser, CifWriter
# Reading
parser = CifParser("structure.cif")
structure = parser.get_structures()[0] # Returns list of structures
# Writing
writer = CifWriter(struct)
writer.write_file("output.cif")
# Or using convenience methods
struct = Structure.from_file("structure.cif")
struct.to(filename="output.cif")
Key features:
- Supports symmetry information
- Can contain multiple structures
- Preserves space group and symmetry operations
- Handles partial occupancies
POSCAR/CONTCAR (VASP)
VASP's structure format.
from pymatgen.io.vasp import Poscar
# Reading
poscar = Poscar.from_file("POSCAR")
structure = poscar.structure
# Writing
poscar = Poscar(struct)
poscar.write_file("POSCAR")
# Or using convenience methods
struct = Structure.from_file("POSCAR")
struct.to(filename="POSCAR")
Key features:
- Supports selective dynamics
- Can include velocities (XDATCAR format)
- Preserves lattice and coordinate precision
XYZ
Simple molecular coordinates format.
# For molecules
mol = Molecule.from_file("molecule.xyz")
mol.to(filename="output.xyz")
# For structures (Cartesian coordinates)
struct.to(filename="structure.xyz")
PDB (Protein Data Bank)
Common format for biomolecules.
mol = Molecule.from_file("protein.pdb")
mol.to(filename="output.pdb")
JSON/YAML
Serialization via dictionaries.
import json
import yaml
# JSON
with open("structure.json", "w") as f:
json.dump(struct.as_dict(), f)
with open("structure.json", "r") as f:
struct = Structure.from_dict(json.load(f))
# YAML
with open("structure.yaml", "w") as f:
yaml.dump(struct.as_dict(), f)
with open("structure.yaml", "r") as f:
struct = Structure.from_dict(yaml.safe_load(f))
Electronic Structure Code I/O
VASP
The most comprehensive integration in pymatgen.
Input Files
from pymatgen.io.vasp.inputs import Incar, Poscar, Potcar, Kpoints, VaspInput
# INCAR (calculation parameters)
incar = Incar.from_file("INCAR")
incar = Incar({"ENCUT": 520, "ISMEAR": 0, "SIGMA": 0.05})
incar.write_file("INCAR")
# KPOINTS (k-point mesh)
from pymatgen.io.vasp.inputs import Kpoints
kpoints = Kpoints.automatic(20) # 20x20x20 Gamma-centered mesh
kpoints = Kpoints.automatic_density(struct, 1000) # By density
kpoints.write_file("KPOINTS")
# POTCAR (pseudopotentials)
potcar = Potcar(["Fe_pv", "O"]) # Specify functional variants
# Complete input set
vasp_input = VaspInput(incar, kpoints, poscar, potcar)
vasp_input.write_input("./vasp_calc")
Output Files
from pymatgen.io.vasp.outputs import Vasprun, Outcar, Oszicar, Eigenval
# vasprun.xml (comprehensive output)
vasprun = Vasprun("vasprun.xml")
final_structure = vasprun.final_structure
energy = vasprun.final_energy
band_structure = vasprun.get_band_structure()
dos = vasprun.complete_dos
# OUTCAR
outcar = Outcar("OUTCAR")
magnetization = outcar.total_mag
elastic_tensor = outcar.elastic_tensor
# OSZICAR (convergence information)
oszicar = Oszicar("OSZICAR")
Input Sets
Pymatgen provides pre-configured input sets for common calculations:
from pymatgen.io.vasp.sets import (
MPRelaxSet, # Materials Project relaxation
MPStaticSet, # Static calculation
MPNonSCFSet, # Non-self-consistent (band structure)
MPSOCSet, # Spin-orbit coupling
MPHSERelaxSet, # HSE06 hybrid functional
)
# Create input set
relax = MPRelaxSet(struct)
relax.write_input("./relax_calc")
# Customize parameters
static = MPStaticSet(struct, user_incar_settings={"ENCUT": 600})
static.write_input("./static_calc")
Gaussian
Quantum chemistry package integration.
from pymatgen.io.gaussian import GaussianInput, GaussianOutput
# Input
gin = GaussianInput(
mol,
charge=0,
spin_multiplicity=1,
functional="B3LYP",
basis_set="6-31G(d)",
route_parameters={"Opt": None, "Freq": None}
)
gin.write_file("input.gjf")
# Output
gout = GaussianOutput("output.log")
final_mol = gout.final_structure
energy = gout.final_energy
frequencies = gout.frequencies
LAMMPS
Classical molecular dynamics.
from pymatgen.io.lammps.data import LammpsData
from pymatgen.io.lammps.inputs import LammpsInputFile
# Structure to LAMMPS data file
lammps_data = LammpsData.from_structure(struct)
lammps_data.write_file("data.lammps")
# LAMMPS input script
lammps_input = LammpsInputFile.from_file("in.lammps")
Quantum ESPRESSO
from pymatgen.io.pwscf import PWInput, PWOutput
# Input
pwin = PWInput(
struct,
control={"calculation": "scf"},
system={"ecutwfc": 50, "ecutrho": 400},
electrons={"conv_thr": 1e-8}
)
pwin.write_file("pw.in")
# Output
pwout = PWOutput("pw.out")
final_structure = pwout.final_structure
energy = pwout.final_energy
ABINIT
from pymatgen.io.abinit import AbinitInput
abin = AbinitInput(struct, pseudos)
abin.set_vars(ecut=10, nband=10)
abin.write("abinit.in")
CP2K
from pymatgen.io.cp2k.inputs import Cp2kInput
from pymatgen.io.cp2k.outputs import Cp2kOutput
# Input
cp2k_input = Cp2kInput.from_file("cp2k.inp")
# Output
cp2k_output = Cp2kOutput("cp2k.out")
FEFF (XAS/XANES)
from pymatgen.io.feff import FeffInput
feff_input = FeffInput(struct, absorbing_atom="Fe")
feff_input.write_file("feff.inp")
LMTO (Stuttgart TB-LMTO-ASA)
from pymatgen.io.lmto import LMTOCtrl
ctrl = LMTOCtrl.from_file("CTRL")
ctrl.structure
Q-Chem
from pymatgen.io.qchem.inputs import QCInput
from pymatgen.io.qchem.outputs import QCOutput
# Input
qc_input = QCInput(
mol,
rem={"method": "B3LYP", "basis": "6-31G*", "job_type": "opt"}
)
qc_input.write_file("mol.qin")
# Output
qc_output = QCOutput("mol.qout")
Exciting
from pymatgen.io.exciting import ExcitingInput
exc_input = ExcitingInput(struct)
exc_input.write_file("input.xml")
ATAT (Alloy Theoretic Automated Toolkit)
from pymatgen.io.atat import Mcsqs
mcsqs = Mcsqs(struct)
mcsqs.write_input(".")
Special Purpose Formats
Phonopy
from pymatgen.io.phonopy import get_phonopy_structure, get_pmg_structure
# Convert to phonopy structure
phonopy_struct = get_phonopy_structure(struct)
# Convert from phonopy
struct = get_pmg_structure(phonopy_struct)
ASE (Atomic Simulation Environment)
from pymatgen.io.ase import AseAtomsAdaptor
adaptor = AseAtomsAdaptor()
# Pymatgen to ASE
atoms = adaptor.get_atoms(struct)
# ASE to Pymatgen
struct = adaptor.get_structure(atoms)
Zeo++ (Porous Materials)
from pymatgen.io.zeopp import get_voronoi_nodes, get_high_accuracy_voronoi_nodes
# Analyze pore structure
vor_nodes = get_voronoi_nodes(struct)
BabelMolAdaptor (OpenBabel)
from pymatgen.io.babel import BabelMolAdaptor
adaptor = BabelMolAdaptor(mol)
# Convert to different formats
pdb_str = adaptor.pdbstring
sdf_str = adaptor.write_file("mol.sdf", file_format="sdf")
# Generate 3D coordinates
adaptor.add_hydrogen()
adaptor.make3d()
Alchemy and Transformation I/O
TransformedStructure
Structures that track their transformation history.
from pymatgen.alchemy.materials import TransformedStructure
from pymatgen.transformations.standard_transformations import (
SupercellTransformation,
SubstitutionTransformation
)
# Create transformed structure
ts = TransformedStructure(struct, [])
ts.append_transformation(SupercellTransformation([[2,0,0],[0,2,0],[0,0,2]]))
ts.append_transformation(SubstitutionTransformation({"Fe": "Mn"}))
# Write with history
ts.write_vasp_input("./calc_dir")
# Read from SNL (Structure Notebook Language)
ts = TransformedStructure.from_snl(snl)
Batch Operations
CifTransmuter
Process multiple CIF files.
from pymatgen.alchemy.transmuters import CifTransmuter
transmuter = CifTransmuter.from_filenames(
["structure1.cif", "structure2.cif"],
[SupercellTransformation([[2,0,0],[0,2,0],[0,0,2]])]
)
# Write all structures
transmuter.write_vasp_input("./batch_calc")
PoscarTransmuter
Similar for POSCAR files.
from pymatgen.alchemy.transmuters import PoscarTransmuter
transmuter = PoscarTransmuter.from_filenames(
["POSCAR1", "POSCAR2"],
[transformation1, transformation2]
)
Best Practices
- Automatic format detection: Use
from_file()andto()methods whenever possible - Error handling: Always wrap file I/O in try-except blocks
- Format-specific parsers: Use specialized parsers (e.g.,
Vasprun) for detailed output analysis - Input sets: Prefer pre-configured input sets over manual parameter specification
- Serialization: Use JSON/YAML for long-term storage and version control
- Batch processing: Use transmuters for applying transformations to multiple structures
Supported Format Summary
Structure formats:
CIF, POSCAR/CONTCAR, XYZ, PDB, XSF, PWMAT, Res, CSSR, JSON, YAML
Electronic structure codes:
VASP, Gaussian, LAMMPS, Quantum ESPRESSO, ABINIT, CP2K, FEFF, Q-Chem, LMTO, Exciting, NWChem, AIMS, Crystallographic data formats
Molecular formats:
XYZ, PDB, MOL, SDF, PQR, via OpenBabel (many additional formats)
Special purpose:
Phonopy, ASE, Zeo++, Lobster, BoltzTraP