Files
2025-11-30 08:30:10 +08:00

10 KiB

Pymatgen I/O and File Format Reference

This reference documents pymatgen's extensive input/output capabilities for reading and writing structural and computational data across 100+ file formats.

General I/O Philosophy

Pymatgen provides a unified interface for file operations through the from_file() and to() methods, with automatic format detection based on file extensions.

Reading Files

from pymatgen.core import Structure, Molecule

# Automatic format detection
struct = Structure.from_file("POSCAR")
struct = Structure.from_file("structure.cif")
mol = Molecule.from_file("molecule.xyz")

# Explicit format specification
struct = Structure.from_file("file.txt", fmt="cif")

Writing Files

# Write to file (format inferred from extension)
struct.to(filename="output.cif")
struct.to(filename="POSCAR")
struct.to(filename="structure.xyz")

# Get string representation without writing
cif_string = struct.to(fmt="cif")
poscar_string = struct.to(fmt="poscar")

Structure File Formats

CIF (Crystallographic Information File)

Standard format for crystallographic data.

from pymatgen.io.cif import CifParser, CifWriter

# Reading
parser = CifParser("structure.cif")
structure = parser.get_structures()[0]  # Returns list of structures

# Writing
writer = CifWriter(struct)
writer.write_file("output.cif")

# Or using convenience methods
struct = Structure.from_file("structure.cif")
struct.to(filename="output.cif")

Key features:

  • Supports symmetry information
  • Can contain multiple structures
  • Preserves space group and symmetry operations
  • Handles partial occupancies

POSCAR/CONTCAR (VASP)

VASP's structure format.

from pymatgen.io.vasp import Poscar

# Reading
poscar = Poscar.from_file("POSCAR")
structure = poscar.structure

# Writing
poscar = Poscar(struct)
poscar.write_file("POSCAR")

# Or using convenience methods
struct = Structure.from_file("POSCAR")
struct.to(filename="POSCAR")

Key features:

  • Supports selective dynamics
  • Can include velocities (XDATCAR format)
  • Preserves lattice and coordinate precision

XYZ

Simple molecular coordinates format.

# For molecules
mol = Molecule.from_file("molecule.xyz")
mol.to(filename="output.xyz")

# For structures (Cartesian coordinates)
struct.to(filename="structure.xyz")

PDB (Protein Data Bank)

Common format for biomolecules.

mol = Molecule.from_file("protein.pdb")
mol.to(filename="output.pdb")

JSON/YAML

Serialization via dictionaries.

import json
import yaml

# JSON
with open("structure.json", "w") as f:
    json.dump(struct.as_dict(), f)

with open("structure.json", "r") as f:
    struct = Structure.from_dict(json.load(f))

# YAML
with open("structure.yaml", "w") as f:
    yaml.dump(struct.as_dict(), f)

with open("structure.yaml", "r") as f:
    struct = Structure.from_dict(yaml.safe_load(f))

Electronic Structure Code I/O

VASP

The most comprehensive integration in pymatgen.

Input Files

from pymatgen.io.vasp.inputs import Incar, Poscar, Potcar, Kpoints, VaspInput

# INCAR (calculation parameters)
incar = Incar.from_file("INCAR")
incar = Incar({"ENCUT": 520, "ISMEAR": 0, "SIGMA": 0.05})
incar.write_file("INCAR")

# KPOINTS (k-point mesh)
from pymatgen.io.vasp.inputs import Kpoints
kpoints = Kpoints.automatic(20)  # 20x20x20 Gamma-centered mesh
kpoints = Kpoints.automatic_density(struct, 1000)  # By density
kpoints.write_file("KPOINTS")

# POTCAR (pseudopotentials)
potcar = Potcar(["Fe_pv", "O"])  # Specify functional variants

# Complete input set
vasp_input = VaspInput(incar, kpoints, poscar, potcar)
vasp_input.write_input("./vasp_calc")

Output Files

from pymatgen.io.vasp.outputs import Vasprun, Outcar, Oszicar, Eigenval

# vasprun.xml (comprehensive output)
vasprun = Vasprun("vasprun.xml")
final_structure = vasprun.final_structure
energy = vasprun.final_energy
band_structure = vasprun.get_band_structure()
dos = vasprun.complete_dos

# OUTCAR
outcar = Outcar("OUTCAR")
magnetization = outcar.total_mag
elastic_tensor = outcar.elastic_tensor

# OSZICAR (convergence information)
oszicar = Oszicar("OSZICAR")

Input Sets

Pymatgen provides pre-configured input sets for common calculations:

from pymatgen.io.vasp.sets import (
    MPRelaxSet,      # Materials Project relaxation
    MPStaticSet,     # Static calculation
    MPNonSCFSet,     # Non-self-consistent (band structure)
    MPSOCSet,        # Spin-orbit coupling
    MPHSERelaxSet,   # HSE06 hybrid functional
)

# Create input set
relax = MPRelaxSet(struct)
relax.write_input("./relax_calc")

# Customize parameters
static = MPStaticSet(struct, user_incar_settings={"ENCUT": 600})
static.write_input("./static_calc")

Gaussian

Quantum chemistry package integration.

from pymatgen.io.gaussian import GaussianInput, GaussianOutput

# Input
gin = GaussianInput(
    mol,
    charge=0,
    spin_multiplicity=1,
    functional="B3LYP",
    basis_set="6-31G(d)",
    route_parameters={"Opt": None, "Freq": None}
)
gin.write_file("input.gjf")

# Output
gout = GaussianOutput("output.log")
final_mol = gout.final_structure
energy = gout.final_energy
frequencies = gout.frequencies

LAMMPS

Classical molecular dynamics.

from pymatgen.io.lammps.data import LammpsData
from pymatgen.io.lammps.inputs import LammpsInputFile

# Structure to LAMMPS data file
lammps_data = LammpsData.from_structure(struct)
lammps_data.write_file("data.lammps")

# LAMMPS input script
lammps_input = LammpsInputFile.from_file("in.lammps")

Quantum ESPRESSO

from pymatgen.io.pwscf import PWInput, PWOutput

# Input
pwin = PWInput(
    struct,
    control={"calculation": "scf"},
    system={"ecutwfc": 50, "ecutrho": 400},
    electrons={"conv_thr": 1e-8}
)
pwin.write_file("pw.in")

# Output
pwout = PWOutput("pw.out")
final_structure = pwout.final_structure
energy = pwout.final_energy

ABINIT

from pymatgen.io.abinit import AbinitInput

abin = AbinitInput(struct, pseudos)
abin.set_vars(ecut=10, nband=10)
abin.write("abinit.in")

CP2K

from pymatgen.io.cp2k.inputs import Cp2kInput
from pymatgen.io.cp2k.outputs import Cp2kOutput

# Input
cp2k_input = Cp2kInput.from_file("cp2k.inp")

# Output
cp2k_output = Cp2kOutput("cp2k.out")

FEFF (XAS/XANES)

from pymatgen.io.feff import FeffInput

feff_input = FeffInput(struct, absorbing_atom="Fe")
feff_input.write_file("feff.inp")

LMTO (Stuttgart TB-LMTO-ASA)

from pymatgen.io.lmto import LMTOCtrl

ctrl = LMTOCtrl.from_file("CTRL")
ctrl.structure

Q-Chem

from pymatgen.io.qchem.inputs import QCInput
from pymatgen.io.qchem.outputs import QCOutput

# Input
qc_input = QCInput(
    mol,
    rem={"method": "B3LYP", "basis": "6-31G*", "job_type": "opt"}
)
qc_input.write_file("mol.qin")

# Output
qc_output = QCOutput("mol.qout")

Exciting

from pymatgen.io.exciting import ExcitingInput

exc_input = ExcitingInput(struct)
exc_input.write_file("input.xml")

ATAT (Alloy Theoretic Automated Toolkit)

from pymatgen.io.atat import Mcsqs

mcsqs = Mcsqs(struct)
mcsqs.write_input(".")

Special Purpose Formats

Phonopy

from pymatgen.io.phonopy import get_phonopy_structure, get_pmg_structure

# Convert to phonopy structure
phonopy_struct = get_phonopy_structure(struct)

# Convert from phonopy
struct = get_pmg_structure(phonopy_struct)

ASE (Atomic Simulation Environment)

from pymatgen.io.ase import AseAtomsAdaptor

adaptor = AseAtomsAdaptor()

# Pymatgen to ASE
atoms = adaptor.get_atoms(struct)

# ASE to Pymatgen
struct = adaptor.get_structure(atoms)

Zeo++ (Porous Materials)

from pymatgen.io.zeopp import get_voronoi_nodes, get_high_accuracy_voronoi_nodes

# Analyze pore structure
vor_nodes = get_voronoi_nodes(struct)

BabelMolAdaptor (OpenBabel)

from pymatgen.io.babel import BabelMolAdaptor

adaptor = BabelMolAdaptor(mol)

# Convert to different formats
pdb_str = adaptor.pdbstring
sdf_str = adaptor.write_file("mol.sdf", file_format="sdf")

# Generate 3D coordinates
adaptor.add_hydrogen()
adaptor.make3d()

Alchemy and Transformation I/O

TransformedStructure

Structures that track their transformation history.

from pymatgen.alchemy.materials import TransformedStructure
from pymatgen.transformations.standard_transformations import (
    SupercellTransformation,
    SubstitutionTransformation
)

# Create transformed structure
ts = TransformedStructure(struct, [])
ts.append_transformation(SupercellTransformation([[2,0,0],[0,2,0],[0,0,2]]))
ts.append_transformation(SubstitutionTransformation({"Fe": "Mn"}))

# Write with history
ts.write_vasp_input("./calc_dir")

# Read from SNL (Structure Notebook Language)
ts = TransformedStructure.from_snl(snl)

Batch Operations

CifTransmuter

Process multiple CIF files.

from pymatgen.alchemy.transmuters import CifTransmuter

transmuter = CifTransmuter.from_filenames(
    ["structure1.cif", "structure2.cif"],
    [SupercellTransformation([[2,0,0],[0,2,0],[0,0,2]])]
)

# Write all structures
transmuter.write_vasp_input("./batch_calc")

PoscarTransmuter

Similar for POSCAR files.

from pymatgen.alchemy.transmuters import PoscarTransmuter

transmuter = PoscarTransmuter.from_filenames(
    ["POSCAR1", "POSCAR2"],
    [transformation1, transformation2]
)

Best Practices

  1. Automatic format detection: Use from_file() and to() methods whenever possible
  2. Error handling: Always wrap file I/O in try-except blocks
  3. Format-specific parsers: Use specialized parsers (e.g., Vasprun) for detailed output analysis
  4. Input sets: Prefer pre-configured input sets over manual parameter specification
  5. Serialization: Use JSON/YAML for long-term storage and version control
  6. Batch processing: Use transmuters for applying transformations to multiple structures

Supported Format Summary

Structure formats:

CIF, POSCAR/CONTCAR, XYZ, PDB, XSF, PWMAT, Res, CSSR, JSON, YAML

Electronic structure codes:

VASP, Gaussian, LAMMPS, Quantum ESPRESSO, ABINIT, CP2K, FEFF, Q-Chem, LMTO, Exciting, NWChem, AIMS, Crystallographic data formats

Molecular formats:

XYZ, PDB, MOL, SDF, PQR, via OpenBabel (many additional formats)

Special purpose:

Phonopy, ASE, Zeo++, Lobster, BoltzTraP