Initial commit
This commit is contained in:
469
skills/pymatgen/references/io_formats.md
Normal file
469
skills/pymatgen/references/io_formats.md
Normal file
@@ -0,0 +1,469 @@
|
||||
# Pymatgen I/O and File Format Reference
|
||||
|
||||
This reference documents pymatgen's extensive input/output capabilities for reading and writing structural and computational data across 100+ file formats.
|
||||
|
||||
## General I/O Philosophy
|
||||
|
||||
Pymatgen provides a unified interface for file operations through the `from_file()` and `to()` methods, with automatic format detection based on file extensions.
|
||||
|
||||
### Reading Files
|
||||
|
||||
```python
|
||||
from pymatgen.core import Structure, Molecule
|
||||
|
||||
# Automatic format detection
|
||||
struct = Structure.from_file("POSCAR")
|
||||
struct = Structure.from_file("structure.cif")
|
||||
mol = Molecule.from_file("molecule.xyz")
|
||||
|
||||
# Explicit format specification
|
||||
struct = Structure.from_file("file.txt", fmt="cif")
|
||||
```
|
||||
|
||||
### Writing Files
|
||||
|
||||
```python
|
||||
# Write to file (format inferred from extension)
|
||||
struct.to(filename="output.cif")
|
||||
struct.to(filename="POSCAR")
|
||||
struct.to(filename="structure.xyz")
|
||||
|
||||
# Get string representation without writing
|
||||
cif_string = struct.to(fmt="cif")
|
||||
poscar_string = struct.to(fmt="poscar")
|
||||
```
|
||||
|
||||
## Structure File Formats
|
||||
|
||||
### CIF (Crystallographic Information File)
|
||||
Standard format for crystallographic data.
|
||||
|
||||
```python
|
||||
from pymatgen.io.cif import CifParser, CifWriter
|
||||
|
||||
# Reading
|
||||
parser = CifParser("structure.cif")
|
||||
structure = parser.get_structures()[0] # Returns list of structures
|
||||
|
||||
# Writing
|
||||
writer = CifWriter(struct)
|
||||
writer.write_file("output.cif")
|
||||
|
||||
# Or using convenience methods
|
||||
struct = Structure.from_file("structure.cif")
|
||||
struct.to(filename="output.cif")
|
||||
```
|
||||
|
||||
**Key features:**
|
||||
- Supports symmetry information
|
||||
- Can contain multiple structures
|
||||
- Preserves space group and symmetry operations
|
||||
- Handles partial occupancies
|
||||
|
||||
### POSCAR/CONTCAR (VASP)
|
||||
VASP's structure format.
|
||||
|
||||
```python
|
||||
from pymatgen.io.vasp import Poscar
|
||||
|
||||
# Reading
|
||||
poscar = Poscar.from_file("POSCAR")
|
||||
structure = poscar.structure
|
||||
|
||||
# Writing
|
||||
poscar = Poscar(struct)
|
||||
poscar.write_file("POSCAR")
|
||||
|
||||
# Or using convenience methods
|
||||
struct = Structure.from_file("POSCAR")
|
||||
struct.to(filename="POSCAR")
|
||||
```
|
||||
|
||||
**Key features:**
|
||||
- Supports selective dynamics
|
||||
- Can include velocities (XDATCAR format)
|
||||
- Preserves lattice and coordinate precision
|
||||
|
||||
### XYZ
|
||||
Simple molecular coordinates format.
|
||||
|
||||
```python
|
||||
# For molecules
|
||||
mol = Molecule.from_file("molecule.xyz")
|
||||
mol.to(filename="output.xyz")
|
||||
|
||||
# For structures (Cartesian coordinates)
|
||||
struct.to(filename="structure.xyz")
|
||||
```
|
||||
|
||||
### PDB (Protein Data Bank)
|
||||
Common format for biomolecules.
|
||||
|
||||
```python
|
||||
mol = Molecule.from_file("protein.pdb")
|
||||
mol.to(filename="output.pdb")
|
||||
```
|
||||
|
||||
### JSON/YAML
|
||||
Serialization via dictionaries.
|
||||
|
||||
```python
|
||||
import json
|
||||
import yaml
|
||||
|
||||
# JSON
|
||||
with open("structure.json", "w") as f:
|
||||
json.dump(struct.as_dict(), f)
|
||||
|
||||
with open("structure.json", "r") as f:
|
||||
struct = Structure.from_dict(json.load(f))
|
||||
|
||||
# YAML
|
||||
with open("structure.yaml", "w") as f:
|
||||
yaml.dump(struct.as_dict(), f)
|
||||
|
||||
with open("structure.yaml", "r") as f:
|
||||
struct = Structure.from_dict(yaml.safe_load(f))
|
||||
```
|
||||
|
||||
## Electronic Structure Code I/O
|
||||
|
||||
### VASP
|
||||
|
||||
The most comprehensive integration in pymatgen.
|
||||
|
||||
#### Input Files
|
||||
|
||||
```python
|
||||
from pymatgen.io.vasp.inputs import Incar, Poscar, Potcar, Kpoints, VaspInput
|
||||
|
||||
# INCAR (calculation parameters)
|
||||
incar = Incar.from_file("INCAR")
|
||||
incar = Incar({"ENCUT": 520, "ISMEAR": 0, "SIGMA": 0.05})
|
||||
incar.write_file("INCAR")
|
||||
|
||||
# KPOINTS (k-point mesh)
|
||||
from pymatgen.io.vasp.inputs import Kpoints
|
||||
kpoints = Kpoints.automatic(20) # 20x20x20 Gamma-centered mesh
|
||||
kpoints = Kpoints.automatic_density(struct, 1000) # By density
|
||||
kpoints.write_file("KPOINTS")
|
||||
|
||||
# POTCAR (pseudopotentials)
|
||||
potcar = Potcar(["Fe_pv", "O"]) # Specify functional variants
|
||||
|
||||
# Complete input set
|
||||
vasp_input = VaspInput(incar, kpoints, poscar, potcar)
|
||||
vasp_input.write_input("./vasp_calc")
|
||||
```
|
||||
|
||||
#### Output Files
|
||||
|
||||
```python
|
||||
from pymatgen.io.vasp.outputs import Vasprun, Outcar, Oszicar, Eigenval
|
||||
|
||||
# vasprun.xml (comprehensive output)
|
||||
vasprun = Vasprun("vasprun.xml")
|
||||
final_structure = vasprun.final_structure
|
||||
energy = vasprun.final_energy
|
||||
band_structure = vasprun.get_band_structure()
|
||||
dos = vasprun.complete_dos
|
||||
|
||||
# OUTCAR
|
||||
outcar = Outcar("OUTCAR")
|
||||
magnetization = outcar.total_mag
|
||||
elastic_tensor = outcar.elastic_tensor
|
||||
|
||||
# OSZICAR (convergence information)
|
||||
oszicar = Oszicar("OSZICAR")
|
||||
```
|
||||
|
||||
#### Input Sets
|
||||
|
||||
Pymatgen provides pre-configured input sets for common calculations:
|
||||
|
||||
```python
|
||||
from pymatgen.io.vasp.sets import (
|
||||
MPRelaxSet, # Materials Project relaxation
|
||||
MPStaticSet, # Static calculation
|
||||
MPNonSCFSet, # Non-self-consistent (band structure)
|
||||
MPSOCSet, # Spin-orbit coupling
|
||||
MPHSERelaxSet, # HSE06 hybrid functional
|
||||
)
|
||||
|
||||
# Create input set
|
||||
relax = MPRelaxSet(struct)
|
||||
relax.write_input("./relax_calc")
|
||||
|
||||
# Customize parameters
|
||||
static = MPStaticSet(struct, user_incar_settings={"ENCUT": 600})
|
||||
static.write_input("./static_calc")
|
||||
```
|
||||
|
||||
### Gaussian
|
||||
|
||||
Quantum chemistry package integration.
|
||||
|
||||
```python
|
||||
from pymatgen.io.gaussian import GaussianInput, GaussianOutput
|
||||
|
||||
# Input
|
||||
gin = GaussianInput(
|
||||
mol,
|
||||
charge=0,
|
||||
spin_multiplicity=1,
|
||||
functional="B3LYP",
|
||||
basis_set="6-31G(d)",
|
||||
route_parameters={"Opt": None, "Freq": None}
|
||||
)
|
||||
gin.write_file("input.gjf")
|
||||
|
||||
# Output
|
||||
gout = GaussianOutput("output.log")
|
||||
final_mol = gout.final_structure
|
||||
energy = gout.final_energy
|
||||
frequencies = gout.frequencies
|
||||
```
|
||||
|
||||
### LAMMPS
|
||||
|
||||
Classical molecular dynamics.
|
||||
|
||||
```python
|
||||
from pymatgen.io.lammps.data import LammpsData
|
||||
from pymatgen.io.lammps.inputs import LammpsInputFile
|
||||
|
||||
# Structure to LAMMPS data file
|
||||
lammps_data = LammpsData.from_structure(struct)
|
||||
lammps_data.write_file("data.lammps")
|
||||
|
||||
# LAMMPS input script
|
||||
lammps_input = LammpsInputFile.from_file("in.lammps")
|
||||
```
|
||||
|
||||
### Quantum ESPRESSO
|
||||
|
||||
```python
|
||||
from pymatgen.io.pwscf import PWInput, PWOutput
|
||||
|
||||
# Input
|
||||
pwin = PWInput(
|
||||
struct,
|
||||
control={"calculation": "scf"},
|
||||
system={"ecutwfc": 50, "ecutrho": 400},
|
||||
electrons={"conv_thr": 1e-8}
|
||||
)
|
||||
pwin.write_file("pw.in")
|
||||
|
||||
# Output
|
||||
pwout = PWOutput("pw.out")
|
||||
final_structure = pwout.final_structure
|
||||
energy = pwout.final_energy
|
||||
```
|
||||
|
||||
### ABINIT
|
||||
|
||||
```python
|
||||
from pymatgen.io.abinit import AbinitInput
|
||||
|
||||
abin = AbinitInput(struct, pseudos)
|
||||
abin.set_vars(ecut=10, nband=10)
|
||||
abin.write("abinit.in")
|
||||
```
|
||||
|
||||
### CP2K
|
||||
|
||||
```python
|
||||
from pymatgen.io.cp2k.inputs import Cp2kInput
|
||||
from pymatgen.io.cp2k.outputs import Cp2kOutput
|
||||
|
||||
# Input
|
||||
cp2k_input = Cp2kInput.from_file("cp2k.inp")
|
||||
|
||||
# Output
|
||||
cp2k_output = Cp2kOutput("cp2k.out")
|
||||
```
|
||||
|
||||
### FEFF (XAS/XANES)
|
||||
|
||||
```python
|
||||
from pymatgen.io.feff import FeffInput
|
||||
|
||||
feff_input = FeffInput(struct, absorbing_atom="Fe")
|
||||
feff_input.write_file("feff.inp")
|
||||
```
|
||||
|
||||
### LMTO (Stuttgart TB-LMTO-ASA)
|
||||
|
||||
```python
|
||||
from pymatgen.io.lmto import LMTOCtrl
|
||||
|
||||
ctrl = LMTOCtrl.from_file("CTRL")
|
||||
ctrl.structure
|
||||
```
|
||||
|
||||
### Q-Chem
|
||||
|
||||
```python
|
||||
from pymatgen.io.qchem.inputs import QCInput
|
||||
from pymatgen.io.qchem.outputs import QCOutput
|
||||
|
||||
# Input
|
||||
qc_input = QCInput(
|
||||
mol,
|
||||
rem={"method": "B3LYP", "basis": "6-31G*", "job_type": "opt"}
|
||||
)
|
||||
qc_input.write_file("mol.qin")
|
||||
|
||||
# Output
|
||||
qc_output = QCOutput("mol.qout")
|
||||
```
|
||||
|
||||
### Exciting
|
||||
|
||||
```python
|
||||
from pymatgen.io.exciting import ExcitingInput
|
||||
|
||||
exc_input = ExcitingInput(struct)
|
||||
exc_input.write_file("input.xml")
|
||||
```
|
||||
|
||||
### ATAT (Alloy Theoretic Automated Toolkit)
|
||||
|
||||
```python
|
||||
from pymatgen.io.atat import Mcsqs
|
||||
|
||||
mcsqs = Mcsqs(struct)
|
||||
mcsqs.write_input(".")
|
||||
```
|
||||
|
||||
## Special Purpose Formats
|
||||
|
||||
### Phonopy
|
||||
|
||||
```python
|
||||
from pymatgen.io.phonopy import get_phonopy_structure, get_pmg_structure
|
||||
|
||||
# Convert to phonopy structure
|
||||
phonopy_struct = get_phonopy_structure(struct)
|
||||
|
||||
# Convert from phonopy
|
||||
struct = get_pmg_structure(phonopy_struct)
|
||||
```
|
||||
|
||||
### ASE (Atomic Simulation Environment)
|
||||
|
||||
```python
|
||||
from pymatgen.io.ase import AseAtomsAdaptor
|
||||
|
||||
adaptor = AseAtomsAdaptor()
|
||||
|
||||
# Pymatgen to ASE
|
||||
atoms = adaptor.get_atoms(struct)
|
||||
|
||||
# ASE to Pymatgen
|
||||
struct = adaptor.get_structure(atoms)
|
||||
```
|
||||
|
||||
### Zeo++ (Porous Materials)
|
||||
|
||||
```python
|
||||
from pymatgen.io.zeopp import get_voronoi_nodes, get_high_accuracy_voronoi_nodes
|
||||
|
||||
# Analyze pore structure
|
||||
vor_nodes = get_voronoi_nodes(struct)
|
||||
```
|
||||
|
||||
### BabelMolAdaptor (OpenBabel)
|
||||
|
||||
```python
|
||||
from pymatgen.io.babel import BabelMolAdaptor
|
||||
|
||||
adaptor = BabelMolAdaptor(mol)
|
||||
|
||||
# Convert to different formats
|
||||
pdb_str = adaptor.pdbstring
|
||||
sdf_str = adaptor.write_file("mol.sdf", file_format="sdf")
|
||||
|
||||
# Generate 3D coordinates
|
||||
adaptor.add_hydrogen()
|
||||
adaptor.make3d()
|
||||
```
|
||||
|
||||
## Alchemy and Transformation I/O
|
||||
|
||||
### TransformedStructure
|
||||
|
||||
Structures that track their transformation history.
|
||||
|
||||
```python
|
||||
from pymatgen.alchemy.materials import TransformedStructure
|
||||
from pymatgen.transformations.standard_transformations import (
|
||||
SupercellTransformation,
|
||||
SubstitutionTransformation
|
||||
)
|
||||
|
||||
# Create transformed structure
|
||||
ts = TransformedStructure(struct, [])
|
||||
ts.append_transformation(SupercellTransformation([[2,0,0],[0,2,0],[0,0,2]]))
|
||||
ts.append_transformation(SubstitutionTransformation({"Fe": "Mn"}))
|
||||
|
||||
# Write with history
|
||||
ts.write_vasp_input("./calc_dir")
|
||||
|
||||
# Read from SNL (Structure Notebook Language)
|
||||
ts = TransformedStructure.from_snl(snl)
|
||||
```
|
||||
|
||||
## Batch Operations
|
||||
|
||||
### CifTransmuter
|
||||
|
||||
Process multiple CIF files.
|
||||
|
||||
```python
|
||||
from pymatgen.alchemy.transmuters import CifTransmuter
|
||||
|
||||
transmuter = CifTransmuter.from_filenames(
|
||||
["structure1.cif", "structure2.cif"],
|
||||
[SupercellTransformation([[2,0,0],[0,2,0],[0,0,2]])]
|
||||
)
|
||||
|
||||
# Write all structures
|
||||
transmuter.write_vasp_input("./batch_calc")
|
||||
```
|
||||
|
||||
### PoscarTransmuter
|
||||
|
||||
Similar for POSCAR files.
|
||||
|
||||
```python
|
||||
from pymatgen.alchemy.transmuters import PoscarTransmuter
|
||||
|
||||
transmuter = PoscarTransmuter.from_filenames(
|
||||
["POSCAR1", "POSCAR2"],
|
||||
[transformation1, transformation2]
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Automatic format detection**: Use `from_file()` and `to()` methods whenever possible
|
||||
2. **Error handling**: Always wrap file I/O in try-except blocks
|
||||
3. **Format-specific parsers**: Use specialized parsers (e.g., `Vasprun`) for detailed output analysis
|
||||
4. **Input sets**: Prefer pre-configured input sets over manual parameter specification
|
||||
5. **Serialization**: Use JSON/YAML for long-term storage and version control
|
||||
6. **Batch processing**: Use transmuters for applying transformations to multiple structures
|
||||
|
||||
## Supported Format Summary
|
||||
|
||||
### Structure formats:
|
||||
CIF, POSCAR/CONTCAR, XYZ, PDB, XSF, PWMAT, Res, CSSR, JSON, YAML
|
||||
|
||||
### Electronic structure codes:
|
||||
VASP, Gaussian, LAMMPS, Quantum ESPRESSO, ABINIT, CP2K, FEFF, Q-Chem, LMTO, Exciting, NWChem, AIMS, Crystallographic data formats
|
||||
|
||||
### Molecular formats:
|
||||
XYZ, PDB, MOL, SDF, PQR, via OpenBabel (many additional formats)
|
||||
|
||||
### Special purpose:
|
||||
Phonopy, ASE, Zeo++, Lobster, BoltzTraP
|
||||
Reference in New Issue
Block a user