2.4. Data file formats (input and output)¶
This section gives an overview of the file formats supported by HORTON. Some formats can be used for input and output, others only for input or for output. The formats are always used in the same way:
To load data from a file, you use the
from_file()
method of theIOData
class:mol = IOData.from_file('example.xyz')
The format is recognized through the file extension (or in somecases by a prefix, as indicated in the following sections). The loaded data are accessible as attributes of the
mol
object, e.g.:print mol.coordinates
Each file format has its corresponding set of attributes that are filled with data read from the file. For some formats, the available attributes may also depend on the data available in the file.
To dump data into a file, you create a
IOData
instance, assign attributes to this instance and call theto_file()
method, e.g.:mol = IOData(title='Example') mol.numbers = np.array([10]) mol.coordinates = np.array([[0.0, 0.0, 0.0]]) mol = IOData.to_file('example.xyz')
As shown in the above example, there are two ways to set the attributes: (i) by passing them as arguments to the constructor of the
IOData
class (first line) or by setting the attributes after creating aIOData
instance (second and third line). Again, the file format is deduced from the file name. If not all required attributes for a given format are set, theto_file
method will raise anAtributeError
.
The complete list of all possible attributes (the superset for all supported
formats) is documented here: horton.io.iodata.IOData
. Note that
HORTON’s internal format supports all of these and any other attribute that you
assign to a IOData
instance.
2.4.1. Molecular geometry file formats¶
2.4.1.1. The .xyz
format¶
Load | Yes |
Dump | Yes |
Recognized by | File extension .xyz |
Interoperation | Nearly all molecular simulation codes and Open Babel |
Always loading | title numbers coordinates |
Derived when loading | natom pseudo_numbers |
Required for dumping | numbers coordinates |
Optional for dumping | title |
2.4.1.2. The POSCAR
format¶
Load | Yes |
Dump | Yes |
Recognized by | File prefix POSCAR |
Interoperation | VASP 5.X, VESTA |
Always loading | title numbers coordinates cell |
Derived when loading | natom pseudo_numbers |
Required for dumping | numbers coordinates cell |
Optional for dumping | title |
2.4.1.3. The .cif
(Crystalographic Information File) format¶
Load | Works only for simple files |
Dump | Yes, except for symmetry information |
Recognized by | File extension .cif |
Interoperation | CCDC, COD, ... |
Always loading | title numbers coordinates cell symmetry links |
Derived when loading | natom pseudo_numbers |
Required for dumping | numbers coordinates cell |
Optional for dumping | title |
2.4.2. Cube file formats¶
2.4.2.1. The Gaussian .cube
format¶
Load | Yes |
Dump | Yes |
Recognized by | File extension .cube |
Interoperation | Gaussian, CP2K, GPAW, Q-Chem <http://www.q-chem.com/>`_, ... |
Always loading | title numbers pseudo_numbers coordinates cell grid cube_data |
Derived when loading | natom |
Required for dumping | numbers coordinates cell grid cube_data |
Optional for dumping | title pseudo_numbers |
Note
The second column in the geometry specification of the cube file is used for the pseudo-numbers.
2.4.2.2. The VASP CHGCAR
and LOCPOT
formats¶
Load | Yes |
Dump | No |
Recognized by | File prefix CHGCAR or LOCPOT |
Interoperation | VASP 5.X, VESTA |
Always loading | title coordinates numbers cell grid cube_data |
Derived when loading | natom pseudo_numbers |
Note
Even though the CHGCAR
and LOCPOT
files look very similar, they
require different conversions to atomic units.
2.4.3. Wavefunction formats (using a Gaussian basis set)¶
All wavefunction formats share the following behavior
In case of a restricted wavefunction, only the alpha orbitals are loaded.
In case of an unrestricted wavefunction, both the alpha and beta orbitals are loaded.
Some formats also load a
permutation
and/or asigns
attribute. These are generated when loading the file, such that appropriate permutations and sign changes can be applied to convert to the proper HORTON conventions for Gaussian basis functions. These conventions are fixed in thefrom_file
method. This allows you to fix also the order of elements in arrays loaded from another file. For example, you can load an.fchk
and a.log
file at the same time:mol = IOData.from_file('foo.fchk', 'foo.log')
In this case,
permutation
is deduced from the filefoo.fchk
but is also applied to reorder the matrix elements loaded fromfoo.log
, for the sake of consistency.
2.4.3.1. The Gaussian .fchk
format¶
Load | Yes |
Dump | No |
Recognized by | File extension .fchk |
Interoperation | Gaussian |
Always loading | title coordinates numbers obasis exp_alpha permutation energy pseudo_numbers mulliken_charges |
loading if present | npa_charges esp_charges exp_beta dm_full_mp2 dm_spin_mp2 dm_full_mp3 dm_spin_mp3 dm_full_cc dm_spin_cc dm_full_ci dm_spin_ci dm_full_scf dm_spin_scf |
Derived when loading | natom |
2.4.3.2. The .molden
format¶
Load | Yes |
Dump | Yes |
Recognized by | File extension .molden |
Interoperation | Molpro, Orca, PSI4, Molden |
Always loading | coordinates numbers obasis exp_alpha signs |
loading if present | title exp_beta |
Derived when loading | natom |
Required for dumping | coordinates numbers obasis exp_alpha |
Optional for dumping | title exp_beta |
2.4.3.3. The .mkl
(Molekel) format¶
Load | Yes |
Dump | No |
Recognized by | File extension .mkl |
Interoperation | Molekel, Orca, |
Always loading | coordinates numbers obasis exp_alpha |
loading if present | exp_beta signs |
Derived when loading | natom |
2.4.3.4. The .wfn
format¶
Load | Yes |
Dump | No |
Recognized by | File extension .wfn |
Interoperation | GAMESS, Gaussian, |
Always loading | title coordinates numbers obasis exp_alpha |
loading if present | exp_beta |
Derived when loading | natom |
Note
Only use this format if the program that generated it does not offer any alternatives that HORTON can load. The WFN format has the disadvantage that it cannot represent contractions and therefore expands all orbitals into a decontracted basis. This makes the post-processing less efficient compared to formats that do support contractions of Gaussian functions.
2.4.4. Hamiltonian file formats¶
2.4.4.1. The Molpro 2012 FCIDUMP
format¶
Load | Yes |
Dump | Yes |
Recognized by | File name contains FCIDUMP |
Interoperation | Molpro, PSI4 |
Always loading | lf nelec ms2 one_mo two_mo core_energy |
Required for dumping | one_mo two_mo |
Optional for dumping | core_energy nelec ms |
2.4.4.2. The Gaussian .log
file¶
Load | Yes |
Dump | No |
Recognized by | File extension .log |
Interoperation | Gaussian, |
loading if present | olp kin na er |
In order to let Gaussian print out all the matrix elements (Gaussian integrals), the following commands must be used in the Gaussian input file:
scf(conventional) iop(3/33=5) extralinks=l316 iop(3/27=999)
Just keep in mind that this feature in Gaussian only works for a low number of
basis functions. The FCIDUMP
files generated with Molpro or PSI4 are more
reliable and also have the advantage that all integrals are stored in double
precision.
2.4.5. HORTON’s internal file format¶
The internal HDF5-based format of HORTON is effectively a superset of all formats listed above. Moreover, the user is free to store any additional data not covered by the file formats above. Many (not all) Python data types can dumped into the internal format:
int
float
str
- Any NumPy array
- Classes in the HORTON library that have a
to_hdf5
andfrom_hdf5
method. For example:AtomicGridSpec
,BeckeMolGrid
,Cell
,CubicSpline
,ESPCost
,GBasis
,GOBasis
,Symmetry
,UniformGrid
and all classes in the packagehorton.matrix
- A dictionary with strings as keys and any mixture of the above data types as values.
Load | Yes |
Dump | Yes |
Recognized by | File extension .h5 |
Interoperation | Custom scripts. Archiving of data generated with any other code. |
loading when present | Any attribute |
Optional for dumping | Any attribute with the right type |