2.4. Data file formats (input and output)

This section gives an overview of the file formats supported by HORTON. Some formats can be used for input and output, others only for input or for output. The formats are always used in the same way:

  • To load data from a file, you use the from_file() method of the IOData class:

    mol = IOData.from_file('example.xyz')
    

    The format is recognized through the file extension (or in somecases by a prefix, as indicated in the following sections). The loaded data are accessible as attributes of the mol object, e.g.:

    print mol.coordinates
    

    Each file format has its corresponding set of attributes that are filled with data read from the file. For some formats, the available attributes may also depend on the data available in the file.

  • To dump data into a file, you create a IOData instance, assign attributes to this instance and call the to_file() method, e.g.:

    mol = IOData(title='Example')
    mol.numbers = np.array([10])
    mol.coordinates = np.array([[0.0, 0.0, 0.0]])
    mol = IOData.to_file('example.xyz')
    

    As shown in the above example, there are two ways to set the attributes: (i) by passing them as arguments to the constructor of the IOData class (first line) or by setting the attributes after creating a IOData instance (second and third line). Again, the file format is deduced from the file name. If not all required attributes for a given format are set, the to_file method will raise an AtributeError.

The complete list of all possible attributes (the superset for all supported formats) is documented here: horton.io.iodata.IOData. Note that HORTON’s internal format supports all of these and any other attribute that you assign to a IOData instance.

2.4.1. Molecular geometry file formats

2.4.1.1. The .xyz format

Load Yes
Dump Yes
Recognized by File extension .xyz
Interoperation Nearly all molecular simulation codes and Open Babel
Always loading title numbers coordinates
Derived when loading natom pseudo_numbers
Required for dumping numbers coordinates
Optional for dumping title

2.4.1.2. The POSCAR format

Load Yes
Dump Yes
Recognized by File prefix POSCAR
Interoperation VASP 5.X, VESTA
Always loading title numbers coordinates cell
Derived when loading natom pseudo_numbers
Required for dumping numbers coordinates cell
Optional for dumping title

2.4.1.3. The .cif (Crystalographic Information File) format

Load Works only for simple files
Dump Yes, except for symmetry information
Recognized by File extension .cif
Interoperation CCDC, COD, ...
Always loading title numbers coordinates cell symmetry links
Derived when loading natom pseudo_numbers
Required for dumping numbers coordinates cell
Optional for dumping title

2.4.2. Cube file formats

2.4.2.1. The Gaussian .cube format

Load Yes
Dump Yes
Recognized by File extension .cube
Interoperation Gaussian, CP2K, GPAW, Q-Chem <http://www.q-chem.com/>`_, ...
Always loading title numbers pseudo_numbers coordinates cell grid cube_data
Derived when loading natom
Required for dumping numbers coordinates cell grid cube_data
Optional for dumping title pseudo_numbers

Note

The second column in the geometry specification of the cube file is used for the pseudo-numbers.

2.4.2.2. The VASP CHGCAR and LOCPOT formats

Load Yes
Dump No
Recognized by File prefix CHGCAR or LOCPOT
Interoperation VASP 5.X, VESTA
Always loading title coordinates numbers cell grid cube_data
Derived when loading natom pseudo_numbers

Note

Even though the CHGCAR and LOCPOT files look very similar, they require different conversions to atomic units.

2.4.3. Wavefunction formats (using a Gaussian basis set)

All wavefunction formats share the following behavior

  • In case of a restricted wavefunction, only the alpha orbitals are loaded.

  • In case of an unrestricted wavefunction, both the alpha and beta orbitals are loaded.

  • Some formats also load a permutation and/or a signs attribute. These are generated when loading the file, such that appropriate permutations and sign changes can be applied to convert to the proper HORTON conventions for Gaussian basis functions. These conventions are fixed in the from_file method. This allows you to fix also the order of elements in arrays loaded from another file. For example, you can load an .fchk and a .log file at the same time:

    mol = IOData.from_file('foo.fchk', 'foo.log')
    

    In this case, permutation is deduced from the file foo.fchk but is also applied to reorder the matrix elements loaded from foo.log, for the sake of consistency.

2.4.3.1. The Gaussian .fchk format

Load Yes
Dump No
Recognized by File extension .fchk
Interoperation Gaussian
Always loading title coordinates numbers obasis exp_alpha permutation
energy pseudo_numbers mulliken_charges
loading if present npa_charges esp_charges exp_beta dm_full_mp2 dm_spin_mp2
dm_full_mp3 dm_spin_mp3 dm_full_cc dm_spin_cc dm_full_ci
dm_spin_ci dm_full_scf dm_spin_scf
Derived when loading natom

2.4.3.2. The .molden format

Load Yes
Dump Yes
Recognized by File extension .molden
Interoperation Molpro, Orca, PSI4, Molden
Always loading coordinates numbers obasis exp_alpha signs
loading if present title exp_beta
Derived when loading natom
Required for dumping coordinates numbers obasis exp_alpha
Optional for dumping title exp_beta

2.4.3.3. The .mkl (Molekel) format

Load Yes
Dump No
Recognized by File extension .mkl
Interoperation Molekel, Orca,
Always loading coordinates numbers obasis exp_alpha
loading if present exp_beta signs
Derived when loading natom

2.4.3.4. The .wfn format

Load Yes
Dump No
Recognized by File extension .wfn
Interoperation GAMESS, Gaussian,
Always loading title coordinates numbers obasis exp_alpha
loading if present exp_beta
Derived when loading natom

Note

Only use this format if the program that generated it does not offer any alternatives that HORTON can load. The WFN format has the disadvantage that it cannot represent contractions and therefore expands all orbitals into a decontracted basis. This makes the post-processing less efficient compared to formats that do support contractions of Gaussian functions.

2.4.4. Hamiltonian file formats

2.4.4.1. The Molpro 2012 FCIDUMP format

Load Yes
Dump Yes
Recognized by File name contains FCIDUMP
Interoperation Molpro, PSI4
Always loading lf nelec ms2 one_mo two_mo core_energy
Required for dumping one_mo two_mo
Optional for dumping core_energy nelec ms

2.4.4.2. The Gaussian .log file

Load Yes
Dump No
Recognized by File extension .log
Interoperation Gaussian,
loading if present olp kin na er

In order to let Gaussian print out all the matrix elements (Gaussian integrals), the following commands must be used in the Gaussian input file:

scf(conventional) iop(3/33=5) extralinks=l316 iop(3/27=999)

Just keep in mind that this feature in Gaussian only works for a low number of basis functions. The FCIDUMP files generated with Molpro or PSI4 are more reliable and also have the advantage that all integrals are stored in double precision.

2.4.5. HORTON’s internal file format

The internal HDF5-based format of HORTON is effectively a superset of all formats listed above. Moreover, the user is free to store any additional data not covered by the file formats above. Many (not all) Python data types can dumped into the internal format:

  • int
  • float
  • str
  • Any NumPy array
  • Classes in the HORTON library that have a to_hdf5 and from_hdf5 method. For example: AtomicGridSpec, BeckeMolGrid, Cell, CubicSpline, ESPCost, GBasis, GOBasis, Symmetry, UniformGrid and all classes in the package horton.matrix
  • A dictionary with strings as keys and any mixture of the above data types as values.
Load Yes
Dump Yes
Recognized by File extension .h5
Interoperation Custom scripts. Archiving of data generated with any other code.
loading when present Any attribute
Optional for dumping Any attribute with the right type