5.1. Atoms-in-Molecules (AIM) analysis¶

5.1.1. Introduction¶

HORTON supports several real-space atoms-in-molecules (AIM) partitioning schemes. For this purpose, a wavefunction or an electron density on a grid can be loaded from several file formats. An overview of all supported file formats can be found here: Data file formats (input and output). For a given wavefunction or cube file, several AIM partitioning schemes can be used to derive AIM observables, e.g. atomic charge, atomic multipole expansion, etc.

HORTON supports two different approaches to perform the partitioning: WPart (based on wavefunction files) and CPart (based on cube files). The WPart and CPart implementations are built on the same algorithms but use different types of grids for numerical integration. An overview of both implementations is given by the following table:

Feature	WPart	CPart	References
Command-line script	`horton-wpart.py`	`horton-cpart.py`
Boundary conditions
0D	X	X
1D		X
2D		X
3D		X
AIM schemes
Becke	X		[becke1988_multicenter]
Hirshfeld	X	X	[hirshfeld1977]
Iterative Hirshfeld	X	X	[bultinck2007]
Iterative Stockholder	X		[lillestolen2008]
Extended Hirshfeld	X	X	[verstraelen2013]
AIM properties
Charges	X	X
Spin charges	X
Cartesian multipoles	X	X
Pure/harmonic multipoles	X	X
Radial moments	X	X
Dispersion coefficients	X	X	[tkatchenko2009]
Spherical decomposition	X
ESP due to each AIM	X		[becke1988_poisson]
ESP due to pro-atom	X	X	[becke1988_poisson]
Wiberg bond order	X
Kohn-Sham response	X
Extra options
Symmetry analysis		X

Note that Gaussian cube files can be generated from other programs like CPMD, ADF, Siesta, Crystal, etc. The output of all these programs should be compatible with HORTON. Moreover, both all-electron and pseudo-potential wavefunctions can be partitioned with HORTON. However, very fine grids are required for all-electron partitioning with CPart.

For almost all the Hirshfeld variants, you must first set up a pro-atom database densities, preferably at the same level of theory used for the molecular computation. The horton-atomdb.py script facilitates setting up such a database.

All of the outputs generated by HORTON are written to HDF5 files (with extension .h5). These are binary (compact) platform-independent files that can be post-processed easily with Python scripts. The horton-hdf2csv.py script can be used to convert (part of) an HDF5 file into the “comma-separated value” (CSV) format, which is supported by most spreadsheet software.

The usage of the four scripts (horton-atomdb.py, horton-wpart.py, horton-cpart.py and horton-hdf2csv.py) will be discussed in horton-atomdb.py – Build a pro-atom database, horton-wpart.py – AIM analysis based on a wavefunction file, horton-cpart.py – AIM analysis based on a cube file, and horton-hdf2csv.py – Conversion of HDF5 files to CSV format, respectively. All scripts have a --help flag that prints out their complete list of command-line arguments. The penultimate section, Python interface to the partitioning code, shows how to use the partitioning code through the Python interface. The last section, Frequently asked questions, answers some frequently asked questions regarding AIM partitioning with HORTON.

5.1.2. `horton-atomdb.py` – Build a pro-atom database¶

The horton-atomdb.py script should be used in three steps to build a pro-atom database:

Generate input files for isolated atom computations using one of the following programs: Gaussian03/09, Orca, PSI4 or CP2K. The following example generates Gaussian09 input files for hydrogen, carbon, nitrogen and oxygen:
```
horton-atomdb.py input g09 1,6-8 template.com
```
The template.com file is used to generate the input files and will be discussed in detail below. A series of directories is created with input files for the atomic computations, like 001__h_001_q+00, 001__h_002_q-01, 001__h_003_q-02, etc. Optional arguments can be used to control the range of cations and anions, the spin multiplicities, etc. Also a run_g09.sh script is generated to take care of the next step.

Run the command below to obtain a complete list of all arguments:
```
horton-atomdb.py input --help
```
Run the atomic computations by executing the run_PROGRAM.sh script. In this case:
```
./run_g09.sh
```
Note that the run_PROGRAM.sh scripts assume a default installation of the corresponding software. In the case of non-standard installations or when special calls or environment variables are needed, you must modify this script first. (It will not be overwritten when horton-atomdb.py input ... is executed again.)
Convert the output files of the external programs into a database of spherically averaged pro-atom densities (atoms.h5). Just run:
```
./horton-atomdb.py convert
```
This script also generates figures of the radial densities and Fukui functions, if matplotlib is installed. In this step, you may use the – grid option, although the default setting should be fine for nearly all cases.

If you remove some directories for atomic computations before or after executing the run_PROGRAM.sh script, the corresponding atoms will not be included in the database. Similarly, you may rerun horton-atomdb.py input ... to generate more input files. In that case run_PROGRAM.sh will only consider the atomic computations that have not been completed yet. In some cases, the run_PROGRAM.sh script needs to be customized, e.g. you may want to use mpirun to run the atomic computations in parallel. When such modifications are made, subsequent runs of horton-atomdb.py input ... will not overwrite the run_PROGRAM.sh script.

5.1.2.1. Template files¶

A template file is simply an input file for an atomic computation, where the distinguishing parameters (element, charge, …) are replaced with keys that are recognized by the input generator of horton-atomdb.py. These keys are:

${element}: The element symbol of the atom.
${number}: The atomic number of the atom.
${charge}: The charge of the atom (or cation, or anion).
${mult}: The spin multiplicity of the atom.

For more advanced cases, you may include (parts of) other files with generic keys, e.g. for basis sets that are different for every element:

${file:filename}: This is replaced by the contents of filename.NNN_PPP_MM, where NNN is the atomic number, PPP is the atomic population and MM is the spin multiplicity. These numbers are left-padded with zeros to fix the the length. If a field in the filename is zero, it is considered as a wild card. For example, you may use ${file:basis} in the template file and store a basis set specification for oxygen in the file basis.008_000_00. The basis contained in this file will be used for all cations, anions, and multiplicities of oxygen, because the zeros in the file name act as a wildcard.
${line:filename}: This is comparable to the previous key, except that all replacements are stored in one file. Each (non-empty) line in this file that starts with NNN_PPP_MM has a string that will be filled into the field of the file that corresponds to the specified NNN_PPP_MM.

None of the keys are mandatory, although ${element} (or ${number}), ${charge} and ${mult} must be present to obtain sensible results.

5.1.2.2. Basic template file for Gaussian 03/09¶

This is a simple template file for atomic computations at the HF/3-21G level:

%chk=atom.chk
#p HF/3-21G scf(xqc)

A random title line

${charge} ${mult}
${element} 0.0 0.0 0.0

Do not forget to include an empty line at the end. Otherwise, Gaussian will complain. The first line %chk=atom.chk is required to write the atomic wavefunction to a file that is needed in the convert step of the horton-atomdb.py script.

5.1.2.3. Advanced template file for Gaussian 03/09¶

When custom basis sets are specified with the Gen keyword in Gaussian, you have to use keys that include other files. For a database containing H, C and O, you could use the following template:

template.com:

%chk=atom.chk
#p PBE1PBE/Gen scf(xqc)

A random title line

${charge} ${mult}
${element} 0.0 0.0 0.0

${file:basis}

basis.001_000_00:
```
H 0
6-31G(d,p)
****
```
basis.006_000_00:
```
C 0
6-31G(d,p)
****
```

basis.008_000_00:

O     0
S   6   1.00
   8588.5000000              0.00189515
   1297.2300000              0.0143859
    299.2960000              0.0707320
     87.3771000              0.2400010
     25.6789000              0.5947970
      3.7400400              0.2808020
SP   3   1.00
     42.1175000              0.1138890              0.0365114
      9.6283700              0.9208110              0.2371530
      2.8533200             -0.00327447             0.8197020
SP   1   1.00
      0.9056610              1.0000000              1.0000000
SP   1   1.00
      0.2556110              1.0000000              1.0000000
SP   1   1.00
      0.0845000              1.0000000              1.0000000
D   1   1.00
      5.1600000              1.0000000
D   1   1.00
      1.2920000              1.0000000
D   1   1.00
      0.3225000              1.0000000
F   1   1.00
      1.4000000              1.0000000
****

5.1.2.4. Simple template file for ORCA¶

The following template file uses the built-in cc_pVQZ basis set of ORCA:

!HF TightSCF

%basis
  Basis cc_pVQZ
end

*xyz ${charge} ${mult}
${element} 0.0 0.0 0.0
*

5.1.2.5. Template file for CP2K¶

You must use CP2K version 2.4-r12857 (or newer). The computation of pro-atoms with CP2K is more involved because you have to specify the occupation of each subshell. The ATOM program of CP2K does not simply follow the Aufbau rule to assign orbital occupations. At this moment, only the computation of atomic densities with contracted basis sets and pseudopotentials are supported.

The following example can be used to generate a pro-atom database with the elements, O, Na, Al and Si, using the GTH pseudopotential and the MolOpt basis set.

template.inp:

&GLOBAL
  PROJECT ATOM
  PROGRAM_NAME ATOM
&END GLOBAL
&ATOM
  ATOMIC_NUMBER ${number}
  ELECTRON_CONFIGURATION (${mult}) CORE ${line:valence.inc}
  CORE none

  MAX_ANGULAR_MOMENTUM 1
  &METHOD
     METHOD_TYPE UKS
     &XC
       &XC_FUNCTIONAL PBE
       &END XC_FUNCTIONAL
     &END XC
  &END METHOD
  &POTENTIAL
      PSEUDO_TYPE GTH
      POTENTIAL_FILE_NAME ../../PBE_PSEUDOPOTENTIALS
      POTENTIAL_NAME ${line:ppot.inc}
  &END POTENTIAL
  &PP_BASIS
      BASIS_SET_FILE_NAME ../../BASIS_MOLOPT
      BASIS_TYPE CONTRACTED_GTO
      BASIS_SET DZVP-MOLOPT-SR-GTH
  &END PP_BASIS
  &PRINT
    &BASIS_SET ON
    &END
    &ORBITALS ON
    &END
    &POTENTIAL ON
    &END
  &END
&END ATOM

ppot.inc:

008_000_00 GTH-PBE-q6
011_000_00 GTH-PBE-q9
013_000_00 GTH-PBE-q3
014_000_00 GTH-PBE-q4

valence.inc:

008_005_02 1s2 2p1
008_006_03 1s2 2p2
008_007_04 1s2 2p3
008_008_03 1s2 2p4
008_009_02 1s2 2p5
008_010_01 1s2 2p6

011_005_02 1s2 2p1
011_006_03 1s2 2p2
011_007_04 1s2 2p3
011_008_03 1s2 2p4
011_009_02 1s2 2p5
011_010_01 1s2 2p6
011_011_02 1s2 2p6 2s1
011_012_01 1s2 2p6 2s2
011_013_02 1s2 2p6 2s2 3p1

013_011_02 1s1
013_012_01 1s2
013_013_02 1s2 2p1
013_014_03 1s2 2p2

014_011_02 1s1
014_012_01 1s2
014_013_02 1s2 2p1
014_014_03 1s2 2p2

5.1.2.6. Simple template file for PSI4¶

The following template file uses the BLYP functional and the built-in cc-pvdz basis set of PSI4:

molecule {
 ${charge} ${mult}
 ${element} 0.0 0.0 0.0
}

set {
 basis cc-pvdz
 scf_type df
 guess sad
 molden_write true
 reference uhf
}

energy('blyp')

Note that the flags molden_write true and reference uhf are required. The former instructs the SCF program to write the orbitals, which HORTON picks up to compute atomic densities. The latter is needed because, usually, most of the atomic computations are on open-shell systems. Because PSI4 only writes Molden files for SCF computations, it is not possible to prepare atomic densities with other levels of theory with PSI4.

5.1.3. `horton-wpart.py` – AIM analysis based on a wavefunction file¶

The basic usage of horton-wpart.py is as follows:

horton-wpart.py wfn output.h5[:group] {b,h,hi,is,he} [atoms.h5]

This script has four important arguments:

The wfn argument is the wave-function file, like a Gaussian03 or 09 formatted checkpoint file, a Molekel file or a Molden input file.
The output file (HDF5 format) in which the partitioning results are written. Usually, the output file has the .h5 extension. The output file is optionally followed by a colon and a group suffix. This suffix can be used to specify a group in the HDF5 output file where the output is to be stored.
The third argument refers to the partitioning scheme:
- b: Becke partitioning. [becke1988_multicenter]
- h: Hirshfeld partitioning. [hirshfeld1977]
- hi: Iterative Hirshfeld partitioning. [bultinck2007]
- is: Iterative Stockholder partitioning. [lillestolen2008]
- he: Extended Hirshfeld partitioning. [verstraelen2013]
The atoms.h5 argument is only needed for the Hirshfeld variants, not for b and is. The file atoms.h5 contains a database of pro-atoms and can be generated as explained in horton-atomdb.py – Build a pro-atom database.

This script computes atomic weight functions, and derives all AIM observables that are implemented for that scheme. These results are stored in the specified HDF5 output file.

An elaborate description of all command-line arguments can be obtained from:

horton-wpart.py --help

The integration grid can be tuned with The --grid option.

Note

When a post-Hartree-Fock (post-HF) level is used in Gaussian 03/09 (MP2, MP3, CC or CI), you must add the keyword Density=current to the route section in the Gaussian input file. This is needed to have the corresponding density matrix in the formatted checkpoint file. When such a post-HF density matrix is present, HORTON will load that density matrix instead of the SCF density matrix. Also note that for some levels of theory, no 1RDM is constructed, including MP4, MP5, ZINDO and QCISD(T).

5.1.4. `horton-cpart.py` – AIM analysis based on a cube file¶

The basic usage of horton-cpart.py is as follows:

horton-cpart.py cube output.h5:group {h,hi,he} atoms.h5

The script takes three arguments:

The cube argument can be a Gaussian cube file (possibly generated with another program, e.g. CP2K). Some information must be added to the cube file about the effective core charges that were used. This information is needed to compute the correct net charges. (This is not only relevant for the final output, but also for constructing the proper pro-atoms. When these numbers are not set correctly, you’ll get garbage output.)

For example, given an original cube file with the following header:

-Quickstep-
 ELECTRON DENSITY
  0.000000    0.000000    0.000000
  0.183933    0.000000    0.000000
  0.000000    0.187713    0.000000
  0.000000    0.000000    0.190349
  0.000000    0.000000    3.677294    0.000000
  0.000000    4.966200   10.401166    0.000000
  0.000000    0.000000   10.401166    0.000000
  0.000000    4.966200    3.677294    0.000000
  0.000000    2.483100    0.000000    2.243351
  0.000000    7.449300    0.000000   13.174925
  0.000000    2.483100    4.554391    4.206115
  0.000000    2.483100    9.524087    4.206115
  0.000000    7.449300    9.524087   11.212180
  0.000000    7.449300    4.554391   11.212180
  0.000000    4.966200    7.039230    7.709138
  0.000000    0.000000    7.039230    7.709138
  0.000000    2.483100    2.971972    1.606588
  0.000000    2.483100   11.106506    1.606588
  0.000000    7.449300   11.106506   13.811687
  0.000000    7.449300    2.971972   13.811687
  0.000000    2.483100    7.039230    5.959157
  0.000000    7.449300    7.039230    9.459119

Now consider the case that 6 effective electrons were used for oxygen and 4 effective electrons for silicon. Then, the second column in the atom lines has to be set to the effective core charge. (This number is not used in the cube format.) In this case, you get:

-Quickstep-
 ELECTRON DENSITY
  0.000000    0.000000    0.000000
  0.183933    0.000000    0.000000
  0.000000    0.187713    0.000000
  0.000000    0.000000    0.190349
  6.0         0.000000    3.677294    0.000000
  6.0         4.966200   10.401166    0.000000
  6.0         0.000000   10.401166    0.000000
  6.0         4.966200    3.677294    0.000000
  6.0         2.483100    0.000000    2.243351
  6.0         7.449300    0.000000   13.174925
  6.0         2.483100    4.554391    4.206115
  6.0         2.483100    9.524087    4.206115
  6.0         7.449300    9.524087   11.212180
  6.0         7.449300    4.554391   11.212180
  6.0         4.966200    7.039230    7.709138
  6.0         0.000000    7.039230    7.709138
  4.0         2.483100    2.971972    1.606588
  4.0         2.483100   11.106506    1.606588
  4.0         7.449300   11.106506   13.811687
  4.0         7.449300    2.971972   13.811687
  4.0         2.483100    7.039230    5.959157
  4.0         7.449300    7.039230    9.459119

The output file (HDF5 format) in which the partitioning results are written. Usually, the output file has the extension .h5. The output file is optionally followed by a colon and a group suffix. This suffix can be used to specify a group in the HDF5 output file where the output is stored.
The third argument refers to the partitioning scheme:
- h: Hirshfeld partitioning. [hirshfeld1977]
- hi: Iterative Hirshfeld partitioning. [bultinck2007]
- he: Extended Hirshfeld partitioning. [verstraelen2013]
The fourth argument is the pro-atom database generated with horton-atomdb.py script.

The horton-cpart.py script computes atomic weight functions, and then derives all AIM observables that are implemented for that scheme. These results are stored in the specifief HDF5 output file.

The horton-cpart.py script is somewhat experimental. Always make sure that the numbers converge with an increase in the number of grid points. You may need the following options to control the efficiency of the program:

--compact COMPACT. Automatically determine cutoff radii for the pro-atoms, where COMPACT is the maximum number of electrons lost in the tail after the cutoff radius. 0.001 is typically a reasonable value for COMPACT. The pro-atoms are renormalized after setting the cutoff radii. One cutoff radius is defined per element. This implies that the tail of the most diffuse anion determines the cutoff radius when the --compact option is used.
--greedy. This enables a more memory-hungry version of the Iterative and Extended Hirshfeld algorithms that runs considerably faster. This becomes unfeasible for systems with huge unit cells.
--stride STRIDE. The STRIDE parameter controls the sub-sampling of the cube file prior to the partitioning. It is 1 by default.

To get a complete list of command-line options, run:

horton-cpart.py --help

5.1.5. Python interface to the partitioning code¶

The horton-wpart.py and horton-cpart.py scripts have a rather intuitive Python interface that allows you to run a more customized analysis. The script data/examples/wpart/becke.py is a simple example that runs a Becke partitioning to compute charges and that writes the charges into a simple text file.

#!/usr/bin/env python

import numpy as np
from horton import *

# Load the Gaussian output from file
fn_fchk = context.get_fn('test/water_sto3g_hf_g03.fchk')
# Replace the previous line with any other fchk file, e.g. fn_fchk = 'yourfile.fchk'.
mol = IOData.from_file(fn_fchk)

# Partition the density with the Becke scheme
grid = BeckeMolGrid(mol.coordinates, mol.numbers, mol.pseudo_numbers, mode='only')
moldens = mol.obasis.compute_grid_density_dm(mol.get_dm_full(), grid.points)
wpart = BeckeWPart(mol.coordinates, mol.numbers, mol.pseudo_numbers, grid, moldens, local=True)
wpart.do_charges()

# Write the result to a file
np.savetxt('charges.txt', wpart['charges'])

The unit tests in the source code contain many small examples that can be used as a starting point for similar scripts. These unit tests can be found in horton/part/test/test_wpart.py and horton/part/test/test_cpart.py.

5.1.6. Frequently asked questions¶

Which atoms-in-molecules (AIM) partitioning scheme should I use?

There is no single partitioning scheme that is most suitable for any application. Nevertheless, some people claim that only one AIM scheme should be used above all others, especially in the QTAIM and electron density communities. Try to stay away from such fanboyism, as it has little to do with good science.

In practice, the choice depends on the purposes you have in mind. You should try to select a scheme that shows some desirable behavior. Typically, you would like to have a compromise between some of the following features:

Numerical stability

Robustness with respect to conformational changes

Uniqueness of the result

Computational efficiency

Mathematical elegance

Simplicity

Linearity in the density (matrix)

Chemical transferability

Applicable to a broad range of systems

Applicable to periodic/isolated systems

Applicable to a broad range of electronic structure methods/implementations

Accuracy of electrostatic interactions with a limited multipole expansion of the atoms

Good (empirical) correlations between some AIM property with some experimental observable

…

It goes beyond the scope of this FAQ to describe how each partitioning scheme (implemented in HORTON or not) performs for these features. Some of these features are very difficult to assess and, therefore, are subject to intense debate in the literature.

Regarding our own work, the following papers are directly related to this question:

[verstraelen2011a] “Assessment of Atomic Charge Models for Gas-Phase Computations on Polypeptides”

[verstraelen2012a] “The conformational sensitivity of iterative stockholder partitioning schemes”

[verstraelen2013] “Hirshfeld-E Partitioning: AIM Charges with an Improved Trade-off between Robustness and Accurate Electrostatics”

The following are also related to this question, but the list is far from complete:

[bultinck2007] “Critical analysis and extension of the Hirshfeld atoms in molecules”

[bultinck2007b] “Uniqueness and basis set dependence of iterative Hirshfeld charges”

If you have more suggestions, please drop me a note: Toon.Verstraelen@UGent.be.

What is the recommended level of theory for computing the pro-atom databses?

This question is relevant to the following methods implemented in HORTON: Hirshfeld, Hirshfeld-I and Hirshfeld-E, which require a pro-atoms database. They are referred to in this answer as Hirshfeld-like schemes.

In principle, you are free to choose any level of theory. In practice, several papers tend to be consistent in the level of theory (and basis set) that is used for the molecular and pro-atomic computations. See for example: [bultinck2007], [verstraelen2009], [verstraelen2011a], [verstraelen2011b], [vanduyfhuys2012], [verstraelen2012a], [verstraelen2012b], [verstraelen2013].

One motivation for this consistency is that Hirshfeld-like partitioning schemes yield atoms-in-molecules that are maximally similar to the pro-atoms. [nalewajski2000] (Technically speaking, the sum of the Kullback-Leibler divergence between the atomic and pro-atomic densities for all the atoms is minimized.) This principle is (or should be) one of the reasons that Hirshfeld-like charges are somewhat transferable between chemically similar atoms. You could hope that the Kullback-Leibler divergence is easier to minimize when the molecular and pro-atomic densities are computed as consistently as possible. Thus, using the same level of theory and basis set is advised.

Another motivation is that such consistency may (to some degree) result in, cancellation of errors, when comparing Hirshfeld-like charges computed at different levels of theory. For example, it is found that Hirshfeld-I charges have only a small basis set dependency based on [bultinck2007b]. In this work, the molecular and pro-atomic densities were computed consistently.

At the end, it can also be argued that without consistency in the level of theory, there are so many possible combinations that it becomes impossible to make a well-motivated choice. Then again, this is a problem that computational chemists usually embrace with open arms hoping that they will find a unique combination of levels of theory that turns out to be useful.

How can one generate Cube files with CP2K?

Electron density cube files can be generated with CP2K by adding the following section to the CP2K input file: E_DENSITY_CUBE.