5.1. Atoms-in-Molecules (AIM) analysis

5.1.1. Introduction

HORTON supports several real-space atoms-in-molecules (AIM) partitioning schemes. For this purpose, a wavefunction or an electron density on a grid can be loaded from several file formats. An overview of all supported file formats can be found here: Data file formats (input and output). For a given wavefunction or cube file, several AIM partitioning schemes can be used to derive AIM observables, e.g. atomic charge, atomic multipole expansion, etc. An overview of features is given in the following table.

Feature References
Boundary conditions  
0D  
1D  
2D  
3D  
AIM schemes  
Becke [becke1988_multicenter]
Hirshfeld [hirshfeld1977]
Iterative Hirshfeld [bultinck2007]
Iterative Stockholder [lillestolen2008]
Minimal Basis Iterative Stockolder [verstraelen2016]
AIM properties  
Charges  
Spin charges  
Cartesian multipoles  
Pure/harmonic multipoles  
Radial moments  
Dispersion coefficients [tkatchenko2009]
Spherical decomposition  
ESP due to each AIM [becke1988_poisson]
ESP due to pro-atom [becke1988_poisson]
Wiberg bond order  
Kohn-Sham response  
Extra options  
Symmetry analysis  

For almost all the Hirshfeld variants, you must first set up a pro-atom database densities, preferably at the same level of theory used for the molecular computation. The horton-atomdb.py script facilitates setting up such a database.

All of the outputs generated by HORTON are written to HDF5 files (with extension .h5). These are binary (compact) platform-independent files that can be post-processed easily with Python scripts. The horton-hdf2csv.py script can be used to convert (part of) an HDF5 file into the “comma-separated value” (CSV) format, which is supported by most spreadsheet software.

The usage of the tjree scripts (horton-atomdb.py, horton-wpart.py, and horton-hdf2csv.py) will be discussed in horton-atomdb.py – Build a pro-atom database, horton-wpart.py – AIM analysis based on a wavefunction file, and horton-hdf2csv.py – Conversion of HDF5 files to CSV format, respectively. All scripts have a --help flag that prints out their complete list of command-line arguments. The penultimate section, Python interface to the partitioning code, shows how to use the partitioning code through the Python interface. The last section, Frequently asked questions, answers some frequently asked questions regarding AIM partitioning with HORTON.

5.1.2. horton-atomdb.py – Build a pro-atom database

The horton-atomdb.py script should be used in three steps to build a pro-atom database:

  1. Generate input files for isolated atom computations using one of the following programs: Gaussian03/09, Orca, PSI4 or CP2K. The following example generates Gaussian09 input files for hydrogen, carbon, nitrogen and oxygen:

    horton-atomdb.py input g09 1,6-8 template.com
    

    The template.com file is used to generate the input files and will be discussed in detail below. A series of directories is created with input files for the atomic computations, like 001__h_001_q+00, 001__h_002_q-01, 001__h_003_q-02, etc. Optional arguments can be used to control the range of cations and anions, the spin multiplicities, etc. Also a run_g09.sh script is generated to take care of the next step.

    Run the command below to obtain a complete list of all arguments:

    horton-atomdb.py input --help
    
  2. Run the atomic computations by executing the run_PROGRAM.sh script. In this case:

    ./run_g09.sh
    

    Note that the run_PROGRAM.sh scripts assume a default installation of the corresponding software. In the case of non-standard installations or when special calls or environment variables are needed, you must modify this script first. (It will not be overwritten when horton-atomdb.py input ... is executed again.)

  3. Convert the output files of the external programs into a database of spherically averaged pro-atom densities (atoms.h5). Just run:

    ./horton-atomdb.py convert
    

    This script also generates figures of the radial densities and Fukui functions, if matplotlib is installed. In this step, you may use the – grid option, although the default setting should be fine for nearly all cases.

If you remove some directories for atomic computations before or after executing the run_PROGRAM.sh script, the corresponding atoms will not be included in the database. Similarly, you may rerun horton-atomdb.py input ... to generate more input files. In that case run_PROGRAM.sh will only consider the atomic computations that have not been completed yet. In some cases, the run_PROGRAM.sh script needs to be customized, e.g. you may want to use mpirun to run the atomic computations in parallel. When such modifications are made, subsequent runs of horton-atomdb.py input ... will not overwrite the run_PROGRAM.sh script.

5.1.2.1. Template files

A template file is simply an input file for an atomic computation, where the distinguishing parameters (element, charge, ...) are replaced with keys that are recognized by the input generator of horton-atomdb.py. These keys are:

  • ${element}: The element symbol of the atom.
  • ${number}: The atomic number of the atom.
  • ${charge}: The charge of the atom (or cation, or anion).
  • ${mult}: The spin multiplicity of the atom.

For more advanced cases, you may include (parts of) other files with generic keys, e.g. for basis sets that are different for every element:

  • ${file:filename}: This is replaced by the contents of filename.NNN_PPP_MM, where NNN is the atomic number, PPP is the atomic population and MM is the spin multiplicity. These numbers are left-padded with zeros to fix the the length. If a field in the filename is zero, it is considered as a wild card. For example, you may use ${file:basis} in the template file and store a basis set specification for oxygen in the file basis.008_000_00. The basis contained in this file will be used for all cations, anions, and multiplicities of oxygen, because the zeros in the file name act as a wildcard.
  • ${line:filename}: This is comparable to the previous key, except that all replacements are stored in one file. Each (non-empty) line in this file that starts with NNN_PPP_MM has a string that will be filled into the field of the file that corresponds to the specified NNN_PPP_MM.

None of the keys are mandatory, although ${element} (or ${number}), ${charge} and ${mult} must be present to obtain sensible results.

5.1.2.2. Basic template file for Gaussian 03/09

This is a simple template file for atomic computations at the HF/3-21G level:

%chk=atom.chk
#p HF/3-21G scf(xqc)

A random title line

${charge} ${mult}
${element} 0.0 0.0 0.0

Do not forget to include an empty line at the end. Otherwise, Gaussian will complain. The first line %chk=atom.chk is required to write the atomic wavefunction to a file that is needed in the convert step of the horton-atomdb.py script.

5.1.2.3. Advanced template file for Gaussian 03/09

When custom basis sets are specified with the Gen keyword in Gaussian, you have to use keys that include other files. For a database containing H, C and O, you could use the following template:

  • template.com:

    %chk=atom.chk
    #p PBE1PBE/Gen scf(xqc)
    
    A random title line
    
    ${charge} ${mult}
    ${element} 0.0 0.0 0.0
    
    ${file:basis}
    
  • basis.001_000_00:

    H 0
    6-31G(d,p)
    ****
    
  • basis.006_000_00:

    C 0
    6-31G(d,p)
    ****
    
  • basis.008_000_00:

    O     0
    S   6   1.00
       8588.5000000              0.00189515
       1297.2300000              0.0143859
        299.2960000              0.0707320
         87.3771000              0.2400010
         25.6789000              0.5947970
          3.7400400              0.2808020
    SP   3   1.00
         42.1175000              0.1138890              0.0365114
          9.6283700              0.9208110              0.2371530
          2.8533200             -0.00327447             0.8197020
    SP   1   1.00
          0.9056610              1.0000000              1.0000000
    SP   1   1.00
          0.2556110              1.0000000              1.0000000
    SP   1   1.00
          0.0845000              1.0000000              1.0000000
    D   1   1.00
          5.1600000              1.0000000
    D   1   1.00
          1.2920000              1.0000000
    D   1   1.00
          0.3225000              1.0000000
    F   1   1.00
          1.4000000              1.0000000
    ****
    

5.1.2.4. Simple template file for ORCA

The following template file uses the built-in cc_pVQZ basis set of ORCA:

!HF TightSCF

%basis
  Basis cc_pVQZ
end

*xyz ${charge} ${mult}
${element} 0.0 0.0 0.0
*

5.1.2.5. Template file for CP2K

You must use CP2K version 2.4-r12857 (or newer). The computation of pro-atoms with CP2K is more involved because you have to specify the occupation of each subshell. The ATOM program of CP2K does not simply follow the Aufbau rule to assign orbital occupations. At this moment, only the computation of atomic densities with contracted basis sets and pseudopotentials are supported.

The following example can be used to generate a pro-atom database with the elements, O, Na, Al and Si, using the GTH pseudopotential and the MolOpt basis set.

  • template.inp:

    &GLOBAL
      PROJECT ATOM
      PROGRAM_NAME ATOM
    &END GLOBAL
    &ATOM
      ATOMIC_NUMBER ${number}
      ELECTRON_CONFIGURATION (${mult}) CORE ${line:valence.inc}
      CORE none
    
      MAX_ANGULAR_MOMENTUM 1
      &METHOD
         METHOD_TYPE UKS
         &XC
           &XC_FUNCTIONAL PBE
           &END XC_FUNCTIONAL
         &END XC
      &END METHOD
      &POTENTIAL
          PSEUDO_TYPE GTH
          POTENTIAL_FILE_NAME ../../PBE_PSEUDOPOTENTIALS
          POTENTIAL_NAME ${line:ppot.inc}
      &END POTENTIAL
      &PP_BASIS
          BASIS_SET_FILE_NAME ../../BASIS_MOLOPT
          BASIS_TYPE CONTRACTED_GTO
          BASIS_SET DZVP-MOLOPT-SR-GTH
      &END PP_BASIS
      &PRINT
        &BASIS_SET ON
        &END
        &ORBITALS ON
        &END
        &POTENTIAL ON
        &END
      &END
    &END ATOM
    
  • ppot.inc:

    008_000_00 GTH-PBE-q6
    011_000_00 GTH-PBE-q9
    013_000_00 GTH-PBE-q3
    014_000_00 GTH-PBE-q4
    
  • valence.inc:

    008_005_02 1s2 2p1
    008_006_03 1s2 2p2
    008_007_04 1s2 2p3
    008_008_03 1s2 2p4
    008_009_02 1s2 2p5
    008_010_01 1s2 2p6
    
    011_005_02 1s2 2p1
    011_006_03 1s2 2p2
    011_007_04 1s2 2p3
    011_008_03 1s2 2p4
    011_009_02 1s2 2p5
    011_010_01 1s2 2p6
    011_011_02 1s2 2p6 2s1
    011_012_01 1s2 2p6 2s2
    011_013_02 1s2 2p6 2s2 3p1
    
    013_011_02 1s1
    013_012_01 1s2
    013_013_02 1s2 2p1
    013_014_03 1s2 2p2
    
    014_011_02 1s1
    014_012_01 1s2
    014_013_02 1s2 2p1
    014_014_03 1s2 2p2
    

5.1.2.6. Simple template file for PSI4

The following template file uses the BLYP functional and the built-in cc-pvdz basis set of PSI4:

molecule {
 ${charge} ${mult}
 ${element} 0.0 0.0 0.0
}

set {
 basis cc-pvdz
 scf_type df
 guess sad
 molden_write true
 reference uhf
}

energy('blyp')

Note that the flags molden_write true and reference uhf are required. The former instructs the SCF program to write the orbitals, which HORTON picks up to compute atomic densities. The latter is needed because, usually, most of the atomic computations are on open-shell systems. Because PSI4 only writes Molden files for SCF computations, it is not possible to prepare atomic densities with other levels of theory with PSI4.

5.1.3. horton-wpart.py – AIM analysis based on a wavefunction file

The basic usage of horton-wpart.py is as follows:

horton-wpart.py wfn output.h5[:group] {b,h,hi,is,he} [atoms.h5]

This script has four important arguments:

  1. The wfn argument is the wave-function file, like a Gaussian03 or 09 formatted checkpoint file, a Molekel file or a Molden input file.

  2. The output file (HDF5 format) in which the partitioning results are written. Usually, the output file has the .h5 extension. The output file is optionally followed by a colon and a group suffix. This suffix can be used to specify a group in the HDF5 output file where the output is to be stored.

  3. The third argument refers to the partitioning scheme:

  4. The atoms.h5 argument is only needed for the Hirshfeld variants h and hi, not for b, is or mbis. The file atoms.h5 contains a database of pro-atoms and can be generated as explained in horton-atomdb.py – Build a pro-atom database.

This script computes atomic weight functions, and derives all AIM observables that are implemented for that scheme. These results are stored in the specified HDF5 output file.

An elaborate description of all command-line arguments can be obtained from:

horton-wpart.py --help

The integration grid can be tuned with The –grid option.

Note

When a post-Hartree-Fock (post-HF) level is used in Gaussian 03/09 (MP2, MP3, CC or CI), you must add the keyword Density=current to the route section in the Gaussian input file. This is needed to have the corresponding density matrix in the formatted checkpoint file. When such a post-HF density matrix is present, HORTON will load that density matrix instead of the SCF density matrix. Also note that for some levels of theory, no 1RDM is constructed, including MP4, MP5, ZINDO and QCISD(T).

5.1.4. Python interface to the partitioning code

The horton-wpart.py script has a rather intuitive Python interface that allows you to run a more customized analysis. The script data/examples/wpart/becke.py is a simple example that runs a Becke partitioning to compute charges and that writes the charges into a simple text file.

#!/usr/bin/env python

import numpy as np

from horton import *  # pylint: disable=wildcard-import,unused-wildcard-import


# Load the Gaussian output from file
fn_fchk = context.get_fn('test/water_sto3g_hf_g03.fchk')
# Replace the previous line with any other fchk file, e.g. fn_fchk = 'yourfile.fchk'.
mol = IOData.from_file(fn_fchk)

# Partition the density with the Becke scheme
grid = BeckeMolGrid(mol.coordinates, mol.numbers, mol.pseudo_numbers, mode='only')
moldens = mol.obasis.compute_grid_density_dm(mol.get_dm_full(), grid.points)
wpart = BeckeWPart(mol.coordinates, mol.numbers, mol.pseudo_numbers, grid, moldens, local=True)
wpart.do_charges()

# Write the result to a file
np.savetxt('charges.txt', wpart['charges'])


# CODE BELOW IS FOR horton-regression-test.py ONLY. IT IS NOT PART OF THE EXAMPLE.
rt_results = {'charges': wpart['charges']}
rt_atols = {'charges': 1e-4}
# BEGIN AUTOGENERATED CODE. DO NOT CHANGE MANUALLY.
rt_previous = {
    'charges': np.array([-0.1353791863255509, 0.067677948336146487, 0.067657348711337439]),
}

The unit tests in the source code contain many small examples that can be used as a starting point for similar scripts. These unit tests can be found in horton/part/test/test_wpart.py.

5.1.5. Frequently asked questions

Which atoms-in-molecules (AIM) partitioning scheme should I use?

There is no single partitioning scheme that is most suitable for any application. Nevertheless, some people claim that only one AIM scheme should be used above all others, especially in the QTAIM and electron density communities. Try to stay away from such fanboyism, as it has little to do with good science.

In practice, the choice depends on the purposes you have in mind. You should try to select a scheme that shows some desirable behavior. Typically, you would like to have a compromise between some of the following features:

  • Numerical stability
  • Robustness with respect to conformational changes
  • Uniqueness of the result
  • Computational efficiency
  • Mathematical elegance
  • Simplicity
  • Linearity in the density (matrix)
  • Chemical transferability
  • Applicable to a broad range of systems
  • Applicable to periodic/isolated systems
  • Applicable to a broad range of electronic structure methods/implementations
  • Accuracy of electrostatic interactions with a limited multipole expansion of the atoms
  • Good (empirical) correlations between some AIM property with some experimental observable
  • ...

It goes beyond the scope of this FAQ to describe how each partitioning scheme (implemented in HORTON or not) performs for these features. Some of these features are very difficult to assess and, therefore, are subject to intense debate in the literature.

Regarding our own work, the following papers are directly related to this question:

  • [verstraelen2011a] “Assessment of Atomic Charge Models for Gas-Phase Computations on Polypeptides”
  • [verstraelen2012a] “The conformational sensitivity of iterative stockholder partitioning schemes”
  • [verstraelen2013] “Hirshfeld-E Partitioning: AIM Charges with an Improved Trade-off between Robustness and Accurate Electrostatics”
  • [verstraelen2016] “Minimal Basis Iterative Stockholder: Atoms in Molecules for Force-Field Development”

The following are also related to this question, but the list is far from complete:

  • [bultinck2007] “Critical analysis and extension of the Hirshfeld atoms in molecules”
  • [bultinck2007b] “Uniqueness and basis set dependence of iterative Hirshfeld charges”

If you have more suggestions, please drop me a note: Toon.Verstraelen@UGent.be.

What is the recommended level of theory for computing the pro-atom databses?

This question is relevant to the following methods implemented in HORTON: Hirshfeld and Hirshfeld-I, which require a pro-atoms database. They are referred to in this answer as Hirshfeld-like schemes.

In principle, you are free to choose any level of theory. In practice, several papers tend to be consistent in the level of theory (and basis set) that is used for the molecular and pro-atomic computations. See for example: [bultinck2007], [verstraelen2009], [verstraelen2011a], [verstraelen2011b], [vanduyfhuys2012], [verstraelen2012a], [verstraelen2012b], [verstraelen2013].

One motivation for this consistency is that Hirshfeld-like partitioning schemes yield atoms-in-molecules that are maximally similar to the pro-atoms. [nalewajski2000] (Technically speaking, the sum of the Kullback-Leibler divergence between the atomic and pro-atomic densities for all the atoms is minimized.) This principle is (or should be) one of the reasons that Hirshfeld-like charges are somewhat transferable between chemically similar atoms. You could hope that the Kullback-Leibler divergence is easier to minimize when the molecular and pro-atomic densities are computed as consistently as possible. Thus, using the same level of theory and basis set is advised.

Another motivation is that such consistency may (to some degree) result in, cancellation of errors, when comparing Hirshfeld-like charges computed at different levels of theory. For example, it is found that Hirshfeld-I charges have only a small basis set dependency based on [bultinck2007b]. In this work, the molecular and pro-atomic densities were computed consistently.

At the end, it can also be argued that without consistency in the level of theory, there are so many possible combinations that it becomes impossible to make a well-motivated choice. Then again, this is a problem that computational chemists usually embrace with open arms hoping that they will find a unique combination of levels of theory that turns out to be useful.

How can one generate Cube files with CP2K?

Electron density cube files can be generated with CP2K by adding the following section to the CP2K input file: E_DENSITY_CUBE.