5.1. Atoms-in-Molecules (AIM) analysis¶
5.1.1. Introduction¶
HORTON supports several real-space atoms-in-molecules (AIM) partitioning schemes. For this purpose, a wavefunction or an electron density on a grid can be loaded from several file formats. An overview of all supported file formats can be found here: Data file formats (input and output). For a given wavefunction or cube file, several AIM partitioning schemes can be used to derive AIM observables, e.g. atomic charge, atomic multipole expansion, etc.
HORTON supports two different approaches to perform the partitioning: WPart (based on wavefunction files) and CPart (based on cube files). The WPart and CPart implementations are built on the same algorithms but use different types of grids for numerical integration. An overview of both implementations is given by the following table:
Feature | WPart | CPart | References |
---|---|---|---|
Command-line script | horton-wpart.py |
horton-cpart.py |
|
Boundary conditions | |||
0D | X | X | |
1D | X | ||
2D | X | ||
3D | X | ||
AIM schemes | |||
Becke | X | [becke1988_multicenter] | |
Hirshfeld | X | X | [hirshfeld1977] |
Iterative Hirshfeld | X | X | [bultinck2007] |
Iterative Stockholder | X | [lillestolen2008] | |
Extended Hirshfeld | X | X | [verstraelen2013] |
AIM properties | |||
Charges | X | X | |
Spin charges | X | ||
Cartesian multipoles | X | X | |
Pure/harmonic multipoles | X | X | |
Radial moments | X | X | |
Dispersion coefficients | X | X | [tkatchenko2009] |
Spherical decomposition | X | ||
ESP due to each AIM | X | [becke1988_poisson] | |
ESP due to pro-atom | X | X | [becke1988_poisson] |
Wiberg bond order | X | ||
Kohn-Sham response | X | ||
Extra options | |||
Symmetry analysis | X |
Note that Gaussian cube files can be generated from other programs like CPMD, ADF, Siesta, Crystal, etc. The output of all these programs should be compatible with HORTON. Moreover, both all-electron and pseudo-potential wavefunctions can be partitioned with HORTON. However, very fine grids are required for all-electron partitioning with CPart.
For almost all the Hirshfeld variants, you must first set up a pro-atom database
densities, preferably at the same level of theory used for the molecular
computation. The horton-atomdb.py
script facilitates setting up such a
database.
All of the outputs generated by HORTON are written to HDF5 files (with extension .h5
). These are
binary (compact) platform-independent files that can be post-processed easily
with Python scripts. The horton-hdf2csv.py
script can be used to convert
(part of) an HDF5 file into the “comma-separated value” (CSV) format, which is
supported by most spreadsheet software.
The usage of the four scripts (horton-atomdb.py
, horton-wpart.py
,
horton-cpart.py
and horton-hdf2csv.py
) will be discussed in horton-atomdb.py – Build a pro-atom database,
horton-wpart.py – AIM analysis based on a wavefunction file, horton-cpart.py – AIM analysis based on a cube file, and horton-hdf2csv.py – Conversion of HDF5 files to CSV format, respectively. All scripts have
a --help
flag that prints out their complete list of command-line arguments. The penultimate
section, Python interface to the partitioning code, shows how to use the partitioning code through the
Python interface. The last section, Frequently asked questions, answers some frequently asked
questions regarding AIM partitioning with HORTON.
5.1.2. horton-atomdb.py
– Build a pro-atom database¶
The horton-atomdb.py
script should be used in three steps to build a pro-atom database:
Generate input files for isolated atom computations using one of the following programs: Gaussian03/09, Orca, PSI4 or CP2K. The following example generates Gaussian09 input files for hydrogen, carbon, nitrogen and oxygen:
horton-atomdb.py input g09 1,6-8 template.com
The
template.com
file is used to generate the input files and will be discussed in detail below. A series of directories is created with input files for the atomic computations, like001__h_001_q+00
,001__h_002_q-01
,001__h_003_q-02
, etc. Optional arguments can be used to control the range of cations and anions, the spin multiplicities, etc. Also arun_g09.sh
script is generated to take care of the next step.Run the command below to obtain a complete list of all arguments:
horton-atomdb.py input --help
Run the atomic computations by executing the
run_PROGRAM.sh
script. In this case:./run_g09.sh
Note that the
run_PROGRAM.sh
scripts assume a default installation of the corresponding software. In the case of non-standard installations or when special calls or environment variables are needed, you must modify this script first. (It will not be overwritten whenhorton-atomdb.py input ...
is executed again.)Convert the output files of the external programs into a database of spherically averaged pro-atom densities (
atoms.h5
). Just run:./horton-atomdb.py convert
This script also generates figures of the radial densities and Fukui functions, if
matplotlib
is installed. In this step, you may use the – grid option, although the default setting should be fine for nearly all cases.
If you remove some directories for atomic computations before or after
executing the run_PROGRAM.sh
script, the corresponding atoms will not be
included in the database. Similarly, you may rerun horton-atomdb.py input ...
to generate more input files. In that case run_PROGRAM.sh
will only consider
the atomic computations that have not been completed yet. In some cases, the
run_PROGRAM.sh
script needs to be customized, e.g. you may want to use
mpirun
to run the atomic computations in parallel. When such modifications
are made, subsequent runs of horton-atomdb.py input ...
will not overwrite
the run_PROGRAM.sh
script.
5.1.2.1. Template files¶
A template file is simply an input file for an atomic computation, where the
distinguishing parameters (element, charge, …) are replaced with keys that are
recognized by the input generator of horton-atomdb.py
. These keys are:
${element}
: The element symbol of the atom.${number}
: The atomic number of the atom.${charge}
: The charge of the atom (or cation, or anion).${mult}
: The spin multiplicity of the atom.
For more advanced cases, you may include (parts of) other files with generic keys, e.g. for basis sets that are different for every element:
${file:filename}
: This is replaced by the contents offilename.NNN_PPP_MM
, whereNNN
is the atomic number,PPP
is the atomic population andMM
is the spin multiplicity. These numbers are left-padded with zeros to fix the the length. If a field in the filename is zero, it is considered as a wild card. For example, you may use${file:basis}
in the template file and store a basis set specification for oxygen in the filebasis.008_000_00
. The basis contained in this file will be used for all cations, anions, and multiplicities of oxygen, because the zeros in the file name act as a wildcard.${line:filename}
: This is comparable to the previous key, except that all replacements are stored in one file. Each (non-empty) line in this file that starts withNNN_PPP_MM
has a string that will be filled into the field of the file that corresponds to the specifiedNNN_PPP_MM
.
None of the keys are mandatory, although ${element}
(or ${number}
),
${charge}
and ${mult}
must be present to obtain sensible results.
5.1.2.2. Basic template file for Gaussian 03/09¶
This is a simple template file for atomic computations at the HF/3-21G level:
%chk=atom.chk
#p HF/3-21G scf(xqc)
A random title line
${charge} ${mult}
${element} 0.0 0.0 0.0
Do not forget to include an empty line at the end. Otherwise, Gaussian will
complain. The first line %chk=atom.chk
is required to write the atomic
wavefunction to a file that is needed in the convert
step of
the horton-atomdb.py
script.
5.1.2.3. Advanced template file for Gaussian 03/09¶
When custom basis sets are specified with the Gen
keyword in Gaussian, you
have to use keys that include other files. For a database containing H, C and O, you
could use the following template:
template.com
:%chk=atom.chk #p PBE1PBE/Gen scf(xqc) A random title line ${charge} ${mult} ${element} 0.0 0.0 0.0 ${file:basis}
basis.001_000_00
:H 0 6-31G(d,p) ****
basis.006_000_00
:C 0 6-31G(d,p) ****
basis.008_000_00
:O 0 S 6 1.00 8588.5000000 0.00189515 1297.2300000 0.0143859 299.2960000 0.0707320 87.3771000 0.2400010 25.6789000 0.5947970 3.7400400 0.2808020 SP 3 1.00 42.1175000 0.1138890 0.0365114 9.6283700 0.9208110 0.2371530 2.8533200 -0.00327447 0.8197020 SP 1 1.00 0.9056610 1.0000000 1.0000000 SP 1 1.00 0.2556110 1.0000000 1.0000000 SP 1 1.00 0.0845000 1.0000000 1.0000000 D 1 1.00 5.1600000 1.0000000 D 1 1.00 1.2920000 1.0000000 D 1 1.00 0.3225000 1.0000000 F 1 1.00 1.4000000 1.0000000 ****
5.1.2.4. Simple template file for ORCA¶
The following template file uses the built-in cc_pVQZ
basis set of ORCA:
!HF TightSCF
%basis
Basis cc_pVQZ
end
*xyz ${charge} ${mult}
${element} 0.0 0.0 0.0
*
5.1.2.5. Template file for CP2K¶
You must use CP2K
version 2.4-r12857 (or newer). The computation of pro-atoms
with CP2K
is more involved because you have to specify the occupation
of each subshell. The ATOM
program of CP2K
does not simply follow the
Aufbau rule to assign orbital occupations. At this moment, only the computation of
atomic densities with contracted basis sets and pseudopotentials are supported.
The following example can be used to generate a pro-atom database with the elements, O, Na, Al and Si, using the GTH pseudopotential and the MolOpt basis set.
template.inp
:&GLOBAL PROJECT ATOM PROGRAM_NAME ATOM &END GLOBAL &ATOM ATOMIC_NUMBER ${number} ELECTRON_CONFIGURATION (${mult}) CORE ${line:valence.inc} CORE none MAX_ANGULAR_MOMENTUM 1 &METHOD METHOD_TYPE UKS &XC &XC_FUNCTIONAL PBE &END XC_FUNCTIONAL &END XC &END METHOD &POTENTIAL PSEUDO_TYPE GTH POTENTIAL_FILE_NAME ../../PBE_PSEUDOPOTENTIALS POTENTIAL_NAME ${line:ppot.inc} &END POTENTIAL &PP_BASIS BASIS_SET_FILE_NAME ../../BASIS_MOLOPT BASIS_TYPE CONTRACTED_GTO BASIS_SET DZVP-MOLOPT-SR-GTH &END PP_BASIS &PRINT &BASIS_SET ON &END &ORBITALS ON &END &POTENTIAL ON &END &END &END ATOM
ppot.inc
:008_000_00 GTH-PBE-q6 011_000_00 GTH-PBE-q9 013_000_00 GTH-PBE-q3 014_000_00 GTH-PBE-q4
valence.inc
:008_005_02 1s2 2p1 008_006_03 1s2 2p2 008_007_04 1s2 2p3 008_008_03 1s2 2p4 008_009_02 1s2 2p5 008_010_01 1s2 2p6 011_005_02 1s2 2p1 011_006_03 1s2 2p2 011_007_04 1s2 2p3 011_008_03 1s2 2p4 011_009_02 1s2 2p5 011_010_01 1s2 2p6 011_011_02 1s2 2p6 2s1 011_012_01 1s2 2p6 2s2 011_013_02 1s2 2p6 2s2 3p1 013_011_02 1s1 013_012_01 1s2 013_013_02 1s2 2p1 013_014_03 1s2 2p2 014_011_02 1s1 014_012_01 1s2 014_013_02 1s2 2p1 014_014_03 1s2 2p2
5.1.2.6. Simple template file for PSI4¶
The following template file uses the BLYP functional and the built-in
cc-pvdz
basis set of PSI4:
molecule {
${charge} ${mult}
${element} 0.0 0.0 0.0
}
set {
basis cc-pvdz
scf_type df
guess sad
molden_write true
reference uhf
}
energy('blyp')
Note that the flags molden_write true
and reference uhf
are required.
The former instructs the SCF program to write the orbitals, which HORTON picks
up to compute atomic densities. The latter is needed because, usually, most of the
atomic computations are on open-shell systems. Because PSI4 only writes Molden
files for SCF computations, it is not possible to prepare atomic densities with
other levels of theory with PSI4.
5.1.3. horton-wpart.py
– AIM analysis based on a wavefunction file¶
The basic usage of horton-wpart.py
is as follows:
horton-wpart.py wfn output.h5[:group] {b,h,hi,is,he} [atoms.h5]
This script has four important arguments:
The
wfn
argument is the wave-function file, like a Gaussian03 or 09 formatted checkpoint file, a Molekel file or a Molden input file.The output file (HDF5 format) in which the partitioning results are written. Usually, the output file has the
.h5
extension. The output file is optionally followed by a colon and a group suffix. This suffix can be used to specify a group in the HDF5 output file where the output is to be stored.The third argument refers to the partitioning scheme:
b
: Becke partitioning. [becke1988_multicenter]h
: Hirshfeld partitioning. [hirshfeld1977]hi
: Iterative Hirshfeld partitioning. [bultinck2007]is
: Iterative Stockholder partitioning. [lillestolen2008]he
: Extended Hirshfeld partitioning. [verstraelen2013]
The
atoms.h5
argument is only needed for the Hirshfeld variants, not forb
andis
. The fileatoms.h5
contains a database of pro-atoms and can be generated as explained in horton-atomdb.py – Build a pro-atom database.
This script computes atomic weight functions, and derives all AIM observables that are implemented for that scheme. These results are stored in the specified HDF5 output file.
An elaborate description of all command-line arguments can be obtained from:
horton-wpart.py --help
The integration grid can be tuned with The --grid option.
Note
When a post-Hartree-Fock (post-HF) level is used in Gaussian 03/09 (MP2, MP3, CC or
CI), you must add the keyword Density=current
to the route section in the
Gaussian input file. This is needed to have the corresponding density matrix
in the formatted checkpoint file. When such a post-HF density matrix is
present, HORTON will load that density matrix instead of the SCF density
matrix. Also note that for some levels of theory, no 1RDM is constructed,
including MP4, MP5, ZINDO and QCISD(T).
5.1.4. horton-cpart.py
– AIM analysis based on a cube file¶
The basic usage of horton-cpart.py
is as follows:
horton-cpart.py cube output.h5:group {h,hi,he} atoms.h5
The script takes three arguments:
The
cube
argument can be a Gaussian cube file (possibly generated with another program, e.g. CP2K). Some information must be added to the cube file about the effective core charges that were used. This information is needed to compute the correct net charges. (This is not only relevant for the final output, but also for constructing the proper pro-atoms. When these numbers are not set correctly, you’ll get garbage output.)For example, given an original cube file with the following header:
-Quickstep- ELECTRON DENSITY 18 0.000000 0.000000 0.000000 54 0.183933 0.000000 0.000000 75 0.000000 0.187713 0.000000 81 0.000000 0.000000 0.190349 8 0.000000 0.000000 3.677294 0.000000 8 0.000000 4.966200 10.401166 0.000000 8 0.000000 0.000000 10.401166 0.000000 8 0.000000 4.966200 3.677294 0.000000 8 0.000000 2.483100 0.000000 2.243351 8 0.000000 7.449300 0.000000 13.174925 8 0.000000 2.483100 4.554391 4.206115 8 0.000000 2.483100 9.524087 4.206115 8 0.000000 7.449300 9.524087 11.212180 8 0.000000 7.449300 4.554391 11.212180 8 0.000000 4.966200 7.039230 7.709138 8 0.000000 0.000000 7.039230 7.709138 14 0.000000 2.483100 2.971972 1.606588 14 0.000000 2.483100 11.106506 1.606588 14 0.000000 7.449300 11.106506 13.811687 14 0.000000 7.449300 2.971972 13.811687 14 0.000000 2.483100 7.039230 5.959157 14 0.000000 7.449300 7.039230 9.459119
Now consider the case that 6 effective electrons were used for oxygen and 4 effective electrons for silicon. Then, the second column in the atom lines has to be set to the effective core charge. (This number is not used in the cube format.) In this case, you get:
-Quickstep- ELECTRON DENSITY 18 0.000000 0.000000 0.000000 54 0.183933 0.000000 0.000000 75 0.000000 0.187713 0.000000 81 0.000000 0.000000 0.190349 8 6.0 0.000000 3.677294 0.000000 8 6.0 4.966200 10.401166 0.000000 8 6.0 0.000000 10.401166 0.000000 8 6.0 4.966200 3.677294 0.000000 8 6.0 2.483100 0.000000 2.243351 8 6.0 7.449300 0.000000 13.174925 8 6.0 2.483100 4.554391 4.206115 8 6.0 2.483100 9.524087 4.206115 8 6.0 7.449300 9.524087 11.212180 8 6.0 7.449300 4.554391 11.212180 8 6.0 4.966200 7.039230 7.709138 8 6.0 0.000000 7.039230 7.709138 14 4.0 2.483100 2.971972 1.606588 14 4.0 2.483100 11.106506 1.606588 14 4.0 7.449300 11.106506 13.811687 14 4.0 7.449300 2.971972 13.811687 14 4.0 2.483100 7.039230 5.959157 14 4.0 7.449300 7.039230 9.459119
The output file (HDF5 format) in which the partitioning results are written. Usually, the output file has the extension
.h5
. The output file is optionally followed by a colon and a group suffix. This suffix can be used to specify a group in the HDF5 output file where the output is stored.The third argument refers to the partitioning scheme:
h
: Hirshfeld partitioning. [hirshfeld1977]hi
: Iterative Hirshfeld partitioning. [bultinck2007]he
: Extended Hirshfeld partitioning. [verstraelen2013]
The fourth argument is the pro-atom database generated with
horton-atomdb.py
script.
The horton-cpart.py
script computes atomic weight functions, and then derives
all AIM observables that are implemented for that scheme. These results are
stored in the specifief HDF5 output file.
The horton-cpart.py
script is somewhat experimental. Always make sure that the
numbers converge with an increase in the number of grid points. You may need the
following options to control the efficiency of the program:
--compact COMPACT
. Automatically determine cutoff radii for the pro-atoms, whereCOMPACT
is the maximum number of electrons lost in the tail after the cutoff radius. 0.001 is typically a reasonable value forCOMPACT
. The pro-atoms are renormalized after setting the cutoff radii. One cutoff radius is defined per element. This implies that the tail of the most diffuse anion determines the cutoff radius when the--compact
option is used.--greedy
. This enables a more memory-hungry version of the Iterative and Extended Hirshfeld algorithms that runs considerably faster. This becomes unfeasible for systems with huge unit cells.--stride STRIDE
. TheSTRIDE
parameter controls the sub-sampling of the cube file prior to the partitioning. It is1
by default.
To get a complete list of command-line options, run:
horton-cpart.py --help
5.1.5. Python interface to the partitioning code¶
The horton-wpart.py
and horton-cpart.py
scripts have a rather intuitive
Python interface that allows you to run a more customized analysis. The script
data/examples/wpart/becke.py
is a simple example that runs a Becke partitioning
to compute charges and that writes the charges into a simple text file.
#!/usr/bin/env python
import numpy as np
from horton import *
# Load the Gaussian output from file
fn_fchk = context.get_fn('test/water_sto3g_hf_g03.fchk')
# Replace the previous line with any other fchk file, e.g. fn_fchk = 'yourfile.fchk'.
mol = IOData.from_file(fn_fchk)
# Partition the density with the Becke scheme
grid = BeckeMolGrid(mol.coordinates, mol.numbers, mol.pseudo_numbers, mode='only')
moldens = mol.obasis.compute_grid_density_dm(mol.get_dm_full(), grid.points)
wpart = BeckeWPart(mol.coordinates, mol.numbers, mol.pseudo_numbers, grid, moldens, local=True)
wpart.do_charges()
# Write the result to a file
np.savetxt('charges.txt', wpart['charges'])
The unit tests in the source code contain many small examples that can be used
as a starting point for similar scripts. These unit tests can be found in
horton/part/test/test_wpart.py
and horton/part/test/test_cpart.py
.
5.1.6. Frequently asked questions¶
Which atoms-in-molecules (AIM) partitioning scheme should I use?
There is no single partitioning scheme that is most suitable for any application. Nevertheless, some people claim that only one AIM scheme should be used above all others, especially in the QTAIM and electron density communities. Try to stay away from such fanboyism, as it has little to do with good science.
In practice, the choice depends on the purposes you have in mind. You should try to select a scheme that shows some desirable behavior. Typically, you would like to have a compromise between some of the following features:
- Numerical stability
- Robustness with respect to conformational changes
- Uniqueness of the result
- Computational efficiency
- Mathematical elegance
- Simplicity
- Linearity in the density (matrix)
- Chemical transferability
- Applicable to a broad range of systems
- Applicable to periodic/isolated systems
- Applicable to a broad range of electronic structure methods/implementations
- Accuracy of electrostatic interactions with a limited multipole expansion of the atoms
- Good (empirical) correlations between some AIM property with some experimental observable
- …
It goes beyond the scope of this FAQ to describe how each partitioning scheme (implemented in HORTON or not) performs for these features. Some of these features are very difficult to assess and, therefore, are subject to intense debate in the literature.
Regarding our own work, the following papers are directly related to this question:
- [verstraelen2011a] “Assessment of Atomic Charge Models for Gas-Phase Computations on Polypeptides”
- [verstraelen2012a] “The conformational sensitivity of iterative stockholder partitioning schemes”
- [verstraelen2013] “Hirshfeld-E Partitioning: AIM Charges with an Improved Trade-off between Robustness and Accurate Electrostatics”
The following are also related to this question, but the list is far from complete:
- [bultinck2007] “Critical analysis and extension of the Hirshfeld atoms in molecules”
- [bultinck2007b] “Uniqueness and basis set dependence of iterative Hirshfeld charges”
If you have more suggestions, please drop me a note: Toon.Verstraelen@UGent.be.
What is the recommended level of theory for computing the pro-atom databses?
This question is relevant to the following methods implemented in HORTON: Hirshfeld, Hirshfeld-I and Hirshfeld-E, which require a pro-atoms database. They are referred to in this answer as Hirshfeld-like schemes.
In principle, you are free to choose any level of theory. In practice, several papers tend to be consistent in the level of theory (and basis set) that is used for the molecular and pro-atomic computations. See for example: [bultinck2007], [verstraelen2009], [verstraelen2011a], [verstraelen2011b], [vanduyfhuys2012], [verstraelen2012a], [verstraelen2012b], [verstraelen2013].
One motivation for this consistency is that Hirshfeld-like partitioning schemes yield atoms-in-molecules that are maximally similar to the pro-atoms. [nalewajski2000] (Technically speaking, the sum of the Kullback-Leibler divergence between the atomic and pro-atomic densities for all the atoms is minimized.) This principle is (or should be) one of the reasons that Hirshfeld-like charges are somewhat transferable between chemically similar atoms. You could hope that the Kullback-Leibler divergence is easier to minimize when the molecular and pro-atomic densities are computed as consistently as possible. Thus, using the same level of theory and basis set is advised.
Another motivation is that such consistency may (to some degree) result in, cancellation of errors, when comparing Hirshfeld-like charges computed at different levels of theory. For example, it is found that Hirshfeld-I charges have only a small basis set dependency based on [bultinck2007b]. In this work, the molecular and pro-atomic densities were computed consistently.
At the end, it can also be argued that without consistency in the level of theory, there are so many possible combinations that it becomes impossible to make a well-motivated choice. Then again, this is a problem that computational chemists usually embrace with open arms hoping that they will find a unique combination of levels of theory that turns out to be useful.
How can one generate Cube files with CP2K?
Electron density cube files can be generated with CP2K by adding the following section to the CP2K input file: E_DENSITY_CUBE.