5.2. Electrostatic potential fitting¶

5.2.1. Introduction¶

The general idea behind the electrostatic potential (ESP) fitting tools in HORTON is to allow:

ESP-fitting: Fitting atomic charges to reproduce the actual molecular ESP, and

ESP-testing: Assessing how well a given set of charges reproduce the actual molecular ESP

in isolated and 3D-periodic systems with compatible methodologies. These procedures are broken into small steps, such that you can carry out more specialized ESP fittings.

These two computations, fitting charges and testing the quality of given charges to reproduce the ESP, can be carried out once an ESP cost function is constructed. In HORTON, this ESP cost function takes the following form:

\[\text{COST}(\{q\}, \Delta V_\text{ref}) = \int_V d\mathbf{r} w(\mathbf{r}) \left(V_\text{ai}(\mathbf{r}) - \sum_A q_A V_\text{point}(\mathbf{r} - \mathbf{R}_A) - \Delta V_\text{ref} \right)^2\]

where the symbols have the following meaning:

\(\{q\}\): the set of the atomic charges
\(V\): the volume where the point-charge ESP (approximated molecular ESP) is compared to the reference ESP (actual molecular ESP). In isolated systems, this is some volume surrounding the molecule. In periodic systems, this is the volume of a single primitive cell.
\(w(\mathbf{r})\): the weight function with range [0, 1] that selects the relevant parts of the volume for the ESP fit. In HORTON, the weight function of Hu, Lu and Yang is used. [hu2007] This weight function is zero far away from and inside the atoms. It becomes one in the part of the electron density tail where non-bonding contacts are typically found. The weight function smoothly varies from 0 to 1. In the Hu-Lu-Yang paper, a pro-density is recommended for this purpose while we recommend the use of a proper all-electron density or a corrected pseudo-density.
\(V_\text{ai}(\mathbf{r})\): the actual molecular ESP from ab initio or DFT calculations obtained from programs, like Gaussian, CP2K, VASP, etc.
\(q_A\): the charge of atom \(A\).
\(V_\text{point}(\mathbf{r} - \mathbf{R}_A)\): the ESP due to a unit point charge. In an isolated molecule, this is simply \(1/|\mathbf{r} - \mathbf{R}_A|\) (in atomic units). For periodic systems, this part is quite a bit more involved.
\(\Delta V_\text{ref}\): constant that may account for differences in reference between the ab initio ESP and the point charge ESP. The need for such a term was established in the REPEAT paper for ESP fitting in 3D periodic systems. [campana2009]

This approach differs from traditional ESP fitting methods (RESP, MKS, CHELPG) in the sense that the cost function is defined as an integral rather than a sum over sample points. This idea comes from the Hu-Lu-Yang paper. [hu2007] The main difference with the REPEAT method [campana2009] is that the selected volume for the fit is defined by a smooth weight function. When this weight function is based on an all-electron density, there is no need to define atomic radii as is done usually in ESP fitting.

The cost function is obviously a quadratic function of the unknown parameters, \(X\), the charges (and \(\Delta V_\text{ref}\) in the case of a 3D periodic system), which we can always write as follows:

\[\text{COST}(X) = X^T A X - 2 B^T X - C\]

The script horton-esp-cost.py constructs the matrix \(A\), the vector \(B\) and the constant \(C\). These results are stored in an HDF5 file that is subsequently used by the horton-esp-fit.py script to obtain ESP-fitted charges, or by the horton-esp-test.py script to evaluate the cost function for a given set of charges.

In the output of these three scripts, the following quantities are reported (if applicable):

\(\text{RMSD} = \sqrt{\text{COST}/V_w}\)
\(\text{RRMSD} = \sqrt{\text{COST}/C}\) (expressed as a percentage)

where the cost is computed with a certain set of charges (and the optimal \(\Delta V_\text{ref}\) when applicable). The parameter \(V_w\) is the integral of the weight function. When these charges are the ESP-fitted charges, \(\text{COST}\), \(\text{RMSD}\) and \(\text{RRMSD}\) are minimal. When the charges are all set to zero, you obtain the worst-case value for \(\text{COST}=C\), \(\text{RMSD}=\sqrt{C/V_w}\) and \(\text{RRMSD}=100\%\). (You can still obtain a higher \(\text{COST}\), but only when using completely insensible charges. When that is the case, it is better to set the charges equal to zero to model the ESP.)

5.2.2. `horton-esp-cost.py` – Set up an ESP cost function¶

The horton-esp-cost.py script can be used to construct the cost function for the ESP fitting or charge testing.

For an all-electron calculation, it is recommended to use:

horton-esp-cost.py esp.cube cost.h5 --wdens=rho.cube --pbc={000|111}

where esp.cube is a Gaussian cube file containing the actual molecular ESP on a grid. The results will be written in cost.h5. The option --wdens=rho.cube implies that the weight function is constructed according to the Hu-Lu-Yang method [hu2007] using the given density cube file. The option --pbc can be used to construct a cost function for an isolated system (000) or a 3D periodic system (111).

When a pseudo-potential computation is used, the density cube file contains regions of low electron density close to the nucleus. These regions may not be excluded from the fit with the Hu-Lu-Yang weight function. Therefore, HORTON allows you to build the weight function as a product of several factors: \(w(\mathbf{r}) = w_1(\mathbf{r})w_2(\mathbf{r})w_3(\mathbf{r})w_4(\mathbf{r}) \ldots\), where the first one is typically the Hu-Lu-Yang weight function and additional weight functions can be included for every pseudo-core,

For a pseudo-potential calculation, it is recommended to use:

horton-esp-cost.py esp.cube cost.h5 --wdens=rho.cube --pbc={000|111} \
                   --wnear Z1:r1:gamma1 [Z2:r2:gamma2 ...]

where the new option, --wnear, takes at least one argument. This argument must be presented for every element in the system for which a pseudo potential is used. The parameter Z1 is the atomic number of atom 1. The parameter r1 is the radius of the core region that you would like to exclude for atom 1. The parameter gamma1 determines how quickly the weight factor for elements Z1 switches from 0 (inside a sphere with radius r1) to 1 (outside a sphere with radius r1). Both r1 and gamma1 must be given in angstrom.

The horton-esp-cost.py script has several more options. As discussed in the Running HORTON as horton-*.py scripts section, the commands below describe the arguments that each script take:

horton-esp-cost.py --help
horton-esp-fit.py  --help
horton-esp-test.py --help
horton-cubehead.py --help

5.2.3. `horton-esp-fit.py` – Fit atomic charges to the ESP¶

Once a cost function is constructed, it can be used to estimate the atomic charges by minimizing the cost function. A bare-bones fit can be carried out as follows:

horton-esp-fit.py cost.h5 charges.h5

where the file cost.h5 is the file generated by the horton-esp-cost.py script.

Useful ESP-fitted charges typically involve a more advanced minimizations of the ESP cost function, for example, by adding constraints, restraints, transforming to bond-charge increments, fitting to several different molecules concurrently, etc. Such advanced features are not supported in horton-esp-fit.py script, but you can implement these in customized scripts that use the the ESP cost functions obtained from the horton-esp-cost.py script.

5.2.4. `horton-esp-test.py` – Test the ESP quality of a given set of charges¶

The horton-esp-test.py script can be used to test the quality of a given set of charges to reproduce the ESP. These charges are typically obtained with horton-wpart.py or horton-cpart.py. This script can be used as follows:

horton-esp-test.py cost.h5 wpart.h5:hi/charges wpart_espcost.h5:hi

The first file, cost.h5, contains the cost function and is generated with the horton-esp-cost.py script. The second file, wpart.h5, contains the given atomic chanrges and, for example, is generated with horton-wpart.py gaussian.fchk wpart.h5:hi hi atoms.h5. The last file, wpart_espcost.h5, will contain the assessment results in the HDF5 group hi.

5.2.5. Making nice cube files with Gaussian¶

HORTON contains an auxiliary tool, horton-cubehead.py, to prepare an input header for a cube file aligned with the molecule of interest. This is more efficient than the default settings of cubegen, which makes a significant difference in disk space when working with molecular databases. For occasional use, horton-cubehead.py is probably an overkill. The script is used as follows:

horton-cubehead.py structure.xyz cubehead.txt

The cubehead.txt file will contain something along the following lines:

 16.5695742234   -2.4411573645  -11.3378429796
-61   -0.0000100512    0.0000288090    0.3779452256
 -0.2210334948    0.3065726468   -0.0000292468
 -0.3065726480   -0.2210334949    0.0000086952

This file can be used for the cubegen utility as follows:

cubegen 0 fdensity=scf somefile.fchk rho.cube -1 < cubehead.txt
cubegen 0 potential=scf somefile.fchk esp.cube -1 < cubehead.txt

where scf must be replaced by the type of wavefunction to be analyzed. Read the cubegen manual for more details.

5.2.6. Making cube files with CP2K¶

CP2K can generate cube files for periodic systems. You have to use the input sections E_DENSITY_CUBE and V_HARTREE_CUBE for density and ESP cube files, respectively. (The name V_HARTREE_CUBE is misleading. CP2K will print out the potential due to the electrons and the nuclei felt by a point charge with charge \(-e\).)

5.2.7. Python interface to the ESP fitting code¶

You can use the ESP cost function constructed by horton-esp-cost.py script to implement customized charge fitting protocols, e.g. using bond-charge increments, constraints or hyperbolic restraints. At the beginning of such a customized script, the cost function can be loaded as follows:

cost = load_h5("cost.h5")['cost']

The cost object is an instance of the ESPCost class. This instance can, for example, be used to evaluate the ESP cost or its gradient for a given array of atomic charges:

print cost.value(charges)
print cost.gradient(charges)

If desired, you can also directly access \(A\), \(B\), \(C\) that define the quadratic cost functions:

print cost._A
print cost._B
print cost._C

For more information, please refer to Introduction.