..
DOCUMENTATION BUILT FROM RELEASE: 2.0.0-244-gaaf5e2b
..
: HORTON: Helpful Open-source Research TOol for N-fermion systems.
: Copyright (C) 2011-2016 The HORTON Development Team
:
: This file is part of HORTON.
:
: HORTON is free software; you can redistribute it and/or
: modify it under the terms of the GNU General Public License
: as published by the Free Software Foundation; either version 3
: of the License, or (at your option) any later version.
:
: HORTON is distributed in the hope that it will be useful,
: but WITHOUT ANY WARRANTY; without even the implied warranty of
: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
: GNU General Public License for more details.
:
: You should have received a copy of the GNU General Public License
: along with this program; if not, see
:
: --
Accelerating HORTON code with Cython
####################################
HORTON was designed to prioritize ease of programming over performance. This is a reasonable decision in light of the fact that the vast majority of time in a quantum chemistry code is only spent in a relatively small section of code. In HORTON, we rewrite these critical portion s of code in C++ and link them into Python using the Cython framework. We identify these critical portions using profiling tools.
Before you begin
================
There are several downsides to accelerating code with Cython. Please make sure they are acceptable to you before starting to code.
- Developer tools will break. PDB cannot read C++ debug symbols. cProfile will break. IDEs will not syntax highlight correctly. Valgrind will report false positive memory leaks.
- Cython is still a fairly new project. The API may not be stable. Don't be surprised if your code breaks after a few versions.
- A steep learning curve. Unless you are already familiar with C/C++ and Python profiling tools, you may not obtain the speed up you expected. Memory allocation and management of arrays is particularly tricky.
Basic example
=============
Background
----------
We will take a simplified example from the slicing code of the matrix class. The original Cholesky decomposed 2-electron integrals had an operation to slice along 3-indices which was consuming a significant portion of the time in the code. This was implemented using the ``numpy.einsum`` method.
The original code is here
.. code-block:: python
def get_slice_slow(self):
return numpy.einsum("ack, bck-> abc", B, B_prime)
This code takes a slice of B where the indices ``c`` are kept the same and then contracts across the last index.
.. math::
A_{abc} = \sum_k B_{ack} B'_{bck}
A quick check using the python cProfile module (``python -m cProfile -o slice.pstats get_slice_slow.py; gprof2dot -f pstats slice.pstats | dot -Tpng -o slice.png``) showed ``get_slice_slow`` method required almost 40% of the total code runtime. Since this operation was simple to implement in C++, it was a good candidate for Cythonizing.
The C++ code to implement the same operation is below:
.. code-block:: c++
//get_slice_abcc.cpp
void get_slice_abcc(double* B, double* B_prime, double* out, long nbasis, long nvec){
new long k; //example
for (k=0; k nvec
dims[1] = nbasis
dims[2] = nbasis
result = numpy.PyArray_SimpleNewFromData(3, dims, np.NPY_DOUBLE, data)
The method ``PyArray_SimpleNewFromData`` creates a new Numpy array from memory which has already been allocated. The numpy data types must be specified, as well as the dimensionality. Data is simply a 1D ``double*`` array of size nvec * nbasis * nbasis.
Further reading
===============
http://docs.cython.org/
http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html