.. DOCUMENTATION BUILT FROM RELEASE: 2.1.0b1-1-ge018244 (Feb 03, 2017) .. : HORTON: Helpful Open-source Research TOol for N-fermion systems. : Copyright (C) 2011-2016 The HORTON Development Team : : This file is part of HORTON. : : HORTON is free software; you can redistribute it and/or : modify it under the terms of the GNU General Public License : as published by the Free Software Foundation; either version 3 : of the License, or (at your option) any later version. : : HORTON is distributed in the hope that it will be useful, : but WITHOUT ANY WARRANTY; without even the implied warranty of : MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the : GNU General Public License for more details. : : You should have received a copy of the GNU General Public License : along with this program; if not, see : : -- Accelerating HORTON code with Cython #################################### HORTON was designed to prioritize ease of programming over performance. This is a reasonable decision in light of the fact that the vast majority of time in a quantum chemistry code is only spent in a relatively small section of code. In HORTON, we rewrite these critical portion s of code in C++ and link them into Python using the Cython framework. We identify these critical portions using profiling tools. Before you begin ================ There are several downsides to accelerating code with Cython. Please make sure they are acceptable to you before starting to code. - Developer tools will break. PDB cannot read C++ debug symbols. cProfile will break. IDEs will not syntax highlight correctly. Valgrind will report false positive memory leaks. - Cython is still a fairly new project. The API may not be stable. Don't be surprised if your code breaks after a few versions. - A steep learning curve. Unless you are already familiar with C/C++ and Python profiling tools, you may not obtain the speed up you expected. Memory allocation and management of arrays is particularly tricky. Basic example ============= Background ---------- We will take a simplified example from the slicing code of the matrix class. The original Cholesky decomposed 2-electron integrals had an operation to slice along 3-indices which was consuming a significant portion of the time in the code. This was implemented using the ``numpy.einsum`` method. The original code is here .. code-block:: python def get_slice_slow(self): return numpy.einsum("ack, bck-> abc", B, B_prime) This code takes a slice of B where the indices ``c`` are kept the same and then contracts across the last index. .. math:: A_{abc} = \sum_k B_{ack} B'_{bck} A quick check using the python cProfile module (``python -m cProfile -o slice.pstats get_slice_slow.py; gprof2dot -f pstats slice.pstats | dot -Tpng -o slice.png``) showed ``get_slice_slow`` method required almost 40% of the total code runtime. Since this operation was simple to implement in C++, it was a good candidate for Cythonizing. The C++ code to implement the same operation is below: .. code-block:: c++ //get_slice_abcc.cpp void get_slice_abcc(double* B, double* B_prime, double* out, long nbasis, long nvec){ new long k; //example for (k=0; k nvec dims[1] = nbasis dims[2] = nbasis result = numpy.PyArray_SimpleNewFromData(3, dims, np.NPY_DOUBLE, data) The method ``PyArray_SimpleNewFromData`` creates a new Numpy array from memory which has already been allocated. The numpy data types must be specified, as well as the dimensionality. Data is simply a 1D ``double*`` array of size nvec * nbasis * nbasis. Further reading =============== http://docs.cython.org/ http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html