solve_banded is very slow as a tridiagonal solver
Marcel Oliver <m.oliver <at> jacobs-university.de>
2015-08-14 16:45:40 GMT
I noticed that using solve_banded to solve a tridiagonal linear system
is at least an order of magnitude slower than it should be based on a
straightforward operation count of the Thomas algorithm.
In the python source code, solve_banded is pretty much a
straightforward call into gbsv from lapack, but I did not dare to
follow up on what is going on in there. Maybe it's doing pivoting, or
there is some inefficiency due to being coded for more general
In any event, in a lot of textbook applications of tridiagonal
solvers, pivoting is not necessary (see, e.g., Higham, Accuracy and
Stability of Numerical Algorithms, SIAM 1996, Section 9.5) and the
Thomas algorithm does not vectorize efficiently.
So it would be very good if either solve_banded could be fixed if it
does something stupid, or otherwise if a plain Thomas algorithm
tridiagonal solver could be added as this would cover a lot of
run-off-the-mill problems while solve_banded could handle the rest.
In any event, just to demonstrate that the performance gap is really
there, I coded up a Thomas algorithm in cython and here is the output
for a tridiagonal system of size 10^6:
CPU time Thomas algorithm in python: 3.47022
CPU time solve_banded: 0.428928
CPU time Thomas algorithm in cython: 0.032048
Speedup python/cython: 108.281952072
Speedup solve_banded/cython: 13.3839241138
The code is attached, but it's quickly put together and I am not
claiming that it's well. In fact, it's the first time I try cython
which was easy enough, except that <at> cython.boundscheck(False) makes
the code run slower, not faster...
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org