ENH: add fast toeplitz matrix multiplication using FFT #11346

InfProbSciX · 2020-01-11T23:15:51Z

This commit adds a function, matmul_toeplitz, for efficient (loglinear time) matrix-matrix products involving a Toeplitz matrix. The Toeplitz matrix is embedded in a circulant matrix, which is diagonalisable using its discrete Fourier transform. It follows that matrix-vector products and hence matrix-matrix products can be computed using the FFT.

This is described in this book, this MIT page and in this math.stackexchange post.

Further, I split some code from the solve_toeplitz function into a cleaning and validation function _validate_args_for_toeplitz_ops because the cleaning needed for both functions is very similar. In the future, if someone implements other fast methods such as matmul_circulant, log_det_circulant, log_det_toeplitz and so on, they could rely on the same validation function.

There are three reference implementations (that I could find) available online which are written in python, these are:

.. [a] : Jacob R Gardner, Geoff Pleiss, David Bindel, Kilian
Q Weinberger, Andrew Gordon Wilson, "GPyTorch: Blackbox Matrix-Matrix
Gaussian Process Inference with GPU Acceleration" with contributions
from Max Balandat and Ruihan Wu.

Available online

.. [b] : R. Scheibler, E. Bezzam, I. Dokmanić, Pyroomacoustics: A Python
package for audio room simulations and array processing algorithms,
Proc. IEEE ICASSP, Calgary, CA, 2018.

Available online

.. [c] : Maranò S, Edwards B, Ferrari G and Fäh D (2017), "Fitting
Earthquake Spectra: Colored Noise and Incomplete Data", Bulletin of
the Seismological Society of America., January, 2017. Vol. 107(1),
pp. 276-291.

Implementation available online

My implementation seems to be consistent with all of them (the code isn't a direct 'copy-paste' of any of them - the github implementations have MIT licenses and I've referenced these in the function), and I've tried to remain consistent with existing Toeplitz functions in SciPy.

The possible use-cases for this function include fast matrix-matrix multiplications, fast solving using preconditioned conjugate gradient methods (which can readily be built using the scipy.sparse.linalg.LinearOperator class and the scipy.sparse.linalg.cg function).

Tests show that the implementation is correct, and the tests corresponding to solve_toeplitz also all pass. The code passes PEP8 checks locally.

mdhaber · 2020-01-11T23:27:02Z

@InfProbSciX this sounds cool. Can you also post some speed test results that demonstrate the speedup?

InfProbSciX · 2020-01-11T23:46:47Z

Yes! Forgot about that :)

Some simple code to run these benchmarks:

import numpy as np
from scipy.linalg import toeplitz, matmul_toeplitz
from numpy.testing import assert_allclose

def fast_sim(n):
    c = np.random.normal(size=n)
    r = np.random.normal(size=n)
    b = np.random.normal(size=n)
    result = matmul_toeplitz((c, r), b)

def naive_sim(n):
    c = np.random.normal(size=n)
    r = np.random.normal(size=n)
    b = np.random.normal(size=n)
    expected = toeplitz(c, r) @ b

# %timeit fast_sim(10000)
# %timeit naive_sim(10000)

Results on my laptop:

Row Len	This_Commit	Naive
100	164 µs	64 µs
1000	768 µs	2650 µs
1000	9640 µs	617000 µs

scipy/linalg/basic.py

Co-Authored-By: Matt Haberland <mdhaber@mit.edu>

mdhaber · 2020-01-12T00:12:34Z

I verified the performance improvement and accuracy on my machine. Impressive!

scipy/linalg/basic.py

mdhaber · 2020-01-12T00:33:55Z

scipy/linalg/basic.py

+    r, c, b, dtype, b_shape = _validate_args_for_toeplitz_ops(
+        c_or_cr, b, check_finite, keep_b_shape=False)
+    n, m = b.shape
+


I didn't check the math here, but it looks pretty straightforward, and it does appear to work.

scipy/linalg/basic.py

mdhaber · 2020-01-12T00:44:10Z

scipy/linalg/tests/test_matmul_toeplitz.py

+from numpy.testing import assert_allclose
+np.random.seed(42)
+
+class ToeplitzTestCase:


Tests look appropriate.

scipy/linalg/basic.py

ilayn · 2020-01-12T11:01:32Z

@InfProbSciX Thanks a lot for such a thorough pull request. I wish we get more of such PRs.

I am a bit pessimistic about adding this to the public API since to people who need this, they are quite well-known (in particular the netlib links you gave are famous). The use case is for people who are aware that they have a toeplitz matrix and not aware that there are more performant ways to multiply these matrices.

That number of people to me doesn't appear to be too many to add another function to the public API. For those who are aware of these facts, it is basically a couple of lines. This applies to circulant and hankel matrices too.

Don't get me wrong; I think this is great to have at one's bag of tricks but the curse of SciPy is that once something goes in it takes years to take it out if we decide to have a different way of bundling these together. So I tend to be on the conservative side. But I'll wait for others to chime in too.

InfProbSciX · 2020-01-12T11:18:05Z

It's true that this is just a few lines, but scipy has functionality for solve_toeplitz only at the moment. When I was getting into this, I felt like it was so difficult to try and learn how these operations can be made more performant. Seeing solve_toeplitz was an eye opener, but there are other tricks like the matrix multiplication one, conjugate gradients and so on, that would be great to support a newcomer into the field. Scipy is also (I feel like) the first place I would look for these - I was also very disappointed to learn that BLAS and LINPACK don't come with so many interesting efficient linear algebra routines, so we must rely on third party algos. Matlab seems to have many of these algos built in - I would like to see scipy having some of these too. It's also quite frustrating to look for simple implementations of the Bariess and so on to find that, outside of those closely knit communities who know how it works, it's difficult to find some code that actually does what you were looking for, particularly python code.

Moreover, there are some implementations of this in the public domain (referenced), that perhaps wouldn't be necessary if scipy had the functionality to begin with. The special matrices in scipy are mentioned but I feel like not a lot can be done with them as the most important operations that make use of their special structure are missing.

I actually want to add the other functions as well, like the log_det_* functions at some point, when I have enough time to write them.

ilayn · 2020-01-12T11:34:33Z

That's certainly true and makes a good segway to our mantra "Try to avoid small tricks as public functions". Note that solve_toeplitz uses a LAPACK routine behind the scenes and in my opinion still doesn't fit our mantra but historically these additions came in a very heterogeneous way and now we don't dare to decommission them. That is also why BLAS/LAPACK maintainers don't add every function to the catalogue because they are thinking along the same manner.

There are many things I wish we had, for example, one of the longest discussions we had is to have a .H for a shortcut of .conj().T in NumPy arrays. The use case is very clear and it is possibly the most used shortcut but even then there are many things to consider and we still don't have it due to historical reasons.

I have a PR waiting for linalg.solve that auto-detects the matrix structure and uses the proper LAPACK routine that is what matlab's backslash operator is famous for. But it touches so many things that I am still reluctant to propose.

All these to say, even though the functionality is nice, we might not want to add because it pollutes the namespace and also makes it very difficult a few years later to maintain if we figure out a better way to do this. I actually expect this to be handled by the matmul operator of ndarray since it is responsible multiplying the matrices.

InfProbSciX · 2020-01-12T12:31:56Z

I definitely see what you're saying - and I completely agree that we don't want to pollute the public namespace. The main reason I wrote this the way I did was consistency - as solve_toeplitz is written in the same way (as a standalone function instead of a method in a class).

I don't see this as a small trick, as I also want to write the determinant functions which wouldn't be such a trivial implementation. I'm trying to do this so that the scipy toeplitz/special matrices section isn't so incomplete.

If the others are unhappy with the public function as well, here are some proposals to make this hidden from the user:

I could write this as a static method of a toeplitz class. The main problem with this is that we'd need to change the toeplitz function to a class inheriting from ndarray (because I think that's the return on that function). It would be a static method as it also takes in a tuple of row, column as the toeplitz function does. This wouldn't be clean and might cause problems along the way if we were ever to expand the special matrices part.
I could write something into ndarray. I'm extremely reluctant to do this for two main reasons. The major problem with this is that it would lose the efficiency as the ndarray would have to be in memory before the multiplication happens. I also don't really want to cause discrepancies between numpy and scipy, and further, the choice should be up to the user to prefer this method over classic matrix multiplication.
Make this a hidden function, so that only users who really want to use this function would access the function as import _xxx from scipy.linalg (I'm not even sure if this works, and I'm sure it would be bad practice).

I want the special matrices section to be more complete, and I completely understand that the implementation looks trivial, but I don't see the other functions that I'd like to contribute in the future being just tricks, as these are great routines to use. There's a good use case for it - toeplitz matrices are common in statistics, particularly stationary processes - perhaps more people would come to use the routines if they were more easily accessible. I see numpy as the more minimalistic linalg package; scipy's design states that "A good rule of thumb is that if it’s covered in a general textbook on numerical computing ... it’s probably implemented in SciPy." I would like to really complete the special matrices part of scipy to live up to that, because at the moment, there's lots of stuff missing.

ilayn · 2020-01-12T15:52:20Z

There are other obvious implementations that currently SciPy doesn't support for matrix multiplication say, diagonal matrices, triangular matrices etc. even when they have a corresponding BLAS routines for them say ?gbmv or ?trmv and their ...mm flavors. They are not exposed currently but as you argue above then we have to create a whole matmul class for each and everyone of them. That's what I mean by polluting namespace; otherwise they are all valuable routines.

I would actually put the actual lines into the documentation of linalg.toeplitz for the interested.

The main issue here for a possible generalization is that discovering a Toeplitz matrix takes more time than just straightforward matrix multiplication hence the benefit of the whole thing is reduced down to the user who is aware that they have a toeplitz matrix at hand and would likely to look at linalg.toeplitz.

DWesl · 2020-01-12T18:44:37Z

As another approach for Toeplitz matrix-vector products, there's an implementation for symmetric Toeplitz matrices in the HomogeneousIsotropicCorrelation class here, especially the from_array factory method with is_cyclic=False and the _matmat method (is_cyclic should probably be renamed to is_periodic for clarity; the terminology refers to whether the random field whose correlation matrix is represented by the operator has periodic boundary conditions or not). There's also a draft PR for calculating the determinant that could be adopted for log-determinant, though I never figured out how to get it to return the same answer as numpy.linalg.det() if the matrix is not also circulant.

It sounds like one solution to not wanting all the special matrix implementations in scipy proper would be to make a package/repository of LinearOperator subclasses for special matrices somewhere. PyLops has one batch, though it seems targeted at geophysical applications rather than general numerics. The separate package would be somewhat more work than the web page with example code that I thought was mentioned above but can't find now.

I hope at least some of this was useful to someone.

InfProbSciX · 2020-01-14T00:15:41Z

@ilayn The log determinant calculation seems to be a straightforward implementation that utilises the levinson algorithm. Would you be okay if I modified the existing hidden cython implementation by adding an additional flag like return_log_det = False, so that if the flag is set to true, the function returns a tuple of result, log_det? I don't think that there would be any significant overhead if the flag is set to false as the log_det seems to be a free/low-cost side effect if the levinson is run.

There wouldn't be any additional public functions and if everyone else agrees, I'll put the matrix multiplication into the toeplitz documentation.

mrava87 · 2020-01-14T07:20:29Z

Hi @DWesl, thanks for mentioning fa PyLops. I second you here, and this is the main reason I started developing this library some years ago: I didn’t feel that scipy would ever allow/be the best place to host many specialized ‘matrix’ implementations. It is true that geophysics is currently the main target (given that it is a very data hungry field) but there is a very extensive set of basic functionalities and we have been even discussing about the possibility of moving all geophysics specific components out to a new library and keep PyLops for just general purpose operators. If people are interested we could speed this up.

Regarding this PR, @InfProbSciX, I am glad to see that more and more people feel the need to optimizing matrix-vector multiplications for special matrices. My main question is: why not subclassing a LinearOperator already (as you also mention eventually users would do that...), this may seem a break in the current trend in scipy but it may pay off in the long run. For example if you want to add log_det* as you say, you will need another public method and raise the same concerns as in this thread. If instead you went for a Toeplitz class this could be just a method of the class, so you limit what it is effectively exposed in the API? Just a suggestion :)

InfProbSciX · 2020-01-14T09:36:45Z

@mrava87 I'd love to add a Toeplitz class to SciPy but this adds more publicly accessible stuff in SciPy, so the decision is up to the devs really. This is the cleanest way to add the functionality imo.

My suggestion for adding the log_det flag to the existing solve function was just a way to add the functionality without adding a public function.

mrava87 · 2020-01-14T12:13:41Z

I see your point. I was just thinking that in your case you still add a new public function (while in the case of making Toeplitz class you would add a class). Of course this is a bit different from current scipy structure so it’s obviously up to the devs.

Regarding the det computation, my personal opinion as user is that I prefer more well defined methods with ‘atomic’ scope than fewer with of additional parameters that lead to many slightly different behaviors, but again I’m not a core dev of scipy so it’s just a comment from user point of view.

ilayn · 2020-01-17T13:36:13Z

Sorry for the delay, I will come back to this as soon as possible. In the meantime, perhaps, can whoever is available from @scipy/scipy-core-team also chime in so we get a different point of view ?

rgommers · 2020-08-24T22:10:48Z

This seems to be in good shape and from the history it looks like we're leaning towards merging this in its current form. Does anyone have objections to that?

WarrenWeckesser · 2020-08-24T23:56:39Z

Don't merge yet, I'm working on a more detailed review now.

WarrenWeckesser

This is a nice addition to SciPy. I requested a bunch of changes inline, and have one more requested change, as follows.

The PR assumes that the Toeplitz matrix is square, but this is not necessary for the fast Toeplitz multiplication algorithm. The code can be made to work with an array with shape (nrows, ncols) with just a few small changes.

First, the validation function will have to be modified to allow an input that is not square. This could be an additional boolean flag added to the arguments.

Then, adjust the sizes in the main code as follows.

In the code, there is already n, m = b.shape(). Define

    # The shape of the Toeplitiz matrix is (nrows, ncols)
    nrows = len(c)
    ncols = len(r)
    p = nrows + ncols - 1  # equivalent to len(embedded_col)

In the complex case of the if statement, change the second two lines to:

        fft_b = fft(b, n=p, axis=0, workers=workers)

        mat_times_b = ifft(fft_mat*fft_b, axis=0, workers=workers)[:nrows, :]

and in the real case, change the two lines to

        fft_b = rfft(b, n=p, axis=0, workers=workers)

        mat_times_b = irfft(fft_mat*fft_b, axis=0, workers=workers,
                            n=p)[:nrows, :]

Finally, change the return statement to

    return mat_times_b.reshape(nrows, m)

I think that covers all the needed changes to the code, but double-check to be sure!

Tests with nonsquare inputs will have to be added, too.

scipy/linalg/basic.py

scipy/linalg/tests/test_matmul_toeplitz.py

scipy/linalg/basic.py

InfProbSciX · 2020-08-25T16:17:26Z

Thanks for the comments, I'll make these changes.

InfProbSciX · 2020-08-25T22:49:41Z

@WarrenWeckesser I've made the suggested changes - thank you for such a detailed comment on the case of non-square matrices :) (I've also added a few tests for non-square matrices.)

scipy/linalg/tests/test_matmul_toeplitz.py

InfProbSciX · 2020-08-28T22:51:20Z

All the CI checks now pass.

InfProbSciX · 2020-10-07T16:17:03Z

Hi @rgommers @WarrenWeckesser I'm just following up to see if there's any further changes that you'd suggest for this PR.

WarrenWeckesser

I requested some nitpicky changes, but otherwise, this looks good.

scipy/linalg/basic.py

Co-authored-by: Warren Weckesser <warren.weckesser@gmail.com>

InfProbSciX · 2020-10-17T09:15:05Z

@WarrenWeckesser All done!

WarrenWeckesser · 2020-10-17T18:40:10Z

I've given this the green check mark, but I'll wait a few days before merging, in case anyone else want to comment first.

When it comes time to merge, the commits should be squashed to a single commit.

InfProbSciX · 2020-10-27T17:04:31Z

@WarrenWeckesser Just wanted to check if you'd like me to squash these commits (or if GitHub's squash and merge is fine).

THANKS.txt

WarrenWeckesser · 2020-11-09T03:23:47Z

@InfProbSciX, squashed and merged, thanks! I undid the change to THANKS.txt--that file has been removed.

WarrenWeckesser · 2020-11-09T03:34:01Z

I added a "thanks" to https://github.com/scipy/scipy/wiki/*THANKS.txt-additions-modifications-for-1.6.0*, and a release note in https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.6.0. Feel free to edit those as you see fit.

ENH: add fast toeplitz multiplication

732da2e

mdhaber reviewed Jan 11, 2020

View reviewed changes