[Numpy-discussion] ENH: compute many inner products quickly

Discussion:

Scott Sievert

2016-05-29 03:53:33 UTC

I recently ran into an application where I had to compute many inner products quickly (roughy 50k inner products in less than a second). I wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1, âŠ, xn.T @ A @ xn]` with A.shape = (1k, 1k).

My first instinct was to look for a NumPy function to quickly compute this, such as np.inner. However, it looks like np.inner has some other behavior and I couldnât get tensordot/einsum to work for me.

Â The main challenge is to figure out how to transition the behavior of all these operations, while preserving backwards compatibility. Quite likely, we need to pick new names for these functions, though we should try to pick something that doesn't suggest that they are second class alternatives.

Do we choose new function names? Do we add a keyword arg that changes what np.inner returns?

[a PR]:https://github.com/numpy/numpy/pull/7690

Stephan Hoyer

2016-06-06 00:41:52 UTC

Permalink

If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.

There are three types of matrix/vector products for which ufuncs would be
nice:
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product

It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on
the other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.

So I'll start by throwing out some potential new names:

For matrix-vector products:
matvecmul (if it's worth making a new operator)

For inner products:
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product

Post by Scott Sievert
I recently ran into an application where I had to compute many inner
products quickly (roughy 50k inner products in less than a second). I
My first instinct was to look for a NumPy function to quickly compute
this, such as np.inner. However, it looks like np.inner has some other
behavior and I couldnât get tensordot/einsum to work for me.
Then a labmate pointed out that I can just do some slick matrix
I opened [a PR] with this, and proposed that we define a new function
called `inner_prods` for this.

The main challenge is to figure out how to transition the behavior of

all these operations, while preserving backwards compatibility. Quite
likely, we need to pick new names for these functions, though we should try
to pick something that doesn't suggest that they are second class
alternatives.
Do we choose new function names? Do we add a keyword arg that changes what
np.inner returns?
[a PR]:https://github.com/numpy/numpy/pull/7690
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

j***@gmail.com

2016-06-06 00:50:44 UTC

Permalink

Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.
There are three types of matrix/vector products for which ufuncs would be
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on
the other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product

how about names in plural as in the PR
I thought the `s` in inner_prods would signal better the broadcasting
behavior

dot_products
...

"dots" ? (I guess not)

Josef

Post by Stephan Hoyer

The main challenge is to figure out how to transition the behavior of

all these operations, while preserving backwards compatibility. Quite
likely, we need to pick new names for these functions, though we should try
to pick something that doesn't suggest that they are second class
alternatives.
Do we choose new function names? Do we add a keyword arg that changes
what np.inner returns?
[a PR]:https://github.com/numpy/numpy/pull/7690
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

2016-06-06 01:20:20 UTC

Permalink

I was using mulmatvec, mulvecmat, mulvecvec back when I was looking at
this. I suppose the mul could also go in the middle, or maybe change it to
x and put it in the middle: matxvec, vecxmat, vecxvec.

Chuck

Sebastian Berg

2016-06-06 07:35:21 UTC

Permalink

Post by Charles R Harris

Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc"
linear algebra, and then deprecate (or at least discourage) using
the older versions with inferior broadcasting rules. Adding a new
keyword arg means we'll be stuck with an awkward API for a long
time to come.
There are three types of matrix/vector products for which ufuncs
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by
inserting dummy dimensions and then calling matmul, but that's a
pretty awkward API, especially for inner products. Unfortunately,
we already use the two most obvious one word names for vector inner
products (inner and dot). But on the other hand, one word names are
not very descriptive, and the short name "dot" probably mostly
exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product

I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
at this. I suppose the mul could also go in the middle, or maybe
change it to x and put it in the middle: matxvec, vecxmat, vecxvec.

Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure anymore,
but might be worth to have a look.

- Sebastian

Post by Charles R Harris
ChuckÂ Â
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

2016-06-06 20:42:19 UTC

Permalink

There I was thinking vector-vector inner product was in fact covered by
`np.inner`. Yikes, half inner, half outer.

As for names, I think `matvecmul` and `vecmul` do seem quite OK (probably
need `vecmatmul` as well, which does the same as `matmul` would for 1-D
first argument).

But as other suggestions, keeping the `dot` one could think of
`vec_dot_vec` and `mat_dot_vec`, etc. More obscure but shorter would be to
use the equivalent `einsum` notation: `i_i`, `ij_j`, `i_ij`, `ij_jk`.

-- Marten

Jaime Fernández del Río

2016-06-06 22:32:44 UTC

Permalink

Post by Sebastian Berg

Post by Charles R Harris

Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc"
linear algebra, and then deprecate (or at least discourage) using
the older versions with inferior broadcasting rules. Adding a new
keyword arg means we'll be stuck with an awkward API for a long
time to come.
There are three types of matrix/vector products for which ufuncs
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by
inserting dummy dimensions and then calling matmul, but that's a
pretty awkward API, especially for inner products. Unfortunately,
we already use the two most obvious one word names for vector inner
products (inner and dot). But on the other hand, one word names are
not very descriptive, and the short name "dot" probably mostly
exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product

I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
at this. I suppose the mul could also go in the middle, or maybe
change it to x and put it in the middle: matxvec, vecxmat, vecxvec.

Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure anymore,
but might be worth to have a look.

We have

from numpy.core.umath_tests import inner1d

which does vectorized vector-vector multiplication, but it's undocumented.
There is also a matrix_multiply in that same module that does the obvious
thing.

And when gufuncs were introduced in linalg, there were a bunch of functions
doing all sorts of operations without intermediate storage, e.g. sum3(a, b,
c) -> a + b + c, that were removed before merging the PR. Wasn't involved
at the time, so not sure what the rationale was.

Since we are at it, should quadratic/bilinear forms get their own function
too? That is, after all, what the OP was asking for.

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayÃºdale en sus planes
de dominaciÃ³n mundial.

Sebastian Berg

2016-06-06 22:41:15 UTC

Permalink

Post by Jaime FernÃ¡ndez del RÃo

Post by Stephan Hoyer

Post by Charles R Harris

Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized

ufunc"

Post by Charles R Harris

Post by Stephan Hoyer
linear algebra, and then deprecate (or at least discourage)

using

Post by Charles R Harris

Post by Stephan Hoyer
the older versions with inferior broadcasting rules. Adding a

new

Post by Charles R Harris

Post by Stephan Hoyer
keyword arg means we'll be stuck with an awkward API for a long
time to come.
There are three types of matrix/vector products for which

ufuncs

Post by Charles R Harris

Post by Stephan Hoyer
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options

Post by Charles R Harris

Post by Stephan Hoyer
inserting dummy dimensions and then calling matmul, but that's

Post by Charles R Harris

Post by Stephan Hoyer
pretty awkward API, especially for inner products.

Unfortunately,

Post by Charles R Harris

Post by Stephan Hoyer
we already use the two most obvious one word names for vector

inner

Post by Charles R Harris

Post by Stephan Hoyer
products (inner and dot). But on the other hand, one word names

are

Post by Charles R Harris

Post by Stephan Hoyer
not very descriptive, and the short name "dot" probably mostly
exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product

I was using mulmatvec, mulvecmat, mulvecvec back when I was

looking

Post by Charles R Harris
at this. I suppose the mul could also go in the middle, or maybe
change it to x and put it in the middle: matxvec, vecxmat,

vecxvec.
Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure
anymore,
but might be worth to have a look.

We have
from numpy.core.umath_tests import inner1d
which does vectorized vector-vector multiplication, but it's
undocumented.Â There is also a matrix_multiply in that same module
that does the obvious thing.
And when gufuncs were introduced in linalg, there were a bunch of
functions doing all sorts of operations without intermediate storage,
e.g. sum3(a, b, c) -> a + b + c, that were removed before merging the
PR. Wasn't involved at the time, so not sure what the rationale was.

I think it was probably just that the api was not thought out much.
Adding sum3 to linalg does seem a bit funny ;). I would not mind it in
numpy as such I guess, if it quite a bit faster anyway, but maybe in
its own submodule for these kind of performance optimizations.

- Sebastian

Post by Jaime FernÃ¡ndez del RÃo
Since we are at it, should quadratic/bilinear forms get their own
function too?Â That is, after all, what the OP was asking for.
Jaime
--Â
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayÃºdale en sus
planes de dominaciÃ³n mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

2016-06-06 23:35:34 UTC

Permalink

On Mon, Jun 6, 2016 at 3:32 PM, Jaime FernÃ¡ndez del RÃo <

Post by Jaime FernÃ¡ndez del RÃo
Since we are at it, should quadratic/bilinear forms get their own function
too? That is, after all, what the OP was asking for.

If we have matvecmul and vecmul, then how to implement bilinear forms
efficiently becomes pretty clear:
np.vecmul(b, np.matvecmul(A, b))

I'm not sure writing a dedicated function in numpy itself makes sense for
something this easy.

I suppose there would be some performance gains from not saving the
immediate result, but I suspect this would be premature optimization in
most cases.

Nathaniel Smith

2016-06-06 22:42:25 UTC

Permalink

Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.
There are three types of matrix/vector products for which ufuncs would be
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on the
other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product

Given how core to linear algebra these are, and that this is a family
of somewhat expert-oriented functions, I think it'd even be fine to
leave the "product" part implicit, like:

np.linalg.matrix_matrix
np.linalg.matrix_vector
np.linalg.vector_matrix
np.linalg.vector_vector
np.linalg.vector_matrix_vector (for bilinear forms)

(or we could shorten matrix -> mat, vector -> vec if we must.)

-n

--
Nathaniel J. Smith -- https://vorpus.org