Discussion:
[Numpy-discussion] ENH: compute many inner products quickly
Scott Sievert
2016-05-29 03:53:33 UTC
Permalink
I recently ran into an application where I had to compute many inner products quickly (roughy 50k inner products in less than a second). I wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1, 
, xn.T @ A @ xn]` with A.shape = (1k, 1k).

My first instinct was to look for a NumPy function to quickly compute this, such as np.inner. However, it looks like np.inner has some other behavior and I couldn’t get tensordot/einsum to work for me.
 The main challenge is to figure out how to transition the behavior of all these operations, while preserving backwards compatibility. Quite likely, we need to pick new names for these functions, though we should try to pick something that doesn't suggest that they are second class alternatives.
Do we choose new function names? Do we add a keyword arg that changes what np.inner returns?

[a PR]:https://github.com/numpy/numpy/pull/7690
Stephan Hoyer
2016-06-06 00:41:52 UTC
Permalink
If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.

There are three types of matrix/vector products for which ufuncs would be
nice:
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product

It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on
the other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.

So I'll start by throwing out some potential new names:

For matrix-vector products:
matvecmul (if it's worth making a new operator)

For inner products:
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
Post by Scott Sievert
I recently ran into an application where I had to compute many inner
products quickly (roughy 50k inner products in less than a second). I
My first instinct was to look for a NumPy function to quickly compute
this, such as np.inner. However, it looks like np.inner has some other
behavior and I couldn’t get tensordot/einsum to work for me.
Then a labmate pointed out that I can just do some slick matrix
I opened [a PR] with this, and proposed that we define a new function
called `inner_prods` for this.
The main challenge is to figure out how to transition the behavior of
all these operations, while preserving backwards compatibility. Quite
likely, we need to pick new names for these functions, though we should try
to pick something that doesn't suggest that they are second class
alternatives.
Do we choose new function names? Do we add a keyword arg that changes what
np.inner returns?
[a PR]:https://github.com/numpy/numpy/pull/7690
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-06-06 00:50:44 UTC
Permalink
Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.
There are three types of matrix/vector products for which ufuncs would be
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on
the other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
how about names in plural as in the PR
I thought the `s` in inner_prods would signal better the broadcasting
behavior

dot_products
...

"dots" ? (I guess not)

Josef
Post by Stephan Hoyer
Post by Scott Sievert
I recently ran into an application where I had to compute many inner
products quickly (roughy 50k inner products in less than a second). I
My first instinct was to look for a NumPy function to quickly compute
this, such as np.inner. However, it looks like np.inner has some other
behavior and I couldn’t get tensordot/einsum to work for me.
Then a labmate pointed out that I can just do some slick matrix
I opened [a PR] with this, and proposed that we define a new function
called `inner_prods` for this.
The main challenge is to figure out how to transition the behavior of
all these operations, while preserving backwards compatibility. Quite
likely, we need to pick new names for these functions, though we should try
to pick something that doesn't suggest that they are second class
alternatives.
Do we choose new function names? Do we add a keyword arg that changes
what np.inner returns?
[a PR]:https://github.com/numpy/numpy/pull/7690
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2016-06-06 01:20:20 UTC
Permalink
Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.
There are three types of matrix/vector products for which ufuncs would be
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on
the other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
I was using mulmatvec, mulvecmat, mulvecvec back when I was looking at
this. I suppose the mul could also go in the middle, or maybe change it to
x and put it in the middle: matxvec, vecxmat, vecxvec.

Chuck
Sebastian Berg
2016-06-06 07:35:21 UTC
Permalink
Post by Charles R Harris
Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc"
linear algebra, and then deprecate (or at least discourage) using
the older versions with inferior broadcasting rules. Adding a new
keyword arg means we'll be stuck with an awkward API for a long
time to come.
There are three types of matrix/vector products for which ufuncs
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by
inserting dummy dimensions and then calling matmul, but that's a
pretty awkward API, especially for inner products. Unfortunately,
we already use the two most obvious one word names for vector inner
products (inner and dot). But on the other hand, one word names are
not very descriptive, and the short name "dot" probably mostly
exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
at this. I suppose the mul could also go in the middle, or maybe
change it to x and put it in the middle: matxvec, vecxmat, vecxvec.
Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure anymore,
but might be worth to have a look.

- Sebastian
Post by Charles R Harris
Chuck  
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Marten van Kerkwijk
2016-06-06 20:42:19 UTC
Permalink
There I was thinking vector-vector inner product was in fact covered by
`np.inner`. Yikes, half inner, half outer.

As for names, I think `matvecmul` and `vecmul` do seem quite OK (probably
need `vecmatmul` as well, which does the same as `matmul` would for 1-D
first argument).

But as other suggestions, keeping the `dot` one could think of
`vec_dot_vec` and `mat_dot_vec`, etc. More obscure but shorter would be to
use the equivalent `einsum` notation: `i_i`, `ij_j`, `i_ij`, `ij_jk`.

-- Marten
Jaime Fernández del Río
2016-06-06 22:32:44 UTC
Permalink
Post by Sebastian Berg
Post by Charles R Harris
Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc"
linear algebra, and then deprecate (or at least discourage) using
the older versions with inferior broadcasting rules. Adding a new
keyword arg means we'll be stuck with an awkward API for a long
time to come.
There are three types of matrix/vector products for which ufuncs
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by
inserting dummy dimensions and then calling matmul, but that's a
pretty awkward API, especially for inner products. Unfortunately,
we already use the two most obvious one word names for vector inner
products (inner and dot). But on the other hand, one word names are
not very descriptive, and the short name "dot" probably mostly
exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
I was using mulmatvec, mulvecmat, mulvecvec back when I was looking
at this. I suppose the mul could also go in the middle, or maybe
change it to x and put it in the middle: matxvec, vecxmat, vecxvec.
Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure anymore,
but might be worth to have a look.
We have

from numpy.core.umath_tests import inner1d

which does vectorized vector-vector multiplication, but it's undocumented.
There is also a matrix_multiply in that same module that does the obvious
thing.

And when gufuncs were introduced in linalg, there were a bunch of functions
doing all sorts of operations without intermediate storage, e.g. sum3(a, b,
c) -> a + b + c, that were removed before merging the PR. Wasn't involved
at the time, so not sure what the rationale was.

Since we are at it, should quadratic/bilinear forms get their own function
too? That is, after all, what the OP was asking for.

Jaime
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
Sebastian Berg
2016-06-06 22:41:15 UTC
Permalink
Post by Jaime Fernández del Río
Post by Stephan Hoyer
Post by Charles R Harris
Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized
ufunc"
Post by Charles R Harris
Post by Stephan Hoyer
linear algebra, and then deprecate (or at least discourage)
using
Post by Charles R Harris
Post by Stephan Hoyer
the older versions with inferior broadcasting rules. Adding a
new
Post by Charles R Harris
Post by Stephan Hoyer
keyword arg means we'll be stuck with an awkward API for a long
time to come.
There are three types of matrix/vector products for which
ufuncs
Post by Charles R Harris
Post by Stephan Hoyer
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options
by
Post by Charles R Harris
Post by Stephan Hoyer
inserting dummy dimensions and then calling matmul, but that's
a
Post by Charles R Harris
Post by Stephan Hoyer
pretty awkward API, especially for inner products.
Unfortunately,
Post by Charles R Harris
Post by Stephan Hoyer
we already use the two most obvious one word names for vector
inner
Post by Charles R Harris
Post by Stephan Hoyer
products (inner and dot). But on the other hand, one word names
are
Post by Charles R Harris
Post by Stephan Hoyer
not very descriptive, and the short name "dot" probably mostly
exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
I was using mulmatvec, mulvecmat, mulvecvec back when I was
looking
Post by Charles R Harris
at this. I suppose the mul could also go in the middle, or maybe
change it to x and put it in the middle: matxvec, vecxmat,
vecxvec.
Were not some of this part of the gufunc linalg functions and we just
removed it because we were not sure about the API? Not sure
anymore,
but might be worth to have a look.
We have
from numpy.core.umath_tests import inner1d
which does vectorized vector-vector multiplication, but it's
undocumented.  There is also a matrix_multiply in that same module
that does the obvious thing.
And when gufuncs were introduced in linalg, there were a bunch of
functions doing all sorts of operations without intermediate storage,
e.g. sum3(a, b, c) -> a + b + c, that were removed before merging the
PR. Wasn't involved at the time, so not sure what the rationale was.
I think it was probably just that the api was not thought out much.
Adding sum3 to linalg does seem a bit funny ;). I would not mind it in
numpy as such I guess, if it quite a bit faster anyway, but maybe in
its own submodule for these kind of performance optimizations.

- Sebastian
Post by Jaime Fernández del Río
Since we are at it, should quadratic/bilinear forms get their own
function too?  That is, after all, what the OP was asking for.
Jaime
-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Stephan Hoyer
2016-06-06 23:35:34 UTC
Permalink
On Mon, Jun 6, 2016 at 3:32 PM, Jaime Fernández del Río <
Post by Jaime Fernández del Río
Since we are at it, should quadratic/bilinear forms get their own function
too? That is, after all, what the OP was asking for.
If we have matvecmul and vecmul, then how to implement bilinear forms
efficiently becomes pretty clear:
np.vecmul(b, np.matvecmul(A, b))

I'm not sure writing a dedicated function in numpy itself makes sense for
something this easy.

I suppose there would be some performance gains from not saving the
immediate result, but I suspect this would be premature optimization in
most cases.

Nathaniel Smith
2016-06-06 22:42:25 UTC
Permalink
Post by Stephan Hoyer
If possible, I'd love to add new functions for "generalized ufunc" linear
algebra, and then deprecate (or at least discourage) using the older
versions with inferior broadcasting rules. Adding a new keyword arg means
we'll be stuck with an awkward API for a long time to come.
There are three types of matrix/vector products for which ufuncs would be
1. matrix-matrix product (covered by matmul)
2. matrix-vector product
3. vector-vector (inner) product
It's straightful to implement either of the later two options by inserting
dummy dimensions and then calling matmul, but that's a pretty awkward API,
especially for inner products. Unfortunately, we already use the two most
obvious one word names for vector inner products (inner and dot). But on the
other hand, one word names are not very descriptive, and the short name
"dot" probably mostly exists because of the lack of an infix operator.
matvecmul (if it's worth making a new operator)
vecmul (similar to matmul, but probably too ambiguous)
dot_product
inner_prod
inner_product
Given how core to linear algebra these are, and that this is a family
of somewhat expert-oriented functions, I think it'd even be fine to
leave the "product" part implicit, like:

np.linalg.matrix_matrix
np.linalg.matrix_vector
np.linalg.vector_matrix
np.linalg.vector_vector
np.linalg.vector_matrix_vector (for bilinear forms)

(or we could shorten matrix -> mat, vector -> vec if we must.)

-n
--
Nathaniel J. Smith -- https://vorpus.org
Loading...