Discussion:
[Numpy-discussion] Get rid of special scalar arithmetic
Charles R Harris
2016-01-13 05:18:43 UTC
Permalink
Hi All,

I've opened issue #7002 <https://github.com/numpy/numpy/issues/7002>,
reproduced below, for discussion.
Numpy umath has a file scalarmath.c.src that implements scalar arithmetic
using special functions that are about 10x faster than the equivalent
ufuncs.
In [1]: a = np.float64(1)
In [2]: timeit a*a
10000000 loops, best of 3: 69.5 ns per loop
In [3]: timeit np.multiply(a, a)
1000000 loops, best of 3: 722 ns per loop
I contend that in large programs this improvement in execution time is not
worth the complexity and maintenance overhead; it is unlikely that
scalar-scalar arithmetic is a significant part of their execution time.
Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic.
This would also bring the benefits of __numpy_ufunc__ to scalars with
minimal effort.
Thoughts?

Chuck
Nathaniel Smith
2016-01-13 08:52:26 UTC
Permalink
On Tue, Jan 12, 2016 at 9:18 PM, Charles R Harris
Post by Charles R Harris
Hi All,
I've opened issue #7002, reproduced below, for discussion.
Numpy umath has a file scalarmath.c.src that implements scalar arithmetic using special functions that are about 10x faster than the equivalent ufuncs.
In [1]: a = np.float64(1)
In [2]: timeit a*a
10000000 loops, best of 3: 69.5 ns per loop
In [3]: timeit np.multiply(a, a)
1000000 loops, best of 3: 722 ns per loop
I contend that in large programs this improvement in execution time is not worth the complexity and maintenance overhead; it is unlikely that scalar-scalar arithmetic is a significant part of their execution time. Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic. This would also bring the benefits of __numpy_ufunc__ to scalars with minimal effort.
Thoughts?
+1e6, scalars are a maintenance disaster is so many ways.

But can we actually pull it off? IIRC there were complaints about
scalars getting slower at some point (and not 10x slower), because
it's not actually too hard to have code that is heavy on scalar
arithmetic. (Indexing an array returns a numpy scalar rather than a
python object, even if these look similar, so any code that, say, does
a Python loop over the elements of an array may well be bottlenecked
by scalar arithmetic. Obviously it's better not to do such loops,
but...)

It still seems to me that surely we can surely speed up ufuncs on
scalars / small arrays? Also I am somewhat encouraged that like you I
get ~700 ns for multiply(scalar, scalar) versus ~70 ns for scalar *
scalar, but I also get ~380 ns for 0d-array * 0d-array. (I guess
probably for multiply(scalar, scalar) we're first calling asarray on
both scalar objects, which is certainly avoidable.)

Here's a profile of zerod * zerod [0]: http://vorpus.org/~njs/tmp/zerod.svg
(Click on PyNumber_Multiply to zoom in on the relevant part)

And here's multiply(scalar, scalar) [1]: http://vorpus.org/~njs/tmp/scalar.svg

In principle it feels like tons of this stuff is fat that can be
trimmed -- even in the first, faster, profile, we're allocating a 0d
array and then converting it to a scalar, and the latter conversion in
PyArray_Return takes 12% of time on its own; like 14% of the time is
spent trying to figure out from scratch the complicated type
resolution and casting procedure needed to multiply two float64s, ...

[0]
a = np.array(1, dtype=float)
for i in range(...):
a * a

[1]
s = np.float64(1)
m = np.multiply
for i in range(...):
m(s, s)

-n
--
Nathaniel J. Smith -- http://vorpus.org
Robert Kern
2016-01-13 09:12:08 UTC
Permalink
Post by Charles R Harris
Hi All,
I've opened issue #7002, reproduced below, for discussion.
Numpy umath has a file scalarmath.c.src that implements scalar
arithmetic using special functions that are about 10x faster than the
equivalent ufuncs.
Post by Charles R Harris
In [1]: a = np.float64(1)
In [2]: timeit a*a
10000000 loops, best of 3: 69.5 ns per loop
In [3]: timeit np.multiply(a, a)
1000000 loops, best of 3: 722 ns per loop
I contend that in large programs this improvement in execution time is
not worth the complexity and maintenance overhead; it is unlikely that
scalar-scalar arithmetic is a significant part of their execution time.
Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic.
This would also bring the benefits of __numpy_ufunc__ to scalars with
minimal effort.
Post by Charles R Harris
Thoughts?
Not all important-to-optimize programs are large in our field; interactive
use is rampant. The scalar optimizations weren't added speculatively:
people noticed that their Numeric code ran much slower under numpy and were
reluctant to migrate. I was forever responding on comp.lang.python, "It's
because scalar arithmetic hasn't been optimized yet. We know how to do it,
we just need a volunteer to do the work. Contributions gratefully
accepted!" The most critical areas tended to be optimization where you are
often working with implicit scalars that pop out in the optimization loop.

--
Robert Kern
Marten van Kerkwijk
2016-01-13 15:33:58 UTC
Permalink
Just thought I would add here a general comment I made in the thread:
replacing scalars everywhere with array scalars (i.e., ndim=0) would be
great also from the perspective of ndarray subclasses; as is, it is quite
annoying to have to special-case, e.g., getting a single subclass element,
and rewrapping the scalar in the subclass.
-- Marten
Sebastian Berg
2016-01-13 17:59:35 UTC
Permalink
Post by Marten van Kerkwijk
replacing scalars everywhere with array scalars (i.e., ndim=0) would
be great also from the perspective of ndarray subclasses; as is, it
is quite annoying to have to special-case, e.g., getting a single
subclass element, and rewrapping the scalar in the subclass.
-- Marten
I understand the sentiment, and right now I think we usually give the
subclass the chance to rewrap itself around 0-d arrays. But ideally I
think this is incorrect. Either you want the scalar to be a scalar, or
the array actually holds information which is associated with the dtype
(i.e. units) and thus should survive conversion to scalar.

To me personally, I don't think that we can really remove scalars, due
to things such as mutability, sequence ABC registration and with that
also hashability.

My gut feeling is that there is actually an advantage in having a
scalar object, even if internally this scalar object could reuse a lot.
Note that a, e.g. 0-d write-only array would raise an error on `a +=
1`....

Now practicality beating purity and all that, but to me it is not
obvious that it would be the best thing to get rid of scalars completly
(getting rid of the code duplication is a different issue).

- Sebastian
Post by Marten van Kerkwijk
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...