Erik Brinkman
2016-07-29 18:42:23 UTC
Hi,
The behavior of a ufuncs reduceat on empty slices seems a little strange,
and I wonder if there's a reason behind it / if there's a route to
potentially changing it. First, I'll go into a little background.
I've been making a lot of use of ufuncs reduceat functionality on staggered
arrays. In general, I'll have "n" arrays, each with size "s[n]" and I'll
store them in one array "x", such that "s.sum() == x.size". reduceat is
great because I use
ufunc.reduceat(x, np.insert(s[:-1].cumsum(), 0, 0))
to get some summary information about each array. However, reduceat seems
to behave strangely for empty slices. To make things concrete, let's assume:
import numpy as np
s = np.array([3, 0, 2])
x = np.arange(s.sum())
inds = np.insert(s[:-1].cumsum(), 0, 0)
# [0, 3, 3]
np.add.reduceat(x, inds)
# [3, 3, 7] not [3, 0, 7]
# This is distinct from
np.fromiter(map(np.add.reduce, np.array_split(x, inds[1:])), x.dtype,
s.size - 1)
# [3, 0, 7] what I wanted
The current documentation
<http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduceat.html>
on reduceat first states:
For i in range(len(indices)), reduceat computes
ufunc.reduce(a[indices[i]:indices[i+1]])
That would suggest the outcome, that I expected. However, in the examples
section it goes into a bunch of examples which contradict that statement
and instead suggest that the actual algorithm is more akin to:
ufunc.reduce(a[indices[i]:indices[i+1]]) if indices[i+1] > indices[i] else
indices[i]
Looking at the source
<https://github.com/numpy/numpy/blob/2af06c804931aae4b30bb3349bc60271b0b65381/numpy/core/src/umath/ufunc_object.c#L3697>,
it seems like it's copying indices[i], and then while there are more
elements to process it keeps reducing, resulting in this unexpected
behavior. It seems like the proper thing to do would be start with
ufunc.identity, and then reduce. This is slightly less performant than
what's implemented, but more "correct." There could, of course, just be a
switch to copy the identity only when the slice is empty.
Is there a reason it's implemented like this? Is it just for performance,
or is this strange behavior *useful* somewhere? It seems like "fixing" this
would be bad because you'll be changing somewhat documented functionality
in a backwards incompatible way. What would the best approach to "fixing"
this be? add another function "reduceat_"? add a flag to reduceat to do the
proper thing for empty slices?
Finally, is there a good way to work around this? I think for now I'm just
going to mask out the empty slices and use insert to add them back in, but
if I'm missing an obvious solution, I'll look at that too. I need to mask
them out because, np.add.reduceat(x, 5) would ideally return 0, but instead
it throws an error since 5 is out of range...
Thanks for indulging my curiosity,
Erik
The behavior of a ufuncs reduceat on empty slices seems a little strange,
and I wonder if there's a reason behind it / if there's a route to
potentially changing it. First, I'll go into a little background.
I've been making a lot of use of ufuncs reduceat functionality on staggered
arrays. In general, I'll have "n" arrays, each with size "s[n]" and I'll
store them in one array "x", such that "s.sum() == x.size". reduceat is
great because I use
ufunc.reduceat(x, np.insert(s[:-1].cumsum(), 0, 0))
to get some summary information about each array. However, reduceat seems
to behave strangely for empty slices. To make things concrete, let's assume:
import numpy as np
s = np.array([3, 0, 2])
x = np.arange(s.sum())
inds = np.insert(s[:-1].cumsum(), 0, 0)
# [0, 3, 3]
np.add.reduceat(x, inds)
# [3, 3, 7] not [3, 0, 7]
# This is distinct from
np.fromiter(map(np.add.reduce, np.array_split(x, inds[1:])), x.dtype,
s.size - 1)
# [3, 0, 7] what I wanted
The current documentation
<http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduceat.html>
on reduceat first states:
For i in range(len(indices)), reduceat computes
ufunc.reduce(a[indices[i]:indices[i+1]])
That would suggest the outcome, that I expected. However, in the examples
section it goes into a bunch of examples which contradict that statement
and instead suggest that the actual algorithm is more akin to:
ufunc.reduce(a[indices[i]:indices[i+1]]) if indices[i+1] > indices[i] else
indices[i]
Looking at the source
<https://github.com/numpy/numpy/blob/2af06c804931aae4b30bb3349bc60271b0b65381/numpy/core/src/umath/ufunc_object.c#L3697>,
it seems like it's copying indices[i], and then while there are more
elements to process it keeps reducing, resulting in this unexpected
behavior. It seems like the proper thing to do would be start with
ufunc.identity, and then reduce. This is slightly less performant than
what's implemented, but more "correct." There could, of course, just be a
switch to copy the identity only when the slice is empty.
Is there a reason it's implemented like this? Is it just for performance,
or is this strange behavior *useful* somewhere? It seems like "fixing" this
would be bad because you'll be changing somewhat documented functionality
in a backwards incompatible way. What would the best approach to "fixing"
this be? add another function "reduceat_"? add a flag to reduceat to do the
proper thing for empty slices?
Finally, is there a good way to work around this? I think for now I'm just
going to mask out the empty slices and use insert to add them back in, but
if I'm missing an obvious solution, I'll look at that too. I need to mask
them out because, np.add.reduceat(x, 5) would ideally return 0, but instead
it throws an error since 5 is out of range...
Thanks for indulging my curiosity,
Erik