[Numpy-discussion] Behavior of .reduceat()

Discussion:

Jaime Fernández del Río

2016-03-27 08:36:21 UTC

Two of the oldest issues in the tracker (#834
<https://github.com/numpy/numpy/issues/834> and #835
<https://github.com/numpy/numpy/issues/835>) are about how .reduceat()
handles its indices parameter. I have been taking a look at the source
code, and it would be relatively easy to modify, the hardest part being to
figure out what the exact behavior should be.

Current behavior is that np.ufunc.reduceat(x, ind) returns
[np.ufunc.reduce(a[ind[i]:ind[i+1]]
for i in range(len(ind))] with a couple of caveats:

1. if ind[i] >= ind[i+1], then a[ind[i]] is returned, rather than a
reduction over an empty slice.
2. an index of len(ind) is appended to the indices argument, to be used
as the endpoint of the last slice to reduce over.
3. aside from this last case, the indices are required to be strictly
inbounds, 0 <= index < len(x), or an error is raised

The proposed new behavior, with some optional behaviors, would be:

1. if ind[i] >= ind[i+1], then a reduction over an empty slice, i.e. the
ufunc identity, is returned. This includes raising an error if the ufunc
does not have an identity, e.g. np.minimum.
2. to fully support the "reduction over slices" idea, some form of out
of bounds indices should be allowed. This could mean either that:
1. only index = len(x) is allowed without raising an error, to allow
computing the full reduction anywhere, not just as the last entry of the
return, or
2. allow any index in -len(x) <= index <= len(x), with the usual
meaning given to negative values, or
3. any index is allowed, with reduction results clipped to existing
values (and the usual meaning for negative values).
3. Regarding the appending of that last index of len(ind) to indices, we
could:
1. keep appending it, or
2. never append it, since you can now request it without an error
being raised, or
3. only append it if the last index is smaller than len(x).

My thoughts on the options:

- The minimal, more conservative approach would go with 2.1 and 3.1. And
of course 1, if we don't implement that none of this makes sense.
- I kind of think 2.2 or even 2.3 are a nice enhancement that shouldn't
break too much stuff.
- 3.2 I'm not sure about, probably hurts more than it helps at this
point, although in a brand new design you probably would either not append
the last index or also prepend a zero, as in np.split.
- And 3.3 seems too magical, probably not a good idea, only listed it
for completeness.

Any other thoughts or votes on what, if anything, should we implement, and
what the deprecation of current behavior should look like?

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayÃºdale en sus planes
de dominaciÃ³n mundial.

Marten van Kerkwijk

2016-05-22 19:15:29 UTC

Permalink

Hi Jaime,

Very belated reply, but only with the semester over I seem to have regained
some time to think.

The behaviour of reduceat always has seemed a bit odd to me, logical for
dividing up an array into irregular but contiguous pieces, but illogical
for more random ones (where one effectively passes in pairs of points, only
to remove the unwanted calculations after the fact by slicing with [::2];
indeed, the very first example in the documentation does exactly this [1]).
I'm not sure any of your proposals helps all that much for the latter case,
while it risks breaking existing code in unexpected ways.

For me, for irregular pieces, it would be much nicer to simply pass in
pairs of points. I think this can be quite easily done in the current API,
by expanding it to recognize multidimensional index arrays (with last
dimension of 2; maybe 3 for step as well?). These numbers would just be the
equivalent of start, end (and step?) of `slice`, so I think one can allow
any integer with negative values having the usual meaning and clipping at 0
and length. So, specifically, the first example in the documentation would
change from:

np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]

to

np.add.reduceat(np.arange(8),[(0, 4), (1, 5), (2, 6), (3,7)])

(Or an equivalent ndarray. Note how horrid the example is: really, you'd
want 4,8 as a pair too, but in the current API, you'd get that by adding a
4.)

What do you think? Would this also be easy to implement?

All the best,

Marten

â[1]
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduceat.html

Feng Yu

2016-05-23 02:00:30 UTC

Permalink

Hi Marten,

As a user of reduceat I seriously like your new proposal!

I notice that in your current proposal, each element in the 'at' list
shall be interpreted asif they are parameters to `slice`.

I wonder if it is meaningful to define reduceat on other `fancy` indexing types?

Cheers,

- Yu

On Sun, May 22, 2016 at 12:15 PM, Marten van Kerkwijk

Post by Marten van Kerkwijk
Hi Jaime,
Very belated reply, but only with the semester over I seem to have regained
some time to think.
The behaviour of reduceat always has seemed a bit odd to me, logical for
dividing up an array into irregular but contiguous pieces, but illogical for
more random ones (where one effectively passes in pairs of points, only to
remove the unwanted calculations after the fact by slicing with [::2];
indeed, the very first example in the documentation does exactly this [1]).
I'm not sure any of your proposals helps all that much for the latter case,
while it risks breaking existing code in unexpected ways.
For me, for irregular pieces, it would be much nicer to simply pass in pairs
of points. I think this can be quite easily done in the current API, by
expanding it to recognize multidimensional index arrays (with last dimension
of 2; maybe 3 for step as well?). These numbers would just be the equivalent
of start, end (and step?) of `slice`, so I think one can allow any integer
with negative values having the usual meaning and clipping at 0 and length.
np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]
to
np.add.reduceat(np.arange(8),[(0, 4), (1, 5), (2, 6), (3,7)])
(Or an equivalent ndarray. Note how horrid the example is: really, you'd
want 4,8 as a pair too, but in the current API, you'd get that by adding a
4.)
What do you think? Would this also be easy to implement?
All the best,
Marten
[1]
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduceat.html
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion