Discussion:
[Numpy-discussion] New Indexing Methods Revival #N (subclasses!)
Sebastian Berg
2016-09-03 19:08:46 UTC
Permalink
Hi all,

not that I am planning to spend much time on this right now, however, I
did a small rebase of the stuff I had (did not push yet) on oindex and
remembered the old problem ;).

The one remaining issue I have with adding things like (except making
the code prettier and writing tests):

arr.oindex[...]  # outer/orthogonal indexing
arr.vindex[...]  # Picking of elements (much like current)
arr.lindex[...]  # current behaviour for backward compat

is what to do about subclasses. Now what I can do (and have currently
in my branch) is to tell someone on `subclass.oindex[...]`: This won't
work, the subclass implements `__getitem__` or `__setitem__` so I don't
know if the result would be correct (its a bit annoying if you also
warn about using those attributes, but...).

However, with or without such error, we need a nice way for subclasses
to define these attributes! This is important even within numpy at
least for masked arrays (possibly also matrix and memmap).

They (typically) do some stuff before or after the plain indexing
operation, so how do we make it convenient to allow them to do the same
stuff for the special indexing attributes without weird code
duplication? I can think of things, but nothing too great yet so maybe
you guys got an elegant idea.

- Sebastian
Sebastian Berg
2016-09-04 12:10:23 UTC
Permalink
On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:
> Hi all,
>
> not that I am planning to spend much time on this right now, however,
> I
> did a small rebase of the stuff I had (did not push yet) on oindex
> and
> remembered the old problem ;).
>
> The one remaining issue I have with adding things like (except making
> the code prettier and writing tests):
>
> arr.oindex[...]  # outer/orthogonal indexing
> arr.vindex[...]  # Picking of elements (much like current)
> arr.lindex[...]  # current behaviour for backward compat
>
> is what to do about subclasses. Now what I can do (and have currently
> in my branch) is to tell someone on `subclass.oindex[...]`: This
> won't
> work, the subclass implements `__getitem__` or `__setitem__` so I
> don't
> know if the result would be correct (its a bit annoying if you also
> warn about using those attributes, but...).
>

Hmm, I am considering to expose a new indexing helper object. So that
subclasses could implement something like `__numpy_getitem__` and
`__numpy_setitem__` and if they do (and preferably nothing else) they
would get back passed a small object with some information about the
indexing operation. So that the subclass would implement:

```
def __numpy_setitem__(self, indexer, values):
    indexer.method  # one of {"plain", "oindex", "vindex", "lindex"}
    indexer.scalar  # Will the result be a scalar?
    indexer.view  # Will the result be a view or a copy?
    # More information might be possible (note that not all checks are
# done at this point, just basic checks will have happened already).

    # Do some code, that prepares self or values, could also use
    # indexer for another array (e.g. mask) of the same shape.

    result = indexer(self, values)

    # Do some coded to fixup the result if necessary.
    # Should discuss whether result is first a plain ndarray or
    # already wrapped.
```

This could be implemented in the C-side without much hassle, I think.
Of course it adds some new API which we would have to support
indefinitely. But it seems something like this would also fix the
hassle of identifying e.g. if the result should be a scalar for a
subclass (which may even be impossible in some cases).

Would be very happy about feedback from subclassers!

- Sebastian


> However, with or without such error, we need a nice way for
> subclasses
> to define these attributes! This is important even within numpy at
> least for masked arrays (possibly also matrix and memmap).
>
> They (typically) do some stuff before or after the plain indexing
> operation, so how do we make it convenient to allow them to do the
> same
> stuff for the special indexing attributes without weird code
> duplication? I can think of things, but nothing too great yet so
> maybe
> you guys got an elegant idea.
>
> - Sebastian
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-09-04 12:23:03 UTC
Permalink
On So, 2016-09-04 at 14:10 +0200, Sebastian Berg wrote:
> On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote:
> >
> > Hi all,
> >
> > not that I am planning to spend much time on this right now,
> > however,
> > I
> > did a small rebase of the stuff I had (did not push yet) on oindex
> > and
> > remembered the old problem ;).
> >
> > The one remaining issue I have with adding things like (except
> > making
> > the code prettier and writing tests):
> >
> > arr.oindex[...]  # outer/orthogonal indexing
> > arr.vindex[...]  # Picking of elements (much like current)
> > arr.lindex[...]  # current behaviour for backward compat
> >
> > is what to do about subclasses. Now what I can do (and have
> > currently
> > in my branch) is to tell someone on `subclass.oindex[...]`: This
> > won't
> > work, the subclass implements `__getitem__` or `__setitem__` so I
> > don't
> > know if the result would be correct (its a bit annoying if you also
> > warn about using those attributes, but...).
> >
> Hmm, I am considering to expose a new indexing helper object. So that
> subclasses could implement something like `__numpy_getitem__` and
> `__numpy_setitem__` and if they do (and preferably nothing else) they
> would get back passed a small object with some information about the
> indexing operation. So that the subclass would implement:
>
> ```
> def __numpy_setitem__(self, indexer, values):
>     indexer.method  # one of {"plain", "oindex", "vindex", "lindex"}
>     indexer.scalar  # Will the result be a scalar?
>     indexer.view  # Will the result be a view or a copy?
>     # More information might be possible (note that not all checks
> are
>     # done at this point, just basic checks will have happened
> already).
>
>     # Do some code, that prepares self or values, could also use
>     # indexer for another array (e.g. mask) of the same shape.
>
>     result = indexer(self, values)
>
>     # Do some coded to fixup the result if necessary.
>     # Should discuss whether result is first a plain ndarray or
>     # already wrapped.
> ```

Hmm, field access is a bit annoying, but I guess can/has to be
included.

>
> This could be implemented in the C-side without much hassle, I think.
> Of course it adds some new API which we would have to support
> indefinitely. But it seems something like this would also fix the
> hassle of identifying e.g. if the result should be a scalar for a
> subclass (which may even be impossible in some cases).
>
> Would be very happy about feedback from subclassers!
>
> - Sebastian
>
>
> >
> > However, with or without such error, we need a nice way for
> > subclasses
> > to define these attributes! This is important even within numpy at
> > least for masked arrays (possibly also matrix and memmap).
> >
> > They (typically) do some stuff before or after the plain indexing
> > operation, so how do we make it convenient to allow them to do the
> > same
> > stuff for the special indexing attributes without weird code
> > duplication? I can think of things, but nothing too great yet so
> > maybe
> > you guys got an elegant idea.
> >
> > - Sebastian
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-***@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Marten van Kerkwijk
2016-09-04 15:20:53 UTC
Permalink
Hi Sebastian,

I haven't given this as much thought as it deserves, but thought I
would comment from the astropy perspective, where we both have direct
subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and
classes that store their data internally as ndarray (subclass)
instances (`Time`, `SkyCoord`, ...).

One comment would be that if one were to introduce a special method,
one should perhaps think a bit more broadly, and capture more than the
indexing methods with it. I wonder about this because for the
array-holding classes mentioned above, we initially just had
`__getitem__`, which got the relevant items from the underlying
arrays, and then constructed a new instance with those. But recently
we realised that methods like `reshape`, `transpose`, etc., require
essentially the same steps, and so we constructed a new
`ShapedLikeNDArray` mixin, which provides all of those [1] as long as
one defines a single `_apply` method. (Indeed, it turns out that the
same technique works without any real change for some numpy functions
such as `np.broadcast_to`.)

That said, in the actual ndarray subclasses, we have not found a need
to overwrite any of the reshaping methods, since those methods are all
handled OK via `__array_finalize__`. We do overwrite `__getitem__`
(and `item`) as we need to take care of scalars. And we would
obviously have to overwrite `oindex`, etc., as well, for the same
reason, so in that respect a common method might be useful.

However, perhaps it is worth considering that the only reason we need
to overwrite them in the first place, unlike what is the case for all
the shape-changing methods, is that scalar output does not get put
through `__array_finalize__`. Might it be an idea to have the new
indexing methods return array scalars instead of normal ones so we can
get rid of this?

All the best,

Marten

[1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L856
Sebastian Berg
2016-09-05 05:48:25 UTC
Permalink
On So, 2016-09-04 at 11:20 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
>
> I haven't given this as much thought as it deserves, but thought I
> would comment from the astropy perspective, where we both have direct
> subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and
> classes that store their data internally as ndarray (subclass)
> instances (`Time`, `SkyCoord`, ...).
>
> One comment would be that if one were to introduce a special method,
> one should perhaps think a bit more broadly, and capture more than
> the
> indexing methods with it. I wonder about this because for the
> array-holding classes mentioned above, we initially just had
> `__getitem__`, which got the relevant items from the underlying
> arrays, and then constructed a new instance with those. But recently
> we realised that methods like `reshape`, `transpose`, etc., require
> essentially the same steps, and so we constructed a new
> `ShapedLikeNDArray` mixin, which provides all of those [1] as long as
> one defines a single `_apply` method. (Indeed, it turns out that the
> same technique works without any real change for some numpy functions
> such as `np.broadcast_to`.)
>
> That said, in the actual ndarray subclasses, we have not found a need
> to overwrite any of the reshaping methods, since those methods are
> all
> handled OK via `__array_finalize__`. We do overwrite `__getitem__`
> (and `item`) as we need to take care of scalars. And we would
> obviously have to overwrite `oindex`, etc., as well, for the same
> reason, so in that respect a common method might be useful.
>
> However, perhaps it is worth considering that the only reason we need
> to overwrite them in the first place, unlike what is the case for all
> the shape-changing methods, is that scalar output does not get put
> through `__array_finalize__`. Might it be an idea to have the new
> indexing methods return array scalars instead of normal ones so we
> can
> get rid of this?

I did not realize the new numpys are special with the scalar handling?
The indexing (already before 1.9. I believe) always goes through
PyArray_ScalarReturn or so, which I thought was used by almost all
functions.

If you mean the attributes (oindex, etc.), they could behave a bit
different of course, though not sure to what it extend it actually
helps since that would also create disparity.
If we implement a new special method (__numpy_getitem__), they
definitely should behave slightly different in some places. One option
might be to not even do the wrapping, but leave it to the subclass.

However, if you have an array with arrays inside, knowing whether to
return a scalar correctly would have to rely on inspecting the index
object, which is why I suggested the indexer to give a few extra
informations (such as this one).

Of course, since the scalar return goes through a ScalarReturn
function, that function could maybe also be tought to indicate the
scalar to `__array_finalize__`/`__array_wrap__` (not sure what exactly
applies).

- Sebastian


> All the best,
>
> Marten
>
> [1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc
> .py#L856
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Marten van Kerkwijk
2016-09-05 18:54:07 UTC
Permalink
Hi Sebastian,

Indeed, having the scalar pass through `__array_wrap__` would have
been useful (_finalize__ is too late, since one cannot change the
class any more, just set attributes). But that is water under the
bridge, since we're stuck with people not expecting that.

I think the slightly larger question, but one somewhat orthogonal to
your suggestion of a new dundermethod, is whether one cannot avoid
more such methods by the new indexing routines returning array scalars
instead of regular ones.

Obviously, though, this has larger scope, as it might be part of the
merging of the now partially separate code paths for scalar and array
arithmetic, etc.

All the best,

Marten
Sebastian Berg
2016-09-05 19:22:15 UTC
Permalink
On Mo, 2016-09-05 at 14:54 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
>
> Indeed, having the scalar pass through `__array_wrap__` would have
> been useful (_finalize__ is too late, since one cannot change the
> class any more, just set attributes).  But that is water under the
> bridge, since we're stuck with people not expecting that.
>
> I think the slightly larger question, but one somewhat orthogonal to
> your suggestion of a new dundermethod, is whether one cannot avoid
> more such methods by the new indexing routines returning array
> scalars
> instead of regular ones.
>
> Obviously, though, this has larger scope, as it might be part of the
> merging of the now partially separate code paths for scalar and array
> arithmetic, etc.

Thanks for the input. I am not quite sure about all of the things.
Calling array wrap for the scalar returns does not sound like a problem
(it would also effect other code paths). Calling it only for the new
methods creates a bit of branching, but is not a big deal.

Would it help you though? You could avoid implementing all the new
indexing methods for many/most subclasses, but how do you tell numpy
that you are supporting them? Right now I thought it would make sense
to give an error if you try `subclass.vindex[...]` but the subclass has
`__getitem__` implemented (and not overwritten vindex).

The dundermethod gives a way to tell numpy: you know what to do. For
the sake of masked arrays it is also convenient (you can use the
indexer also on the mask), but masked arrays are rather special. It
would be interesting if there are more complex subclasses out there,
which implement `__getitem__` or `__setitem__`. Maybe all we need is
some new trick for the scalars and most subclasses can just remove
their `__getitem__` methods....

- Sebastian


>
> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Marten van Kerkwijk
2016-09-05 22:31:36 UTC
Permalink
Actually, on those names: an alternative to your proposal would be to
introduce only one new method which can do all types of indexing,
depending on a keyword argument, i.e., something like
```
def getitem(self, item, mode='outer'):
...
```

-- Marten
Sebastian Berg
2016-09-06 06:53:44 UTC
Permalink
On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> Actually, on those names: an alternative to your proposal would be to
> introduce only one new method which can do all types of indexing,
> depending on a keyword argument, i.e., something like
> ```
> def getitem(self, item, mode='outer'):
>     ...
> ```

Yeah we can do that easily. The only disadvantage is losing the square
bracket notation.


>
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Sebastian Berg
2016-09-06 07:46:20 UTC
Permalink
On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> >
> > Actually, on those names: an alternative to your proposal would be
> > to
> > introduce only one new method which can do all types of indexing,
> > depending on a keyword argument, i.e., something like
> > ```
> > def getitem(self, item, mode='outer'):
> >     ...
> > ```
> Have I been overthinking this, eh? Just making it `__getitem__(self,
> index, mode=...)` and then from `vindex` calling the subclasses
> `__getitem__(self, index, mode="vector")` or so would already solve
> the
> issue almost fully? Only thing I am not quite sure about:
>
> 1. Is `__getitem__` in some way special to make this difficult (also
> considering some new ideas like allowing object[a=4]?

OK; I think the C-side slot cannot get the kwarg likely, but probably
you can find a solution for that....

> 2. Do we currently have serious deficiencies we want to fix, and
> could
> maybe not fix like that.
>
>
> >
> > -- Marten
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-***@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern
2016-09-06 09:57:17 UTC
Permalink
On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg <***@sipsolutions.net>
wrote:
>
> On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> > >
> > > Actually, on those names: an alternative to your proposal would be
> > > to
> > > introduce only one new method which can do all types of indexing,
> > > depending on a keyword argument, i.e., something like
> > > ```
> > > def getitem(self, item, mode='outer'):
> > > ...
> > > ```
> > Have I been overthinking this, eh? Just making it `__getitem__(self,
> > index, mode=...)` and then from `vindex` calling the subclasses
> > `__getitem__(self, index, mode="vector")` or so would already solve
> > the
> > issue almost fully? Only thing I am not quite sure about:
> >
> > 1. Is `__getitem__` in some way special to make this difficult (also
> > considering some new ideas like allowing object[a=4]?
>
> OK; I think the C-side slot cannot get the kwarg likely, but probably
> you can find a solution for that....

Well, the solution is to use a different name, I think.

--
Robert Kern
Sebastian Berg
2016-09-06 16:26:27 UTC
Permalink
On Di, 2016-09-06 at 10:57 +0100, Robert Kern wrote:
> On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg <***@sipsolution
> s.net> wrote:
> >
> > On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> > > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> > > >
> > > > Actually, on those names: an alternative to your proposal would
> be
> > > > to
> > > > introduce only one new method which can do all types of
> indexing,
> > > > depending on a keyword argument, i.e., something like
> > > > ```
> > > > def getitem(self, item, mode='outer'):
> > > >     ...
> > > > ```
> > > Have I been overthinking this, eh? Just making it
> `__getitem__(self,
> > > index, mode=...)` and then from `vindex` calling the subclasses
> > > `__getitem__(self, index, mode="vector")` or so would already
> solve
> > > the
> > > issue almost fully? Only thing I am not quite sure about:
> > >
> > > 1. Is `__getitem__` in some way special to make this difficult
> (also
> > > considering some new ideas like allowing object[a=4]?
> >
> > OK; I think the C-side slot cannot get the kwarg likely, but
> probably
> > you can find a solution for that....
>
> Well, the solution is to use a different name, I think.
>

Yeah :). Which goes back to `__numpy_getitem__` or so, just with a
slightly different (simpler) API. Something more along:

1. If subclass has `__numpy_getitem__` call it with the method
   keyword. Or just add the argument to `__getitem__` which should
   likely work as well.
2. Implement `ndarray.__numpy_getitem__` which takes the method
   keyword and subclasses would call it instead of
   `ndarray.__getitem__` their base class call.

The option I first mentioned would be similar but allows to give a bit
of extra information to the subclass which may be useful. But if no one
actually needs that information (this information would be things
available after inspection of the indexing object) it just adds quite a
bit of code and thus a maintenance burden).

Such a new method could of course do things slightly different (such as
the scalar cases, I really have to understand that wrapping thing, I am
always worried about the array of array case as well. Or that annoying
setitem calls getitem. Or maybe not wrap the array at all.).

- Sebastian


> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathan Goldbaum
2016-09-05 23:19:01 UTC
Permalink
On Monday, September 5, 2016, Marten van Kerkwijk <***@gmail.com>
wrote:

> Hi Sebastian,
>
> It would seem to me that any subclass has to keep up to date with new
> features in ndarray, and while I think ndarray has a responsibility
> not to break backward compatibility, I do not think it has to protect
> against new features possibly not working as expected in subclasses.
> In particular, I think it is overly complicated (and an unnecessary
> maintenance burden) to error out if a subclass has `__getitem__`
> overwritten, but not `oindex`.
>
> For somewhat similar reasons, I'm not too keen on a new
> `__numpy_getitem__` method; I realise it might reduce complexity for
> some ndarray subclasses eventually, but it also is an additional
> maintenance burden. If you really think it is useful, I think it might
> be more helpful to define a new mixin class which provides a version
> of all indexing methods that just call `__numpy_getitem__` if that is
> provided by the class that uses the mixin. I would *not* put it in
> `ndarray` proper.


I disagree that multiple inheritance (i.e. with your proposed mixin and
ndarray) is something that numpy should enshrine in its API for subclasses.
As the maintainer of an ndarray subclass, I'd much rather prefer just to
implement a new duner method that older numpy versions will ignore.


>
> Indeed, the above might even be handier for subclasses, since they can
> choose, if they wish, to implement a similar mixin for older numpy
> versions, so that all the numpy version stuff can be moved to a single
> location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)
>
> Overall, my sense would be to keep your PR to just implementing the
> various new index methods (which are great -- I still don't really
> like the names, but sadly don't have better suggestions...).
>
> But it might be good if others pipe in here too, in particular those
> maintaining ndarray subclasses!
>
> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org <javascript:;>
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Marten van Kerkwijk
2016-09-06 01:00:42 UTC
Permalink
Hi Nathan,

The question originally posed is whether `ndarray` should provide that
single method as a convenience already, even though it doesn't
actually use it itself. Do you think that is useful, i.e., a big
advantage over overwriting the new oindex, vindex, and another that I
forget?

My own feeling is that while it is good to provide some hooks for
subclasses (__array_prepare__, wrap, finalize, numpy_ufunc), this one
is too fine-grained and the benefits do not outweigh the cost,
especially since it could easily be done with a mixin (unlike those
other cases, which are not used to cover ndarray methods, but rather
numpy functions, i.e., they provide subclasses with hooks into those
functions, which no mixin could possibly do).

All the best,

Marten
Marten van Kerkwijk
2016-09-06 01:02:27 UTC
Permalink
p.s. Just to be clear: personally, I think we should have neither
`__numpy_getitem__` nor a mixin; we should just get the quite
wonderful new indexing methods!
Sebastian Berg
2016-09-06 06:52:39 UTC
Permalink
On Mo, 2016-09-05 at 21:02 -0400, Marten van Kerkwijk wrote:
> p.s. Just to be clear: personally, I think we should have neither
> `__numpy_getitem__` nor a mixin; we should just get the quite
> wonderful new indexing methods!

Hehe, yes but see MaskedArrays. They need logic to also index the mask,
so `__getitem__`, etc. actually do a lot of things. Without any complex
changes (implementing a unified method within that class or similar).
The new indexing attributes simply cannot work right. And I think at
least a warning might be in order (from the numpy side) for such a
subclass.

But maybe MaskedArrays are pretty much the most complex subclass
available in that regard....

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Sebastian Berg
2016-09-06 21:29:55 UTC
Permalink
On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote:
> In a separate message, since perhaps a little less looney: yet
> another
> option would be work by analogy with np.ix_ and define pre-dispatch
> index preparation routines np.ox_ and np.vx_ (say), which would be
> used as in:
> ```
> array[np.ox_[:, 10]]   -- or -- array[np.vx_[:, 10]]
> ```
> This could work if those functions each return something appropriate
> for the legacy indexer, or, if that is not possible, a specific
> subclass of tuple as a marker that gets interpreted further up.

Sure, it would be a solution, but not sure it is any better
implementation wise then just passing an extra argument. As for the
syntax for plain arrays, I am not convinced to be honest.

- Sebastian


> In the end, though, probably also too complicated. It may remain best
> to simply implement the new methods instead and keep it at that!
>
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Sebastian Berg
2016-09-06 21:23:34 UTC
Permalink
On Di, 2016-09-06 at 10:10 -0700, Stephan Hoyer wrote:
> On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk <m.h.vankerkwijk@
> gmail.com> wrote:
> > p.s. Just to be clear: personally, I think we should have neither
> > `__numpy_getitem__` nor a mixin; we should just get the quite
> > wonderful new indexing methods!
> +1
>
> I don't maintain ndarray subclasses (I prefer composition), but I
> don't think it's too difficult to require implementing vindex and
> oindex properties from scratch.
>

Well, in some sense why I brought it up is masked arrays. They have
quite a bit of code in `__getitem__` including doing an identical
indexing operation to the mask. I suppose you can always solve it with
such code:

def __special_getitem__(self, index, method=None):
    # do magic
    if method is None:
        # (not sure if this gets passed the base class
        res = super()[index]
    elif method == "outer":
        res = super().oindex[index]
    # ...
    # more magic.

def __getitem__(self, index):
    self.__special_getitem__(index)

@property
def oindex(self):
    # define class elsewhere I guess
    class _indexer(object):
        def __init__(self, arr):
            self.arr = arr
        def __getitem__(self, index)
            arr.__special_getitem__(index, method='oter')
    return _indexer(self)

Though I am not 100% sure without testing, a superclass method that
understands the `method` kwarg might work better. We can teach numpy to
pass in that `method` to getitem so that you don't have to implement
that `_indexer` class for the special attribute. I first started to do
that for MaskedArrays, and while it is not hard, it seemed a bit
tedious.

If we move this to a method with a new name, a slight advantage would
be that other oddities could be removed maybe.

By now it seems to me that nobody really needs the extra information
(i.e. preprocessing information of the indexing tuple)...? I thought it
could be useful to know things about the result, but I suppose you can
check most things (view vs. no view; field access; scalar access) also
afterwards?


> Side note: I would prefer the more verbose "legacy_index" to
> "lindex". We really want to discourage this one, and two new
> abbreviations are bad enough.

Sounds good to me.

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-09-11 13:07:09 UTC
Permalink
On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote:
> In a separate message, since perhaps a little less looney: yet
> another
> option would be work by analogy with np.ix_ and define pre-dispatch
> index preparation routines np.ox_ and np.vx_ (say), which would be
> used as in:
> ```
> array[np.ox_[:, 10]]   -- or -- array[np.vx_[:, 10]]
> ```
> This could work if those functions each return something appropriate
> for the legacy indexer, or, if that is not possible, a specific
> subclass of tuple as a marker that gets interpreted further up.
>

A specific subclass of tuple.... Part of me thinks this is horrifying,
but it actually would solve some of the subclassing issues if
`arr.vindex[...]` could end up calling `__getitem__` with a bit special
indexing tuple value.

I simply can't quite find the end of subclassing issues. We have tests
for things like masked array correctly calling the `_data` subclass,
but if the `_data` subclass does not implement the new method, numpy
would have to run in circles (or something)....

- Sebastian

> In the end, though, probably also too complicated. It may remain best
> to simply implement the new methods instead and keep it at that!
>
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Marten van Kerkwijk
2016-09-11 15:19:33 UTC
Permalink
There remains the option to just let subclasses deal with new ndarray
features... Certainly, for `Quantity`, I'll quite happily do that.
And if it alllows the ndarray code to remain simple and efficient, it
is probably the best solution. -- Marten
Sebastian Berg
2016-09-11 16:28:51 UTC
Permalink
On So, 2016-09-11 at 11:19 -0400, Marten van Kerkwijk wrote:
> There remains the option to just let subclasses deal with new ndarray
> features...  Certainly, for `Quantity`, I'll quite happily do that.
> And if it alllows the ndarray code to remain simple and efficient, it
> is probably the best solution.  -- Marten
>

Maybe, but I can't quite shake the feeling that we would see a lot of
annoying bugs for subclasses that don't adept very quickely.


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Sebastian Berg
2016-09-06 06:49:48 UTC
Permalink
On Mo, 2016-09-05 at 18:19 -0500, Nathan Goldbaum wrote:
>
>
> On Monday, September 5, 2016, Marten van Kerkwijk <***@gm
> ail.com> wrote:
> > Hi Sebastian,
> >
> > It would seem to me that any subclass has to keep up to date with
> > new
> > features in ndarray, and while I think ndarray has a responsibility
> > not to break backward compatibility, I do not think it has to
> > protect
> > against new features possibly not working as expected in
> > subclasses.
> > In particular, I think it is overly complicated (and an unnecessary
> > maintenance burden) to error out if a subclass has `__getitem__`
> > overwritten, but not `oindex`.
> >
> > For somewhat similar reasons, I'm not too keen on a new
> > `__numpy_getitem__` method; I realise it might reduce complexity
> > for
> > some ndarray subclasses eventually, but it also is an additional
> > maintenance burden. If you really think it is useful, I think it
> > might
> > be more helpful to define a new mixin class which provides a
> > version
> > of all indexing methods that just call `__numpy_getitem__` if that
> > is
> > provided by the class that uses the mixin. I would *not* put it in
> > `ndarray` proper.
> I disagree that multiple inheritance (i.e. with your proposed mixin
> and ndarray) is something that numpy should enshrine in its API for
> subclasses. As the maintainer of an ndarray subclass, I'd much rather
> prefer just to implement a new duner method that older numpy versions
> will ignore.
>  

Hmm, OK, so that would be a + for the method solution even without the
need of any of the extra capabilities that may be possible.

> >
> > Indeed, the above might even be handier for subclasses, since they
> > can
> > choose, if they wish, to implement a similar mixin for older numpy
> > versions, so that all the numpy version stuff can be moved to a
> > single
> > location. (E.g., I can imagine doing the same for
> > `__numpy_ufunc__`.)
> >
> > Overall, my sense would be to keep your PR to just implementing the
> > various new index methods (which are great -- I still don't really
> > like the names, but sadly don't have better suggestions...).
> >
> > But it might be good if others pipe in here too, in particular
> > those
> > maintaining ndarray subclasses!
> >
> > All the best,
> >
> > Marten
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-***@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-09-06 06:48:15 UTC
Permalink
On Mo, 2016-09-05 at 18:24 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
>
> It would seem to me that any subclass has to keep up to date with new
> features in ndarray, and while I think ndarray has a responsibility
> not to break backward compatibility, I do not think it has to protect
> against new features possibly not working as expected in subclasses.
> In particular, I think it is overly complicated (and an unnecessary
> maintenance burden) to error out if a subclass has `__getitem__`
> overwritten, but not `oindex`.
>

It is not complicated implementation wise to check for `__getitem__`
existence. However, I start to agree that a warning icould be the
better option. It might work after all.

> For somewhat similar reasons, I'm not too keen on a new
> `__numpy_getitem__` method; I realise it might reduce complexity for
> some ndarray subclasses eventually, but it also is an additional
> maintenance burden. If you really think it is useful, I think it
> might
> be more helpful to define a new mixin class which provides a version
> of all indexing methods that just call `__numpy_getitem__` if that is
> provided by the class that uses the mixin. I would *not* put it in
> `ndarray` proper.
>

Yes, that is maybe a simplier option (in the sense of maintainability),
the other would have a bit of extra information available. If this
extra information is unnecessary, a MixIn is probably a bit simpler.

> Indeed, the above might even be handier for subclasses, since they
> can
> choose, if they wish, to implement a similar mixin for older numpy
> versions, so that all the numpy version stuff can be moved to a
> single
> location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.)
>

You can always implement a mixin for older verions if you do all the
logic yourself, but I would prefer not to duplicate that logic (Jaime
wrote a python class that does it for normal arrays -- not sure if its
100% the same as I did, but you could probably use it in a subclass).

So that a numpy provided mixin would not help with that supporting it
in old numpy versions, I think.

> Overall, my sense would be to keep your PR to just implementing the
> various new index methods (which are great -- I still don't really
> like the names, but sadly don't have better suggestions...).
>

Well... The thing is that we have to fix the subclasses within numpy as
well (most importantly MaskedArrays). Of course you could delay things
a bit, but in the end whatever we use internally could likely also be
whatever subclasses might end up using.

> But it might be good if others pipe in here too, in particular those
> maintaining ndarray subclasses!
>

Yeah :).

> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-***@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
Loading...