Discussion:
[Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy
Pauli Virtanen
2016-09-07 18:16:55 UTC
Permalink
Wed, 07 Sep 2016 09:22:24 -0700, Nathaniel Smith kirjoitti:
[clip]
I wonder if there is any way we can avoid the flag, and just make this
happen automatically when appropriate? nditer has too many "unbreak-me"
flags already.
Are there any cases where we *don't* want the copy-if-overlap behavior?
Traditionally overlap has triggered undefined behavior, so there's no
backcompat issue, right?
I didn't put it on by default, because of backward compatibility and side
effects that break things.

On side effects: there are some bugs in ufunc code that need fixing if
the flag is turned on (wheremask code breaks, and ufuncs write to wrong
output arrays). Moreover, copying write operands with updateifcopy marks
the original arrays as read-only, until the copied array is decrefed.
There may also be other side effects that are not so obvious.

The PR is not mergeable if the flag would be on by default --- that
requires inspecting all the uses of the iterator in the numpy codebase
and making sure there's no weird stuff done. I'm not sure how much 3rd
party code is using the iterator, but I'm a bit worried also that copies
break assumptions also there.

It might be possible to turn it on by default for operands with COPY or
UPDATEIFCOPY flags --- but I'm not sure if that's helpful (now you'd need
to set the flags to all input operands).
--
Pauli Virtanen
Sebastian Berg
2016-09-07 16:36:24 UTC
Permalink
Hi all,
Pauli just opened a nice pull request [1] to add overlap detection
to
`NPY_ITER_COPY_IF_OVERLAP`
If passed to the iterator (also exposed in python), the iterator
will
copy the operands such that reading and writing should only occur
for
identical operands. For now this is implemented by always copying
the
output/writable operand (this could be improved though, so I would
not
say its fixed API).
I wonder if there is any way we can avoid the flag, and just make
this happen automatically when appropriate? nditer has too many
"unbreak-me" flags already.
Are there any cases where we *don't* want the copy-if-overlap
behavior? Traditionally overlap has triggered undefined behavior, so
there's no backcompat issue, right?
Puh, I remember weird abuses, that sometimes stopped working. Even just
adding it to ufuncs might destroy some weird cases in someones
script....

Whether or not we can just make it default, might be worth thinking
about it. What do downstream projects that use the API think? My guess
is that would be projects such as numexpr, numba, or I think theano?

Maybe another approach is to think about some other way to make good
defaults to the iterator easier/saner. Heck, I wonder if we would
default to things like "zero size ok" and warned about it, anyone would
notice unless as in: Oh I should make it zero size ok ;).

- Sebastian
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Pauli Virtanen
2016-09-11 21:21:13 UTC
Permalink
Hi,

In the end some further API additions turn out to appear needed:

* NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME
flags for NpyIter_New.

* New API function PyArray_MapIterArrayCopyIfOverlap,
as ufunc.at needs to check overlaps for index arrays
before constructing iterators, and the parsing is done
in multiarray.

Continuation here: https://github.com/numpy/numpy/pull/8043
Hi all,
Pauli just opened a nice pull request [1] to add overlap detection to
`NPY_ITER_COPY_IF_OVERLAP`
If passed to the iterator (also exposed in python), the iterator will
copy the operands such that reading and writing should only occur for
identical operands. For now this is implemented by always copying the
output/writable operand (this could be improved though, so I would not
say its fixed API).
Since adding this flag is new API, please feel free to suggest other
names/approaches or even decline the change ;).
This is basically a first step, which should be easily followed by
adding overlap detection to ufuncs, removing traps such as the well (or
not so well known) `a += a.T`. Other parts of numpy may follow one by
one.
The work is based on his older awesome new memory overlap detection
implementation.
If there are no comments, I will probably merge it very soon, so we can
look at the follow up things.
- Sebastian
[1] https://github.com/numpy/numpy/
pull/8026_______________________________________________
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2016-09-12 09:31:07 UTC
Permalink
Post by Pauli Virtanen
Hi,
Very nice :).
Post by Pauli Virtanen
* NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME
  flags for NpyIter_New.
* New API function PyArray_MapIterArrayCopyIfOverlap,
  as ufunc.at needs to check overlaps for index arrays 
  before constructing iterators, and the parsing is done 
  in multiarray.
I think here Nathaniels point might be right. It could be we can assume
that copying is always fine, there is probably only one or two
downstream projects using this function, plus it seems harder to create
abusing structures that actually do something useful.
It was only exposed for usage in `ufunc.at` if I remember right. I know
theano uses it though, but not sure about anyone else, maybe numba. On
the other hand.... It is not the worst API clutter in history.
Post by Pauli Virtanen
Continuation here: https://github.com/numpy/numpy/pull/8043
Hi all,
Pauli just opened a nice pull request [1] to add overlap detection to
`NPY_ITER_COPY_IF_OVERLAP`
If passed to the iterator (also exposed in python), the iterator will
copy the operands such that reading and writing should only occur for
identical operands. For now this is implemented by always copying the
output/writable operand (this could be improved though, so I would not
say its fixed API).
Since adding this flag is new API, please feel free to suggest other
names/approaches or even decline the change ;).
This is basically a first step, which should be easily followed by
adding overlap detection to ufuncs, removing traps such as the well (or
not so well known) `a += a.T`. Other parts of numpy may follow one by
one.
The work is based on his older awesome new memory overlap detection
implementation.
If there are no comments, I will probably merge it very soon, so we can
look at the follow up things.
- Sebastian
[1] https://github.com/numpy/numpy/
pull/8026_______________________________________________
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Pauli Virtanen
2016-09-12 20:22:40 UTC
Permalink
Post by Sebastian Berg
Post by Pauli Virtanen
* NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME
  flags for NpyIter_New.
* New API function PyArray_MapIterArrayCopyIfOverlap,
  as ufunc.at needs to check overlaps for index arrays before
  constructing iterators, and the parsing is done in multiarray.
I think here Nathaniels point might be right. It could be we can assume
that copying is always fine, there is probably only one or two
downstream projects using this function, plus it seems harder to create
abusing structures that actually do something useful.
It was only exposed for usage in `ufunc.at` if I remember right. I know
theano uses it though, but not sure about anyone else, maybe numba. On
the other hand.... It is not the worst API clutter in history.
Do you suggest that I break the PyArray_MapIterArray API?

One issue here is that the function doesn't make distinction between read-
only access and read-write access, so copying may give unnecessary
slowdown. The second thing is that it will result to a bit uglier code, as
I need to manage the overlap with the second operation in ufunc_at.

For NpyIter, I'd still be wary about copying by default, because it's not
needed everywhere (the may_share_memory checks are better done earlier),
and since the semantic change can break things inside Numpy.

Pauli
Sebastian Berg
2016-09-12 20:46:02 UTC
Permalink
Post by Pauli Virtanen
Post by Sebastian Berg
Post by Pauli Virtanen
* NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME
  flags for NpyIter_New.
* New API function PyArray_MapIterArrayCopyIfOverlap,
  as ufunc.at needs to check overlaps for index arrays before
  constructing iterators, and the parsing is done in multiarray.
I think here Nathaniels point might be right. It could be we can assume
that copying is always fine, there is probably only one or two
downstream projects using this function, plus it seems harder to create
abusing structures that actually do something useful.
It was only exposed for usage in `ufunc.at` if I remember right. I know
theano uses it though, but not sure about anyone else, maybe numba. On
the other hand.... It is not the worst API clutter in history.
Do you suggest that I break the PyArray_MapIterArray API?
One issue here is that the function doesn't make distinction between read-
only access and read-write access, so copying may give unnecessary 
slowdown. The second thing is that it will result to a bit uglier
code, as 
I need to manage the overlap with the second operation in ufunc_at.
Yeah, was only wondering about MapIterArray, because I might get away
with the API break in the case that it works everywhere for our
internal usage. But if its not quite straight forward, there is no
point in thinking about it.
Post by Pauli Virtanen
For NpyIter, I'd still be wary about copying by default, because it's
not 
needed everywhere (the may_share_memory checks are better done
earlier), 
and since the semantic change can break things inside Numpy.
Yes, I tend to agree here about it. You can always wonder whether its
still the most convenient place to do the checks (at least for a few
places), but from glancing at the code, it still seems elegant to me.
If we are concerned about making the iterator more and more complex,
maybe we can really do something else about it as well.

I am not sure whether I will manage to look at it very closely soon, so
would invite anyone to take a look; this is definitely a highlight!

- Sebastian
Post by Pauli Virtanen
Pauli
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...