Discussion:
[Numpy-discussion] Indexing issue with ndarrays
Joseph Fox-Rabinovitz
2016-08-25 14:36:50 UTC
Permalink
This issue recently came up on Stack Overflow:
http://stackoverflow.com/questions/39145795/masking-a-series-with-a-boolean-array.
The poster attempted to index an ndarray with a pandas boolean Series
object (all False), but the result was as if he had indexed with an array
of integer zeros.

Can someone explain this behavior? I can see two obvious possibilities:

1. ndarray checks if the input to __getitem__ is of exactly the right
type, not using instanceof.
2. pandas actually uses a wider datatype than boolean internally, so
indexing with the series is in fact indexing with an integer array.

In my attempt to reproduce the poster's results, I got the following
warning:

FutureWarning: in the future, boolean array-likes will be handled as a
boolean array index

This indicates that the issue is probably #1 and that a fix is already on
the way. Please correct me if I am wrong. Also, where does the code for
ndarray.__getitem__ live?

Thanks,

-Joe
Sebastian Berg
2016-08-25 20:37:45 UTC
Permalink
This issue recently came up on Stack Overflow: http://stackoverflow.c
om/questions/39145795/masking-a-series-with-a-boolean-array. The
poster attempted to index an ndarray with a pandas boolean Series
object (all False), but the result was as if he had indexed with an
array of integer zeros.
Can someone explain this behavior? I can see two obvious
ndarray checks if the input to __getitem__ is of exactly the right
type, not using instanceof.
pandas actually uses a wider datatype than boolean internally, so
indexing with the series is in fact indexing with an integer array.
You are overthinking it ;). The reason is quite simply that the logic
used to be:

 * Boolean array? -> think about boolean indexing.
 * Everything "array-like" (not caught earlier) -> convert to `intp`
array and do integer indexing.

Now you might wonder why, but probably it is quite simply because
boolean indexing was tagged on later.

- Sebastian
In my attempt to reproduce the poster's results, I got the following
FutureWarning: in the future, boolean array-likes will be handled as
a boolean array index
This indicates that the issue is probably #1 and that a fix is
already on the way. Please correct me if I am wrong. Also, where does
the code for ndarray.__getitem__ live?
Thanks,
    -Joe
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Joseph Fox-Rabinovitz
2016-08-26 13:57:22 UTC
Permalink
Post by Sebastian Berg
This issue recently came up on Stack Overflow: http://stackoverflow.c
om/questions/39145795/masking-a-series-with-a-boolean-array. The
poster attempted to index an ndarray with a pandas boolean Series
object (all False), but the result was as if he had indexed with an
array of integer zeros.
Can someone explain this behavior? I can see two obvious
ndarray checks if the input to __getitem__ is of exactly the right
type, not using instanceof.
pandas actually uses a wider datatype than boolean internally, so
indexing with the series is in fact indexing with an integer array.
You are overthinking it ;). The reason is quite simply that the logic
* Boolean array? -> think about boolean indexing.
* Everything "array-like" (not caught earlier) -> convert to `intp`
array and do integer indexing.
Now you might wonder why, but probably it is quite simply because
boolean indexing was tagged on later.
- Sebastian
In my attempt to reproduce the poster's results, I got the following
FutureWarning: in the future, boolean array-likes will be handled as
a boolean array index
This indicates that the issue is probably #1 and that a fix is
already on the way. Please correct me if I am wrong. Also, where does
the code for ndarray.__getitem__ live?
Thanks,
-Joe
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
This makes perfect sense. I would like to help fix it if a fix is desired
and has not been done already. Could you point me to where the "Boolean
array?, etc." decision happens? I have had trouble navigating to
`__getitem__` (which I assume is somewhere in np.core.multiarray C code.

-Joe
Sebastian Berg
2016-08-26 14:08:14 UTC
Permalink
Post by Joseph Fox-Rabinovitz
Post by Joseph Fox-Rabinovitz
This issue recently came up on Stack Overflow: http://stackoverfl
ow.c
om/questions/39145795/masking-a-series-with-a-boolean-array. The
poster attempted to index an ndarray with a pandas boolean Series
object (all False), but the result was as if he had indexed with
an
array of integer zeros.
Can someone explain this behavior? I can see two obvious
ndarray checks if the input to __getitem__ is of exactly the
right
type, not using instanceof.
pandas actually uses a wider datatype than boolean internally, so
indexing with the series is in fact indexing with an integer
array.
You are overthinking it ;). The reason is quite simply that the logic
 * Boolean array? -> think about boolean indexing.
 * Everything "array-like" (not caught earlier) -> convert to `intp`
array and do integer indexing.
Now you might wonder why, but probably it is quite simply because
boolean indexing was tagged on later.
- Sebastian
In my attempt to reproduce the poster's results, I got the
following
FutureWarning: in the future, boolean array-likes will be handled
as
a boolean array index
This indicates that the issue is probably #1 and that a fix is
already on the way. Please correct me if I am wrong. Also, where
does
the code for ndarray.__getitem__ live?
Thanks,
    -Joe
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
This makes perfect sense. I would like to help fix it if a fix is
desired and has not been done already. Could you point me to where
the "Boolean array?, etc." decision happens? I have had trouble
navigating to `__getitem__` (which I assume is somewhere in
np.core.multiarray C code.
As the warning says, it already is fixed in a sense (we just have to
move forward with the deprecation, which you can maybe actually do at
this time). This is all in the mapping.c stuff, without checking, there
is a function called something like "prepare index" which goes through
all the different types of indexing objects. It should be pretty
straight forward to find the warning.

The actual old behaviour where this behaviour originated in was a
completely different code base though (you would have to check out some
pre 1.9 version of numpy if you are interested in archeology.

- Sebastian
Post by Joseph Fox-Rabinovitz
    -Joe
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...