Discussion:
[Numpy-discussion] Sign of NaN
Charles R Harris
2015-09-29 15:13:15 UTC
Permalink
Hi All,

Due to a recent commit, Numpy master now raises an error when applying the
sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.

Thoughts?

Chuck
Freddy Rietdijk
2015-09-29 15:17:58 UTC
Permalink
I wouldn't know of any valid output when applying the sign function to NaN.
Therefore, I think it is correct to return a ValueError. Furthermore, I
would prefer such an error over just returning NaN since it helps you
locating where NaN is generated.
Post by Charles R Harris
Hi All,
Due to a recent commit, Numpy master now raises an error when applying the
sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.
Thoughts?
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Anne Archibald
2015-09-29 15:25:13 UTC
Permalink
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?

Anne

P.S. If you want exceptions when NaNs appear, that's what np.seterr is for.
-A
Post by Freddy Rietdijk
I wouldn't know of any valid output when applying the sign function to
NaN. Therefore, I think it is correct to return a ValueError. Furthermore,
I would prefer such an error over just returning NaN since it helps you
locating where NaN is generated.
On Tue, Sep 29, 2015 at 5:13 PM, Charles R Harris <
Post by Charles R Harris
Hi All,
Due to a recent commit, Numpy master now raises an error when applying
the sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.
Thoughts?
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2015-09-29 15:39:15 UTC
Permalink
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
Anne
P.S. If you want exceptions when NaNs appear, that's what np.seterr is
for. -A
I also think NaN should be treated the same way as floating point numbers
(whatever that is). Otherwise it is difficult to remember when nan is
essentially a float dtype or another dtype.
(given that float is the smallest dtype that can hold a nan)

Josef
Post by Anne Archibald
Post by Freddy Rietdijk
I wouldn't know of any valid output when applying the sign function to
NaN. Therefore, I think it is correct to return a ValueError. Furthermore,
I would prefer such an error over just returning NaN since it helps you
locating where NaN is generated.
On Tue, Sep 29, 2015 at 5:13 PM, Charles R Harris <
Post by Charles R Harris
Hi All,
Due to a recent commit, Numpy master now raises an error when applying
the sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.
Thoughts?
Chuck
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Allan Haldane
2015-09-29 15:44:10 UTC
Permalink
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point
arrays. Why should it be different for object arrays?
Anne
P.S. If you want exceptions when NaNs appear, that's what np.seterr
is for. -A
I also think NaN should be treated the same way as floating point
numbers (whatever that is). Otherwise it is difficult to remember when
nan is essentially a float dtype or another dtype.
(given that float is the smallest dtype that can hold a nan)
Note that I've reimplemented np.sign for object arrays along these lines
in this open PR:
https://github.com/numpy/numpy/pull/6320

That PR recursively uses the np.sign ufunc to evaluate object arrays
containing float and complex numbers. This way the behavior on object
arrays is identical to float/complex arrays.

Here is what the np.sign ufunc does (for arbitrary x):

np.sign(np.nan) -> nan
np.sign(complex(np.nan, x)) -> complex(nan, 0)
np.sign(complex(x, np.nan)) -> complex(nan, 0)

Allan
Charles R Harris
2015-09-29 15:43:53 UTC
Permalink
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
What about non-numeric objects in general ?

<snip>

Chuck
Nathaniel Smith
2015-09-29 18:16:37 UTC
Permalink
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?

The argument for doing it this way would be that arbitrary python objects
don't have a sign, and the natural way to implement something like
np.sign's semantics using only the "object" interface is

if obj < 0:
return -1
elif obj > 0:
return 1
elif obj == 0:
return 0
else:
raise

In general I'm not a big fan of trying to do all kinds of guessing about
how to handle random objects in object arrays, the kind that ends up with a
big chain of type checks and fallback behaviors. Pretty soon we find
ourselves trying to extend the language with our own generic dispatch
system for arbitrary python types, just for object arrays. (The current
hack where for object arrays np.log will try calling obj.log() is
particularly horrible. There is no rule in python that "log" is a reserved
method name for "logarithm" on arbitrary objects. Ditto for the other
ufuncs that implement this hack.)

Plus we hope that many use cases for object arrays will soon be supplanted
by better dtype support, so now may not be the best time to invest heavily
in making object arrays complicated and powerful.

OTOH sometimes practicality beats purity, and at least object arrays are
already kinda cordoned off from the rest of the system, so I don't feel as
strongly as if we were talking about core functionality.

...is there a compelling reason to even support np.sign on object arrays?
This seems pretty far into the weeds, and that tends to lead to poor
intuition and decision making.

-n
Charles R Harris
2015-09-29 18:53:08 UTC
Permalink
Post by Anne Archibald
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects
don't have a sign, and the natural way to implement something like
np.sign's semantics using only the "object" interface is
return -1
return 1
return 0
raise
That is what current master does, using PyObject_RichCompareBool for the
comparisons.

Chuck
j***@gmail.com
2015-09-29 18:58:24 UTC
Permalink
Post by Anne Archibald
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects
don't have a sign, and the natural way to implement something like
np.sign's semantics using only the "object" interface is
return -1
return 1
return 0
raise
In general I'm not a big fan of trying to do all kinds of guessing about
how to handle random objects in object arrays, the kind that ends up with a
big chain of type checks and fallback behaviors. Pretty soon we find
ourselves trying to extend the language with our own generic dispatch
system for arbitrary python types, just for object arrays. (The current
hack where for object arrays np.log will try calling obj.log() is
particularly horrible. There is no rule in python that "log" is a reserved
method name for "logarithm" on arbitrary objects. Ditto for the other
ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted
by better dtype support, so now may not be the best time to invest heavily
in making object arrays complicated and powerful.
OTOH sometimes practicality beats purity, and at least object arrays are
already kinda cordoned off from the rest of the system, so I don't feel as
strongly as if we were talking about core functionality.
...is there a compelling reason to even support np.sign on object arrays?
This seems pretty far into the weeds, and that tends to lead to poor
intuition and decision making.
One of the usecases that has sneaked in during the last few numpy versions
is that object arrays contain numerical arrays where the shapes don't add
up to a rectangular array.
In those cases being able to apply numerical operations might be useful.

But I'm +0 since I don't work with object arrays.

Josef
Post by Anne Archibald
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Chris Barker - NOAA Federal
2015-09-30 00:58:26 UTC
Permalink
One of the usecases that has sneaked in during the last few numpy versions
is that object arrays contain numerical arrays where the shapes don't add
up to a rectangular array.


I think that's the wrong way to dve that problem -- we really should have a
"proper" ragged array implementation. But is is the easiest way at this
point.

For this, and other use-cases, special casing Numpy arrays stored in object
arrays does make sense:

"If this is s a Numpy array, pass the operation through."

-CHB


In those cases being able to apply numerical operations might be useful.

But I'm +0 since I don't work with object arrays.

Josef
Post by Charles R Harris
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2015-09-30 01:31:56 UTC
Permalink
On Tue, Sep 29, 2015 at 6:58 PM, Chris Barker - NOAA Federal <
Post by j***@gmail.com
One of the usecases that has sneaked in during the last few numpy versions
is that object arrays contain numerical arrays where the shapes don't add
up to a rectangular array.
I think that's the wrong way to dve that problem -- we really should have
a "proper" ragged array implementation. But is is the easiest way at this
point.
For this, and other use-cases, special casing Numpy arrays stored in
"If this is s a Numpy array, pass the operation through."
Because we now (development) use rich compare, the result looks like

In [1]: a = ones(3)

In [2]: b = array([a, -a], object)

In [3]: b
Out[3]:
array([[1.0, 1.0, 1.0],
[-1.0, -1.0, -1.0]], dtype=object)

In [4]: sign(b)
Out[4]:
array([[1L, 1L, 1L],
[-1L, -1L, -1L]], dtype=object)

The function returns long integers in order to not special case Python 3.
Hmm, wonder if we might want to change that.

Chuck
Charles R Harris
2015-09-30 01:35:12 UTC
Permalink
Post by Charles R Harris
On Tue, Sep 29, 2015 at 6:58 PM, Chris Barker - NOAA Federal <
Post by j***@gmail.com
One of the usecases that has sneaked in during the last few numpy
versions is that object arrays contain numerical arrays where the shapes
don't add up to a rectangular array.
I think that's the wrong way to dve that problem -- we really should have
a "proper" ragged array implementation. But is is the easiest way at this
point.
For this, and other use-cases, special casing Numpy arrays stored in
"If this is s a Numpy array, pass the operation through."
Because we now (development) use rich compare, the result looks like
In [1]: a = ones(3)
In [2]: b = array([a, -a], object)
In [3]: b
array([[1.0, 1.0, 1.0],
[-1.0, -1.0, -1.0]], dtype=object)
In [4]: sign(b)
array([[1L, 1L, 1L],
[-1L, -1L, -1L]], dtype=object)
The function returns long integers in order to not special case Python 3.
Hmm, wonder if we might want to change that.
Oops, not what was intended. In fact it raises an error

In [7]: b
Out[7]: array([array([ 1., 1., 1.]), array([-1., -1., -1.])],
dtype=object)

In [8]: sign(b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-3b1a81271d2e> in <module>()
----> 1 sign(b)

ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

Chuck
Post by Charles R Harris
Chuck
Chris Barker
2015-09-30 16:11:08 UTC
Permalink
Post by Chris Barker - NOAA Federal
For this, and other use-cases, special casing Numpy arrays stored in
Post by Charles R Harris
Post by Chris Barker - NOAA Federal
"If this is s a Numpy array, pass the operation through."
Because we now (development) use rich compare, the result looks like
Oops, not what was intended. In fact it raises an error
In [7]: b
Out[7]: array([array([ 1., 1., 1.]), array([-1., -1., -1.])],
dtype=object)
In [8]: sign(b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-3b1a81271d2e> in <module>()
----> 1 sign(b)
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
exactly -- it seems to me that a special case for numpy arrays as objects
in object arrays makes sense, so you'd get:

In [6]: oa
Out[6]:
array([[1.0, 1.0, 1.0],
[-1.0, -1.0, -1.0]], dtype=object)

In [7]: np.sign(oa)
Out[7]:
array([[1, 1, 1],
[-1, -1, -1]], dtype=object)

(which you do now in the version I'm running).

Though rather than the special case, maybe we really need dtype=ndarray
arrays?

oa = np.array([a1, a2], dtype=np.ndarray)

Then we could count on everything in the array being an array.....

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Sebastian Berg
2015-09-30 16:25:15 UTC
Permalink
On Tue, Sep 29, 2015 at 6:35 PM, Charles R Harris
For this, and other use-cases, special casing
Numpy arrays stored in object arrays does make
"If this is s a Numpy array, pass the
operation through."
Because we now (development) use rich compare, the
result looks like
Oops, not what was intended. In fact it raises an error
In [7]: b
Out[7]: array([array([ 1., 1., 1.]), array([-1., -1.,
-1.])], dtype=object)
In [8]: sign(b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-3b1a81271d2e> in <module>()
----> 1 sign(b)
ValueError: The truth value of an array with more than one
element is ambiguous. Use a.any() or a.all()
exactly -- it seems to me that a special case for numpy arrays as
In [6]: oa
array([[1.0, 1.0, 1.0],
[-1.0, -1.0, -1.0]], dtype=object)
In [7]: np.sign(oa)
array([[1, 1, 1],
[-1, -1, -1]], dtype=object)
(which you do now in the version I'm running).
Though rather than the special case, maybe we really need
dtype=ndarray arrays?
I think this (as a dtype) is an obvious solution. The other solution, I
am not sure about in general to be honest. We may have to be more
careful about creating a monster with new dtypes, rather than being
careful to implement all possible features ;).
It is not that I think we would not have consistent rules, etc. it is
just that we *want* to force code to be obvious. If someone has arrays
inside arrays, maybe he should be expected to specify that.

It actually breaks some logic (or cannot be implemented for everything),
because we have signatures such as `O->?`, which does not support array
output.

- Sebastian
oa = np.array([a1, a2], dtype=np.ndarray)
Then we could count on everything in the array being an array.....
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg
2015-09-29 21:07:18 UTC
Permalink
Post by Anne Archibald
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point
arrays. Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python
objects don't have a sign, and the natural way to implement something
like np.sign's semantics using only the "object" interface is
return -1
return 1
return 0
raise
In general I'm not a big fan of trying to do all kinds of guessing
about how to handle random objects in object arrays, the kind that
ends up with a big chain of type checks and fallback behaviors. Pretty
soon we find ourselves trying to extend the language with our own
generic dispatch system for arbitrary python types, just for object
arrays. (The current hack where for object arrays np.log will try
calling obj.log() is particularly horrible. There is no rule in python
that "log" is a reserved method name for "logarithm" on arbitrary
objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be
supplanted by better dtype support, so now may not be the best time to
invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a
PyFloatDtype kind of thing (it is a bit different from our float because
it would always convert back to a python float and maybe raises more
errors), which "registers" with the dtype system in that it says "I know
how to handle python floats and store them in an array and provide ufunc
implementations for it".

Then, the "object" dtype ufuncs would try to call the ufunc on each
element, including "conversion". They would find a "float", since it is
not an array-like container, they interpret it as a PyFloatDtype scalar
and call the scalars ufunc (the PyFloatDtype scalar would be a python
float).

Of course likely I am thinking down the wrong road, but if you want e.g.
an array of Decimals, you need some way to tell that numpy as a
PyDecimalDtype.
Now "object" would possibly be just a fallback to mean "figure out what
to use for each element". It would be a bit slower, but it would work
very generally, because numpy would not impose limits as such.

- Sebastian
Post by Anne Archibald
OTOH sometimes practicality beats purity, and at least object arrays
are already kinda cordoned off from the rest of the system, so I don't
feel as strongly as if we were talking about core functionality.
...is there a compelling reason to even support np.sign on object
arrays? This seems pretty far into the weeds, and that tends to lead
to poor intuition and decision making.
-n
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith
2015-09-30 07:01:17 UTC
Permalink
On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg
[...]
Post by Sebastian Berg
Post by Nathaniel Smith
In general I'm not a big fan of trying to do all kinds of guessing
about how to handle random objects in object arrays, the kind that
ends up with a big chain of type checks and fallback behaviors. Pretty
soon we find ourselves trying to extend the language with our own
generic dispatch system for arbitrary python types, just for object
arrays. (The current hack where for object arrays np.log will try
calling obj.log() is particularly horrible. There is no rule in python
that "log" is a reserved method name for "logarithm" on arbitrary
objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be
supplanted by better dtype support, so now may not be the best time to
invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a
PyFloatDtype kind of thing (it is a bit different from our float because
it would always convert back to a python float and maybe raises more
errors), which "registers" with the dtype system in that it says "I know
how to handle python floats and store them in an array and provide ufunc
implementations for it".
Then, the "object" dtype ufuncs would try to call the ufunc on each
element, including "conversion". They would find a "float", since it is
not an array-like container, they interpret it as a PyFloatDtype scalar
and call the scalars ufunc (the PyFloatDtype scalar would be a python
float).
I'm not sure I understand this, but it did make me think of one
possible approach --

in my notebook sketches for what the New and Improved ufunc API might
look like, I was already pondering whether the inner loop should
receive a pointer to the ufunc object itself. Not for any reason in
particular, but just because hey they're sorta vaguely like methods
and methods get pointers to the object. But now I know what this is
useful for :-).

If ufunc loops get a pointer to the ufunc object itself, then we can
define a single inner loop function that looks like (sorta-Cython
code):

cdef generic_object_inner_loop(ufunc, args, strides, n, ...):
for i in range(n):
arg_objs = []
for i in range(ufunc.narg):
args_objs.append(<object> (args[j] + strides[j] * i))
ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])

and register it by default in every ufunc with signature
"{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in
just a few lines of code provide a pretty sensible generic behavior
for *all* object array ufuncs -- they recursively call the ufunc on
their contents.

As a prerequisite of course we would need to remove the auto-coercion
of unknown objects to object arrays, otherwise this becomes an
infinite recursion. But we already decided to do that.

And for this to be really useful for arbitrary objects, not just the
ones that asarray recognizes, then we need __numpy_ufunc__. But again,
we already decided to do that :-).

-n
--
Nathaniel J. Smith -- http://vorpus.org
Sebastian Berg
2015-09-30 07:32:02 UTC
Permalink
Post by Nathaniel Smith
On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg
[...]
Post by Sebastian Berg
Post by Nathaniel Smith
In general I'm not a big fan of trying to do all kinds of guessing
about how to handle random objects in object arrays, the kind that
ends up with a big chain of type checks and fallback behaviors. Pretty
soon we find ourselves trying to extend the language with our own
generic dispatch system for arbitrary python types, just for object
arrays. (The current hack where for object arrays np.log will try
calling obj.log() is particularly horrible. There is no rule in python
that "log" is a reserved method name for "logarithm" on arbitrary
objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be
supplanted by better dtype support, so now may not be the best time to
invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a
PyFloatDtype kind of thing (it is a bit different from our float because
it would always convert back to a python float and maybe raises more
errors), which "registers" with the dtype system in that it says "I know
how to handle python floats and store them in an array and provide ufunc
implementations for it".
Then, the "object" dtype ufuncs would try to call the ufunc on each
element, including "conversion". They would find a "float", since it is
not an array-like container, they interpret it as a PyFloatDtype scalar
and call the scalars ufunc (the PyFloatDtype scalar would be a python
float).
I'm not sure I understand this, but it did make me think of one
possible approach --
in my notebook sketches for what the New and Improved ufunc API might
look like, I was already pondering whether the inner loop should
receive a pointer to the ufunc object itself. Not for any reason in
particular, but just because hey they're sorta vaguely like methods
and methods get pointers to the object. But now I know what this is
useful for :-).
If ufunc loops get a pointer to the ufunc object itself, then we can
define a single inner loop function that looks like (sorta-Cython
arg_objs = []
args_objs.append(<object> (args[j] + strides[j] * i))
ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])
and register it by default in every ufunc with signature
"{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in
just a few lines of code provide a pretty sensible generic behavior
for *all* object array ufuncs -- they recursively call the ufunc on
their contents.
As a prerequisite of course we would need to remove the auto-coercion
of unknown objects to object arrays, otherwise this becomes an
infinite recursion. But we already decided to do that.
And for this to be really useful for arbitrary objects, not just the
ones that asarray recognizes, then we need __numpy_ufunc__. But again,
we already decided to do that :-).
Well, what I mean is. A `Decimal` will probably never know about numpy
itself. So I was wondering if you should teach numpy the other way
around about it.
I.e. you would create an object which has all the information about
ufuncs and casting for Decimal and register it with numpy. Then when
numpy sees a Decimal (also in `asarray` it would know what to do with
them, how to store them in an array, etc. The `Decimal` object would be
the scalar version of an array of Decimals.
By the way, in some way an array is a "Scalar" as well, it can be put
into another array and if you apply the ufunc to it, it applies the
ufunc to all its elements.

This all is likely too complicated though, maybe it is better to just
force the user to subclass the Decimal to achieve this. I am sure there
are quite a few roads we could go and we just need to think about it
some more about what we want and what we can do. :)

- Sebastian
Post by Nathaniel Smith
-n
Chris Barker
2015-09-30 16:13:20 UTC
Permalink
Post by Sebastian Berg
Post by Nathaniel Smith
Plus we hope that many use cases for object arrays will soon be
supplanted by better dtype support, so now may not be the best time to
invest heavily in making object arrays complicated and powerful.Well,
what I mean is. A
`Decimal` will probably never know about numpy
itself. So I was wondering if you should teach numpy the other way
around about it.
indeed -- but the way to do that is to create a Decimal dtype -- if we have
the "better dtype support", then that shouldn't be hard to do.

-CHB
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Allan Haldane
2015-09-29 22:28:12 UTC
Permalink
Post by Anne Archibald
Post by Anne Archibald
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point
arrays. Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python
objects don't have a sign, and the natural way to implement something
like np.sign's semantics using only the "object" interface is
return -1
return 1
return 0
raise
In general I'm not a big fan of trying to do all kinds of guessing about
how to handle random objects in object arrays, the kind that ends up
with a big chain of type checks and fallback behaviors. Pretty soon we
find ourselves trying to extend the language with our own generic
dispatch system for arbitrary python types, just for object arrays. (The
current hack where for object arrays np.log will try calling obj.log()
is particularly horrible. There is no rule in python that "log" is a
reserved method name for "logarithm" on arbitrary objects. Ditto for the
other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be
supplanted by better dtype support, so now may not be the best time to
invest heavily in making object arrays complicated and powerful.
Even though I submitted the PR to make object arrays more powerful, this
makes a lot of sense to me.

Let's say we finish a new dtype system, in which (I imagine) each dtype
specifies how to calculate each ufunc elementwise for that type. What
are the remaining use cases for generic object arrays? The only one I
see is having an array with elements of different types, which seems
like a dubious idea anyway. (Nested ndarrays of varying length could be
implemented as a dtype, as could the PyFloatDtype Sebastian mentioned,
without need for a generic 'object' dtype which has to figure out how
to call ufuncs on individual objects of different type).

Allan
Antoine Pitrou
2015-09-29 16:14:51 UTC
Permalink
On Tue, 29 Sep 2015 09:13:15 -0600
Post by Charles R Harris
Due to a recent commit, Numpy master now raises an error when applying the
sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.
None for example? float('nan') may be a bit weird amongst e.g. an array
of Decimals.

Regards

Antoine.
Joe Kington
2015-09-29 16:40:35 UTC
Permalink
Post by Antoine Pitrou
None for example? float('nan') may be a bit weird amongst e.g. an array
of Decimals
The downside to `None` is that it's one more thing to check for and makes
object arrays an even weirder edge case. (Incidentally, Decimal does have
its own non-float NaN which throws a whole different wrench in the works. `
np.sign(Decimal('NaN'))` is going to raise an error no matter what.)

A float (or numpy) NaN makes more sense to return for mixed datatypes than
None does, in my opinion. At least then one can use `isfinite`, etc to
check while `np.isfinite(None)` will raise an error. Furthermore, if
there's at least one floating point NaN in the object array, getting a
float NaN out makes sense.

Just my $0.02, anyway.
Stephan Hoyer
2015-09-29 17:59:47 UTC
Permalink
Post by Charles R Harris
Due to a recent commit, Numpy master now raises an error when applying the
sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.
We discussed this last month on the list and on GitHub:
https://mail.scipy.org/pipermail/numpy-discussion/2015-August/073503.html
https://github.com/numpy/numpy/issues/6265
https://github.com/numpy/numpy/pull/6269/files

The discussion was focused on what to do in the generic fallback case. Now
that I think about this more, I think it makes sense to explicitly check
for NaN in the unorderable case, and return NaN is the input is NaN. I
would not return NaN in general from unorderable objects, though -- in
general we should raise an error.

It sounds like Allan has already fixed this in his PR, but it also would
not be hard to add that logic to the existing code. Is this code in the
NumPy 1.10?

Stephan
Charles R Harris
2015-09-29 18:15:24 UTC
Permalink
On Tue, Sep 29, 2015 at 8:13 AM, Charles R Harris <
Post by Charles R Harris
Due to a recent commit, Numpy master now raises an error when applying
the sign function to an object array containing NaN. Other options may be
preferable, returning NaN for instance, so I would like to open the topic
for discussion on the list.
https://mail.scipy.org/pipermail/numpy-discussion/2015-August/073503.html
https://github.com/numpy/numpy/issues/6265
https://github.com/numpy/numpy/pull/6269/files
The discussion was focused on what to do in the generic fallback case. Now
that I think about this more, I think it makes sense to explicitly check
for NaN in the unorderable case, and return NaN is the input is NaN. I
would not return NaN in general from unorderable objects, though -- in
general we should raise an error.
It sounds like Allan has already fixed this in his PR, but it also would
not be hard to add that logic to the existing code. Is this code in the
NumPy 1.10?
No. NumPy 1.10 also has differing behavior between python 2 and python 3.
The reason I raise the question now is that current master has replace use
of PyObject_Compare by PyObject_RichCompare for both python 2 and 3. I
would be easy to extend it. I'm less sure of Allan's work, on a quick look
it seems more complicated.

***@fc [~]$ python3
Python 3.4.2 (default, Jul 9 2015, 17:24:30)
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Post by Charles R Harris
import numpy as np
np.sign(np.array([float('nan')]*3, np.object))
array([None, None, None], dtype=object)
***@fc [~]$ python2
Python 2.7.10 (default, Jul 5 2015, 14:15:43)
[GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Post by Charles R Harris
import numpy as np
np.sign(np.array([float('nan')]*3, np.object))
array([-1, -1, -1], dtype=object)

Chuck
Continue reading on narkive:
Loading...