Discussion:
[Numpy-discussion] Nansum function behavior
Charles Rilhac
2015-10-23 16:45:57 UTC
Permalink
Hello,

I noticed the change regarding nan function and especially nansum function. I think this choice is a big mistake. I know that Matlab and R have made this choice but it is illogical and counterintuitive.

First argument is about logic. An arithmetic operation between Nothing and Nothing cannot make a figure or an object. Nothing + Object can be an object or something else, but from nothing, it cannot ensue something else than nothing. I hope you see what I mean.

Secondly, it's counterintuitive and not convenient. Because, if you want to fill the result of nanfunction you can do that easily :

a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)
print(a)
array([np.nan, 1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you cannot replace the result of NaN-full rows by NaN easily. In the case above, you cannot because you lost information about NaN-full rows.

I know it is tough to come back to a previous stage but I really think that it is wrong to absolutely fill with zeros the result of arithmetic operation containing NaN.

Thank for your work guys ;-)
Robert Kern
2015-10-23 17:08:07 UTC
Permalink
Post by Charles Rilhac
Hello,
I noticed the change regarding nan function and especially nansum
function. I think this choice is a big mistake. I know that Matlab and R
have made this choice but it is illogical and counterintuitive.

What change are you referring to?

--
Robert Kern
Benjamin Root
2015-10-23 17:11:13 UTC
Permalink
The change to nansum() happened several years ago. The main thrust of it
was to make the following consistent:

np.sum([]) # zero
np.nansum([np.nan]) # zero
np.sum([1]) # one
np.nansum([np.nan, 1]) # one

If you want to propagate masks and such, use masked arrays.
Ben Root
Post by Charles Rilhac
Hello,
I noticed the change regarding nan function and especially nansum
function. I think this choice is a big mistake. I know that Matlab and R
have made this choice but it is illogical and counterintuitive.
First argument is about logic. An arithmetic operation between Nothing and
Nothing cannot make a figure or an object. Nothing + Object can be an
object or something else, but from nothing, it cannot ensue something else
than nothing. I hope you see what I mean.
Secondly, it's counterintuitive and not convenient. Because, if you want
a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)print(a)
array([np.nan, 1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you
cannot replace the result of NaN-full rows by NaN easily. In the case
above, you cannot because you lost information about NaN-full rows.
I know it is tough to come back to a previous stage but I really think
that it is wrong to absolutely fill with zeros the result of arithmetic
operation containing NaN.
Thank for your work guys ;-)
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles Rilhac
2015-10-24 00:47:28 UTC
Permalink
Why do we keep this behaviour ? :
np.nansum([np.nan]) # zero

Firstly, you lose information.
You can easily fill nan with zero after applying nansum but you cannot keep nan for nan-full rows if you doesn’t have a mask or keep the information about nan-full row before.
It is not convenient, useful.
Secondly, it is illogical. A arithmetic operation or whatever else between Nothing and Nothing cannot return Something.
We can accept that Nothing + Object = Object but we cannot get a figure from nothing. It is counterintuitive. I really disagree with this change happened few years ago.
Post by Benjamin Root
np.sum([]) # zero
np.nansum([np.nan]) # zero
np.sum([1]) # one
np.nansum([np.nan, 1]) # one
If you want to propagate masks and such, use masked arrays.
Ben Root
Hello,
I noticed the change regarding nan function and especially nansum function. I think this choice is a big mistake. I know that Matlab and R have made this choice but it is illogical and counterintuitive.
First argument is about logic. An arithmetic operation between Nothing and Nothing cannot make a figure or an object. Nothing + Object can be an object or something else, but from nothing, it cannot ensue something else than nothing. I hope you see what I mean.
a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)
print(a)
array([np.nan, 1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you cannot replace the result of NaN-full rows by NaN easily. In the case above, you cannot because you lost information about NaN-full rows.
I know it is tough to come back to a previous stage but I really think that it is wrong to absolutely fill with zeros the result of arithmetic operation containing NaN.
Thank for your work guys ;-)
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Stephan Hoyer
2015-10-24 01:28:27 UTC
Permalink
Hi Charles,

You should read the previous discussion about this issue on GitHub:
https://github.com/numpy/numpy/issues/1721

For what it's worth, I do think the new definition of nansum is more
consistent.

If you want to preserve NaN if there are no non-NaN values, you can often
calculate this desired quantity from nanmean, which does return NaN if
there are only NaNs.

Stephan
Charles Rilhac
2015-10-24 01:43:54 UTC
Permalink
I saw this thread and I totally disagree with thouis argument…
Of course, you can have NaN if there are only NaNs. Thanks goodness, There is a lot of way to do that.
But it’s not convenient, consistent and above all, it is wrong logically to do that. NaN does not mean zeros and operation with NaN only cannot return a figure…
You lose information about your array. It is easier to fill the result of nansum with zeros than to keep a mask of your orignal array or whatever you do.

Why it’s misleading ?
For example you want to sum rows of a array and mean the result :

a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
b = np.nansum(a, axis=1) # array([ 6., 0.])
m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6
Post by Stephan Hoyer
Hi Charles,
https://github.com/numpy/numpy/issues/1721
For what it's worth, I do think the new definition of nansum is more consistent.
If you want to preserve NaN if there are no non-NaN values, you can often calculate this desired quantity from nanmean, which does return NaN if there are only NaNs.
Stephan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Juan Nunez-Iglesias
2015-10-24 06:08:18 UTC
Permalink
Hi Charles,


Just providing an outsider's perspective...




Your specific use-case doesn't address the general definition of nansum: perform a sum while ignoring nans. As others have pointed out, (especially in the linked thread) the sum of nothing is 0. Although the current behaviour of nansum doesn't quite match your use-case, there is no doubt at all that it follows a consistent convention. "Wrong" is certainly not the correct way to describe it.




You can easily cater to your use case as follows:




def rilhac_nansum(ar, axis=None):

    if axis is None:

        return np.nanmean(ar)

    else:

        return np.nanmean(ar, axis=axis) * ar.shape[axis]




nanmean _consistently_ returns nans when encountering nan-only values because the mean of nothing is nan (the sum of nothing divided by the length of nothing, ie 0/0).




Hope this helps...




Juan.
I saw this thread and I totally disagree with thouis argument

Of course, you can have NaN if there are only NaNs. Thanks goodness, There is a lot of way to do that.
But it’s not convenient, consistent and above all, it is wrong logically to do that. NaN does not mean zeros and operation with NaN only cannot return a figure

You lose information about your array. It is easier to fill the result of nansum with zeros than to keep a mask of your orignal array or whatever you do.
Why it’s misleading ?
a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
b = np.nansum(a, axis=1) # array([ 6., 0.])
m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6
Post by Stephan Hoyer
Hi Charles,
https://github.com/numpy/numpy/issues/1721
For what it's worth, I do think the new definition of nansum is more consistent.
If you want to preserve NaN if there are no non-NaN values, you can often calculate this desired quantity from nanmean, which does return NaN if there are only NaNs.
Stephan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...