[Numpy-discussion] Nansum function behavior

Discussion:

Charles Rilhac

2015-10-23 16:45:57 UTC

Hello,

I noticed the change regarding nan function and especially nansum function. I think this choice is a big mistake. I know that Matlab and R have made this choice but it is illogical and counterintuitive.

First argument is about logic. An arithmetic operation between Nothing and Nothing cannot make a figure or an object. Nothing + Object can be an object or something else, but from nothing, it cannot ensue something else than nothing. I hope you see what I mean.

Secondly, it's counterintuitive and not convenient. Because, if you want to fill the result of nanfunction you can do that easily :

a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)
print(a)
array([np.nan, 1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you cannot replace the result of NaN-full rows by NaN easily. In the case above, you cannot because you lost information about NaN-full rows.

I know it is tough to come back to a previous stage but I really think that it is wrong to absolutely fill with zeros the result of arithmetic operation containing NaN.

Thank for your work guys ;-)

Robert Kern

2015-10-23 17:08:07 UTC

Permalink

Post by Charles Rilhac
Hello,
I noticed the change regarding nan function and especially nansum

function. I think this choice is a big mistake. I know that Matlab and R
have made this choice but it is illogical and counterintuitive.

What change are you referring to?

--
Robert Kern

Benjamin Root

2015-10-23 17:11:13 UTC

Permalink

The change to nansum() happened several years ago. The main thrust of it
was to make the following consistent:

np.sum([]) # zero
np.nansum([np.nan]) # zero
np.sum([1]) # one
np.nansum([np.nan, 1]) # one

If you want to propagate masks and such, use masked arrays.
Ben Root

Post by Charles Rilhac
Hello,
I noticed the change regarding nan function and especially nansum
function. I think this choice is a big mistake. I know that Matlab and R
have made this choice but it is illogical and counterintuitive.
First argument is about logic. An arithmetic operation between Nothing and
Nothing cannot make a figure or an object. Nothing + Object can be an
object or something else, but from nothing, it cannot ensue something else
than nothing. I hope you see what I mean.
Secondly, it's counterintuitive and not convenient. Because, if you want
a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)print(a)
array([np.nan, 1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you
cannot replace the result of NaN-full rows by NaN easily. In the case
above, you cannot because you lost information about NaN-full rows.
I know it is tough to come back to a previous stage but I really think
that it is wrong to absolutely fill with zeros the result of arithmetic
operation containing NaN.
Thank for your work guys ;-)
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles Rilhac

2015-10-24 00:47:28 UTC

Permalink

Why do we keep this behaviour ? :
np.nansum([np.nan]) # zero

Firstly, you lose information.
You can easily fill nan with zero after applying nansum but you cannot keep nan for nan-full rows if you doesnât have a mask or keep the information about nan-full row before.
It is not convenient, useful.
Secondly, it is illogical. A arithmetic operation or whatever else between Nothing and Nothing cannot return Something.
We can accept that Nothing + Object = Object but we cannot get a figure from nothing. It is counterintuitive. I really disagree with this change happened few years ago.

Post by Benjamin Root
np.sum([]) # zero
np.nansum([np.nan]) # zero
np.sum([1]) # one
np.nansum([np.nan, 1]) # one
If you want to propagate masks and such, use masked arrays.
Ben Root
Hello,
I noticed the change regarding nan function and especially nansum function. I think this choice is a big mistake. I know that Matlab and R have made this choice but it is illogical and counterintuitive.
First argument is about logic. An arithmetic operation between Nothing and Nothing cannot make a figure or an object. Nothing + Object can be an object or something else, but from nothing, it cannot ensue something else than nothing. I hope you see what I mean.
a = np.array([[np.nan, np.nan], [1,np.nan]])
a = np.nansum(a, axis=1)
print(a)
array([np.nan, 1.])
a[np.isnan(a)] = 0
Whereas, if the result is already filled with zero on NaN-full rows, you cannot replace the result of NaN-full rows by NaN easily. In the case above, you cannot because you lost information about NaN-full rows.
I know it is tough to come back to a previous stage but I really think that it is wrong to absolutely fill with zeros the result of arithmetic operation containing NaN.
Thank for your work guys ;-)
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

2015-10-24 01:28:27 UTC

Permalink

Hi Charles,

You should read the previous discussion about this issue on GitHub:
https://github.com/numpy/numpy/issues/1721

For what it's worth, I do think the new definition of nansum is more
consistent.

If you want to preserve NaN if there are no non-NaN values, you can often
calculate this desired quantity from nanmean, which does return NaN if
there are only NaNs.

Stephan

Charles Rilhac

2015-10-24 01:43:54 UTC

Permalink

I saw this thread and I totally disagree with thouis argument…
Of course, you can have NaN if there are only NaNs. Thanks goodness, There is a lot of way to do that.
But it’s not convenient, consistent and above all, it is wrong logically to do that. NaN does not mean zeros and operation with NaN only cannot return a figure…
You lose information about your array. It is easier to fill the result of nansum with zeros than to keep a mask of your orignal array or whatever you do.

Why it’s misleading ?
For example you want to sum rows of a array and mean the result :

a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
b = np.nansum(a, axis=1) # array([ 6., 0.])
m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6

Post by Stephan Hoyer
Hi Charles,
https://github.com/numpy/numpy/issues/1721
For what it's worth, I do think the new definition of nansum is more consistent.
If you want to preserve NaN if there are no non-NaN values, you can often calculate this desired quantity from nanmean, which does return NaN if there are only NaNs.
Stephan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Juan Nunez-Iglesias

2015-10-24 06:08:18 UTC

Permalink

Hi Charles,

Just providing an outsider's perspective...

Your specific use-case doesn't address the general definition of nansum: perform a sum while ignoring nans. As others have pointed out, (especially in the linked thread) the sum of nothing is 0. Although the current behaviour of nansum doesn't quite match your use-case, there is no doubt at all that it follows a consistent convention. "Wrong" is certainly not the correct way to describe it.

You can easily cater to your use case as follows:

def rilhac_nansum(ar, axis=None):

Â Â if axis is None:

Â Â Â Â return np.nanmean(ar)

Â Â else:

Â Â Â Â return np.nanmean(ar, axis=axis) * ar.shape[axis]

nanmean _consistently_ returns nans when encountering nan-only values because the mean of nothing is nan (the sum of nothing divided by the length of nothing, ie 0/0).

Hope this helps...

Juan.

I saw this thread and I totally disagree with thouis argumentâŠ
Of course, you can have NaN if there are only NaNs. Thanks goodness, There is a lot of way to do that.
But itâs not convenient, consistent and above all, it is wrong logically to do that. NaN does not mean zeros and operation with NaN only cannot return a figureâŠ
You lose information about your array. It is easier to fill the result of nansum with zeros than to keep a mask of your orignal array or whatever you do.
Why itâs misleading ?
a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
b = np.nansum(a, axis=1) # array([ 6., 0.])
m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion