[Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?

Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?

- Sebastian

Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Benjamin Root

2016-10-14 17:44:50 UTC

Why not "propagated"?

Post by Sebastian Berg

Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?
- Sebastian

Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Allan Haldane

2016-10-14 18:23:09 UTC

I think the possibilities that have been mentioned so far (here or in
the PR) are:

contagious
contagious_mask
propagate
propagate_mask
propagated

`propogate_mask=False` seemed to imply that the mask would never be set,
so Eric also suggested
propagate_mask='any' or propagate_mask='all'

I would be happy with 'propagated=False' as the name/default. As Eric
pointed out, most MaskedArray functions like sum implicitly don't
propagate, currently, so maybe we should do likewise here.

Allan

Post by Benjamin Root
Why not "propagated"?
On Fri, Oct 14, 2016 at 1:08 PM, Sebastian Berg

Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922

<https://github.com/numpy/numpy/pull/7922>

Post by Allan Haldane
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?

Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?
- Sebastian

Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

<https://mail.scipy.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
<https://mail.scipy.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Juan Nunez-Iglesias

2016-10-14 23:49:48 UTC

+1 for propagate_mask. That is the only proposal that immediately makes sense to me. "contagious" may be cute but I think approximately 0% of users would guess its purpose on first use.

Can you elaborate on what happens with the masks exactly? I didn't quite get why propagate_mask=False was unintuitive. My expectation is that any mask present in the input will not be set in the output, but the mask will be "respected" by the function.

Post by Allan Haldane
I think the possibilities that have been mentioned so far (here or in
contagious
contagious_mask
propagate
propagate_mask
propagated
`propogate_mask=False` seemed to imply that the mask would never be set,
so Eric also suggested
propagate_mask='any' or propagate_mask='all'
I would be happy with 'propagated=False' as the name/default. As Eric
pointed out, most MaskedArray functions like sum implicitly don't
propagate, currently, so maybe we should do likewise here.
Allan

Post by Benjamin Root
Why not "propagated"?
On Fri, Oct 14, 2016 at 1:08 PM, Sebastian Berg

Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922

<https://github.com/numpy/numpy/pull/7922

Post by Allan Haldane
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?

Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?
- Sebastian

Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

<https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
<https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Allan Haldane

2016-10-16 01:21:13 UTC

Post by Juan Nunez-Iglesias
+1 for propagate_mask. That is the only proposal that immediately makes
sense to me. "contagious" may be cute but I think approximately 0% of
users would guess its purpose on first use.
Can you elaborate on what happens with the masks exactly? I didn't quite
get why propagate_mask=False was unintuitive. My expectation is that any
mask present in the input will not be set in the output, but the mask
will be "respected" by the function.

Here's an illustration of how the PR currently works with convolve,

m = np.ma.masked
a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1])
b = np.ma.array([1,1,1])
print np.ma.convolve(a, b, propagate_mask=True)

[1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1]

print np.ma.convolve(a, b, propagate_mask=False)

[1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1]

Allan

Hanno Klemm

2016-10-16 09:52:57 UTC

m = np.ma.masked
a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1])
b = np.ma.array([1,1,1])
print np.ma.convolve(a, b, propagate_mask=True)

[1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1]

print np.ma.convolve(a, b, propagate_mask=False)

[1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1]
Allan

Given this behaviour, I'm actually more concerned about the logic ma.convolve uses in the propagate_mask=False case. It appears that the masked values are essentially replaced by zero. Is my interpretation correct and if so does this make sense?

When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.

Hanno

Pierre Haessig

2016-10-17 17:01:14 UTC

Hi,

Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.

When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.

Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."

just thinking of yet another keyword name : ignore_masked (or drop_masked)

If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.

Now of course the convolution function is more general than just
autocorrelation...

best,
Pierre

j***@gmail.com

2016-10-18 17:25:37 UTC

On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig

When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.
Now of course the convolution function is more general than just
autocorrelation...

I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.

To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True

Josef

Post by Pierre Haessig
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

j***@gmail.com

2016-10-18 17:30:52 UTC

Post by j***@gmail.com
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig

as aside: statsmodels has now an option for acf and similar

missing : str
A string in ['none', 'raise', 'conservative', 'drop']
specifying how the NaNs
are to be treated.

Josef

Post by j***@gmail.com

Post by Pierre Haessig
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.
Now of course the convolution function is more general than just
autocorrelation...

I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.
To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True
Josef

Post by Pierre Haessig
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

j***@gmail.com

2016-10-18 17:49:13 UTC

Post by j***@gmail.com

Post by j***@gmail.com
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig

as aside: statsmodels has now an option for acf and similar
missing : str
A string in ['none', 'raise', 'conservative', 'drop']
specifying how the NaNs
are to be treated.

aside to the aside: statsmodels was just catching up in this

The original for masked array acf including correct counting of "valid" terms is

https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeseries/lib/avcf.py

(which I looked at way before statsmodels had any acf)

Josef

Post by j***@gmail.com
Josef

Post by j***@gmail.com

I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.
To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True
Josef

Post by Pierre Haessig
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Allan Haldane

2016-10-18 22:37:56 UTC

When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.

There is an old unimplemented NEP which uses similar language, like
"ignorena", and np.NA.

http://docs.scipy.org/doc/numpy/neps/missing-data.html

But right now that isn't part of numpy, so I think it would be confusing
to use that terminology.

Allan

Allan Haldane

2016-10-18 23:18:18 UTC

Post by Pierre Haessig

Based on feedback so far, I think "propagate_mask" sounds like the best
word to use. Let's go with that.

As for whether it should default to "True" or "False", the arguments I
see are:

* False, because that is the way most functions like `np.ma.sum`
already work, as well as matlab and octave's similar "nanconv".

* True, because its effects are more visible and might lead to less
surprises. The "False" case seems like it is often not what the user
intended. Eg, it affects the overall normalization of normalized
kernels, and the choice of 0 seems arbitrary.

If no one says anything, I'd probably go with True.

Allan

Stephan Hoyer

2016-10-18 23:44:03 UTC

Post by Allan Haldane
As for whether it should default to "True" or "False", the arguments I
* False, because that is the way most functions like `np.ma.sum`
already work, as well as matlab and octave's similar "nanconv".
* True, because its effects are more visible and might lead to less
surprises. The "False" case seems like it is often not what the user
intended. Eg, it affects the overall normalization of normalized
kernels, and the choice of 0 seems arbitrary.
If no one says anything, I'd probably go with True

I also have serious concerns about if it ever actually makes sense to use
`propagate_mask=False`.

So, I think it's definitely appropriate to default to `propagate_mask=True`.

Pierre Haessig

2016-10-19 08:10:18 UTC

Post by Allan Haldane
Based on feedback so far, I think "propagate_mask" sounds like the best
word to use. Let's go with that.
As for whether it should default to "True" or "False", the arguments I
* False, because that is the way most functions like `np.ma.sum`
already work, as well as matlab and octave's similar "nanconv".
* True, because its effects are more visible and might lead to less
surprises. The "False" case seems like it is often not what the user
intended. Eg, it affects the overall normalization of normalized
kernels, and the choice of 0 seems arbitrary.
If no one says anything, I'd probably go with True.

Sounds good!

Pierre

Allan Haldane

2016-10-18 22:49:16 UTC

Post by Hanno Klemm

m = np.ma.masked
a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1])
b = np.ma.array([1,1,1])
print np.ma.convolve(a, b, propagate_mask=True)

[1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1]