Discussion:
[Numpy-discussion] how to name "contagious" keyword in np.ma.convolve
Allan Haldane
2016-10-14 17:00:28 UTC
Permalink
Hi all,

Eric Wieser has a PR which defines new functions np.ma.correlate and
np.ma.convolve:

https://github.com/numpy/numpy/pull/7922

We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
are leaning towards calling it "contagious", with default of True:

def convolve(a, v, mode='full', contagious=True):

Any thoughts?

Cheers,
Allan
Sebastian Berg
2016-10-14 17:08:17 UTC
Permalink
Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?
Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?

- Sebastian
Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Benjamin Root
2016-10-14 17:44:50 UTC
Permalink
Why not "propagated"?
Post by Sebastian Berg
Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?
Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?
- Sebastian
Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Allan Haldane
2016-10-14 18:23:09 UTC
Permalink
I think the possibilities that have been mentioned so far (here or in
the PR) are:

contagious
contagious_mask
propagate
propagate_mask
propagated

`propogate_mask=False` seemed to imply that the mask would never be set,
so Eric also suggested
propagate_mask='any' or propagate_mask='all'


I would be happy with 'propagated=False' as the name/default. As Eric
pointed out, most MaskedArray functions like sum implicitly don't
propagate, currently, so maybe we should do likewise here.


Allan
Post by Benjamin Root
Why not "propagated"?
On Fri, Oct 14, 2016 at 1:08 PM, Sebastian Berg
Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922
<https://github.com/numpy/numpy/pull/7922>
Post by Allan Haldane
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?
Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?
- Sebastian
Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
<https://mail.scipy.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
<https://mail.scipy.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Juan Nunez-Iglesias
2016-10-14 23:49:48 UTC
Permalink
+1 for propagate_mask. That is the only proposal that immediately makes sense to me. "contagious" may be cute but I think approximately 0% of users would guess its purpose on first use.

Can you elaborate on what happens with the masks exactly? I didn't quite get why propagate_mask=False was unintuitive. My expectation is that any mask present in the input will not be set in the output, but the mask will be "respected" by the function.
Post by Allan Haldane
I think the possibilities that have been mentioned so far (here or in
contagious
contagious_mask
propagate
propagate_mask
propagated
`propogate_mask=False` seemed to imply that the mask would never be set,
so Eric also suggested
propagate_mask='any' or propagate_mask='all'
I would be happy with 'propagated=False' as the name/default. As Eric
pointed out, most MaskedArray functions like sum implicitly don't
propagate, currently, so maybe we should do likewise here.
Allan
Post by Benjamin Root
Why not "propagated"?
On Fri, Oct 14, 2016 at 1:08 PM, Sebastian Berg
Post by Allan Haldane
Hi all,
Eric Wieser has a PR which defines new functions np.ma.correlate and
https://github.com/numpy/numpy/pull/7922
<https://github.com/numpy/numpy/pull/7922
Post by Allan Haldane
We're deciding how to name the keyword arg which determines whether
masked elements are "propagated" in the convolution sums. Currently we
Any thoughts?
Sounds a bit overly odd to me to be honest. Just brain storming, you
could think/name it the other way around maybe? Should the masked
values be considered as zero/ignored?
- Sebastian
Post by Allan Haldane
Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
<https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
<https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Allan Haldane
2016-10-16 01:21:13 UTC
Permalink
Post by Juan Nunez-Iglesias
+1 for propagate_mask. That is the only proposal that immediately makes
sense to me. "contagious" may be cute but I think approximately 0% of
users would guess its purpose on first use.
Can you elaborate on what happens with the masks exactly? I didn't quite
get why propagate_mask=False was unintuitive. My expectation is that any
mask present in the input will not be set in the output, but the mask
will be "respected" by the function.
Here's an illustration of how the PR currently works with convolve,
Post by Juan Nunez-Iglesias
Post by Allan Haldane
m = np.ma.masked
a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1])
b = np.ma.array([1,1,1])
print np.ma.convolve(a, b, propagate_mask=True)
[1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1]
Post by Juan Nunez-Iglesias
Post by Allan Haldane
print np.ma.convolve(a, b, propagate_mask=False)
[1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1]

Allan
Post by Juan Nunez-Iglesias
Post by Allan Haldane
I think the possibilities that have been mentioned so far (here or in
contagious
contagious_mask
propagate
propagate_mask
propagated
`propogate_mask=False` seemed to imply that the mask would never be set,
so Eric also suggested
propagate_mask='any' or propagate_mask='all'
I would be happy with 'propagated=False' as the name/default. As Eric
pointed out, most MaskedArray functions like sum implicitly don't
propagate, currently, so maybe we should do likewise here.
Allan
Hanno Klemm
2016-10-16 09:52:57 UTC
Permalink
Post by Allan Haldane
Post by Juan Nunez-Iglesias
+1 for propagate_mask. That is the only proposal that immediately makes
sense to me. "contagious" may be cute but I think approximately 0% of
users would guess its purpose on first use.
Can you elaborate on what happens with the masks exactly? I didn't quite
get why propagate_mask=False was unintuitive. My expectation is that any
mask present in the input will not be set in the output, but the mask
will be "respected" by the function.
m = np.ma.masked
a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1])
b = np.ma.array([1,1,1])
print np.ma.convolve(a, b, propagate_mask=True)
[1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1]
Post by Juan Nunez-Iglesias
print np.ma.convolve(a, b, propagate_mask=False)
[1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1]
Allan
Given this behaviour, I'm actually more concerned about the logic ma.convolve uses in the propagate_mask=False case. It appears that the masked values are essentially replaced by zero. Is my interpretation correct and if so does this make sense?

When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.

Hanno
Pierre Haessig
2016-10-17 17:01:14 UTC
Permalink
Hi,
Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.
When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.


Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."

just thinking of yet another keyword name : ignore_masked (or drop_masked)

If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.

Now of course the convolution function is more general than just
autocorrelation...

best,
Pierre
j***@gmail.com
2016-10-18 17:25:37 UTC
Permalink
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
Post by Pierre Haessig
Hi,
Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.
When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.
Now of course the convolution function is more general than just
autocorrelation...
I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.

To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True

Josef
Post by Pierre Haessig
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-10-18 17:30:52 UTC
Permalink
Post by j***@gmail.com
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
Post by Pierre Haessig
Hi,
Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.
When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
as aside: statsmodels has now an option for acf and similar

missing : str
A string in ['none', 'raise', 'conservative', 'drop']
specifying how the NaNs
are to be treated.

Josef
Post by j***@gmail.com
Post by Pierre Haessig
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.
Now of course the convolution function is more general than just
autocorrelation...
I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.
To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True
Josef
Post by Pierre Haessig
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
j***@gmail.com
2016-10-18 17:49:13 UTC
Permalink
Post by j***@gmail.com
Post by j***@gmail.com
On Mon, Oct 17, 2016 at 1:01 PM, Pierre Haessig
Post by Pierre Haessig
Hi,
Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.
When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
as aside: statsmodels has now an option for acf and similar
missing : str
A string in ['none', 'raise', 'conservative', 'drop']
specifying how the NaNs
are to be treated.
aside to the aside: statsmodels was just catching up in this

The original for masked array acf including correct counting of "valid" terms is

https://github.com/pierregm/scikits.timeseries/blob/master/scikits/timeseries/lib/avcf.py

(which I looked at way before statsmodels had any acf)

Josef
Post by j***@gmail.com
Josef
Post by j***@gmail.com
Post by Pierre Haessig
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.
Now of course the convolution function is more general than just
autocorrelation...
I think "drop" or "ignore" is too generic, for correlation it would be
for example ignore pairs versus ignore cases.
To me propagate sounds ok to me, but something with `valid` might be
more explicit for convolution or `correlate`, however `valid` also
refers to the end points, so maybe valid_na or valid_masked=True
Josef
Post by Pierre Haessig
best,
Pierre
_______________________________________________
NumPy-Discussion mailing list
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Allan Haldane
2016-10-18 22:37:56 UTC
Permalink
Post by Pierre Haessig
Hi,
Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.
When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
just thinking of yet another keyword name : ignore_masked (or drop_masked)
If I remember well, in R it is dropna. It would be nice if the boolean
switch followed the same logic.
There is an old unimplemented NEP which uses similar language, like
"ignorena", and np.NA.

http://docs.scipy.org/doc/numpy/neps/missing-data.html

But right now that isn't part of numpy, so I think it would be confusing
to use that terminology.

Allan
Allan Haldane
2016-10-18 23:18:18 UTC
Permalink
Post by Pierre Haessig
Post by Hanno Klemm
When I have similar situations, I usually interpolate between the valid values. I assume there are a lot of use cases for convolutions but I have difficulties imagining that ignoring a missing value and, for the purpose of the computation, treating it as zero is useful in many of them.
When estimating the autocorrelation of a signal, it make sense to drop
missing pairs of values. Only in this use case, it opens the question of
correcting or not correcting for the number of missing elements when
computing the mean. I don't remember what R function "acf" is doing.
Also, coming back to the initial question, I feel that it is necessary
that the name "mask" (or "na" or similar) appears in the parameter name.
Otherwise, people will wonder : "what on earth is contagious/being
propagated...."
Based on feedback so far, I think "propagate_mask" sounds like the best
word to use. Let's go with that.

As for whether it should default to "True" or "False", the arguments I
see are:

* False, because that is the way most functions like `np.ma.sum`
already work, as well as matlab and octave's similar "nanconv".

* True, because its effects are more visible and might lead to less
surprises. The "False" case seems like it is often not what the user
intended. Eg, it affects the overall normalization of normalized
kernels, and the choice of 0 seems arbitrary.

If no one says anything, I'd probably go with True.

Allan
Stephan Hoyer
2016-10-18 23:44:03 UTC
Permalink
Post by Allan Haldane
As for whether it should default to "True" or "False", the arguments I
* False, because that is the way most functions like `np.ma.sum`
already work, as well as matlab and octave's similar "nanconv".
* True, because its effects are more visible and might lead to less
surprises. The "False" case seems like it is often not what the user
intended. Eg, it affects the overall normalization of normalized
kernels, and the choice of 0 seems arbitrary.
If no one says anything, I'd probably go with True
I also have serious concerns about if it ever actually makes sense to use
`propagate_mask=False`.

So, I think it's definitely appropriate to default to `propagate_mask=True`.
Pierre Haessig
2016-10-19 08:10:18 UTC
Permalink
Post by Allan Haldane
Based on feedback so far, I think "propagate_mask" sounds like the best
word to use. Let's go with that.
As for whether it should default to "True" or "False", the arguments I
* False, because that is the way most functions like `np.ma.sum`
already work, as well as matlab and octave's similar "nanconv".
* True, because its effects are more visible and might lead to less
surprises. The "False" case seems like it is often not what the user
intended. Eg, it affects the overall normalization of normalized
kernels, and the choice of 0 seems arbitrary.
If no one says anything, I'd probably go with True.
Sounds good!

Pierre

Allan Haldane
2016-10-18 22:49:16 UTC
Permalink
Post by Hanno Klemm
Post by Allan Haldane
Post by Juan Nunez-Iglesias
+1 for propagate_mask. That is the only proposal that immediately makes
sense to me. "contagious" may be cute but I think approximately 0% of
users would guess its purpose on first use.
Can you elaborate on what happens with the masks exactly? I didn't quite
get why propagate_mask=False was unintuitive. My expectation is that any
mask present in the input will not be set in the output, but the mask
will be "respected" by the function.
m = np.ma.masked
a = np.ma.array([1,1,1,m,1,1,1,m,m,m,1,1,1])
b = np.ma.array([1,1,1])
print np.ma.convolve(a, b, propagate_mask=True)
[1 2 3 -- -- -- 3 -- -- -- -- -- 3 2 1]
Post by Juan Nunez-Iglesias
print np.ma.convolve(a, b, propagate_mask=False)
[1 2 3 2 2 2 3 2 1 -- 1 2 3 2 1]
Allan
Given this behaviour, I'm actually more concerned about the logic ma.convolve uses in the propagate_mask=False case. It appears that the masked values are essentially replaced by zero. Is my interpretation correct and if so does this make sense?
I think that's right.

Its usefulness wasn't obvious to me either, but googling shows that
in matlab people like the file "nanconv.m" which works this way, using
nans similarly to how the mask is used here.

Just as convolution functions often add zero-padding around an image,
here the mask behavior would allow you to have different borders, eg
[m,m,m,1,1,1,1,m,m,m,m]
using my notation from before.

Octave's "nanconv" does this too.

I still agree that in most cases people should be handling the missing
values more carefully (manually) if they are doing convolutions, but
this default behaviour maybe seems reasonable to me.

Allan
Loading...